summaryrefslogtreecommitdiff
path: root/transcripts/open-science-summit-2010/thursday.mdwn
blob: 5e2525c3185c66039acba4accab5650ea64477bd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
Transcripts from Open Science Summit 2010

My name is Michael .. I am a theoretical physicist and I am about to publish a
book on open science.

We're going to up the gain on this microphone. We'll get on that. How is it
here? Are we live there? Okay, just a second. Just to get.. Australians, my
name is Freder Nadegem. Council in the UK. I am interested in open science,
sharing science, processing, how to make the web suck less, how to do science
with others. I'm Ray Torrez Sastain, I am a statistician at the universit,y I
am going to talk about the credibility crisis in computational science and how
the solution is in this room, how open science will help it, increased
visibility and science path. I am Jason hoytt at Mendeley, and I am here to
talk about what you want to talk about. You'll hear me in a few minutes. I am
Mike Credis from My Health Cap, how open science and open innovation can be
applied to research and research development. Thanks. It sounds like it's much
better.

We're going to do about 15 minutes of general Q&A, discussion among the
panelists here, and then we launch into our first presentation with Mike Pretz,
who is hereby on notice. Be ready to start in 15 minutes. Why don't we have the
panelists open up for any general questions that the audience would like to
ask. Can we start with a good definition of open science? I thought that
science was open by definition.

The "open" bit. The open bit means that it is available to any one in the world
to do whatever they like with it, without any strings attached. I wuold agree
with that. It's pretty clear that we don't have that situation right now in
science. I would be thinking for some time- open science is not something that
you want to define too precisely. It's a great banner, and a great rallying
pool for people with quite diferent priorities. There are people who very
strongly advocate public access to the published peer-reviewed literature;
there are people who are strongly arguing for access to data, and that includes
the open government data movements, and then there are others who are
interested in improving the efficiencies in the way that we do science. All of
these people might not agree on the priorities. We see some different
priorities in the open access movement. There are some differences in science. 

Have you ever downloaded someone else's scripts without an open license, and
you ran them? There's the horror story of IP- it's getting in the way, it's
actually illegal. This is just one that is pervasive and a real barrier to
sharing and openness in science. There's certainly horror stories in terms of
patent stories and maybe I'll let other people talk about that. Here's a brief
personal story. I was involved in a proejct we were trying to get funded for.
It was in nanotech and heart disease. It was important, it concerns people a
great deal. It was for the treating of degradation of hearts in heart disease.
We had to have trials, we had to have patents, at the same time, we're talking
about putting nanoparticles into people. We need to have a lengthy appropriate
conversation with a lot of community, what kind of safeguards, we can't have
that conversation until we've got the patent. The therapy- if it works- it's
not popular. These things feed on each other in different directions, and they
are not always good. They can be problems, personal stories at least. So, I
have a few one. Heresay, but it's true. One university in the UK started
putting its theses online, and one of the scientists discovered that these
theses- horror- they were being read by other people, and some of it might be
patents. People thought that theses are for obscuring science. So now the
university has put an embargo on theses so nobody can read any of them. We have
to educate people on best practices. I did a survey of machine learning
community, and what was preventing them for putting code and data online, and
what enabled them if they did it? A number of senior professors said that they
wouldn't put the code up, because they would patent it and do a startup, and
the scientific community doesn't get it. At Mendeley, we've had a couple people
contact us and shocked that the metadata ended up on their search catalog.
People wanted that taken out. That was to their own detriment, now people can't
discover it. It was quite weird. I can't even think abuot the IP issues- we
have people getting upset sharing publications about these IPs, so we have a
long way to go apparently. 


I just wanted to contribute to an example of a hroror story. For the last 18
months, we've been doing a project on gene therapy. A number of scienctists,
these are great scientists, they are bright. The majority of scientists get
sto.. they said nothing, this is a real problem because we're going to break
this impass of intellectual property dominating science. It's going to take
courage, it's going to take scientists... evidence of how their research is
being impacted. The evidence came from Peter Mack. This is a world renowned
cancer research institute in Melbourne. Publicly funded. In 2 years, VRCA1 gene
sequences was delayed because of teh Australian and .. over who actually could
give permission to these guys to do this work. Eventulaly, the ersearch was
granted, and it has been delayed, and the cost was tripled. Let me also add.
One more comment. This idea that you need a patent, or without a patent you
aren't going to get a license, that's just nonsense. It really is nonsense.
Until 1978, you couldn't even get a patent on this, we have medicines. By the
way, the world's first antibiotic (penicillin) was developed withuot patent
protection. What happened before the patent system even existed? Why do we have
thsi in innovation? It's not because of the patent system. Scientists have to
stop believing thhhe nonsense          of    patent lawyers. They have done a
good job of making sure you understand this... that you need a patent to
successfully commercialize your research. You do not.

A very simple question: why now? Why is openness important now, but not 1995,
or 1865, or so on? The human race is growing up and this is an important part
of its future? Has that ever not been a true? We haven't understood the
importance fo openness until now. I started 50 years ago, it wasn't an issue,
it was perhaps implicit then, and we've been through a period of digital land
grabs, so now we're realizing the price we're paying for the digital land
grabs. In th e 1600s, the world, Henry Otenborg, went through a similar process
of open ness in science. What's the jogular issue today? Sorry, I don't want to
push you too hard. I can take a stab at      ttttthat. A main tttheme in my
research is that in the    16660s   we had   certaiiiin standardss. So why are
we not adhearing tooooo it now? Until 20 yearrrrrrs ago, when ocmputaaaation
became more pervasive, you had strict standards on what you would include so
that other scientists could replicate your experiments and verify your resulst.
When you introduce computation, the results are um, so detailed and complex in
terms of the steps that you taken, invocation for scripts and code, what
parameters you used, it doesn't fit into a section in a paper. We've not
embraced reprrrrroducibility....... A lot oooof my work has beenarrrrrround
fostering rrrrreeeeeproduccccibbbbbbiiility. We'vvvve had this   since the
1660s, our tools have changed. We need to recognize and adapt to this, to
simplify something that we already have. 400 years ago, the methods for
dissimentating science, matched the research output, for 400 years. And now, as
now Victoriiiiiiiiia is alluding to, we have so much data, so many IP issues,
licensing etc., to disseminate all of this, it took quite a bit of effort than
400 years ago. I don't know if the debate over open science is any larger now,
and it certainly seems like it. We need more channels. I guess for me, the
answer is that the way it was, reached a stage of maturity, where we can share
more details, and what the process of science, what the outputs of science are,
and we haven't used it effectively in the past 5 years as we have with.. taken
traditional print-publication and previous.. and that has been about the end of
it.. we can do a lot better, we need to do a lot better. We have a lot more
scientists, we have a lot more people and a lot more projects. Actually being
able to facilitate effective communication between the right subsets of people.
It should have been done 5 years ago, but it should happen over the next 5
years.

This isn't so much a question as a comment to spark uallatiyoff further
commingents from the panel. A lot of the arguments for and against open access
in literature, which probably won't find much sympathy in this room, some of
them seem disproved by current existing counter-examples. If you have
pre-publication publications, then you can't have a real publication because
the journals won't accept it. There can't be peer review, and that's sort of
nonsensical. This is.. the norm in math and pyhsics, my background is in math,
as soon as you have something that is reasonable, you put it up on the arXiv
preprint server, and you have published it and everyone can see that you got
there first. In the peer review process, that's what happens when you submit it
to a journal. So it seems like a historical thing in the life sciennnces, hwo
is it that it keeps persisting despite clear evidence that these arguments
don't make sense?alys

Very quickly: it varies between domain. In math, computer science and physics,
they have flowered the forest. If you publish a word in chemistr,y the American
Chemistry Society will not published it. We have to change the way that they
look at the world. It will be tough. This is true in various areas. If you let
the publishers runt hings, you will end up with a lot of these problems. If we
run things, we can decide what the rules are. We are our own worst enemies,
we'll have something that is incredible conservatism. We do this to ourselves.
You can see this very easily, you go into the top ranking universities, yuo
look at the people doing things that are radical and unusual. They are people
who are middle-aged who have some sense of security who are small numbered, not
terribly successful, they got tenure. The people who are most conservative are
postdocs, tenure-track professors, who are doing desperately whatever they can
to fit into the mold to make the job. There's a whole industry serving their
conservativsm.. because of the intense pressure. You have to validate or take
count of the value of the different types.. I think that will change, but I
think that's one reason.

I am interested in data that is inconclusive and act negve results. I am moving
from physics to bioinformatics.. er, something. A lot of data that has negative
results. I can negotiate the process, but it puts a barrier. Some stuff is a
secret- like bad drugs. Later on, as you put those drugs up on the market
without explaining all the bad effects, maybe as a society we might want to
start proposing some rules on publishing this data. Big data sets can find
random stuff in the data. There are scientists that are advocating the idea of
publishing failed experiments. It would be nice to have failed resultsp
ublished, because at least 25% of the work in a lab is stuff that they did in
the lab 5 years ago. There's a lot of work wasted. At least getting boring
science published. Then we move to stuff that is exciting. I put up 5 years of
failed and boring PhD research, so it's up on the web to download (applause).
This is an extremely serious problem. Part of the open science debate is this
multiple comparisons problem. Data mining across data sets, and then picking
your one significant result, as if you just found that, and publishing it. When
you are going to do a scientific experiment, befoer you look at the data, look
at the hypotheses, list it at a timestamp on a website, and then verify about
the hypotheses. But it's extremely pervasive just from doing statistical
consulting on a number of fields, I'm not going to point fingers. And a lot of,
some kind of mechanism to keep track of technical an.. broader issue. I think
this is something where openness in terms of reproducibility by workflow
tracking and data probing software and things that get the actual methods otu
into the visibility and at least out of the lab. So something that could be
incorporated into this, recording your experimetns and being able to
communicate them in a way that lets others figure out what's working and not
working.. and p-values. There's one other things. The problem is that there's
not venues.. that's not it.. there are journals of negative results. How many
scientists at the rule, had a bunch of results to publish, but hadn't got the
time to write the paper. There's about 3 hands. There's a real need to make
this work.. where we can record that research in a way that can be published
with a very easy press-a-button.. writing a paper takes a long time. This is a
reason for open notebooks, this is not so that others can readi t, but so that
we can make systems that makes a single button publishing system. Maybe not
into a journal, but at least it's there. For him, it's not worth his time to
write up those negative results. If the incentives were there, maybe you can
change his mind.. what incentives could be accomplished to make this happen?

I haven't heard the distinction between academic research that is publicly
funded, and under circumstances, that data should be open. So what about the
derivative of that science- like technology that is commercialized? To permit
the existence of intellectual property for copyrights or patent rights. The
other issue is quality control. Getting the concathiticity of data, all this
openness is making a mess of signal to noise ratio. How do you know what you
have? I have an example- the business that I am in: the mess by irressponsible
marketing of genetic analysis data which I think is a very serious setback both
to the open science concept and to the commercial development of what can come
out of sequencing. I'll leave it to you guys to argue about it.


There's an assumption about the information that is coming in from researchers.
More stuff being published is bad because we'd have to read all of it. That
assumption iis fundamentally flawed and you don't need to look very far to see
why. You want more stuff published and more rubbish published because then
someone can build Google for science. We need to make the discovery process
better. We're never going back to reading the table of contents. Filter
failure, science in terms of filter failure, we always assumed that we had to
filter stuff before publishing or whatever. That's crap. There's a moral
argument for public availability for science. In a sense, the more interesting
question is, what is the business case for open approaches to technology
development? In which cases are they going to be successful, and how can we
build legal structures to support these open innovations, and the best
commercial return on public investment?

I think I have a quick point. Tax payer rights, like I am a tax payer, and I
paid to fund you as a scientist, so I want to read what you write and stuff. It
isn't about the benefits to the tax payers. It's really about society as a
whole and we've decided to do this and give this back to society as a public
good. A problem with the tax payer argument is that not everyone is a tax
payer.. what we've decided is that as a society we're going to do it for
society's benefit. The second point is about the quality of data and analyzing
it, we're going to have to use machines to help us. We publish PDFs and GIFs
and stuff which machines can't deal with. We're looking at semantic publishing
of data so that machines can referree those bits that machines are good at. The
data has to be fit for review by humans, and searh by humans. In some other
disciplines, this isn't true. Astronomers do it well. We do not put enough
effort into it and we are going to have to it.

I usually agree with everything Cameron says.. but not today. Many younger
scientists have become conservative in their behavior. But they start out not
being conservatve. In case you want to read what 12 year olds are posting on
Facebook, they want to share everything, and within science this is also true.
There aer many young scientists who really get what open science is about. Our
culture has bashed them over the head about why they shouldn't be open.. they
read this as part of the culture of science, as the "older" crowd, what we need
to do as the older leaders of this community, not just give examples, but also
try to solve those existential crises about why these people no longer want to
be open. That's the most important things here, how to keep them from being
open, like what they want to be. I think it's the other way around.

I'll respond to that. I agree with you. That's why I used the term postdocs
instead of students. I agree with you. We need to do something radical with the
way we train young scientists, and be radical about the way to be radical. A
change of policy would definitely help with that. Speaking as someone who
leaved academia and take a career in science, as a technologist I am trying to
create the tools, not to shame scientists into putting their data out there,
but it's going to .. make their life more difficult to do it, when they see
that their peers are promoting their data, and their peers are getting
recognized and they are not. I will hint about how we are going to do that
later.

This question about how do you get more people into open science. Some of it is
inertia, it's not malicious, it's quite hard. In a big genomics institute,
making sure that data gets out the door, and making sure there's metadata,
there's lots of funding rqeuirements and stuff. We make the metadata a
requirement. Open access publishing requirements- that's something else we
require. We have to force it with an internal incentive but just a wider
incentive about people keeping secrets. There's this U.S. Whyte Noll act.
Things that get developed in academia get translated into products. The fixed
that was design was by dow- to give grants that make mony out of it, to
incentize them to go through this. There's been lots of tech transfer, double
their costs, and a lot of patents which industry actually, they say they can't
get access, because academics have inappropriate views of how much.. so that's
a big block. They will not anything else.. so it looks like. IT's just the
wrong term. The objective was to improve translation, and they tried this
mechanism, whih made things worse. We've come across this idea of openness
instead, how about aggressive openness, keep the academics pure, and that might
be better for translation, instead of the Bighta Act. Thank you so much ffffor
that. That was not planned. We had slides that we had to get queued up.
Cameron, please step down, and Mike Kretz, and Organ, and Martha, and we're
going to run through the ten minutes presentations starting with Mike.

Neglected R&D: How can open science bridge the health gap? Mike Gretes,
MindTheHealthGap.org

Okay, hi. Good evening. Thanks to the organizer for listening and setting this
up. From what I understand, there's two questions: the ethical dimension of why
do open science- because humanity has a right to the fruits of scientific
research, and an element of the fact that we can make science better, if we
have greater openness. I wanted to touch on these themes. And also kind of give
that moral, an .. as well.

Okay, so, my name is Mike Gretes. I am going to talk about neglected disease
R&D and how open science can helpw ith that. What is the health gap? There are
differences in terms of life expteancy. In the dark green there, where most of
us live, you get to live to 70 or 80. You can expect half of that in other
parts of the world. Why is this? People haven't always lived to 80. If you're
going to make it to the age of 33, like Jesus did in the year 0, he was
actually doing pretty well for that time. The situation around the world is
still trapped in health stuff from hundreds of years ago. So why is that?
There's a lot of reasons for that- violence. Infectious disease is a huge
element of that. Hear me out: malaria, HIV, tuberculosis. The top ten killers
of people. Four of the top seven in this list was infectious disease including
TB, HIV, and brain infections. It's not necessarily, what doesn't killy ou
makes you stronger. If you look at the amount of suffering and dsiability
caused by infectious diseases, these are disability-adjusted-life-years. The
same pattern persists, and the same countries suffer from disability as well as
death. Parasitic infectious contribute to this, but they don't kill people, but
it's a huge amount of disability. It's billions of people. 2.5 billion people
effected by TB. And a billion and half effected by other infections. It becomes
not only a technological question, ut also a .. income.. purchasing power. The
countries that are sicker and die.. this isn't a surprise to anyone, but the
fact that ti is a surprise is showing how profuond this question should be. It
still persists. So, I wanted to talk about, a lot of diseases, that people
haven't heard about. I am going to talk about one, whether HIV is a disease or
not, but for 25 years ago, it was definitely a disease. No cure. HIV, you're
going to die. For awhile it was not recognized as a cause of AIDS. From 1987 to
1994, and you can see a number of people, and HIV just continuing to climb.
It's a lot more staggering if you compress the X axis, and there's this trend,
and nobody knew what was going to happen. And the turn-around and what caused
that to happen. This is Joseph. I won't tell you where he is right now, he's in
a health care facility. The health outcome he's seeing right now is from HIV,
it could have been anyone in the 1980s. But what turned this around was
anti-retval viral drugs and therapies, and these tailed off the infections in
the United States and much of the developed world. What does that mean? 

Thsi is my neighbor. He has been living with HIV. He was dying of a cancer
caused by HIV/AIDS. In the 1990s, he heard about anti-viral therapy, and he
asked them to put him on it. He heard it from his friends. He made a full
recovery, he's very healthy. From his state, he looked a lot like Joseph did.
So thanks to these drugs, he turned his life around. Joseph was looking like
that in the year 2000 in Haiti though. There's this sort of biomedical
victory.. and great achievement of antiviral drugs.. without the social
innovationt o actually get these drugst o everyone. To most of the world it's
an incomplete victory. There's a whole story aout that. Need... smuggeling
drugs down to Haiti.. in 6 months, Joseph was recovered. This matters to us
because we're interested in doing research and at least in parts.. drug
development is a way of helping people in the immediate term. The burden of
disease is in the colored bars here. There's cancer, HIV, aids, TB, malaria,
and other diseases that line up. The number of stacked pills there represents
the number of countries... rate of new drug development from 1975, 30 year
period. There's no drugs whatsoever for HIV, now there's many. The rate of drug
development is far less than that for cardiovascular disease. It fall alongs
the line of rich and poor. We would like to see the same number of drugs for
most of these diseases. What are the challenges for this? We had our PhD team
here. What happens for those cancer and cardiovascular disease drugs? We had
patients like my neighbor John, and then we have funders, Heart&Stroke
Foundation, and a large profitable industry that can invest a lot of resources,
put it into the pipeline, a lot of universities help out, and then there's
MTHG. I'll skip over this. This is what the pipeline looks like in detail. IT
costs a lot in detail. Why can't we do drug development more cheaply? We
should. It's important in phase 3, we're talking about 3,000 patients for
several years, and that's hundreds of millions of dollars, per drug, and the
estimate varies based on who is asking. The public isn't going to pay that (?).
If there really isn't money to pay for it, even small industry interest, a huge
patient population, you're not going to see anything for some time. The
situation is changing. Linux. I don't know what the open science mascot is. So
I just used the Linux penguins. 

thesynapticleap.org tropicaldisease.org

sandler.ucsf.edu/lnf rarediseases.  pd2.lilly.com gsk.com callobartivedrug.com
Medicines for Neglected Diseases (Boston)

info@mindthehealthgap.org



Two ideas for open science. Victoria Stodden. In a way that it will catch fire
across all scientific disciplines. Open science as a movement from my
perspective. Reproducibility as a framing principle and also touch on what I
believe is a credibility crisis at least in computational science, based on the
fact that we're not sharing data and scripts in the way that we would be if we
were following scientific principls. Code needs to be included in this
framework of open science, underscored by this concept of reproducibility. My
understanding- or oen way to think of a movement- is that it's something that
is emerging across multiple disciplines. It's not just happening in biology or
crystallography, writ large. We have changing communication modality and
pervasiveness of computationality, but the type of knowledge and the questions
they can ask, and different ways that.. using that. there's also a cultural
component. Data standards. Being discussed in many circles along with the
publication of your paper. We also had the opening discussion, Tim mentioned
how in the UK there were data release plans,a nd the NSF has decided that data
release plans will accompany all grants starting in October and that will be
peer reviewed. Another dimension of the cultural aspect is the standards and
expectations. There are journals and so on, but the strongest incentive is as
the scientists, is what do our peers expect? And what are the standards in my
local community? SO my thesis for this mini-talk is that our adapation and so
on to the technological and new openness and sharing- it's not happening fast
enough, and it's bringing about a credibility crisis. In Climategate, there
were many emails but also some documentation files and other pieces of code
from these things in a University in the UK, one of the premiere climate
research schools. This was a failure of information sharing, we couldn't .. we
didn't know how the results were being generated, and not so much as scientists
being bad, we just wanted to know what was happening. Something this week was
.. some ground breaking work on using genetic data to know what drugs will best
treat your cancer, and there's clinical trials at Duke, and there's a lot of
scrutiny as other scientists found mistakes. The work was award-winning.. and
the mistakes shed a lot of light on what maybe actually be flawed science, so
this scandal is on-going. I don't think that scandal is too strong of a word..
what type of review does our work go under? And what could be a foundation for
safe clinical.. mistakes in publications.. So what's with all these stories?
There's lots of risks .. it's a problem. This has actually started to seep into
.. an off-band .. lso what I think the solution to thisi s is this getting the
code and data out there that there is a way to reproduce the data, so that
other peopl can be shared.. under the published results at the time of
publication. Reproducibility. ................ cooking, cleaning, wriet up
results, and just build up the resultsi nto the public. So, um, I would like to
also argue that all of these aspects of what a scientist does, and there's deep
intellectual contributions. For example, in all of the knowledge of all of
these, it's all important for the replication of these results. Data filtering
is not trivial. This is something that is not only hard and complex to
replicate, but it can really impact the outcomes. Leaving a few observations
here and there, that dramatically changes the results. Data analysis, there's
um, typically in many cases, a large amount of intellectual capital in terms of
the statistical methods and the modeling that can embody many deep intellectual
contributions to science, so all this, the filtering and analysis and the
software necessary for replication, and it would be an oversight to leave these
out of the discussion when we're talking abuot transferring knowledge and so
on. Open code is as much an important part of this as much as open data, so it
must have an important role to play here. One thing that I have been working on
is something I called the Reproducible Research Standard. So I had a licensing
framework for code and data and for a published paper so that scientists could
attach this license, so here's one recommendation: all of the work can be
freely shared consistent with scientific norms and not in violation of
copyright, so my recommendations in brief is to attach an appropriate
attributions license to each component. Use my work however you like, just
attribute me, or put it into the public domain. This notion of the research
compendium that we're seeing discussed more and more more.. there's a paper,
code, data. Tools that are developing rapidly in different areas and it's
exciting- make this easier for the scientists to get teh code and data in a
format shared and so that others can verify the work with. There's many more.
Publication is being assisted by S-weave, so when you are compiling your
published doc, it re-compiles yyour data and so on. There's also a GenePattern.
Sweave. Sharing software platforms. mloss.org, DANSE, madagascar, Taverna,
Pegasus, Trident Workbench, Galaxy, Sumatra. Allow the community to do this,
very specialized in a platform, and is able to understand the data and use the
tools for the data. Madagascar is another platform for sharing in geophysics
and lots of workflow and tracking. We have .. this is her work. Pegasus.
Trident. Galaxy. It goes on and on. This will facilitate the openness of code
and data in terms of reproducibility. My final slide. Open code and data is a
unified principle which will allow us to do what we talked about in the very
beginning. Make it a movement that goes across all scientific pfields.. we can
rely on the notion of reproducibility and reproducible research. This is
nothing new in science, it's just something we signed up for when we signed up
for being a scientist. We are not updating the social contract, what we're
doing is returning to the scientific method which has beena round for hundreds
of years. (applause)

Peter Murray-Rust. I am a chemist. I don't know about slides because I do not
do PowerPoint. If any of you have anything, here we are, right, you can type
murrayrust blog and you will see that. Click on various things as we go
through. My main method of presentation is flowerpoint. I am old enough to have
remembered the 60s and not to have been at Berkeley but it has made a huge
contribution to our culture. The Open Knowledge Foundation will adopt this as a
way of making my points. We have many different areas- mabye 50- that come
under open, that relate to knowledge in general. If we can scroll down.. First
of all, my petals are going to talk about various aspects of openness. So I
will cover those things there, if you can down to the second link, the open
knowledge definition. This is the most important thing in this. A piece of
knowledge is open if you are free to use, re-use and re-distribute it, subject
only to attribute and share-alike. That's a wonderfully powerful algorithm. If
you cand o that, it's open. If not, it's not open according to this knowledge.
What the OKF has done, another picture, Panton Principles. It's a placed called
a pub. It's 200 meters from the chemistry departments where I work, and between
the pub and the chemistry lab is the Open Knowledge Foundation. Rufus has been
successful to get people to work on this. A lot of this is about government,
public relations. How many people have written open source software? What about
open access papers? How many of them had a full CC-BY license. If they weren't,
they didn't work as open objects. CC-NC, cause more problems thant hey solve.
How many people have either published or have people in their group who have
published a digital thesis, not many, right? How many of those explicitly carry
the CC-BY license. That's an area where  wwe have to work. Open Theses aer a
part of what we're trying to set up in the Open Knowledge Foundation, made the
semantics available, LaTeX, Word, whatever they wrote it in, that would be
enormously helpful. The digital landgrab in theses is startinga nd we have to
stop it. There are many things we can do. There are two projects, and these
have been funded. okay, So, Open Bibliography and Open Citations. At the
moment, we're being governed by non-accountable proprietary organizations who
measure our scholarly worth by citations and metrics that they invent because
they are easy to manage and retain control of our scholarship. We can reclaim
that within a year or two, and gather all of our citation data, and
bibliographic data, and we can then, and if we want to do metrics, I am not a
fan, we should be doing them, and not some unaccountable body. Anyone can get
involved in Open Bibliography and Open Citations. The next is open data, and
the next is very straight forward. Jordan Hatcher, John Wilbanks from Science
Commons, that has shown that open data is complex. I think it's going to take
10 years. This is a group involved in the Panton Principles, I can't point to
them. Jenny Malone, Jenny is a student. The power of our students..
undergraduates are not held back by fear and conventions. She has done a
fantastic job in the Open Knowledge Fuondation. Jordan, then Rufus, John
Wilbanks, Cameron, and me, and anyway, we came up with the Panton Principles,
so if you back a slide, you will see the Panton PRinciples, and let's just deal
with the first one. Data related to public science should be explicitly placed
in the public domain. There are four principles to use when you publish data.
What came out of all of this work is that, one should use a license that
explicitly puts your license in the public domain -CC0, or PPL from the Open
KNowledge Foundation. So, the motto that I have brought to this is which I've
been using and been taken up by.. our general library in the UK in the UK, on
the reverse of the flower, reclaim our scholarship. That's a verys imple idea,
one's that possible if a large enough people int he world looks to reclaiming
scholarship, we can do it. There are many more difficult things that have been
done by concerted activists. We can bring back our scholarship where we control
it, and not others. I would like to thank to people on these projects, Open
Citations and our funders and collaborators who are Jisk, who funds it, BioMed
Central who also sponsors this, Open .. Public Library of Science. (applause)

Biotorrents: a file sharing service for scientific daa. Morgan Langillie.
Here's a way to share your data right now. First, I'd like to acknowledge the
MOORE Foundation and my supervisor who let me take this tangent. You can send
me comments via twitter. I think we all agree that data is growing. We're
drowning in data, I hate that term. I'm going to throw you some terms. If we
want to continue to share open data and more openly, it should be simple and so
on. Three sort of personal challenges on a day to day basis, thsi is why I
built biotorrents. I want the download speed reliability, I just want to grab
some data, and it's annoying that it takes 3 days before I can get it. I want
to share all data associated with a study. The easiest way was to package it up
and share its omewhere. With biotorrents you can do that. It's not super
elegant, but at least it gets out there. Traditional file transfer.. you
connect with your main server, and the other one is basically download that
data, and the whitebar indicates how much that file has been transferred, and
another one doesn't get the data because of teh bandwidth. Unfortunately the
data has to travel across entire continents, and in between the two
institutions, your actual download speed is very limited sometimes for whatever
reasons. If the site goes down for planned maintenance or by accident, that
data doesn't get out there, which is not good if the data is time-sensitive.
You also want to check that the data is the original copy. Using traditional
method, you have to .. it's not in the protocol by default. There has to be a
better method. Today, I can download movies much faster than, movies that are
legal, than data that is open. In a p2p file transfer method, like bittorent,
the data set is now broken up in small pieces, and each computer has the piece,
and you still have that sole provider, but then the other users, as long as
they have the different pieces, so bandwidth grows as users increase. The other
computers might be geographically sperse so that others might be nearby. If
there's at least one full copy, at least everyone can get the paper. There's
also a SHA1 cryptographic hash so that the data can be guaranteed to be the
original. It's really well tested.. at least 25% of all internet traffic is bit
torrent. We can use it to share movies, but also data. So how easy is it to
use? Install bit torrent, you download it, you download a .torrent, basically
what happens when a user downloads a .torrent file, and there's a
tracker/server, like biotorrents.org, and it's not hosted on the server. If
it's hosted on biotorrents.org, it's just the .torrent file, not the giant data
set. It's metadata. And then it's communicated to other computers. Behind the
scene, the software is connecting to the tracker, getting IP addresses, and the
peers start communicating with each other and sharing data. There's a few other
bit torrent features.. a lot of people talk about unique IDDDDs, whenever you
createa  dataset, there's a hash, the whole data set, you'er guaranteed that
another person sitting beside you is using the same dataset, there's also the
client software, hashtable, and there's things.. without connecting to a
tracker even if it is down, and there's data posted to different trackers, and
the clients can find each other thorugh those other trackers. Lots of people
download a data set, and then a person downloads a data set, and then local
peer discovery so that people can find someone and then transfer the data over
the local network. I found out about this by accident, and I started testing it
and it was blazing fast, that was nice. If there was data hosted on traditional
methods like FTP,t hey can be added and they just act as an existing or extra
seed. You can upload your favorite genome to any one of these. Lots of these
exist already, and a lot of these have illegal copyright file sharing issues.
There's a few other trackers, but not very many. So, on top of that, even if
yuo did upload it there, it would be a hard time to find it, because the
community there isn't oriented to science. So, that's why I made biotorrents.
Of course, all data must be open. No illegal file hosting. The biological
domain. Of course I mentioned, it's not hosted on biotorrents, but I'm
mirroring the data on a separate server, in the long term it's up to the users
to provide the seeding of that data. You can search and browse by particular
scientific categories, and also by license and username. You have to set some
kind of license when you upload it. There's a large list there. There's an
"Other" category but they usually pick one of those licenses. Anyone can
download the data without a username, but if you want to interact with the
site, you can use your own username. Hopefully people will get a reputation for
sharing good data down the road. A few cool things about this - there's an RSS
feed, where you can automatically download data sets, there's also versions of
data set, or if the data just expands, or whatever, then you can put that
through versions with RSS feeds where you basically subscribe to the versions
for a certain data set and then from then on you get all new versions of that
data set, and that basically means that it also handles updates. Lastly,
there's an upload script, that so far, there's about 1,000 users, and it's
pretty limitedo n the number of data sets, whatever you've been sitting on for
years, and do we really need it? Here's an example of genbank. By FTP it took 6
hours. Right now the only way is to get it from NCBI.. and I can only get 0.5
MB/sec, and that means 5 days. So that sucks. So who uses biotorrents? Existing
large data providers, scientists share and publish data. Scientists sharing
unpublished data. There's issues with any sort of technology. Metalink.
Volunteer computing. THat's it. My final message is that data transfer should
be fast and easy. Embrace the technologies suh as bit torrent. Hopefully..
thanks.

It's time we change how research has changed

We are going to change the world of reference managemetn. THis is a bold
statement, a ridiculous oen I would say to you a couple of years ago. What we
are doing and what we are seeing. To explain the why, we are going to enlist
the ehlp of Tim Bergers-Lee, and what TIm said, the paraphrase is that we have
all this information on cancer, stem cells, diseases, it's all siloed away to
different computers. So Tim has issued a challenge to unblock this data. It's
not just abuot technology. There's this huge social norm. There's the behavior
of people. Open data, open science, it's actually obstructing the progress of
human progress in the world. The U.S. national economy, issued a few grand
challenges, one of these was the tools of scientific discovery, how can we
address this challenge? With mendeley, we are trying to make rscience more
transparent and open, and we're trying to build the world's largest academic
database. In these next slides, we are keeping these things in mind. Helping
researchers work scatter, extractable text and PDF files. There's a PDf..
annotate with Microsoft Word or Open Office and then what we do is take that
research data, and aggregate it into the cloud. By doing this then, we are
helping researchers collaborate and we'er making that data more transparent. So
then this is a screenshot of what you get when you sign up for Mendeley Web,
and you start seeing what's going on with people you're collaborating with on
different projects. What separates us from other reference managers? We find
statistical trends, like the most popular author or paper for the upcoming
week. And then if your'e familiar with twitter or trending tags, then we show
some of that. So we take all of that data being siloed away, and then we built
a search catalog on top of it. The big difference with our search catalog like
with pubmed, the 27 and those, that's the number of readers for that particular
article, you can't get that if you're just doing something on pubmed. So I
clicked to my landing page, so the standard citation information for that, but
then we also start digging down into the demographics. Who are these readers?
PhD students, professors, where are they from, what discipline are they in?
Because most of these papers are multi-disciplinary. And then of cuorse we show
some relayed research, like TIDEF, and then also collaborative filtering, like
the research papers that you should be reading but may have been missing. SO
we've been in public beta for 18 months, we have actually 450,000 users. These
are the top 20 universities so far. In terms of the number of papers we're
aggregating, we have 29M papers that metadata has been uploaded on. The size of
this? The Thomas Web Of Knowledge database has 40M papers, and it took them 50
years this. We might be able to match that amount in just 2 years. So one of
the things.. we created an open API so that others can access the same metadata
and statistics and there's some mashups that developers are creating. Chemical
Compounds, Location-based mashups of alzheimers research, swan data, grant
search engines, twitter streams, we have people building Google Wave mashups,
some Microsoft Word mashups, Google Docs mashups with these open APIs. And as
far as the future goes at Mendeley, getting back at what Tim Bergers-Lee said,
all that data is filed away on individual computers. There's a vast amounto fk
nowledge in our heads. How do we re-use and repurpose scientific knowledge? How
about semantic markup of other papers? Does this sentence support this
paragraph or sentence from this other paper? So we're creating a
human-currated, high throughput system, crowdsourced system for semantically
linking PDF papers that would be impossible to link even if this was
machine-readable, and just to get back to this statement here. How do we change
the social behavior of scientists who are sketpical of sharing their own
publications? So one of the things we're experimenting with and haven't
released yet are reputation metrics. We might show the number of downloads of
your publications or the page-views, and an incentive in your reputation metric
for scientists to upload their PDF files, and later on their own data. So
getting back to end this to what Tim was saying.. are we unlocking the data
that we have been siloing away for years? I hope we're doing that, and I hope
that our project wille ncourage others to do similar things, maybe with
biotorrents.

Open, Candidd discussion of science

Maartha Bagnall thirdreviewer.com

This conference is long overdue. Scientists have conversations about the
quality and evaluations of the published literature all the time. Every day you
walk into the alb and you see this recent paper, how are you thinking about it,
scientists are constantly usin this information in published literature, to
design new experiments, build on published ilterature, or whatever. What are
the venues for the kinds of communication?

ok, I need a break.. uploading.