Reinventing Discovery

Michael Neilsen

Open Science Summit, July 2010

Thank you all for coming along and it's a great pleasure to be talking about this opportunity to accelerate the rate of scientific discovery. When I'm thinking open science, I find it helpful to think about the opportunities that we have. The first of which is that, that has been discussed, is open data. So very broadly it's not just data in the conventional sense, but also making your code and design specifications open. Soem of that has been discussed. A second broad.. ideas that have been discussed, citizen science, open access, anything that allows us to build bridging instutitions and the broader community. I am not going to focus on this, but I just wanted to mention that as one of the three big things. This is.. okay. Hopefully we can sort that out, it shouldn't be much of a problem throughout the talk.

What I am going to focus on is this idea of open collaborative process, and I am going to talk around it. I am going to tell you a story from early last year, from the 27th of January of last year, when this man, Tim Gowers, had this idea. He began a project, and Gowers is a mathematician at Cambridge University. Okay. So, he's a mathematician at Cambridge University. He's a very good one, one of the world's leading mathematicians, he has received one of the highest honors in math. He's also a blogger, and on the 27th of January last year, he posted a very interesting post on his blog, and is massively collaborative mathematics possible? Gowers' idea was that it might be possibel to use his blog to attack a hard mathematical problem in the open. In particular, the comment section of his blog, and the idea was that anyone in the world could come in and comment on the blog. In the event that it was not just Gowers' blog that was used, but this other Field Medalist was also used, and it spread to a very active wiki for this purpose. He dubbed this the Polymath Project. I won't describe the problem, but it was a deep problem in Hales-Jewett theorem. He would have loved to solve it, but it was a very difficult and challenging open problem. This was right at the boundary of mathematics. Before getting started, there was some discussion on how the collaboration should proceed. There were some rules that he proposed. Two themes. One, people should be polite to one another and to themselves as individuals- don't beat up on yourself, and so on. And in the event that this was really, done admirably well, and it was very.. the second broad theme was that there should be a single idea in each comment, and in particular people should feel free to propose half-baked ideas or half-baked thoughts in the comments, incomplete ideas. In particular, this is veryi nteresting for a speaker- the size change. People were discouraged to go off and do their own extensive work, it was all to be done in the open.

The conversations opened up on February 1, there were 37 days, 27 contributors, 800 comments, and 170,000 words later. It was to normal research as driving is to pushing a car. The rate at ideas being proposed was intense. He announced that he solved the problem, and it took a few more months to work through details. They solved the heart of the problem. This is remarkably good work. It was ultimately written up as a paper, and a second paper (Moser numbers).. DHJ Polymath.. it was a large group of people. Given the success of the initial problem, several follow-up projects have been run by mathematicians, and there are several ongoing projects.g What conclusions can we make? It happened quickly. They had all the expertise to solve the problem - that expertise was latent in the community. The tools, blogs and wikis coordinated the attention of the people involved in a new way that activated that latent expertise. They restructured the expert attention that resulted in the solution of the project. Expert attention is often the most important and scarce resource. That's why these tools were important. By restructuring expert attention, we extended their problem solving ability. That's the promise of open collaborative process. It's not just the polymath project. There's all sorts of others. Here's a couple.

One is this open called mathoverflow.com, it's a stackoverflow- many leading mathematicians are involved, it was started a year ago, and they are doing very well. They get a few dozen problems each day, research-level problems. 90% of the problems are ultimately given answers. There are also some about publications. You see some frequently solved within minutes and often within hours. It's another way of restructuring expert attentiosn. There's also the BioStack site. It's doing something similar in bioinformatics. So, this open collaborative process is open and fun. You might see the explosion in rates of scientific progress.

Unfortunately of course, a lot of interesting ideas, described as failing. I want to dwell on failing and what distinguishes them from successful projects. Here's a quick example. Here's a science wiki from 2005 from the blah of Technology. It got a lot of attention, it was a super textbook for quantum computing research. The idea was that people would go to the wiki and upload ideas and reference materials relevant to the field. A lot of people were interested but not interested in contributing so it was over-run by spam and so on. There were other stories about these wikis.. quantwiki, string theory wiki, and many more. Those sites very systematically failed. The science comment sites- many attempts have been made to do the Amazon Review Section for papers, to write commentary, and there have been many attempts, and even high profiling ones fail. There's the idea of a facebook for science, and there's been tons of these started, and they are all ghost towns as far as I can see. The problem is that there's very little incentive for scientists to contribute to these sites. So why would you share knowledge or a similar sort of -- when first of all they are not very good? Not enough people contributing. And second anything you share, might help your competitors if you share specialized knowledge, you don't have much time and yuo won't get academic credit- so why not just write papers and get the credit? Something that everybody knows as a scientists is that bares repeating- when we share results in journal papers, we are revealing valuable information to other scientists, but you get a reputational reward in return. That's the scientific economy so to speak. When we contribute to a science wiki or something, or some other way, we're giving something up and we don't get enough reputational reward in return for it to be practical. So they have potential, but the opportunity cost is leading people to do other things, like write papers. So what's successful tend to be projects that have conventional.. papers.. like Project Polymath. It's wonderful for projects, but it's constraints, conventional projects and you just need to ... so having moved more to a open culture. It's more daunting. You could be mixed, the culture is great, but individually it doesn't mean you have easy actions to take to make it happen. Six months or so- it was very difficult because he was making a big effort to do so, and others weren't doing the same, so the benefits were not there. So while you can decide to do something, that doesn't mean that others will cooperate with you. That's the core problem or core paradox. TO make an analogy, it's like trying to change the side of the road that others drive on, just by changing yourself, and taht's not going to happen. It's hard to do, but that doesn't mean it's impossible. In Sweden, they made the change at 5 o'clock in the morning not without problem- 1967. It can happen. A similar problem in the 1600s. Elizabeth Eistenstein- history of the printing press- exploitation of the mass media is more common among pseudoscience and quakcs, they often withhold their works against the press. They were very conservative at adopting the new media. From your science high school- the Hooke's law- the force caused by a spring is proportional to the extension. In 1676.. what this meant was that if someone else made the discovery, Hooke would reveal the anagram and share his discovery, within time. So Galileo, Leonardo also employed anagrams. And anagrams have been revealed. So, discoveries were routinely kept secret.

The secretive culture of discovery is where there is little personal gain in sharing discoveries. Scientific advancements motivated patrons in the government to subsidize science, and the public benefit was strongest when discoveries were freely shared. They were subsidizing the freedom of science. It required social change- there was a shift where it was the sharing of discoveries, but not the making, which led to prestige to the journal authors or whatever. Michael Faraday, and someone asked him the secret of the success- work, finish publish. By that time, the open science revolution was achieved by subsidizing scientists who published their work in journals. That same subsidy is inhibiting the adoption of more technologies because it inhibits people to adopt the science and technologies.

Okay, so let me just finish about a change in the reputation discussion. There are many ideas about how to change, I want to talk about one in particular, to create new ways for scientists to create reputation, based on new tools. This has been done in the culture of physics. And preprints in physics have a long history, but in the last few years in particular, preprints have achieved a status as an end to themselves. there's this wonderful site, the arxiv.org where scientists can upload their preprints and freely download them. The preprints are sort of a stick towards real publication, it wasn't an end in itself. It has changed in recent years, and principally down to this side, SPIRES, what it does is a citation tracking, and they made an interesting decision- both between preprints and publications. Papers have been cited 13 times and preprint, so you're creating this ecology where it's possible to track the impact of preprints. The impact of your work can be demonstrated even if not published in the conventional sense. So in particular physics, SPIRES focuses on particle physics. I have been sitting in meetings where we evaluating young scientists and everyoen brings up SPIRES on their laptops to see the impact of people's work, and you can see the impact of their preprints and this makes a difference to people's attitude towards preprints. By making it citable and measurable, you're creating a new reputation or a new way to create reputation, and this can lead to a small change towards open science. You can adopt the same strategy, same basic strategy, you want to make data citable, and then the impact of that data is citable. Could deal with code, data, ideas, anything, any bit of science, and hopefully this leads to a more open science, and perhaps a second open science revolution. Thank you all very much for your attention.