William Gunn http://mendeley.com/
I am the head of academic outreach for Mendeley. I would like to invite anyone to blog and share this presentation. So I am going to talk to you about what we're doing at Mendeley but really what I want to talk about is the perspective of the research process, since I'm from a research background, and moving into a startup environment. I want to share some ideas about how we can help catalyze open science as a movement and how to catalyze effective research.
I was at a meeting with National Academy of Science. How do we get to a cure faster? There is a lot of talk about DARPA as a model for engineering these things to work faster. You know, it struck me that what we're trying to do, study diseases and so on, is way more complicated than engineering a jet engine or something like that. It's a complicated sort of thing that we have to deal with that. Just any kind of innovation is hard. The light bulb, hwo many years did we have the lightbulb before we had this room?
Before I get into this, I want to talk about a story that we should consider. Does anyone recognize this guy? This is Hairy Creamer, he was a business magnet in the 60s who was fascinated about human powered flight. He thought that this could be transformative for the world. He wanted to see this happen. He put up $50k and $100k in the 1960s which made it the xprize of its day. A lot of aeronautics companies worked on prototypes, making them lighter giving the mmore lift, etc. So they would get funding and they would build these prototyeps and test them, and they would crash 50 meters into the ground later. They would then test them a year later. The prize was not won for more than 10 years, even with all these experts in aeronautics until Paul McReary came along. The problem was not how to make our planes light enough for whatever so people ccan fly them, but how do we fix the testing process so that we can engineer this stufff without it being expensive or boring?
So he built these prototypes which were initially just plastic tubing that, wouldn't even fly to begin wtih, they were breaking all the time, but they were so light and cheap that he could tape them back togehter, and he could do 2 or 3 tests per day instead of per year. One of his planes, did it. He won both of the prizes. He's known to be one of the most brilliant engineers in the whole aeronautics industry.
This probably sound sfamiliar in research. There are these large time scales. The publication timescale is huge. The treatment for ATR2-positive breast cancer. The first paper was back in 1994 but you didn't see the activity until after 2000 and such. That's a huge time lag. As you can see, the citations and characteristically are lagging the publications by about two yearsi n this process. Towards the front, it takes a nose dive at 2010. This was data pulled yesterday. There's a huge lag, what's the nature of this impact? The guys who came before me talked about the problems about the system. I wanted to make a good point here.
How do we take all this money, $300 million from the NIH and other mechanisms, and put it in this black box of research. 12 years to get a drug to market? 42 is the average age is where a researcher gets their first NIH grant. At that time, all that buy-in, maybe you don't want to rock the boat at that point. So how do we get a product out there that is tasty sausage and not somthing else that this could be. How do we take that 12 years and make it 12 months?
Here are some ideas. You could look at the papers that get cited, but that takes a long time and it's not fast. You could go for the journal with the most sselective process. But history has shown that you can't pick whih papers will have the most impact, or which ones will be retracted ahead of time. There's actually a pretty good correlation between impact factor and the number of retractions, this is a sign that this process is broken.
Can we get more and better data faster? Let's publishg excessively, will that catalze things? We kicked this off at Mendeley, we wanted to get this real time. To do that, we built a tool, a desktop client that people download and use to organize their research papers. We can aggregate all this catalog, this crowdsourced catalogs with all these signals from the domain experts. We have a fairly good reputation, we have a ton of data to dig into this stuff. All of this data is associated with author profiles in the system, which is a rich social demographic layer. With all this stuff together, we canc ome up with great summaries that will show you the top researchers, the key concepts that you want to dig into, and the breakout papers for the subject.
But, I'm not going to stand here and say using Mendeley is going to help you cure faster. That's not true. It's about using the data we've collected to drive your process faster. There's more stuff to do with this data than there are ways that we could hire developers to do. If Jason, our cofounder was here... we put up a prize to incentivize people to take data and build something ccool. A cool open science project at the end, it was opensnp.org where someone built this platform where you can share your 23andme or decodeme genetic data. It just aggregates all this stuff. All this data is CC licensed, and you can download it and look at correlations, and it has literature search.
Some other people took our data and came up with simple versions of metrics. This is the person index, it's another way of looking at an author's impact. Say you have A sub 4 of 7, if you hae 7 papers on Mendeley.. the difference here is that this is updated on a daily basis, as soon as it comes out it has these metrics accumulating, so it cuts 2 years out of that cycle. As you will see later on, there are other people that are taking this data and making a product out of it. Jason will talk about impact later.
There are people taking this and trying to come up with new peer review models, attaching Mendeley objects to this peer review system. Leverage the systems already existing and don't try to take this overhead to create a peer review system from scratch.
This is another data project- encouraging people to build semantic links between documents. I think it's a neat idea, you know. So we have all this data from all these academics,s o now we are trying to segment that and come up with focused areas. So we have whole institutions as blocks as now, and now we have Mendeley Institution edition. It's like Google Analytics for your research. If you are a library you want to know who at your institution is reading which journals, which are the ones that I really need? It gives them leveraging power against traditional bad deals that the publishing industry forced on them. It's another I guess pressure point within the whole theme there.
If you are a researcher in that institution you might want to know who is reading my paper, in which industries am Ihaving an impact? You can get that stuff out. What major papers are having impact on my department or my colleagues?
So that's what I've talked about- our data and what we've been doing it with. You should build products on top of Mendeley data and help accelerate science. Thank you.
What do you see as the primary direction for growth?
The major growth direction is getting individual institutions to sign up in mass so that we have not 1000 people here, and 2000 people there spread around, but instead getting teams and departmets. The collaboration software and team packages- we are going to get that stuff in, getting a picture of the total map of what the impact of an individual institution has.. this will help grant funders make better allocation decisions. We have seen interest from libraries around this product for us.
So you see this as your biggest obstacle: just usage, just getting people to rely on your data as a critical research element?
Yeah, I think we're doing pretty well with adoption. We have a signup/adoption curve.. it's not a severe problem, but that's the number one challenge is getting more people. Do you feel that people see these results that you have gotten as a tool that they wish to spend more time on, to cross that threshold? As you're going to see from these guys, total impacto n some other startups that are basing their startups on, it's a vote of confidence in what we're doing it, as a good valuable robust data source. Filtering is important as, prior talk, that we were listening to, is because you have enough numbers that just the right parts keep screening out, and the junk and noise aspect out is so much down the page that it automatically filters it out. Our API gives that to people who need to filter.. for example, PLOS One, all of PLOS, uses an article level metrics, and they use Mendeley to see the number of readers. Quality of being trusted.. one way of accurately establishing a value to see if that value is growing? What you're becoming is a Google-liek proxy.. Google was successful was because when you did the first search you got all the results on the first page.. so the level at which you're working is that point but with science. It's going to allow personalization and segmentation in that, not just what's globally popular in geographic regions, but what's popular in my subdiscipline? What is your revenue model? We have three different revenue moddels: individual upgraded account that gives you storage space, Mendeley Institutional analytic packages, is sold to the university with an upgraded account, that's a very good revenue model for us. The platform that we have, the API that others startups are cnosuming data from, is an additional opportunity that we have that could turn out to be big.
So you mentioned a relationship between number of retractions and traditional journal impact factor, is there anything that you're doing to change the trend of bad science. That's one of the things that we want to do, our main challenge really is getting enough good developers on the end and people that understand science as well to take a crack at these problems, right now we're kinda like Twitter, we're trying to keep the service up and scaling and dealing with the massive volumes of uploads. We haven't had a chance to breath and attack these really interesting other things. We're starting to take that on and use our API, there's just so much stuff. There's stuff that could be done, yes.
So, kinda related to that last point.. we live in a world of diverse incentives in biomedical science. There is peer review in journals, and then we base our funding on peer reviews, so we're doubling down on a bad system. The simple acto f publishing.. when you look at a paper, and maybe it's prestigious or not, or something, you're looking at a cherrypicked.. you're looking at what the PI has decided to interpret and present to you.. the sharing actual raw data and changing the incentives for this so that the good scientists that make good data is rewarded, so not only do you have thoughts on how to incentivize scientists to share raw data, rather than this ridiculous process of publishing?
If you can speed up the process a little bit, you can do micropublishing and then the other thing is, you know, you have all these experts doing a certain task or certain procedure around the world.. what if their icentivize didn't come from publishing, what if they just want to put their expertise in a marketplace, and let the market decide? Like ScienceExchange.
You showed some organizations and projects.. can you speak to whether Mendeley has plans to work with companies that wants to do partnerships to reach your audienceor vice-versa to reach your customers? Not just your Data API. Yes, we'er interested in how to do that stuff. Is there anything going on like that? Yes there are some projects.