transcripts/startup-science-2012/william-gunn-mendeley.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186

William Gunn http://mendeley.com/

I am the head of academic outreach for Mendeley. I would like to invite anyone
to blog and share this presentation. So I am going to talk to you about what
we're doing at Mendeley but really what I want to talk about is the perspective
of the research process, since I'm from a research background, and moving into
a startup environment. I want to share some ideas about how we can help
catalyze open science as a movement and how to catalyze effective research.

I was at a meeting with National Academy of Science. How do we get to a cure
faster? There is a lot of talk about DARPA as a model for engineering these
things to work faster. You know, it struck me that what we're trying to do,
study diseases and so on, is way more complicated than engineering a jet engine
or something like that. It's a complicated sort of thing that we have to deal
with that. Just any kind of innovation is hard. The light bulb, hwo many years
did we have the lightbulb before we had this room?

Before I get into this, I want to talk about a story that we should consider.
Does anyone recognize this guy? This is Hairy Creamer, he was a business magnet
in the 60s who was fascinated about human powered flight. He thought that this
could be transformative for the world. He wanted to see this happen. He put up
$50k and $100k in the 1960s which made it the xprize of its day. A lot of
aeronautics companies worked on prototypes, making them lighter giving the
mmore lift, etc. So they would get funding and they would build these
prototyeps and test them, and they would crash 50 meters into the ground later.
They would then test them a year later. The prize was not won for more than 10
years, even with all these experts in aeronautics until Paul McReary came
along. The problem was not how to make our planes light enough for whatever so
people ccan fly them, but how do we fix the testing process so that we can
engineer this stufff without it being expensive or boring?

So he built these prototypes which were initially just plastic tubing that,
wouldn't even fly to begin wtih, they were breaking all the time, but they were
so light and cheap that he could tape them back togehter, and he could do 2 or
3 tests per day instead of per year. One of his planes, did it. He won both of
the prizes. He's known to be one of the most brilliant engineers in the whole
aeronautics industry.

This probably sound sfamiliar in research. There are these large time scales.
The publication timescale is huge. The treatment for ATR2-positive breast
cancer. The first paper was back in 1994 but you didn't see the activity until
after 2000 and such. That's a huge time lag. As you can see, the citations and
characteristically are lagging the publications by about two yearsi n this
process. Towards the front, it takes a nose dive at 2010. This was data pulled
yesterday. There's a huge lag, what's the nature of this impact? The guys who
came before me talked about the problems about the system. I wanted to make a
good point here.

How do we take all this money, $300 million from the NIH and other mechanisms,
and put it in this black box of research. 12 years to get a drug to market? 42
is the average age is where a researcher gets their first NIH grant. At that
time, all that buy-in, maybe you don't want to rock the boat at that point. So
how do we get a product out there that is tasty sausage and not somthing else
that this could be. How do we take that 12 years and make it 12 months?

Here are some ideas. You could look at the papers that get cited, but that
takes a long time and it's not fast. You could go for the journal with the most
sselective process. But history has shown that you can't pick whih papers will
have the most impact, or which ones will be retracted ahead of time. There's
actually a pretty good correlation between impact factor and the number of
retractions, this is a sign that this process is broken.

Can we get more and better data faster? Let's publishg excessively, will that
catalze things? We kicked this off at Mendeley, we wanted to get this real
time. To do that, we built a tool, a desktop client that people download and
use to organize their research papers. We can aggregate all this catalog, this
crowdsourced catalogs with all these signals from the domain experts. We have a
fairly good reputation, we have a ton of data to dig into this stuff. All of
this data is associated with author profiles in the system, which is a rich
social demographic layer. With all this stuff together, we canc ome up with
great summaries that will show you the top researchers, the key concepts that
you want to dig into, and the breakout papers for the subject.

But, I'm not going to stand here and say using Mendeley is going to help you
cure faster. That's not true. It's about using the data we've collected to
drive your process faster. There's more stuff to do with this data than there
are ways that we could hire developers to do. If Jason, our cofounder was
here... we put up a prize to incentivize people to take data and build
something ccool. A cool open science project at the end, it was opensnp.org
where someone built this platform where you can share your 23andme or decodeme
genetic data. It just aggregates all this stuff. All this data is CC licensed,
and you can download it and look at correlations, and it has literature search.

Some other people took our data and came up with simple versions of metrics.
This is the person index, it's another way of looking at an author's impact.
Say you have A sub 4 of 7, if you hae 7 papers on Mendeley.. the difference
here is that this is updated on a daily basis, as soon as it comes out it has
these metrics accumulating, so it cuts 2 years out of that cycle. As you will
see later on, there are other people that are taking this data and making a
product out of it. Jason will talk about impact later.

There are people taking this and trying to come up with new peer review models,
attaching Mendeley objects to this peer review system. Leverage the systems
already existing and don't try to take this overhead to create a peer review
system from scratch.

This is another data project- encouraging people to build semantic links
between documents. I think it's a neat idea, you know. So we have all this data
from all these academics,s o now we are trying to segment that and come up with
focused areas. So we have whole institutions as blocks as now, and now we have
Mendeley Institution edition. It's like Google Analytics for your research. If
you are a library you want to know who at your institution is reading which
journals, which are the ones that I really need? It gives them leveraging power
against traditional bad deals that the publishing industry forced on them. It's
another I guess pressure point within the whole theme there.

If you are a researcher in that institution you might want to know who is
reading my paper, in which industries am Ihaving an impact? You can get that
stuff out. What major papers are having impact on my department or my
colleagues?

So that's what I've talked about- our data and what we've been doing it with.
You should build products on top of Mendeley data and help accelerate science.
Thank you.

What do you see as the primary direction for growth?

The major growth direction is getting individual institutions to sign up in
mass so that we have not 1000 people here, and 2000 people there spread around,
but instead getting teams and departmets. The collaboration software and team
packages- we are going to get that stuff in, getting a picture of the total map
of what the impact of an individual institution has.. this will help grant
funders make better allocation decisions. We have seen interest from libraries
around this product for us.

So you see this as your biggest obstacle: just usage, just getting people to
rely on your data as a critical research element?

Yeah, I think we're doing pretty well with adoption. We have a signup/adoption
curve.. it's not a severe problem, but that's the number one challenge is
getting more people. Do you feel that people see these results that you have
gotten as a tool that they wish to spend more time on, to cross that threshold?
As you're going to see from these guys, total impacto n some other startups
that are basing their startups on, it's a vote of confidence in what we're
doing it, as a good valuable robust data source. Filtering is important as,
prior talk, that we were listening to, is because you have enough numbers that
just the right parts keep screening out, and the junk and noise aspect out is
so much down the page that it automatically filters it out. Our API gives that
to people who need to filter.. for example, PLOS One, all of PLOS, uses an
article level metrics, and they use Mendeley to see the number of readers.
Quality of being trusted.. one way of accurately establishing a value to see if
that value is growing? What you're becoming is a Google-liek proxy.. Google was
successful was because when you did the first search you got all the results on
the first page.. so the level at which you're working is that point but with
science. It's going to allow personalization and segmentation in that, not just
what's globally popular in geographic regions, but what's popular in my
subdiscipline? What is your revenue model? We have three different revenue
moddels: individual upgraded account that gives you storage space, Mendeley
Institutional analytic packages, is sold to the university with an upgraded
account, that's a very good revenue model for us. The platform that we have,
the API that others startups are cnosuming data from, is an additional
opportunity that we have that could turn out to be big.

So you mentioned a relationship between number of retractions and traditional
journal impact factor, is there anything that you're doing to change the trend
of bad science. That's one of the things that we want to do, our main challenge
really is getting enough good developers on the end and people that understand
science as well to take a crack at these problems, right now we're kinda like
Twitter, we're trying to keep the service up and scaling and dealing with the
massive volumes of uploads. We haven't had a chance to breath and attack these
really interesting other things. We're starting to take that on and use our
API, there's just so much stuff. There's stuff that could be done, yes.

So, kinda related to that last point.. we live in a world of diverse incentives
in biomedical science. There is peer review in journals, and then we base our
funding on peer reviews, so we're doubling down on a bad system. The simple
acto f publishing.. when you look at a paper, and maybe it's prestigious or
not, or something, you're looking at a cherrypicked.. you're looking at what
the PI has decided to interpret and present to you.. the sharing actual raw
data and changing the incentives for this so that the good scientists that make
good data is rewarded, so not only do you have thoughts on how to incentivize
scientists to share raw data, rather than this ridiculous process of
publishing?

If you can speed up the process a little bit, you can do micropublishing and
then the other thing is, you know, you have all these experts doing a certain
task or certain procedure around the world.. what if their icentivize didn't
come from publishing, what if they just want to put their expertise in a
marketplace, and let the market decide? Like ScienceExchange.

You showed some organizations and projects.. can you speak to whether Mendeley
has plans to work with companies that wants to do partnerships to reach your
audienceor vice-versa to reach your customers? Not just your Data API. Yes,
we'er interested in how to do that stuff. Is there anything going on like that?
Yes there are some projects.