transcripts/startup-science-2012/heather-piwowar-jason-priem.mdwn


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221

Heather Piwowar Jason Priem

We are with Total Impact. I am really gratified with the stuff that wa stalked
about earlier. We want to bring together these topics that we are talking
about. I think this is a revolutionary thing. The first scientific revolution
was the publication of the first journal, the Philosophical Transactions of the
Royal Society. They were using the best available technology to disseminate
updates. Instead of sending letters all over the place, let's have a standard
way to produce these in an industrial scale way.

I think the second revolution of science will not be hegemoizing but
hetergonizing, it will be diversity. We have the ability to apply technology to
these problems. Traditionally we have articles or stories about what you've
found. There are conversations about science, analysis, data, we haven't had
the means to publish this because of this homongeous system. Eventually,
science will publish all these aspects- analysis, data, if you do it in R you
could pubslih your code and make it minable. Maybe I will publish blog posts
instead of articles.

Even in the Republic of Letters, it was all about discussing with your
colleagues.

How does this new technology enable diverse ecosystems? Well, Mendeley. We've
talked about that. Another example is twitter. Some of my research, I'm a PhD
student, I want to see how scientists are using twitter. There are al ot of
scientific citations on twitter. Don't quote me on the number, I will deny it.
The number is impressive. This is the same conversation from faculty lounges,
but now it's leaving a trace. One of the participants in my study is that
twitter is like a jury that pre-selects what they look like. We talk a lot
about the filtering problem, and a lot of filtering is happening in this
conversation.

The diverse scientific communications ecosystem- we only look at academics that
cite things- but you have impact on patients and lots of other people. Another
kind of resource.. we only look at one kind of use, there are a lot ofo ther
things we might use research or. In this new world of scholarly diversity, we
need something to mine impact on this next one. Bibliometrics, impact factor
from the first one, we want to look at the impact on the enxt one, and we want
to see... what Richard introduced is looking at these indicators that we
weren't able to see before, looking at all these indicators, and trying to get
a broader or more timely picture of the impact of research. This is part of a
larger thing, like stock prices, elections, it's an area in SV, because of
Gradient6, dataset, postrank are being acquired by big companies, big
valuations, it's even part of our culture.. money ball, awesome baseball team
based on stats analysis. I think we are at this moment where the tech is coming
together to let big data analysis for scholarly work.

So, again, a couple of people have alluded to Total Metric.. there has been a
growing set of academic research on All metrics. There is an article both
academia and popular press... who thinks that these metrics are valuable?
Funders are very interested in this. A foundder found one tweet from a nurse
that liked a study they funded, they were desperate to see this. Publishers are
very interested in this. We have interest from librarians that want to sit down
with their faculty to build impact profiles. Repositories also, who aren't well
served by normal citations.

Heather will now show you some working software. Um, okay.

http://total-impact-webapp.something...

http://researchremix.org/wordpress/wp-content/uploads...

Our web app and API scratches my itch as a researcher and I don't think I'm
alone about scratching these itches. This is my old CV. Papers! I submitted a
paper to PLOS Biology in 2007, and they didn't accept, so they wanted to submit
to PLOS One. So I published it there and I loved it. But nobody in my field had
heard of it, it wasn't indexed, it wasn't really rewarded and it didn't have
context. I have these things on my CV but nobody has no idea what to do with
it. Nobody knows about these conference papers. And nobody knows about my
posters, demos, my data sets, my passion is who wants to publish data, so I
list them on my CV, but nobody knows what to think about that. Nobody knows who
uses that, nobody knows how it relates to what other people have done. I have
done tons of presentations. How many people have been at those things? And then
way down here, I blog. But I spend a lot of time blogging but you would not
know that.

Your total impact.. we just pushed on heroku.. our main url doesn't redirect
here.

http://total-impact-webapp.herokuapp.com/

Major proceedings article, my journal article including my PLOS Ones, I pushed
on github, the total impact impact source code.

http://guthub.com/jasonpriem/total-impact

I put it on the Total Impact page, and here, is the total impact result. So,
here are articles and their metrics. These are from PLOS, from the HTML views,
full text views, the number of times on Mendeley that have read it, tweeted it
on twitter, on facebook likes, the number of times that my data sets have been
viewed on the data repostiory, or bookmarked on Mendeley, or the people on
github watching the repository, how many people have tweeted about it.. slides,
how many people have viewed the slides, slideshared it, delicioused it,
webpages.. we need to do add more about webpages. We need more webdata about
that. How many people have read it, or how many people have read that article
from Chronicle of Higher Education. You can cclick on these and drill down on
who is tweeting and what is twitting.

Total impact.. we want to build a web app, it has to be something... so this is
what it is when embedded, you pull in our javascript and the DOI will
automatically go in and display that information. We told them that it was
early coded, use at your own risk.. but they embedded it anyway alreay on Peer
Evaluation. We've been unded by the Open Society Foundation. We started it at a
hackathon, and then 6, then 3, and Jason and I decided that we really want to
run with it. We have Sloan Foundation Grant for 130k to work on this for a
year. You can follow us on our blog, and we will show a live demo since it was
working on Bryan's computer.

The Sloan Foundation.. where is this money coming from? What are the business
models? Where is this coming from? We had to take some .. why do we need it?
This metric infrastructure is exactly that. This is next generation science.
Once you have all these metrics, they are putting these on their resumes. It's
so much mroe than that. What is peer review? Itk t's the assessment of people
in their social circle. If I tell you when you wake up in the morning, the one
article for today, the 20 people that you follow the most on twitter, because
they have been bookmarking it and such, why would you ever open up a journal
again? You have up to date advice, wso between the recommendation system and
metrics, I think this is what science needs to go on. it needs to be open,
anyone should look at it and mine it, so we're on a social entrepreneurship
model. We have foundation funds and we want to keep working on that and ways to
make it more sustainable. We want metrics inrastructures to be available for
everyone else, they can add their own secret sauce or open sauce which we would
like even more. The metrics are out there adn you can build open science on
this.

Is this a cloud for science, is this a cloud for scientist, or it.. is that a
bad thing? I think, cloud for science.. we've been throwing it aruond? I think
there's a lot of terrible things about Klout.. we want to get away from this,
we want to be open infrastructure that other people can build rating systems on
top. Google was able to build, in many ways, the web we had today, they could
crawl stuff and make, we want to be the open web that people can do on top of
us. We will probably build some metrics because it's fun.

So you have this metrics base.. a thing that is how important things is, a
recommendation system which may or may not leverage this.. so I just looked it
up, and I see that the research around whether there is a relationship between
tweets and citations seems to be controversial. What is your understanding of
the state of that relationship? How do you think this relates to
recommendations?

The research is in its infimacy. So if you look at citation research.. it took
him 10 years to convince anyone to fund him to create a citation index, and
then another 10 years to convince anyone that it was useful. So he was able to
predict nobel prize winners after a while, and people thought that citations
tracked impact. I think tracking mendeley, twitter, and these correlations and
different samples.. it's a big field, I thik it's evidence o correlation, but
over the long-term, the bigger win is not the orrelation of citations, it's to
provide a broader picture of impact than just citations. Citations are the
intentional tracks.. you say, this is what I said influenced me.. but what
actually influenced you is the informal communication. Tracking that inormal
communication will be much deeper and broader.

The story is easier if correlates perfectly with citations, and it's better if
they don't because they measure a different thing. THe correlation isn't
perfect, ad that allows us to identify different types of scholarly impact. I
want to take a github id? Ok, kanzure. Slideshare? williamgunn.

Here are some random PLOS One IDs... let's see here. There's a bug here... it
takes a while.. it's going to these providers to get things.. it's going to top
secret twitters, it's going to plos and pulling these in in real time. Oh look,
this software has been tweeted and facebooked.

What about the invisible elephant room - the implicit basis of the myers brigg
personality type, not everyone is into twitter and facebook? Our community is
very much involved, I think we're in the tip of the iceberg.. I think there's
lots of relevant work that isn't entering this, I think this is a big
motivation for starting metrics in the first place... the problem you just
described is this current problem with current citations...

The amount of money that NSF puts into identify research is vast. We want
diversity to be rewarded. plumbanalytics are doing this. They are targetting
press release officers. So, about, perhaps not everyone is going out to write a
blog and tweet. If people see your paper, other people tweet about it, that's
where the metrics come from.. it's not about whehter you're blogging, it's
about how much impact your papers are having on the community.. a lot of
academics don't use the internet. The other people will do the spreading for
you.

If you use the internet not to push, but to bookmark on delicious or citeulike,
or you just use it because you want those lists on those two computers? Then
that data is also for us to include, same with download stats and other things.
Some people mentioned the idea of one journal to rule them all. Instead of
balkinizing instead this huge multi-silo thing, we don't want to trust the
entirety of scholarly communication to one super organization.. we don't want
one giant organization, we want hundreds of thousands, and then aggregate them.
I don't want a company to do the web, I want a set of standards to build things
on top of. Let's listen to the conversation where it is. I think the idea of
listening. I don't think you just bookmark something.. you bookmark it, then
Mendeley takes care of the rest.. instead of people to abstract articles, let's
listen to what's out there. NSF has no secret that the early work was on
citation graphs.. listen to what people are already talkign about, don't ask
them.

It seems that there's a very big difference between the closed walled garden
universe that we started with with the journals, and the open architecture that
you're suggesting, aggregating a lot of references and elements. You are
implying a data-driven process, much like the way Google does analysis of
things. How far are you planning on taking the data driven, could you talk to
us.. like are you looking at things inside papers, common phrases, referenced
items, how do you introduce new references? Like say only these three phrases
in papers, so this inspired me to do X, which now becomes very popular, and
maybe people want to follow that trace that back.

I don't think I can address all of that, but the text mining. I am very
interested in text mining, or texts around citations or reasons for attribution
is important. Part of my interest is that if something is cited, is it in the
context of data reuse, is it because someone reused this data. But you might
want to use lots of other attributes about attributions as well, the problem
has really been lack of access to articles for doing text mining. We are busy
trying to push lots of licensing and rights and laws and things to help with
that. I think there will be tools that benefit from that. It's certainly
something we're interested in.

The concept of the semantic web and microformats might allow something along
those lines. Science is not just references per se, because references are too
dilute.. they don't call specific information really.. how do you call out
information to be able to add it into the master view of impact that you are
trying to do?

So if someone makes an API we'd love to consume it.