Heather Piwowar Jason Priem
We are with Total Impact. I am really gratified with the stuff that wa stalked about earlier. We want to bring together these topics that we are talking about. I think this is a revolutionary thing. The first scientific revolution was the publication of the first journal, the Philosophical Transactions of the Royal Society. They were using the best available technology to disseminate updates. Instead of sending letters all over the place, let's have a standard way to produce these in an industrial scale way.
I think the second revolution of science will not be hegemoizing but hetergonizing, it will be diversity. We have the ability to apply technology to these problems. Traditionally we have articles or stories about what you've found. There are conversations about science, analysis, data, we haven't had the means to publish this because of this homongeous system. Eventually, science will publish all these aspects- analysis, data, if you do it in R you could pubslih your code and make it minable. Maybe I will publish blog posts instead of articles.
Even in the Republic of Letters, it was all about discussing with your colleagues.
How does this new technology enable diverse ecosystems? Well, Mendeley. We've talked about that. Another example is twitter. Some of my research, I'm a PhD student, I want to see how scientists are using twitter. There are al ot of scientific citations on twitter. Don't quote me on the number, I will deny it. The number is impressive. This is the same conversation from faculty lounges, but now it's leaving a trace. One of the participants in my study is that twitter is like a jury that pre-selects what they look like. We talk a lot about the filtering problem, and a lot of filtering is happening in this conversation.
The diverse scientific communications ecosystem- we only look at academics that cite things- but you have impact on patients and lots of other people. Another kind of resource.. we only look at one kind of use, there are a lot ofo ther things we might use research or. In this new world of scholarly diversity, we need something to mine impact on this next one. Bibliometrics, impact factor from the first one, we want to look at the impact on the enxt one, and we want to see... what Richard introduced is looking at these indicators that we weren't able to see before, looking at all these indicators, and trying to get a broader or more timely picture of the impact of research. This is part of a larger thing, like stock prices, elections, it's an area in SV, because of Gradient6, dataset, postrank are being acquired by big companies, big valuations, it's even part of our culture.. money ball, awesome baseball team based on stats analysis. I think we are at this moment where the tech is coming together to let big data analysis for scholarly work.
So, again, a couple of people have alluded to Total Metric.. there has been a growing set of academic research on All metrics. There is an article both academia and popular press... who thinks that these metrics are valuable? Funders are very interested in this. A foundder found one tweet from a nurse that liked a study they funded, they were desperate to see this. Publishers are very interested in this. We have interest from librarians that want to sit down with their faculty to build impact profiles. Repositories also, who aren't well served by normal citations.
Heather will now show you some working software. Um, okay.
http://total-impact-webapp.something...
http://researchremix.org/wordpress/wp-content/uploads...
Our web app and API scratches my itch as a researcher and I don't think I'm alone about scratching these itches. This is my old CV. Papers! I submitted a paper to PLOS Biology in 2007, and they didn't accept, so they wanted to submit to PLOS One. So I published it there and I loved it. But nobody in my field had heard of it, it wasn't indexed, it wasn't really rewarded and it didn't have context. I have these things on my CV but nobody has no idea what to do with it. Nobody knows about these conference papers. And nobody knows about my posters, demos, my data sets, my passion is who wants to publish data, so I list them on my CV, but nobody knows what to think about that. Nobody knows who uses that, nobody knows how it relates to what other people have done. I have done tons of presentations. How many people have been at those things? And then way down here, I blog. But I spend a lot of time blogging but you would not know that.
Your total impact.. we just pushed on heroku.. our main url doesn't redirect here.
http://total-impact-webapp.herokuapp.com/
Major proceedings article, my journal article including my PLOS Ones, I pushed on github, the total impact impact source code.
http://guthub.com/jasonpriem/total-impact
I put it on the Total Impact page, and here, is the total impact result. So, here are articles and their metrics. These are from PLOS, from the HTML views, full text views, the number of times on Mendeley that have read it, tweeted it on twitter, on facebook likes, the number of times that my data sets have been viewed on the data repostiory, or bookmarked on Mendeley, or the people on github watching the repository, how many people have tweeted about it.. slides, how many people have viewed the slides, slideshared it, delicioused it, webpages.. we need to do add more about webpages. We need more webdata about that. How many people have read it, or how many people have read that article from Chronicle of Higher Education. You can cclick on these and drill down on who is tweeting and what is twitting.
Total impact.. we want to build a web app, it has to be something... so this is what it is when embedded, you pull in our javascript and the DOI will automatically go in and display that information. We told them that it was early coded, use at your own risk.. but they embedded it anyway alreay on Peer Evaluation. We've been unded by the Open Society Foundation. We started it at a hackathon, and then 6, then 3, and Jason and I decided that we really want to run with it. We have Sloan Foundation Grant for 130k to work on this for a year. You can follow us on our blog, and we will show a live demo since it was working on Bryan's computer.
The Sloan Foundation.. where is this money coming from? What are the business models? Where is this coming from? We had to take some .. why do we need it? This metric infrastructure is exactly that. This is next generation science. Once you have all these metrics, they are putting these on their resumes. It's so much mroe than that. What is peer review? Itk t's the assessment of people in their social circle. If I tell you when you wake up in the morning, the one article for today, the 20 people that you follow the most on twitter, because they have been bookmarking it and such, why would you ever open up a journal again? You have up to date advice, wso between the recommendation system and metrics, I think this is what science needs to go on. it needs to be open, anyone should look at it and mine it, so we're on a social entrepreneurship model. We have foundation funds and we want to keep working on that and ways to make it more sustainable. We want metrics inrastructures to be available for everyone else, they can add their own secret sauce or open sauce which we would like even more. The metrics are out there adn you can build open science on this.
Is this a cloud for science, is this a cloud for scientist, or it.. is that a bad thing? I think, cloud for science.. we've been throwing it aruond? I think there's a lot of terrible things about Klout.. we want to get away from this, we want to be open infrastructure that other people can build rating systems on top. Google was able to build, in many ways, the web we had today, they could crawl stuff and make, we want to be the open web that people can do on top of us. We will probably build some metrics because it's fun.
So you have this metrics base.. a thing that is how important things is, a recommendation system which may or may not leverage this.. so I just looked it up, and I see that the research around whether there is a relationship between tweets and citations seems to be controversial. What is your understanding of the state of that relationship? How do you think this relates to recommendations?
The research is in its infimacy. So if you look at citation research.. it took him 10 years to convince anyone to fund him to create a citation index, and then another 10 years to convince anyone that it was useful. So he was able to predict nobel prize winners after a while, and people thought that citations tracked impact. I think tracking mendeley, twitter, and these correlations and different samples.. it's a big field, I thik it's evidence o correlation, but over the long-term, the bigger win is not the orrelation of citations, it's to provide a broader picture of impact than just citations. Citations are the intentional tracks.. you say, this is what I said influenced me.. but what actually influenced you is the informal communication. Tracking that inormal communication will be much deeper and broader.
The story is easier if correlates perfectly with citations, and it's better if they don't because they measure a different thing. THe correlation isn't perfect, ad that allows us to identify different types of scholarly impact. I want to take a github id? Ok, kanzure. Slideshare? williamgunn.
Here are some random PLOS One IDs... let's see here. There's a bug here... it takes a while.. it's going to these providers to get things.. it's going to top secret twitters, it's going to plos and pulling these in in real time. Oh look, this software has been tweeted and facebooked.
What about the invisible elephant room - the implicit basis of the myers brigg personality type, not everyone is into twitter and facebook? Our community is very much involved, I think we're in the tip of the iceberg.. I think there's lots of relevant work that isn't entering this, I think this is a big motivation for starting metrics in the first place... the problem you just described is this current problem with current citations...
The amount of money that NSF puts into identify research is vast. We want diversity to be rewarded. plumbanalytics are doing this. They are targetting press release officers. So, about, perhaps not everyone is going out to write a blog and tweet. If people see your paper, other people tweet about it, that's where the metrics come from.. it's not about whehter you're blogging, it's about how much impact your papers are having on the community.. a lot of academics don't use the internet. The other people will do the spreading for you.
If you use the internet not to push, but to bookmark on delicious or citeulike, or you just use it because you want those lists on those two computers? Then that data is also for us to include, same with download stats and other things. Some people mentioned the idea of one journal to rule them all. Instead of balkinizing instead this huge multi-silo thing, we don't want to trust the entirety of scholarly communication to one super organization.. we don't want one giant organization, we want hundreds of thousands, and then aggregate them. I don't want a company to do the web, I want a set of standards to build things on top of. Let's listen to the conversation where it is. I think the idea of listening. I don't think you just bookmark something.. you bookmark it, then Mendeley takes care of the rest.. instead of people to abstract articles, let's listen to what's out there. NSF has no secret that the early work was on citation graphs.. listen to what people are already talkign about, don't ask them.
It seems that there's a very big difference between the closed walled garden universe that we started with with the journals, and the open architecture that you're suggesting, aggregating a lot of references and elements. You are implying a data-driven process, much like the way Google does analysis of things. How far are you planning on taking the data driven, could you talk to us.. like are you looking at things inside papers, common phrases, referenced items, how do you introduce new references? Like say only these three phrases in papers, so this inspired me to do X, which now becomes very popular, and maybe people want to follow that trace that back.
I don't think I can address all of that, but the text mining. I am very interested in text mining, or texts around citations or reasons for attribution is important. Part of my interest is that if something is cited, is it in the context of data reuse, is it because someone reused this data. But you might want to use lots of other attributes about attributions as well, the problem has really been lack of access to articles for doing text mining. We are busy trying to push lots of licensing and rights and laws and things to help with that. I think there will be tools that benefit from that. It's certainly something we're interested in.
The concept of the semantic web and microformats might allow something along those lines. Science is not just references per se, because references are too dilute.. they don't call specific information really.. how do you call out information to be able to add it into the master view of impact that you are trying to do?
So if someone makes an API we'd love to consume it.