Why we do it

Jonathan Izant

Sage Bionetworks

www.sagebase.org

The earlier speakers have given context to the title. Why do we do stuff at SAGE? We do it to try to cure diseases. Within one of the comments- once we get all this data, how do we figure out how to use it? Quick point. I noticed on the schedule that Sage was down on all capital cases, it should be Sage, and that should be used, it's not an acronym. Serial analysis of gene expression. We actually asked him if he minded if we are called Sage. We are trying to avoid being confused with Survivors against global exploitation, which is a group trying to protect people from slave trade. With that, let's jump right in with what I want to talk about. In terms of contacts, why.

A lot of it has been touched on by the things this morning. In many of the big issues that we face developing truly effective advanced drugs, many of us are in an incredibly chronic state of denial. Specifically, often in this point, in this young stage of genomics information, we don't know how to use genomic information in its current form, and the huge amounts that is going to be coming, to make good decision about clinical care, not to mention mechanisms of disease and drug action. We heard this morning about pharma and biotech- the drug development pipeline system process is very inefficient and broken.

Standards of care- we heard some interesting things earlier, about people refusing the first choice care options. Why would they do this? If you look at cancer, there are many forms of cancer, for whom patients find, for as many as 60%, the recommended standard of care first line traetment gives them no benefit. There's this majority of patients who have been getting treatments that have had various side effects and caustics. We need to fix how we use information about clinical care as well.

The paradoxical thing is often as we look for people to believe, we find ourselves looking in a mirror, we think it's academics and academic enterprises where we find blocks to really doing things. I want to start with genetics timeline and some of the traditions that have made things maybe more difficult than they might be. Across the top we have some of the heros. The people on the right, that's Alaz Wall Avery, the substance of DNA really does have heritable properties. Importantly on the bottom is the icons- Peas from Gregor Mendel, a bacteria phage, and gene chips where we get massive amounts of information. Many of us have to include myself grew up in - how genes give you phenotypes and how they give you disease. Starting with mendel, it was one mutation, oen change in the phenotype, it was a lot of great work done by Mendel, there were separate pathways at the start of biochemistryt netwroks. Everything linear, everything in a nice siloed fashion, this meant as we looked through gene regulation a number of years ago, people thought these simple linear pathways, a huge amount of work on identifying these steps. One of these confusings things was these meetings where it would be on self- proliferation, and speaker after speaker would show the same slide, but with different genes in each of those boxes. And they can't all be regulating, but each of these things, because of the nature of the tools, was being looked at, in these single gene pathways types of fashion. Well, gradually, people began to look at how networks of genes worked, and this was how the WIND pathways worked. You could see interaction, molecular phenomena, genetic, and a few pathways that talked with each other.

As the Genome Project and some of the tech for doing DNA variation and expression variation proceeded, you can develop complex systems where you can find a whole number of different genes, but you have this confusing field, of making to the development of therapeutics, where is my target, which one is the best, and this is very much the genesis of what we've been trying to do, when you think about what you can do with genetic information, there are genome-wide association studies, with causal inference, but no sort of mechanism. There are association studies, correlations tudies, looking at patterns of gene expression off of gene chips, and thy are good, but they don't show cause and effect. You have huge numbers of potential targets. The important aspect of this is that biological networks are highly responsive and flexible. When people started doing knock-out mutations, and there's lots of redundancy, and even if you had a good drug, and you knocked down a regulator, well something else came in to compensate. How do you find the weak-points and sweet spots in the genetic networks to better target your drugs?

One of the things I am going to tell you about is some work that we've been doing, ways doing complex analysis, DNA variation, RNA expression, protein expression, along with phenotypic information whcih in a clinical setting means clinical data, and integrating these in order to get some causal models. All of this started with Steve Friend and Lee Heartwell and something called the Seattle Project a number of years ago, to see if they can use large amounts of genetic information, this was Rosetta Informatics, it was eventually bought by Merck, and they hired Eric Shatt who came in and started a 5 year program at Rosetta, very large resources, taking a wide variety of astensibly unrelated data, integrating them with bayesian network technology, in order to make predictive models of where in these gene networks one might want to go in to perturb things to create predictable therapeutics. This went on for a number of years, some of the tech and expertise, creating bayesian networks, doing small perturbations of everything in your whole netwrok, recomputing things, it's very computationally intensive, it's one of the reasons why these things are expensive, but when they went back to validate it, and the hypothesis is great, and unless yuo can validate it in animal models or drugs under development, it's no good. Then we looked at opportunities to validate, we can publish in journals, btoh for specific diseases, metabolic disease, general methods, they were able to find that my goodness maybe this stuff really works. Maybe we can use computational biology and network theory in order to make some detailed priority list of which genes are important.

Diseases such as cancer are highly diverse diseases, they are diverse between and within patients. This gives us an opportunity to do sophisticated individual medicine. We had soem of these stunning technologies- one of our technical words- HEAPS OF DATA- we were looking for non-redundant genes that would be better targets for biomarkers, therapeutic intervention. And we're developing as well as others, handling thois massive amounts of information. This is the essence of sage. The mission is to try and create, a publicly accessible database repository of these bionetworks where people can come in and people are obliged to elave data, leave data, leave the database richer by adding in stuff in, you have to combine databases , it makes a real difference to have more. Better than they left it.

We're new. We're a new NPO. We've been going for over a year. Some of the noticable activity is over the last year across the top. We lease offices from the Cancer Research Center. We've been forutnate to receive some rather inspired catalytic philanthropic funding and we've been forutnate with large traditional grants and we've had to go through the process of becoming a new insittution in NIH land, and that's a joy in itself. A year ago, I had a full head of dark brown hair, and now I don't. We're making good progress.

We are small and we plan to stay small. We see ourselves working through partners. Some of our partners are commercial groups. Many of the commercial gruops are some of our best partners as far as open access and sharing. Many of you are aware at Merck and Lilly and others overseas is that a lot of early stage resource open up stuff, because they have come to realize that it's very difficult if not impossible to put real monetary value on early stage resuorce. We've workede with academic collaborators, the National Cancer Institute, giving us funds to set up a new systems biology for cancer institute. The commons, this core feature, there's a bit of a description on the website, I hate to say go see the website, but let's just say that this is an emerging developing resource where anyone can come and get highly annotated globally coherent data. That's the key- globally coherent. It means data for which you have multiple layers of information- clinical, DNA, you have it correlated, DNA variation, so that you can get the full richness of the biological networks. We have been stratified. Some datasets are completely available, some of you are doing that in the back, you're not just checking facebook. Annotation or things requiring release, IRB and other sorts of challenges, we are spending and will, there's a large set of datasets that are in progress that would be great additions to this repository. We have some tools, other groups have some great tools as well. We have some that we developed, they are compatible with the standards we've developed.

There were some challenges from the Congress meeting in April with 220 people. They were scientists, IT experts, networking experts, funders, policy experts and publsihers. Out of it came a number of work groups, none of them I should say, are shared by people with... so this is getting to our external partnerships and it's acting as a catalytic enabler, standards, tools to use these databases, citation, no surprise to anyone, citation is one of the most emotive issues we had at our Congress as well, and that's the heart of academics. There were internationalization issues. Public engagement -how do we engage and inform the public, the progressive patient advocacy groups. What are the barriers?

We decided to get empirical, not just vague. We sent out a questionaire, and we got a lot of comments, these are verbatim including the mispellings. Visualization tools. Lots of barriers. One of the things that stuck out is, well, even in today's age where you have these policies that everyone submtis an NIH grant has to put in about resource sharing, how easy is it to get data and how easy is it to get data you can use? It turns uot it's still pretty hard. Over 80% of the people felt that most of the data felt it was held in the laboratories that it was generated. That doesn't eman it's not accessible. What percentage of clinical and genomic data is accessible, that's in the blue. What percent is accessible in a form that yuo can use today, to build networks, or use with your data, most people found and said that most data was not available to them.

Well, why? We did another study, Alex Pecko sent out a question to everyone who went to the congress. We asked them to mention every professional link, would you meet them, have you, would you not care to meet them, we put this into a database, then used some software commonly used to look at genetic databases, can we model individual researchers as genes, and where the regulatory nodes are, how well our people are interacting, etc.? Well, thsi is not good social science research, but it was fun and informative. Circles show how many links, the biggest circles were the people who organized the meeting. As you start to tease this appart, if you teased it apart into different types of people at the conference, soem group such as the netwrok thinkers were highly interactive. Some people-l ike scientists, were not really interactive. IT people were interested in talking to public policy, funders and so on, who weren't that interested in talking to them.

Thinking about the science- how do we get there? Biomedical science is a cottage industry. People had their individual age in their individual apartment or village, townships and universities or something else like that. The motivations and the scorekeeping and the success metrics were around how that one little cottage succeeded or failed the grand.. and to a large extent, that kind of tradition has stayed into this very complex ecosystem in many university settings, and it's what complicates tech transfer, promotion, hiring and sorts that we have. I am not going to go through this in detail, but yes it's tough and this is where some of the resistance is. Faculty, tenrure, etc. Between labs, people do great.

one of the people that spoke, Josh Summer, ahd a great way to talk about, if only we could transfer between labs as rapid as it is within a lab, we would be miles away. He published a wonderful piece about how this is costing lives in Nature Biotech.

one of the things that isn't there anymore is holding on to individual genes any more. This is a piece of pseudodata, one of those inverse Mooore's laws, and it looks at the big pay offs oer time, like growth hormone, insulin, but then gradually as we go into the century, individual genes, you couldn't get a price for them. Some of them are deals because I negotiated them back in af ormer lifetime where I was doing tech transfer for a large research institute and doing startups and stuff. The price per gene is insignificant. It's patterns of gene expression, and complex patterns. What are the incentive issues? We look back at the people at the congress. They had a number of different thoughts about high priority incentive issues. We started talking with like-minded lab, and we started an experiment called The Federation. We can't ask others to enter this world unless we do it too, we have four other labs, where we are going to try on the genomic database activities to fuse them so that your data is someone else's data, fusing their access to tools, and whether we acn get a consortium, whether yuo're an academic collaborator, or whether you're an industrial collaborator, where you can't see the difference between those. We're just setting up now, but we feel that we can make this experiment work, and what we can learn from this we can better apply with others.

We're focused on improving treatments for disease, it's not just an open access projects. We're working catalytically with a number of other partnerships, and a number of other things that evil forces have been described in the congress, are some of our biggest challenges, people see that there's motivation to get on baord because if they don't they are going to loose out on opportunities down the line. Thank you very much.