summaryrefslogtreecommitdiff
path: root/transcripts/open-science-summit-2010/jonathan-izant-sage-bionetworks.mdwn
blob: 1381196a43fe9104a291a64a976387fbdb9567f3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
Why we do it

Jonathan Izant

Sage Bionetworks

www.sagebase.org

The earlier speakers have given context to the title. Why do we do stuff at
SAGE? We do it to try to cure diseases. Within one of the comments- once we
get all this data, how do we figure out how to use it? Quick point. I noticed
on the schedule that Sage was down on all capital cases, it should be Sage,
and that should be used, it's not an acronym. Serial analysis of gene
expression. We actually asked him if he minded if we are called Sage. We are
trying to avoid being confused with Survivors against global exploitation,
which is a group trying to protect people from slave trade. With that, let's
jump right in with what I want to talk about. In terms of contacts, why.

A lot of it has been touched on by the things this morning. In many of the big
issues that we face developing truly effective advanced drugs, many of us are
in an incredibly chronic state of denial. Specifically, often in this point,
in this young stage of genomics information, we don't know how to use genomic
information in its current form, and the huge amounts that is going to be
coming, to make good decision about clinical care, not to mention mechanisms
of disease and drug action. We heard this morning about pharma and biotech-
the drug development pipeline system process is very inefficient and broken.

Standards of care- we heard some interesting things earlier, about people
refusing the first choice care options. Why would they do this? If you look at
cancer, there are many forms of cancer, for whom patients find, for as many as
60%, the recommended standard of care first line traetment gives them no
benefit. There's this majority of patients who have been getting treatments
that have had various side effects and caustics. We need to fix how we use
information about clinical care as well.

The paradoxical thing is often as we look for people to believe, we find
ourselves looking in a mirror, we think it's academics and academic
enterprises where we find blocks to really doing things. I want to start with
genetics timeline and some of the traditions that have made things maybe more
difficult than they might be. Across the top we have some of the heros. The
people on the right, that's Alaz Wall Avery, the substance of DNA really does
have heritable properties. Importantly on the bottom is the icons- Peas from
Gregor Mendel, a bacteria phage, and gene chips where we get massive amounts
of information. Many of us have to include myself grew up in - how genes give
you phenotypes and how they give you disease. Starting with mendel, it was one
mutation, oen change in the phenotype, it was a lot of great work done by
Mendel, there were separate pathways at the start of biochemistryt netwroks.
Everything linear, everything in a nice siloed fashion, this meant as we
looked through gene regulation a number of years ago, people thought these
simple linear pathways, a huge amount of work on identifying these steps. One
of these confusings things was these meetings where it would be on self-
proliferation, and speaker after speaker would show the same slide, but with
different genes in each of those boxes. And they can't all be regulating, but
each of these things, because of the nature of the tools, was being looked at,
in these single gene pathways types of fashion. Well, gradually, people began
to look at how networks of genes worked, and this was how the WIND pathways
worked. You could see interaction, molecular phenomena, genetic, and a few
pathways that talked with each other.

As the Genome Project and some of the tech for doing DNA variation and
expression variation proceeded, you can develop complex systems where you can
find a whole number of different genes, but you have this confusing field, of
making to the development of therapeutics, where is my target, which one is
the best, and this is very much the genesis of what we've been trying to do,
when you think about what you can do with genetic information, there are
genome-wide association studies, with causal inference, but no sort of
mechanism. There are association studies, correlations tudies, looking at
patterns of gene expression off of gene chips, and thy are good, but they
don't show cause and effect. You have huge numbers of potential targets. The
important aspect of this is that biological networks are highly responsive and
flexible. When people started doing knock-out mutations, and there's lots of
redundancy, and even if you had a good drug, and you knocked down a regulator,
well something else came in to compensate. How do you find the weak-points and
sweet spots in the genetic networks to better target your drugs?

One of the things I am going to tell you about is some work that we've been
doing, ways doing complex analysis, DNA variation, RNA expression, protein
expression, along with phenotypic information whcih in a clinical setting
means clinical data, and integrating these in order to get some causal models.
All of this started with Steve Friend and Lee Heartwell and something called
the Seattle Project a number of years ago, to see if they can use large
amounts of genetic information, this was Rosetta Informatics, it was
eventually bought by Merck, and they hired Eric Shatt who came in and started
a 5 year program at Rosetta, very large resources, taking a wide variety of
astensibly unrelated data, integrating them with bayesian network technology,
in order to make predictive models of where in these gene networks one might
want to go in to perturb things to create predictable therapeutics. This went
on for a number of years, some of the tech and expertise, creating bayesian
networks, doing small perturbations of everything in your whole netwrok,
recomputing things, it's very computationally intensive, it's one of the
reasons why these things are expensive, but when they went back to validate
it, and the hypothesis is great, and unless yuo can validate it in animal
models or drugs under development, it's no good. Then we looked at
opportunities to validate, we can publish in journals, btoh for specific
diseases, metabolic disease, general methods, they were able to find that my
goodness maybe this stuff really works. Maybe we can use computational biology
and network theory in order to make some detailed priority list of which genes
are important.

Diseases such as cancer are highly diverse diseases, they are diverse between
and within patients. This gives us an opportunity to do sophisticated
individual medicine. We had soem of these stunning technologies- one of our
technical words- HEAPS OF DATA- we were looking for non-redundant genes that
would be better targets for biomarkers, therapeutic intervention. And we're
developing as well as others, handling thois massive amounts of information.
This is the essence of sage. The mission is to try and create, a publicly
accessible database repository of these bionetworks where people can come in
and people are obliged to elave data, leave data, leave the database richer by
adding in stuff in, you have to combine databases , it makes a real difference
to have more. Better than they left it.

We're new. We're a new NPO. We've been going for over a year. Some of the
noticable activity is over the last year across the top. We lease offices from
the Cancer Research Center. We've been forutnate to receive some rather
inspired catalytic philanthropic funding and we've been forutnate with large
traditional grants and we've had to go through the process of becoming a new
insittution in NIH land, and that's a joy in itself. A year ago, I had a full
head of dark brown hair, and now I don't. We're making good progress.

We are small and we plan to stay small. We see ourselves working through
partners. Some of our partners are commercial groups. Many of the commercial
gruops are some of our best partners as far as open access and sharing. Many
of you are aware at Merck and Lilly and others overseas is that a lot of early
stage resource open up stuff, because they have come to realize that it's very
difficult if not impossible to put real monetary value on early stage
resuorce. We've workede with academic collaborators, the National Cancer
Institute, giving us funds to set up a new systems biology for cancer
institute. The commons, this core feature, there's a bit of a description on
the website, I hate to say go see the website, but let's just say that this is
an emerging developing resource where anyone can come and get highly annotated
globally coherent data. That's the key- globally coherent. It means data for
which you have multiple layers of information- clinical, DNA, you have it
correlated, DNA variation, so that you can get the full richness of the
biological networks. We have been stratified. Some datasets are completely
available, some of you are doing that in the back, you're not just checking
facebook. Annotation or things requiring release, IRB and other sorts of
challenges, we are spending and will, there's a large set of datasets that are
in progress that would be great additions to this repository. We have some
tools, other groups have some great tools as well. We have some that we
developed, they are compatible with the standards we've developed.

There were some challenges from the Congress meeting in April with 220 people.
They were scientists, IT experts, networking experts, funders, policy experts
and publsihers. Out of it came a number of work groups, none of them I should
say, are shared by people with... so this is getting to our external
partnerships and it's acting as a catalytic enabler, standards, tools to use
these databases, citation, no surprise to anyone, citation is one of the most
emotive issues we had at our Congress as well, and that's the heart of
academics. There were internationalization issues. Public engagement -how do
we engage and inform the public, the progressive patient advocacy groups. What
are the barriers?

We decided to get empirical, not just vague. We sent out a questionaire, and
we got a lot of comments, these are verbatim including the mispellings.
Visualization tools. Lots of barriers. One of the things that stuck out is,
well, even in today's age where you have these policies that everyone submtis
an NIH grant has to put in about resource sharing, how easy is it to get data
and how easy is it to get data you can use? It turns uot it's still pretty
hard. Over 80% of the people felt that most of the data felt it was held in
the laboratories that it was generated. That doesn't eman it's not accessible.
What percentage of clinical and genomic data is accessible, that's in the
blue. What percent is accessible in a form that yuo can use today, to build
networks, or use with your data, most people found and said that most data was
not available to them.

Well, why? We did another study, Alex Pecko sent out a question to everyone
who went to the congress. We asked them to mention every professional link,
would you meet them, have you, would you not care to meet them, we put this
into a database, then used some software commonly used to look at genetic
databases, can we model individual researchers as genes, and where the
regulatory nodes are, how well our people are interacting, etc.? Well, thsi is
not good social science research, but it was fun and informative. Circles show
how many links, the biggest circles were the people who organized the meeting.
As you start to tease this appart, if you teased it apart into different types
of people at the conference, soem group such as the netwrok thinkers were
highly interactive. Some people-l ike scientists, were not really interactive.
IT people were interested in talking to public policy, funders and so on, who
weren't that interested in talking to them.

Thinking about the science- how do we get there? Biomedical science is a
cottage industry. People had their individual age in their individual
apartment or village, townships and universities or something else like that.
The motivations and the scorekeeping and the success metrics were around how
that one little cottage succeeded or failed the grand.. and to a large extent,
that kind of tradition has stayed into this very complex ecosystem in many
university settings, and it's what complicates tech transfer, promotion,
hiring and sorts that we have. I am not going to go through this in detail,
but yes it's tough and this is where some of the resistance is. Faculty,
tenrure, etc. Between labs, people do great.

one of the people that spoke, Josh Summer, ahd a great way to talk about, if
only we could transfer between labs as rapid as it is within a lab, we would
be miles away. He published a wonderful piece about how this is costing lives
in Nature Biotech.

one of the things that isn't there anymore is holding on to individual genes
any more. This is a piece of pseudodata, one of those inverse Mooore's laws,
and it looks at the big pay offs oer time, like growth hormone, insulin, but
then gradually as we go into the century, individual genes, you couldn't get a
price for them. Some of them are deals because I negotiated them back in af
ormer lifetime where I was doing tech transfer for a large research institute
and doing startups and stuff. The price per gene is insignificant. It's
patterns of gene expression, and complex patterns. What are the incentive
issues? We look back at the people at the congress. They had a number of
different thoughts about high priority incentive issues. We started talking
with like-minded lab, and we started an experiment called The Federation. We
can't ask others to enter this world unless we do it too, we have four other
labs, where we are going to try on the genomic database activities to fuse
them so that your data is someone else's data, fusing their access to tools,
and whether we acn get a consortium, whether yuo're an academic collaborator,
or whether you're an industrial collaborator, where you can't see the
difference between those. We're just setting up now, but we feel that we can
make this experiment work, and what we can learn from this we can better apply
with others.

We're focused on improving treatments for disease, it's not just an open
access projects. We're working catalytically with a number of other
partnerships, and a number of other things that evil forces have been
described in the congress, are some of our biggest challenges, people see that
there's motivation to get on baord because if they don't they are going to
loose out on opportunities down the line. Thank you very much.