Lessons learned: from rEcoli to 1 million genomes

George Church

https://www.youtube.com/watch?v=6nirrqtUSUk&list=PLHpV_30XFQ8R2Kpcc1pwwXsFnJOCQVndt&index=2&t39m50s

So this is my conflict of interest slide. And the question I'm asking is why make millions or billions or trillions of synthetic genomes? I think that's why we want to bring down the cost. This is our progress curve for reading and writing. We are now at the point where we can make synthetic oligonucleotides for the entire human genome, and in fact I think this has already been done for various reasons, like SNP analysis, expression analysis, and so on. Assembly is another matter, which is this even lower curve here. There's a gap between the amount we can sequence and synthesize, and then an even larger gap between what we can write and what we can assemble. And the more radical the changes we make in the genome, the bigger the gap gets, which you will hear from the yeast and ecoli projects today.

Making billions of genomes combinatorially is routine. It was fairly routine in 2009 when Harris Wang et al 2009 did this experiment where we had 4.3 billion combinatorial genome variants per day. They weren't just random, they were highly designed to optimize the ribosome binding sites in a pathway. If we go forward to today, we have 8 major tools for genome engineering and editing, including group 2 introns, cas9, meganuclease, recombinase, ZnF, TALE, recA CAGE, and lambda bet/exo MAGE. They either use DNA, RNA or protein as the matter that goes and finds the needle in the haystick, finds that 1 bp out of 6 billion and edits it. We have crystal structures for most of these, but not for lambda beta to my dismay. We have a ways to go before all of them are as easy to manipulate.

This project that will be presented in a moment started where we changed one codon genome-wide in ecoli and we achieved in that project very efficient use of non-standard amino acids, genetic isolation and containment, and multi-virus resistance, which are 100,000x down in this strain. This was one codon only.

That is genome-wide. That happened to be a rare codon. As you go to more codons, you need to go genome wide. As we get to higher larger eukaryotic genomes, like mouse, pig, human and so forth, the exons that compose the protein coding regions become about 1% of the genome. It might be more efficient to engineer 100 kb swap-ins that Jef described... but these are examples of coding region level changes where you might want to change the species or the virus resistance via recoding.

But we've heard how, for the most part, both yeast and soon about ecoli, that we have left the non-coding regions in tact because we don't understand them well enough to engineer them. This will have to change. Even things that seem to be junk DNA, like ALU repeats, have samples where single point mutations can change transcription activity and be associated with leukemia for example.

Editing of repetitive elements: ERVs, LINEs, SINEs, centromeres, telomeres, etc. Simultaneous mutations to test chromosome folding rules and readout. Heterozygous mutations in cis-regulatory elements genome wide.

... but every base pair. Because there are chromosome folding rules that we know nothing about. We can multiplex and put together a large number of changes into a single genome where if you make a heterozygous for every cis-regulatory mutation and you can measure allele-specific expression, you can do 100's of thousands of different promoters scattered throughout the animal genome. We'd like to do them all at once.

So the encyclopedia of DNA elements is based on observations of epigenetic changes, RNA changes, and chromosome changes. There is transcription throughout the entire genome. There are binding proteins everywhere. This is a wonderful project, but just because something is expressed at the RNA or protein level, well we need the organoids to assess the changes to every base pair and try to do as many base pairs at once.

We can make ancestral genomes. Not just by reading ancient genomes, but by constructing phylogenetic trees.

Q: ...

A: So the idea is that, there's some investment you put into analyzing if you wanted to do, if you want to treat the entire genome as a cis-regulatory element, even the coding regions in some organisms are a cis-regulatory element. Let's say 3 billion points you want to change, and there's an overhead of analyzing that, you want to change 1 bp, and you want to ask whether it changes the brain or liver or something. So maybe do a serial section through a mouse embryo and see what's different. If you did an allele-specific RNA expression, or any allele-specific epigenetic expression, and you made a mutation, you can have a beautiful informal control which is the other chromosome the diploid. To some extent the cis-regulatory element, if they are subtle, they are independent... they are not necessarily influencing heterzygous state in gene B, so you could change them simultaneously and then read them out in parallel in every cell of the body.

Q: ...

A: If you greatly ooverexpress something haplo-insufficient, but if you do a bunch of these, there's going to be a billion of these, in many combinations...

Q: ...

Q: How much epigenetic modification you want to do? TET1 enzymes and dCas9 and targeted epigenetic modifications.

A: You need to control the epigenetics, that's very important yes. Also, so that you can test that you understand the impact of mutations and epigenetic changes. You can systematically go through epigenetic factors with organoids. We now have a complete collection of human transcription factors and now we can go forward in systematic manner. We can come up with high-throughput ways of testing developmental biology, and maybe we can do it in a few microns rather than the expense of having a mouse facility or bowhead whale facility.

Okay, thank you very much.