Evolution of hgp-write and progress to date

Jef Boeke

George Church



What is gp-write?

I am going to start simple. We have a diverse crowd here. We have some scientists that want to know about DNA sequences; we have some reporters that might not want that kind of fine-grained discussion. So let's start big picture. As we heard from Nancy, it's a project to support technology development in genome writing. It could also be described as genome design and synthesis, and then an evaluation of the impact of genome design changes.

This uote here is paraphrasing what we wrote in our whitepaper. We want to reduce the cost of designing, synthesizing, assembling and testing genomes in cells by 1000 fold over the next 10 years. It's an aggressive goal. Based on what we saw with the human genome project, the reading project if you will, we think we can do this. We think that such technology improvements will really revolutionize ohw we as scientists learn about the world and then how we use that information to engineer.

GP-write vs GP-read

Nancy allude3d to the development of a roadmap for GP-write. I want to talk a little bit about that. GP-write vs GP-read first. So, GP-read is what we now refer to as the human genome project, the sequencing project of the human genome which was of course expanded to include many model organism genomes like the fruit fly and mouse etc. And so, GP-write is the next phase. It's interesting to contemplate the difference between reading and writing. You kind of know when you're done reading, you finished the book, maybe there are some complexities as to interpretation, but reading is really well suited to a purely scientific endeavor because it's really clear when you have finished decoding.

On the other hand, writing has an element of creativity to it, it has an artistic side if you will; when are you finished? You could write one book, or a thousand books. It's fundamentally different from reading, and this excites me and others.

GP-write vs HGP-write

What about gp-write versus HGP-write? Last year the meeting was titled hgp-write. As you will see in a moment, we have defined GP-write as the overarhing effort to develop genome writing technologies. We are all passionately interested in our genomes, and the notion that we could write a human genome is simultaneously thrilling to some and not so thrilling to others. And so we recognize that this is going to take a lot of discussion from a lot of stakeholders so we defined HGP-write as an initiative within GP-write that perhaps we will have to move more slowly on as we engage these stakeholders. This has dominated our thinking about the large-scale roadmap.

A very important component of this that has been lost in a lot of the writing is that HGP-write we will do in cells only, not producing organisms.


The first 5 years has a heavy technology development focus and to develop pilot projects that might have immediate benefit to society. You will hear more about that in a moment. During this time period, we want an intense debate about the hgp-write component and engage a diverse group of stakeholders.

As a genome designer, I am interested in identifying what are the interesting designs for hgp-write. Are there more worthy goals?

After that, we can pursue highly worthy genome projects while continuing to develop technologies. We need that price drop.

Name change

Why did we change the name? Well, we listened to a lot of you a year ago, and we thought that combining the two components was really helpful to the future of the project. Many journalists bring the same question back to me for hgp-write in particular, why are you doing this?

I'll offer two kinds of answers.

I want to know the rules that make a genome tick. I want to learn about it. This was paraphrasized from a quote allegedly found on Richard Feynman's blackboard, "What I cannot create I cannot understand" and this has become a kind of manifesto for our field. What we learn about this is the 3d structure of the synthetic chromosomes in the synthetic yeast cells; on the right here you see maps of chromosomes taking trajectories through the nucleus of the cell. This was done using a technique known as HiC. This was one such synthetic chromosome, the fifth chromosome, which you will hear later. This is the native version; and this is what happens when you make a synthetic version with thousands of changes; we removed repetitive DNA, all the tRNA genes, and yet there's only a minimal impact of the trajectory and the overall structure.

Secondly, we want to do good things- not bad things. Why? Well, I thought I would go biblical on you. There's the 10 plagues from Exodus- here they are, frogs, blight, water, etc. Today, we have environmental destruction, we have invasive species, we have emerging pathogens, we have food security problems, and we have climate change. All of these things have potential biological solutions that GP-write we think would be an integral part of.

History of writing

Mesopotamia, egypt 3000 BC. It goes way back. George will tell you that he has encoded an entire movie in DNA, every pixel. But actually on this when we talk about DNA writing in cells, we're kind of in this Gutenberg phase... we can write millions of letters of DNA, but that's about it. We're at the very beginning. In fact, the history of DNA writing in cells started around 1976, with transfer RNA gene synthesis. It has expanded extensively, there are viruses in 2002, Venter mycoplasma in 2010, and this year the synthetic yeast genome, and maybe humans in the near future.

DNA writing recent activity

Our Sc2.0 project- which you will hear a lot about- has generated a lot of excitement. We have participants here from Australia and China and many other places. We have new funding of various types which George will go into in more depth. This is an international project.

Dark matter project

We are launching a dark matter project. In case you don't know what that is, you can look it up on Amazon, Nessa Carey Junk DNA.

George Church

GP-write related funding

This will be publicly available at some point-- it's about $200M total from all of our institutions.

Exponentially improving DNA read-write

I think it's important that we interconnect reading, writing and testing. We have 3 million fold improvement in sequencing and a billion fold improvement in synthesis of nucleotides on chips. But we want to turn that into testing in cells and organisms. This particular example here, is ... ... this particular meta-project is on ultra-safe cells for manufacturing therapies; this could be any mammalian cells for producing pharmaceuticals and vaccines, probably also stem cells, transplants; we want radiation resistance, senescence resistance, we want to remove endogenous retroviruses which Yihan will talk about later; and virus resistance; germline negative but pluripotent for stem cells... we want fail-proof self-destruct which we have already demonstrated using non-standard amino acids, we want to do that in mammalian cells. Some of the people taking this on are postdocs and grad students, same thing for mammalian repeats.

Engineering mammalian repeats

We are engineering mammalian repeats...

  • rDNA repeats
  • SINEs (Alu)
  • LINEs
  • SSR (triplets)
  • Telomeres

There has never been a human or mammalian genome sequenced anywhere in the world; you heard it here if you haven't heard it before. We need many human genomes sequenced all the way through, we're still not there yet.

Non-standard amino acids

  • biocontainment
  • genetic isolation
  • metabolic isolation
  • multi-virus resistance-- resistant to all viruses, even some we have never seen before

GP-write and test: gene therapy

We need to know the deleterious effects, a gene therapy testline. We need to sequence everyone on the planet. We will need cells from the personal genome project.

I have already mentioned epigenome engineering; I just want to take a moment about this; in order to do read-write and test, the testing needs to have almost any kind of cell or organ in the human body without making a human being. It's hard to read-write a genome, and once you make it with writing you want to read it and check the results. It's critical.

VUS-pipeline: Full transcription factor library

This includes all human transcription factor genes, including multiple alternative spliceforms to determine causality there. We use it for making almost any cell type we set our mind to, I don't know any failure since we have done this. This will be distributed through Addgene, a non-profit great way to distribute plasmids and so on.

Cell types

We have developed neurons, musculature, endothelial blood vessels, glia, and so forth.

Patient mutants

We take a cell line from the personal genome project, which is freely available. It's a stem cell line that can be engineered with any of the GP-write tools to make a one base pair change. This is a clonal cell line so the issues about off-target or on-target problems don't really apply. You can sequence it; you can epigenetically reprogram these from fibroblasts to stem cells to cardiac tissue where you can see the cardiac repeat structure here. With a single base pair change, you can edit morphology and the contractile nature showing that in one patient, not necessarily in a gigantic cohort, you have something more convincing which is cause-effect. You can complement this with messenger RNA (mRNA).

Aging reversal

This might be something appropriate for genome wide; we want pathogen resistance, senescence resistant, and cancer resistance. A previous postdoc of mine has made a database for including 45 gene therapies for aging, called GenAge, the aging gene database, at a cell and tissue level.

I want to wrap up with the ultrasafe cell line slide.


How many of you have your genome sequenced? How many of you do not?

Q: Genome synthesis project... I would like to understand how proteins interact, how they fold, how they interact with DNA and many other types of features. This seems to be lacking right now. Are there any plans regarding human genome synthesis, or not?

A: To engineer systems that do amazing things for society, you do not need to have full understanding. We did not understand viruses when we began to wipe out smallpox in the 1900s. We can begin to look at 3d structure of the nucleus, a new 3d structure, cryo-EM is getting better, crystallography of things as large as ribosomes. We are engineering proteins from scratch, both de novo and also based on currently-existing proteins. We are making a biocontained organism by engineering essential genes that are dependent on non-essential amino acids, designed by the Rosetta protein design tools. We're excited about the problems you're talking about.

Q: In order for this project to be successful, which existing technologies need to be improved or corrected in order to give us the huge scale required?

A: Obviously the first order of business is DNA synthesis cost reduction. As George mentioned, we can make millions of oligoucleotides on chips, but producing them in practical format that can be used to actually write DNA at low cost for genomes, is still a work in progress. That's one area, we're in the 10 cent per base pair range right now in terms of practical cost for DNA synthesis. That's at top of the list for me. We need to reduce that 1000x fold. We need to also assemble these DNA molecules into larger structures. How can we make 1000mers into 1 million mers and more efficiently make these? How can we make them much more accurately? And also, delivering large DNAs into cells is a real bottleneck particularly in mammalian systems. Those would be 3 very high on the list on the DNA end. As George mentioned, the read-outs that we need, the testing of the phenotypes of cells very deep testing and phenotypic testing are something we would really like to see the cost drop on.

Q: What about cell-free systems?

A: We're open to suggestions, it's probably not well-integrated. We are somewhat focused on in vivo at the moment, but we're open minded.