This is a wish list for protein folding and engineering. It contains some speculation and brain storming and shouldn't be considered completely viable for now.
Wishlist
Given a 3d shape (of some nanostructure), produce a protein's amino acid sequence that will consistently create that shape. (done as of 2023?)
Control over protein functional properties, such as catalytic domains and sites, as well as designing specific confirmational changes and control over conformation changes.
DNA data storage: faster polymerases
Proteins that make molecular display techniques easier (simplifying lab bench protocols) -- like mRNA display and ribosome display; easier molecular display would be very valuable for projects using directed evolution techniques.
Better protein-based nanopores for DNA sequencing, amino acid sequencing, and protein sensing.
Human-controlled DNA polymerase synthesis activity (choose each nucleotide), or an instrumented ribosome to control protein production regardless of mRNA content
Molecular protein lego: connect multiple legos together to build large-scale protein structures. This is generally useful for modeling and nanostructures. Binding by DNA addresses or other high affinity ligand specific techniques, for a stable toolbox of known protein structures and shapes and building up larger structures from small parts.
Protein mechanical logic: protein structures that have internal logic and state, based on mechanical motion or other catalytic reactions and interactions.
Generalized, fully-programmable molecular nanotechnology: programmable nanomachines and nanofactories that can produce other nanostructures to exact specifications, without uncertainty regarding protein folding.
TODO
- What were those long-tube protein molecular-chemistry factories called? (non-ribosomal peptide synthetases or NRPS). They are apparently natural, and they have multiple points of interest inside the tube that modify a molecule as it progresses along the protein.
Other interesting targets
- gene editing proteins (see gene-editing)
- enzymes for DNA synthesis
- molecular recording (like in vivo DNA-based recording devices, for debugging or otherwise, lineage tracing techniques, "of toasters and molecular ticker tapes")
- protein binding affinity stuff (protein-protein interaction)
- catalytic activity, enhancement of catalysis or reduction of catalysis
- synthetic metabolisms
- biosensors
Structural protein design with machine learning
Well, it's probably time to update this page... lots of recent progress in machine learning for protein design.
- AlphaFold2: Highly accurate protein structure prediction with AlphaFold
- RoseTTAFold: Accurate prediction of protein structures and interactions using a three-track neural network
- RFdiffusion: Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models
- A new protein design era with protein diffusion
- A high-level programming language for generative protein design
- Codon language embeddings provide strong signals for protein engineering
- openfold (ref)
- De novo design of high-affinity protein binders to bioactive helical peptides
- Illuminating protein space with a programmable generative model
References
See https://diyhpl.us/~bryan/papers2/bio/protein-engineering/