Week 3: Protein Design

From a chemical point of view, proteins are by far the most structurally complex and functionally sophisticated molecules known.
--- Molecular Biology of the Cell, 5th

What about this week on Protein Design?

Shuguang Zhangfrom MIT gave us a overall introduction on protein as the folding of the sequence of various amino acids. Especially, using the beautiful metaphor of architecture to describe the pattern in proteins. Thras karydis from DeepCure talked about the modern way to design protein with computational tools.


Into the world of Protein

How many proteins when you eat a 17.6 ounce(~500 gram)of steak?

From a biology book I really like called Cell Biology By The Numbers, we can see how diversity a protein can be. So lets narrow down this question to let us think more clearly. Different kinds of meat will vary in there composition of proteins. Take beaf as example, it contain about 26 gram protein per 100 gram.
And beaf is composed from mainly muscle of the cow. (Different tissue types have different kinds of proteins in it). The central unit of muscle is sarcomere, which mainly composed from myosin(~ 50 kDa) and actin(~41.8 kDa), average 45,000 kDa per protein. (26*5/45000)*6*10^23 ~ 1.73 * 10^21 proteins in 500 gram of steak!

Why are there only 20 natural amino acids

In biology, the cell used 3 nucleic acid condon to represent a amino acid. We have 4 different kinds of DNA/RNA such as A,T,C,G/A,U,C,G. But there can be more than one combination to present an amino acid. And some codon combination is for sending message such as STOP(UAA, UAG, UGA) and START(AUG, UUG, GUG) for the translation. In the end, we only have 20 nature amino acid with related mapping codon talbe. What a amzaing number of 20!

Why most molecular helices are right handed?

The chirality in life is really mysterious. This phenomenon happend in chemicals, protein(often left), DNA and RNA, or even large scale that most human with right-handed. And DNA as double-strand helix also reveal this characteristics.

Where did amino acids come from before enzymes that make them, and before life started?

The origin of life is still a really hard but interesting question to chase after. DNA, RNA and Amino acid, which one come first? Maybe a pretty blurry or combination of the interaction between these things. Recommend one book called Protoceolls:Bridging Nonliving and Living Matter, published by MIT Press. It collected a serious of articles on how the life begin and the possible evolution between nonliving and living. Amino acids is not only the buidling block of proteins and also maybe a chemical messengers between cells. Before the existence of enzyme, scientists believed that amino acid can be produced by chemical synthesis.

What do digital databases and nucleosomes have in common?

Nucleosome act as the packaing unit for the DNA in eukaryotes. The basis of nucelosome is composed of a consistent size DNA circulated around 8 histone proteins, about 146 base pairs. Digital database also used certain kind of methodology to compress and index data, which can reduce the data size and make it easy to read without searching all the data but just from the index.

Folding a electically conductive protein

The protein with electically conductive ability is really interesting. Recent publication revealed one of the conductive pili protein from Geobacter sulfurreducens have really great conductivity. This protein is produced by pilA gene. In 2020, the researcher from University of Massachusetts-Amherst used E. coli to produce this protein nanowires, which make it really approachable in the research community.
PilA protein from Derek R. Lovley publication at mBIO have 61 amino acids, quite a small size protein. Its amino sequence is FTLIELLIVVAIIGILAAIAIPQFSAYRVKAYNSAASSDLRNLKTALESAFADDQTYPPES. The highest frequency of amino acid is Alanin (~ 18%). According to the BLASTp result, the original sequence of Amino acid came from the 2013 publication by researchers Patrick N Reardon 1, Karl T Mueller at Pacific Northwest National Laboratory. The pilA protein belongs to type IV pilus assembly proteins, which is general secretory pathway protein family. The conserved domains belong to Comp_DUS superfamily. Digging in the data popped up from the NCBI database, the feeling of little messy around the different annotation.

Blastp input Blastp 02

From the taxonomy result of sequence blastp, organism of Deltaproteobacteria such as Geobacter sulfurreducens, Geobacter lovleyi, Geobacter metallireducens, Geobacter bremensis, Geobacter pelophilus have this homologous protein for their pilus structure. The protein structure had solved in 2013. The proteobacteria with this extracellular structure are capable of cycling of minerals in environments. The electrically conductive protein nanowire facilitaed the transport of electrons with it filamentous fibers. The nanowires are polymeric aseemblies. The structure is reported with the high resolution NMR data from PilA protein from G. sulfurreducens. Almost > 85% are alpha-helical.

Blastp 03 Blastp 04

In RCSB PDB database, I used the 2M7G protein ID to search(it is not useful to search pilA, because lots of protein related to pilus, the result will be pretty annoying) . The page on RCSB provide much more information on the protein structure. It also described the detailed experimental method it used to solve structure. In RCSB page, every structure prediction have a metric to show its qaulity and you can also see the full wwwPDB report.

Blastp 04

Then, I downloaded pyMol to interact the protein structure. The red color in the protein model represent helix structure and is major structure in this protein. There is no obvious pocket in the structure. I further label the amino acid with hydrophobic sid chain such as Alanine, Isoleucine, Leucine, Methionine, Valine, Phyenylalanine, Typyophen and Tyrosine.

Blastp 04 Blastp 04

Part B: How to (almost) Fold (almost) Anything

Blastp 04

Part C: Protein Design by Machine Learning


Reference

  • Wikipedia: Nucleosome https://en.wikipedia.org/wiki/Nucleosome
  • BIOT: GreatBay SZ - BIOT https://2020.igem.org/Team:GreatBay_SZ/Proof_Of_Concept
  • 2020. Power generation from ambient humidity using protein nanowire. Nature
  • 2017. Expressing the Geobacter metallireducens PilA in Geobacter sulfurreducens Yield Pili with Exceptional Conductivity. mBio
  • 2020. An Escherichia coli Chassis for Production of Electically Conductive Protein Nanowires. ACS Synth. Biol
  • 2013. Structure of the type IVa major pilin from the electrically conductive bacterial nanowires of Geobacter sulfurreducens. J. Biol. Chem
  • 2009. Emergence of a Code in the Polymerization of Amino Acids along RNA Templates. PLOS ONE
  • 2010. An Evolutionary Perspective on Amino Acids. Scitable
  • 2009. A thermodynamic basis for prebiotic amino acid synthesis and the nature of the first genetic code. Astrobiology
  • 2005. Evolution of the genetic triplet code via two types of doublet codons. Journal of Molecular Evolution
  • 2018. Abiotic synthesis of amino acids in the recesses of the oceanic lithosphere. Nature
  • 2020. Amino Acid Modified RNA Bases as Building Blocks of an Early Earth RNA-Peptide World. Chemistry A European Journal
  • 2018. Going the Distance: Long-Range Conductivity in Protein and Peptide Bioelectronic Material. The journal of Physical Chemistry
  • 1962. Electrical Conductivity of Proteins. Nature