Part A (From Pranam)
- Sign up for HuggingFace (we will be using PepMLM: https://huggingface.co/ChatterjeeLab/PepMLM-650M)
- Once you login, go to the page (https://huggingface.co/settings/tokens). ClickÂ
+Create new token.
- Go to the page (https://huggingface.co/ChatterjeeLab/PepMLM-650M) and find their Colab Notebook (link).
- Make a copy to your Google Drive, choose T4 GPU and run each block.
- Find the amino acid sequence for SOD1 in UniProt (ID: P00441), a protein when mutated, can cause Amyotrophic lateral sclerosis (ALS). In fact, the A4V (when you change position 4 from Alanine to Valine) causes the most aggressive form of ALS, so make that change in the sequence
- Enter your mutated SOD1 sequence into the PepMLM inference API and generate 4 peptides of length 12 amino acids (Step 5 takes a while so you can also just pick 1 or 2 peptides)
- To your list, add this known SOD1-binding peptide to your list: FLYRWLPSRRGG [from -https://genesdev.cshlp.org/content/22/11/1451]
- Go to AlphaFold-Multimer (https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb). This is similar to what you did for homework last week but instead for a protein-peptide complex
- Set model_type: alphafold2_multimer_v3 (this model has been shown to recapitulate peptide-protein binding accurately: https://www.frontiersin.org/articles/10.3389/fbinf.2022.959160/full). * Add your query sequence - Its the SOD1Sequence:PeptideSequence.
- After running AlphaFold-Multimer with your 5 peptides alongside your mutated SOD1 sequence, plot the ipTM scores, which measures the relative confidence of the binding region.
- Provide a 1 paragraph write-up of your results
Part B (Final Project: L-Protein Mutants)
<aside>
💡
This homework requires computation that might take you a while to run. So please get started early.
</aside>
Bacteriophage MS2 is a single stranded RNA virus whose genome only encodes 4 proteins -the maturation protein (A-protein), the lysis (L-Protein) protein, the coat protein (cp), and the replicase (rep) protein. Bacteriophages infect E-coli. Upon infection, the L-Protein forms pores in the E-coli cell membrane which eventually leads to breakdown of the membrane (Lysis). DnaJ is a chaperone protein in E-coli (chaperone proteins are proteins that assist during protein folding). It is thought to be involved in the lysis mechanism. In this homework, we will explore if computational models we learnt about in the last class are useful for designing variants/mutants of the lysis protein sequence. We will study the effects of L-protein mutants on the bacteriophage infectivity.
Goal:
Our goal for this part of the homework is to create mutants of L-protein that affect its lysis activity and/or its interaction with DNAj. Making a mutation for L-protein without a way to computationally predict what happens to lysis or its interaction with DNAj is hard. So we are going to try various hypotheses on how to use the models from last week and also try a few other tools. These mutants will be tested in the lab.
L-Protein and DNAj Sequence