We’ve all been there. The research is going great, the results are coming in and a picture of how everything works together is starting to emerge . Then, you hit that wall: an annoying unknown protein sitting in the middle of the model.
The project has gone smoothly, except for this question mark. And now, your PI asks you to identify that missing player to resolve the mystery. So, you go on Google or PubMed and start looking for ways to identify unknown proteins. The answer to your query is clear: you need mass spectrometry.
But you don't know how it works and you will have to explain it in the next lab meeting.
Don’t worry, we can help.
In this blog post, we explain the processes behind protein identification by LC-MS/MS. We'll describe how you should prepare your sample, how the machine works, how a program can identify proteins, and even share our optimized protocol for in-gel protein identification. You will be on your way to identifying the unknown protein in no time.
Fig. 1: Example of a silver stained gel from a plasma sample
This really depends on how you found out you had a protein to identify. Did you purify your protein of interest and found another protein eluting from the preparation? Or did you do an immunoprecipitation of your target protein and dyed the SDS PAGE gel to find a major band besides your bait ?
In all cases, you will want to redo the protocol with clean material to limit the contamination of your sample by keratins.
Those pesky proteins are present everywhere and can really hinder the identification of your protein if they are too concentrated in your sample.
Since we will focus on the identification of a protein from a gel, we suggest you take a look at our protocol for the identification of a protein from a Coomassie Blue-stained gel for an example of the experimentation itself. As stated in our post on sample preparation, you should always be working with gloves, filtered pipette tips, and low-binding Eppendorf tubes. This will greatly enhance the peptide recovery and the success of your experiment. Whatever sample preparation method you use, you must keep in mind that with mass spectrometry, you should always work in a clean environment and with appropriate tools.
As you know, proteins are in fact a chain of amino acids that are covalently linked to one another. In order to identify your protein by LC-MS/MS, you will have to break it in to smaller parts, called peptides. This is usually done by an enzymatic digestion with proteolytic enzymes, such as trypsin or chymotrypsin. The choice of which enzyme to use is determined by the sequence of your protein of interest.
In the case of an unknown protein, we usually choose trypsin because of its low miscleavage rate and its high activity in rather denaturing buffers. Again, the protocol for enzymatic digestion will vary according to the matrix in which your protein is. If it is in a buffer in a relatively pure state (from a HPLC purification, for example), chances are you could just add the enzyme and proceed to the digestion. If your protein is in a gel, the procedure may be a bit more complex. The end results should however always be the same: a clean peptide mixture that you can analyze by LC-MS/MS.
Now that you have your peptide mixture, you are ready to ship it for analysis to your favorite mass spectrometry service provider. There are several different acquisition modes that you can use to identify proteins, but we will focus on the information dependant acquisition (IDA, also known as data dependent acquisition, or DDA) in this post. IDA uses the very high speed and resolution of the mass spectrometer to determine the mass of the peptides that are in your sample. Here's how it works:
The peptide mixture is loaded on a chromatographic column. Peptides with different amino acid composition will have different affinities with the column. By applying liquid phases with an increasing concentration of organic solvent over time to the column, we can separate the peptides from one another. According to their affinity for the column, the peptides will be gradually released from the column by the increasing organic concentration in the mobile phase and will gradually enter the mass spectrometer. The chromatographic separation enables the peptides to be concentrated at one point in time and facilitates their detection by the machine. On average, on a micro-LC with a 60-minute gradient, it takes approximately 45 seconds for all the copies of a peptide to elute from the column.
In mass spectrometry, the term "cycle" represents the sum of all the small tasks that are done by the machine. For example, in the IDA mode, a cycle includes a survey scan and several product ion scans (up to 40 for high resolution mass spectrometers). Each of the components of a cycle takes a small amount of time to be completed. For example, if a cycle has a 200-millisecond survey scan and 40 product ion scans each lasting 35 milliseconds, the total time the machine takes to complete a cycle is roughly 1.6 seconds (200ms + (40 x 35ms) = 1600 ms). Once a cycle has been completed, the machine starts a new cycle. This goes on for the whole length of the analysis. During a 60 minutes gradient, the mass spec will complete approximately 2118 cycles. No wonder a MS analysis produces so much data! Here is the description of the two main scan types that are required in an IDA mode analysis.
To identify proteins with the IDA mode, the MS first performs a full reading of the flow of ions that is called a survey scan. The survey scan records the mass over charge ratio (m/z) of every ion that is entering the machine at this time. The scan lasts for about 200 milliseconds. At the end of the scan, the machine knows which ions were present in the sample at this time point and can rank them by signal intensity.
The second part of the IDA method is known as the product ion scan. During the survey scan, the machine created a list of the x most intense ions (up to 40) in the ion flow entering the machine. The first product ion scan will isolate the first most intense ion, fragment it, and record the m/z ratio of all its fragments. This lasts for approximately 35 milliseconds. Once this is done, the machine runs a second product ion scan. It therefore isolates, fragments, and records the m/z ratio of the fragments of the second most intense ion... and so on until the number of ions to analyze in the cycle is reached (up to 40 in one cycle). Then, the first cycle ends, and second cycle starts. A peptide elutes from the column for approximately 45 seconds in a 60-minute LC gradient. Knowing that a cycle lasts only 1.6 seconds, this means that we could potentially detect the same peptide roughly 28 times during its elution from the column (45 sec/1.6 sec per cycle = 28). Since the machine only records the 40 most intense ions out of several hundred at one time, we could potentially miss a lot of information. This is where a useful exclusion feature of the IDA method comes into play. Indeed, we can tell the machine to record the same ion only two times and to exclude it for a set number seconds afterwards. During the next cycle where this ion is detected in the survey scan after it's been recorded twice, it will not be counted in the 40 product ion scans. Instead, the machine will isolate and fragment the 40+1th most intense ion. This can continue indefinitely, explaining how the machine can obtain a good sequencing depth, even while being limited to 40 product ion scans per cycle!
The end result of the IDA method is a list of precursor ions (recorded during the survey scan) and of their associated fragments (recorded during the product ion scan). The files produced by the MS are then fed in to a program that will perform the identification of the proteins in two steps*.
*We work with ProteinPilot Software (by Sciex) and will cover how this suite of algorithms work for protein identification. Note that there are other tools out there that can do similar analysis, all with slight differences in outputs.
Fig2: Example of a Paragon search.
The different taglets identified by Paragon are listed on the top of the figure. The thicker the red line under the taglet is, the higher score it received. In this example, only one protein is shown. The red lines underneath the sequence indicate the mapping of the taglets on the amino acid sequence of the protein. The more taglet a region has, the "hotter" it is considered by Paragon. Once this process is done, Paragon attributes different search power to the regions, according to their relative "temperature". If a region is "hot", Paragon will search more extensively to find peptides for this region. For example, a "hot" region will be searched for all the available post-translational modifications known to date and for a lot of rare miscleavages, whereas a "cold" region will only be searched for the most common PTMs. This figure was taken from the original Paragon paper from Sciex (2007).
By following the process we’ve outlined in this post, the unknown band on your gel is now an identified protein. Interestingly, the same MS process can be coupled with other sample preparation techniques, such as immunoprecipitation, to identify protein-protein interaction partners or post-translational modifications on your protein of interest.
The screenshots below illustrate what a protein identification looks like in the ProteinPilot software. It also emphasizes how complex such an analysis is, and how much data can be generated with only one MS experiment!
Fig3: From the ProteinPilot software (click the images to enlarge).
A) Screenshot of a peptide identification. The bottom right panel displays the MS/MS spectra of one peptide. B) Screenshot of a protein identification. The lower panel shows the peptides that were detected during the peptide identification. The green letters represents peptides with a good confidence, while the yellow and red letters represents peptides with medium and low confidence, respectively.
Mass spectrometry is a powerful tool to identify proteins from a sample. Whether it is from a very complex sample or from a single protein band cut from a polyacrylamide gel, LC-MS/MS is the most efficient way to get your proteomics project moving forward.
More questions about this post or how mass spec can help you in your research? Contact our experts.