Finding proteins of interest
With demultiplexed files in Skyline I can export my results to .csv file for analysis. While I do still need to create a Retention Time Calculator and apply to data in Skyline, I’m taking an initial stab at finding differentially expressed proteins.
First step, define the metrics that I want to export:
Opening the resulting .csv file in Excel, and it looks like this, with Area & Peptide Peak Found Ratio for each sample in separate columns.
For my initial analysis I’m going to use Excel- I’ll graduate up to R soon I promise myself!- for the meantime, I used a Pivot Table to pull ** Sum Area by Protein & Sample #**:
I need to normalize the sum area based on the total amount of protein, aka to remove any differences based on the amount of protein we loaded into each sample. We can do this with the Total Ion Current (TIC) for each sample file, which Emma provided for us. Here’s a screen shot of the file she provided:
On a new tab in my .xls spreadsheet I transposed the TIC data, then pulled all the Sum Area data from my pivot table into this new tab, normalizing it by dividing by the TIC for each sample:
It looks like an enormous # of peptides peaks were not identified by Skyline (thus all the N/A’s). I’m hoping that my RT calculator will improve this. While I don’t necessarily want to work with this sparse data, it’s good to struggle through Excel so I can realize how much I need to transition to R…. so I did the following:
Using the normalized data I calculated the average of the two replicates for each treatment/site; if only 1 area value was present, I used the single data point. Then, I calculated fold change with the averages in eelgrass: Eelgrass / Bare. So, ratios in the following spreadsheet represent expression in eelgrass relative to bare:
And I’ve highlighted the proteins that were differentially expressed either <0.5 or >2.0:
Pulling them out, I’ve identified 31 proteins:
Now to see what these proteins are! Using Galaxy I joined the P. generosa gonad annotated transcriptome and my .csv proteins of interest files, and here are the results; those marked with a “0.5” weren’t perfect matches (i.e. just same comp#). Interesting to see the “universal stress response” pop up….
I also found that some of my proteins aren’t in the annotated file; those not highlighted in red were not in the annotation file: