Interesting Proteins, part II

To determine over/under-expressed proteins eelgrass vs. bare treatments I did the following:

1) Downloaded the report from Skyline; replaced all “N/A” values with blanks:

2) Created a pivot table to sum the total peak area for each protein, broken down by sample #.

3) I copied the pivot table results and pasted into a new tab; I did this b/c oddly referencing the pivot table in a subsequent formula was not working. Entered the Total Ion Current (TIC) values into

4) On a new tab, I assigned each sample # its corresponding site & treatment, then normalized the protein sum peak area by the TIC: %TIC = [peak area / TIC]*100 I highlighted cells with %TIC between 20%-99% in green, and those >100% in red. I’ll need to investigate why I have some proteins with a peak area than the TIC.

5) On a new tab, I averaged technical replicates’ %TIC (i.e. same sample was run twice on Lumos), then calculated the fold change in eelgrass beds compared to bare beds, by SITE. For example: %TIC @ Case Inlet = Average %TIC CI-Eelgrass / %TIC CI-Bare

6) On a new tab, created another pivot table showing fold change for each protein in eelgrass beds compared to bare, organized by site. I highlighted proteins over-expressed by 5x in green, and under-expressed by 5x in red.

7) Then, I did some more re-organization and extracted a list of proteins in each site that were over/under expressed 5x:

8) I moved this list to a new (much smaller) file. Next task: identify which proteins are consistently expressed differently in the two treatments. I re-structured the list of proteins & associated fold change into 1 column, and assigned each data point a site. Then, I created yet another pivot table to organize these candidate “interesting” proteins by site; As shown in this screen shot, in total there are 2382 proteins that were over/under expressed by a factor of 5 in at least one site: 866 in Case Inlet, 1153 in Fidalgo Bay, 910 in Port Gamble, and 666 in Skokomish.

9) I an adjacent column I calculated the # sites that had data for each protein, then sorted by that count. I highlighted all overexpressed proteins in green, and underexpressed in red. This allowed me to visually review all proteins, starting with those that were differentially expressed by a factor of 5 in all 4 sites, and assign “Over” / “Under” for each protein. I assigned proteins that were consistently over- or under-expressed in at least 3 sites; I also spot-checked proteins for which 2 sites were over/under by a factor of 5, and had very consistent/very pronounced diff. expression. I labeled these as “interesting.” I copied this list, pasted into a new file that I’ll use in Galaxy to join to the annotated transcriptome.

10) In Galaxy, I uploaded the annotated geoduck gonad transcriptome file (https://raw.githubusercontent.com/sr320/paper-pano-go/master/data-results/Geo-v3-join-uniprot-all0916-condensed.tab) and my protein list:

11) I downloaded the results, added the headers, scanned the proteins, and bolded a few that were significantly differentially expressed:

12) Files are saved on GitHub, and on Owl, files dated 5/15. Most important file (at the moment) is: 2017-05-15_Proteins of Definite Interest_Annotated

Written on May 15, 2017