# Finalized list of SRM data analysis stats

### The Thanksgiving weekend provided time to ponder/reflect on the SRM stats that I’ve run thus far, what else needs to be done, and how to finish up. Received guidance from Dave-o. Note: next time, include a housekeeping protein in list of targets.

This is my “finalized” list of SRM & environmental stats to run. In the last few days I’ve completed much of this. In **bold** are those remaining. Tomorrow I’ll hopefully be done, and will post my scripts and results.

### SRM Protein & Environmental Data Analysis Steps

#### Each Protein: (assume proteins are independent)

- Test for normality
- Lambda transformation
- Test for normality post transformation
- Assess outliers, remove if necessary
- N-way ANOVA by: a) location b) habitat c) site d) region
- Determine P-adjusted, correct for multiple comparisons (bonferroni method, P/13)
- Post-hoc test to ID differences (is this really necessary?)
- Ultimate goal: which proteins are different between locations?
- Compare total abundance between sites (sum peptide abundance)

#### Each environmental variable:

- Download tidal chart data for each site
- Edit pH, DO & Salinity data:

a. Remove data from exposed time points, as determined from tidal charts

b. Identify and remove outliers from pH, DO & Salinity data

c. Recombined outlier-scrubed data with Temp, Tide data. - Assess Normality of each env. variable (all time points)
- Found to be non-parametric (pH is kinda, but let’s assume not). Dataset is large (>6000 for each parameter), so did not determine lambda via
`tukeytransform`

function. Instead, used Krusgal-Wallis non-parametric analysis in lieu of ANOVA

- Found to be non-parametric (pH is kinda, but let’s assume not). Dataset is large (>6000 for each parameter), so did not determine lambda via
- KW test for each env. variable by location, by region
- Dunn Test post-hoc test to ID differences
- Use bonferroni correction for P-adjusted in tests
- Ultimate goal: which env. variables are different between locations?

a. basically all of them.

#### Prep for regression model:

- Calculate summary statistics: mean, variance, sd, min, max, median, %>1 sd from mean, %>2 sd from mean
- Plot() all env. variables- are any linearly related, aka not independent? If so, need to include interaction parameter in regression model.
- Plot() protein peptides against each other to confirm linear correlation; equation should be ~1:1.
- If all correlated select 1 peptide to use in regression model; highest abundance is best.

#### Run regression models for each representative peptide:

**Step-wise linear regression models with all env. variables; I would expect that only the variables that were found to be different via the ANOVA would significantly contribute to the model****General linear model with variables ID’d in step-wise lm****Figure out when to add a constant, and if I should do that in this scenario****Run anova on best fit model, find P-value of the env. variables to determine confidence in the influence of each env. variable on proteins.****Run model on the other peptides in the protein (not used as representative peptides); ID the R^2 and P values**

Written on November 28, 2017