# LC-MS data pre-processing

Input peak data and information to allow all possible combinations of the follow pre-processing operations to be downloaded in a zip file:

## Removal of variables with too many zeros (Z):

You will be asked for the percentage of zeros you would be pre-pared to accept for a variable.

Variables with more than this percentage of zeros will be removed (and the variable information file updated accordingly);

## Normalisation to total ion count (N)

## QC correction (QC):

Each variable will be corrected for batch differences by subtracting the mean of the closest (in data collection order) two QC values for this variable

## BG correction (BG)

Each variable will be corrected for batch differences by subtracting the relevant value from a smoothed trend obtained over all observations (QC and non-QC, in data collection order) for this variable.

See Rusilowicz et al.2016 for more details of correction methods.

## Data files

The data file should be a space-delimited text file with batch information given on the first line as:

m_{1} m_{2} m_{3} … m_{N}
where m_{i} is the number of non-QC observations between the consecutive QC_{s}, QC_{i} and QC_{i+1} and there are N+1 QCs and M = m_{1} + m_{2} + m_{3} + … + m_{N} + N + 1 observations in total.

For example, if there are 4 QCs with 3 observations between the QC_{1} and QC_{2}, 5 observations between the QC_{2} and QC_{3} and 4 observations between the QC_{3} and QC_{4}, the first line of the data file would be:

3 5 4

The following M lines of the file should consist of the LC-MS data with a new line for each observation (example) in run order.

Variable information should be given in a space-delimited text file with a variable number followed by the m/z value and retention time on each line.

