Calculate a distance matrix between MS/MS spectra in the MetFamily feature Matrix
calculateDistanceMatrix.Rd
Computes a distance matrix between MS/MS spectra in the feature matrix representation using various distance or similarity measures. The function supports multiple distance metrics, of which the following are also available in the MetFamily GUI: "Jaccard", "Jaccard (intensity-weighted)", "Jaccard (fragment-count-weighted)" and "NDP (Normalized dot product)". The following list defines the metrics and their characteristics:
Arguments
- dataList
List object containing precursor, feature and MS/MS data.
- filter
Logical or integer vector indicating for which precursors to include the MS/MS spectra.
- distanceMeasure
Character string specifying the distance metric to use. Supported values include "Jaccard", "Manhatten", "NDP (Normalized dot product)", and others.
- progress
Logical or NA. If TRUE, progress is reported via incProgress().
Value
A list with elements:
- distanceMatrix
A numeric matrix of pairwise distances.
- filter
The filter vector used, same as the input parameter.
- distanceMeasure
The distance metric used, same as the input parameter.
Details
"Jaccard": Jaccard distance \(1 - |A \cap B| / |A \cup B|\): the fraction of matching fragments among all fragments, where A and B are MS/MS features of two different precursors.
"Jaccard (intensity-weighted)": Jaccard distance weighted by the intensity of features. First, intensities are discretized: [0.01-0.2[ are set down to 0.01, [0.2, 0.4[ down to 0.2, and >= 0.4 increased to 1. For matching fragments the higher intensity is used. Then, Jacard becomes the sum of matching intensities among the sum of all intensities in A and B. \(1 - \sum_{i\in matches} max(A_i, B_i) / (sum(A) + sum(B))\)
"Jaccard (fragment-count-weighted)": Jaccard distance weighted by relative occurance of the fragments among the precursors after filtering counts: \(1 - \sum_{i\in matches} freq(A_i) / (sum_{\notin matches} freq(A) + sum_{\notin matches}(B))\)
"NDP (Normalized dot product)": Normalized dot product similarity: \(NDP = \frac{\left( \sum_{i}^{\text{S1\&S2}} W_{\text{S1},i} W_{\text{S2},i} \right)^2}{\sum_i W_{\text{S1},i}^2 \sum_i W_{\text{S2},i}^2}\) as described in Gaquerel et al. 2015. (10.1073/pnas.1610218113)https://www.pnas.org/doi/10.1073/pnas.1610218113#sec-4-5