MetFrag is available as a command line tool, which matches the functionality present on MetFragWeb. It combines the efficient fragmenter and additional scoring functions to rank the retrieved candidates, including mass spectral match, retention time information from liquid chromatography and reference information if desired.
Download MetFrag CL (latest) MetFrag CL on GitHub Post an issue
Once downloaded, the executable MetFrag jar can be run using the following command (where X.Y.Z should be replaced by the version number matching the downloaded jar file):
All input parameters for MetFrag CL are specified in the parameter file, which contains all necessary settings to process a given MS/MS peak list. An example parameter file for querying PubChem can be downloaded here, while the corresponding example MS/MS peak list can be downloaded here. Further details about the parameter options are given in the section "Defining Parameters" below. Note that it is also possible to use the MetFrag Web interface to generate parameter files by selecting all desired settings and pressing the "Download Parameters" button in the "Fragmentation Settings & Processing" section.
# |
# data file containing mz intensity peak pairs (one per line) |
# |
PeakListPath = example_data.txt |
# |
# database parameters -> how to retrieve candidates |
# |
# |
MetFragDatabaseType = PubChem |
NeutralPrecursorMolecularFormula = C9H11Cl3NO3PS |
NeutralPrecursorMass = 348.926284 |
# |
# peak matching parameters |
# |
FragmentPeakMatchAbsoluteMassDeviation = 0.001 |
FragmentPeakMatchRelativeMassDeviation = 5 |
PrecursorIonMode = 1 |
IsPositiveIonMode = True |
# |
# scoring parameters |
# |
MetFragScoreTypes = FragmenterScore |
MetFragScoreWeights = 1.0 |
# |
# output |
# SDF, XLS, CSV, ExtendedXLS, ExtendedFragmentsXLS |
# |
MetFragCandidateWriter = XLS |
SampleName = example_1 |
ResultsPath = . |
# |
# following parameteres can be kept as they are |
# |
MaximumTreeDepth = 2 |
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter |
MetFragPostProcessingCandidateFilter = InChIKeyFilter |
# NumberThreads = 1 |
This example can be run using the following command (for MetFragCL v2.4.5):
This will generate the following output:
First MetFrag uses the defined database parameters to retrieve candidates. In this case the molecular formula C9H11Cl3NO3PS is used, resulting in 8 matching candidates. Then the processing starts. The progress is given in percent. After the processing is finished, MetFrag gives you a small summary about the number of discarded candidates due to the defined pre- and post-processing filters and errors occured during the processing. The latter can be caused by e.g. InChI parsing errors.
The result file(s) is/are stored in the result directory given in the parameter file (ResultsPath). The format(s) of the result file(s) is given by the parameter MetFragCandidateWriter.
PeakListPath = EQ300804_Nicotine_peaks.txt |
ResultsPath = . |
IsPositiveIonMode = true |
PrecursorIonMode = 1 |
IonizedPrecursorMass = 163.1229 |
DatabaseSearchRelativeMassDeviation = 5.0 |
FragmentPeakMatchRelativeMassDeviation = 5.0 |
FragmentPeakMatchAbsoluteMassDeviation = 0.001 |
SampleName = EQ300804_MetFragCL_PCL |
MetFragCandidateWriter = XLS |
OfflineSpectralDatabaseFile = MoNA-export-LC-MS-MS_Spectra-20241014-0.005.mb |
MetFragDatabaseType = LocalCSV |
LocalDatabasePath = PubChemLite_exposomics_20241227.csv |
MetFragScoreTypes = FragmenterScore,OfflineIndividualMoNAScore,AnnoTypeCount,Patent_Count,PubMed_Count |
MetFragScoreWeights = 1.0,1.0,1.0,1.0,1.0 |
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter,IsotopeFilter |
MaximumTreeDepth = 2 |
NumberThreads = 2 |
UseSmiles = true |
This example can be run using the following command (for MetFragCL v2.4.5):
This will generate the following output:
First MetFrag uses the ion settings (IsPositiveIonMode = true; PrecursorIonMode = 1; IonizedPrecursorMass = 163.1229) and tolerance (DatabaseSearchRelativeMassDeviation = 5.0) to retrieve candidates from PubChemLite via the localCSV option (here, saved in the same folder as the MetFrag jar file). 110 matching candidates were retrieved (since the localCSV file is quite large, this can take a little bit of time to run). Then the processing starts, with progress reported in percent. After the processing is finished, a small summary is given detailing the number of discarded candidates due to the defined pre- and post-processing filters and errors occured during the processing. Since PubChemLite was optimized for MetFrag and mass spectrometry data processing, these are usually zero.
The result file(s) is/are stored in the result directory given in the parameter file (ResultsPath). The format(s) of the result file(s) is given by the parameter MetFragCandidateWriter. The results are sorted by the "Score" column (maximim score of 5 due to the 5 score parameters set, all with weight = 1) and clearly shows that Nicotine is ranked better (Score=4.57, MoNA match 0.999 - i.e. a Level 2a identification) than the remaining candidates (Score = 1.89 or less). The XLS output contains all columns available in PubChemLite, not just the selected scoring terms, allowing additional interpretation of the results by the classification categories. See the PubChemLite website for more information about PubChemLite.
PeakListPath = example_data.txt |
# Database settings |
MetFragDatabaseType = ... |
LocalDatabasePath = ... (only needed for LocalSDF, LocalCSV or LocalPSV) |
ChemSpiderToken = ... (only needed for ChemSpiderRest) |
# Retrieval settings (at least one of these three groups must be defined) |
NeutralPrecursorMass = ... |
DatabaseSearchRelativeMassDeviation = ... (a value in ppm) |
# AND/OR |
NeutralPrecursorMolecularFormula = ... |
# AND/OR |
PrecursorCompoundIDs = ... |
Different database (MetFragDatabaseType) options for retrieving candidate molecules include:
MetFragDatabaseType = PubChem |
NeutralPrecursorMolecularFormula = C9H11Cl3NO3PS |
while here is an example query to retrieve candidates from PubChemLite (localCSV option, file downloaded into the same directory as the jar file) via exact mass:
MetFragDatabaseType = localCSV |
LocalDatabasePath = PubChemLite_exposomics_20241025.csv |
NeutralPrecursorMass = 253.966126 |
DatabaseSearchRelativeMassDeviation = 5 |
FragmentPeakMatchAbsoluteMassDeviation = 0.001 |
FragmentPeakMatchRelativeMassDeviation = 5 |
PrecursorIonMode = 1 |
IsPositiveIonMode = True |
MetFragCandidateWriter = XLS |
SampleName = example_1 |
ResultsPath = . |
MaximumTreeDepth = 2 |
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter,IsotopeFilter |
MetFragPostProcessingCandidateFilter = InChIKeyFilter |
NumberThreads = 1 |
useSmiles = TRUE |
MetFragScoreTypes = FragmenterScore |
MetFragScoreWeights = 1.0 |
There are several different options for more advanced scoring schemes, depending on the database. Selecting ExtendedPubChem enables the inclusion of patent (PubChemNumberPatents) and reference/literature information (PubChemNumberPubMedReferences) for the retrieved candidates. This can be defined as follows:
MetFragDatabaseType = ExtendedPubChem |
MetFragScoreTypes = FragmenterScore,PubChemNumberPatents,PubChemNumberPubMedReferences |
MetFragScoreWeights = 1.0,1.0,1.0 |
For local file databases (LocalSDF, LocalCSV, LocalPSV), additional numerical scoring terms can be included using unique column headers (PSV, CSV) or tags (SDF). As an example, the recommended scoring terms for PubChemLite (DOI: 10.1186/s13321-021-00489-0) are as follows (including the recommended spectral library matching option, see next section):
MetFragDatabaseType = localCSV |
LocalDatabasePath = PubChemLite_exposomics_20241025.csv |
MetFragScoreTypes = FragmenterScore,OfflineIndividualMoNAScore,AnnoTypeCount,Patent_Count,PubMed_Count |
MetFragScoreWeights = 1.0,1.0,1.0,1.0,1.0 |
It is possible to adjust the weights. For example, up to 5 reference scores can be retrieved from ChemSpider, which can be weighted to form a combined reference score total of maximum 1 (total score maximum 2) as follows:
MetFragDatabaseType = ChemSpiderRest |
ChemSpiderToken = ... |
MetFragScoreTypes = FragmenterScore,ChemSpiderReferenceCount,ChemSpiderNumberExternalReferences,ChemSpiderRSCCount,ChemSpiderNumberPubMedReferences,ChemSpiderDataSourceCount |
MetFragScoreWeights = 1.0,0.2,0.2,0.2,0.2,0.2 |
The parameter file tells MetFrag which information to include in the final scoring via the database, scoring term and associated weight. If in doubt, use the MetFrag Web interface to generate example parameter files by selecting all desired settings (it is also possible to adjust the weights) and pressing the "Download Parameters" button. For local databases, suitable additional scoring terms, if available, will appear automatically on the web interface in the "Candidate Filter & Score Settings" section (bottom right).
MetFragScoreTypes = FragmenterScore,AutomatedPeakFingerprintAnnotationScore,AutomatedLossFingerprintAnnotationScore |
MetFragScoreTypes = FragmenterScore,OfflineIndividualMoNAScore,OfflineMetFusionScore |
MetFragScoreWeights = 1.0,1.0,1.0 |
The spectral library to use can be defined as a single file or a directory and MetFrag will read all .mb files in that directory:
OfflineSpectralDatabaseFile = /path/to/MoNA-export-LC-MS.mb |
# OR |
OfflineSpectralDatabaseFile = /path/to/ |
The OfflineIndividualMoNAScore matches spectra to the candidates using the InChIKey, reporting the best similarity match if multiple spectra with the same InChIKey exist. This option allows the generation of "Level 2a" annotations (spectral similarity match, according to DOI: 10.1021/es5002105) with sufficiently high match values (e.g., >0.9). The OfflineMetFusionScore uses the MetFusion scoring approach and will return spectral match values even if no spectrum exists, using both spectral and structural similarity (see Gerlich et al, DOI: 10.1002/jms.3123).
1 | - | [M+H]+ |
18 | - | [M+NH4]+ |
23 | - | [M+Na]+ |
39 | - | [M+K]+ |
33 | - | [M+CH3OH+H]+ |
42 | - | [M+ACN+H]+ |
64 | - | [M+ACN+Na]+ |
83 | - | [M+2ACN+H]+ |
-1 | - | [M-H]- |
35 | - | [M+Cl]- |
45 | - | [M+HCOO]- |
59 | - | [M+CH3COO]- |
0 | - | [M]+/- |
The PrecursorIonMode and IsPositiveIonMode parameters can be coupled with IonizedPrecursorMass (instead of NeutralPrecursorMass) to use the charged mass (m/z) from the instrument to perform the database search (candidate retrieval), as follows. See the PubChemLite section above for a full example.
IsPositiveIonMode = true |
PrecursorIonMode = 1 |
IonizedPrecursorMass = 163.1229 |
DatabaseSearchRelativeMassDeviation = 5.0 |