MetFrag CL

MetFrag is available as a command line tool, which matches the functionality present on MetFragWeb. It combines the efficient fragmenter and additional scoring functions to rank the retrieved candidates, including mass spectral match, retention time information from liquid chromatography and reference information if desired.

Download MetFrag CL (latest) MetFrag CL on GitHub Post an issue


Usage

Once downloaded, the executable MetFrag jar can be run using the following command (where X.Y.Z should be replaced by the version number matching the downloaded jar file):

# java -jar MetFragCommandLine-X.Y.Z.jar [parameter file]

All input parameters for MetFrag CL are specified in the parameter file, which contains all necessary settings to process a given MS/MS peak list. An example parameter file for querying PubChem can be downloaded here, while the corresponding example MS/MS peak list can be downloaded here. Further details about the parameter options are given in the section "Defining Parameters" below. Note that it is also possible to use the MetFrag Web interface to generate parameter files by selecting all desired settings and pressing the "Download Parameters" button in the "Fragmentation Settings & Processing" section.

Running the Example

The example parameter file for running MetFrag can be viewed using a text editor and looks like this. Lines starting with # are comments and are not used by MetFrag.

#
# data file containing mz intensity peak pairs (one per line)
#
PeakListPath = example_data.txt
#
# database parameters -> how to retrieve candidates
#
#
MetFragDatabaseType = PubChem
NeutralPrecursorMolecularFormula = C9H11Cl3NO3PS
NeutralPrecursorMass = 348.926284
#
# peak matching parameters
#
FragmentPeakMatchAbsoluteMassDeviation = 0.001
FragmentPeakMatchRelativeMassDeviation = 5
PrecursorIonMode = 1
IsPositiveIonMode = True
#
# scoring parameters
#
MetFragScoreTypes = FragmenterScore
MetFragScoreWeights = 1.0
#
# output
# SDF, XLS, CSV, ExtendedXLS, ExtendedFragmentsXLS
#
MetFragCandidateWriter = XLS
SampleName = example_1
ResultsPath = .
#
# following parameteres can be kept as they are
#
MaximumTreeDepth = 2
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter
MetFragPostProcessingCandidateFilter = InChIKeyFilter
# NumberThreads = 1

This example can be run using the following command (for MetFragCL v2.4.5):

# java -jar MetFrag2.4.5-CL.jar example_parameter_file.txt

This will generate the following output:

INFO de.ipbhalle.metfraglib.database.OnlinePubChemDatabase - Fetching candidates from PubChem
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Got 8 candidate(s)
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 10 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 30 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 40 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 50 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 60 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 80 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 90 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 100 %
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) were discarded before processing due to pre-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) discarded during processing due to errors
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 1 candidate(s) discarded after processing due to post-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Stored 7 candidate(s)

First MetFrag uses the defined database parameters to retrieve candidates. In this case the molecular formula C9H11Cl3NO3PS is used, resulting in 8 matching candidates. Then the processing starts. The progress is given in percent. After the processing is finished, MetFrag gives you a small summary about the number of discarded candidates due to the defined pre- and post-processing filters and errors occured during the processing. The latter can be caused by e.g. InChI parsing errors.
The result file(s) is/are stored in the result directory given in the parameter file (ResultsPath). The format(s) of the result file(s) is given by the parameter MetFragCandidateWriter.

Running Another Example - PubChemLite with Charged Mass

The following parameters (download file here) can be used to run MetFrag with PubChemLite coupled to MoNA (including all recommended scoring terms) using the IonizedPrecursorMass setting for a nicotine spectrum extracted from MassBank (download formatted peak list here). The local files (PubChemLite and MoNA MetFrag library) were saved locally in the same directory as MetFrag.

PeakListPath = EQ300804_Nicotine_peaks.txt
ResultsPath = .
IsPositiveIonMode = true
PrecursorIonMode = 1
IonizedPrecursorMass = 163.1229
DatabaseSearchRelativeMassDeviation = 5.0
FragmentPeakMatchRelativeMassDeviation = 5.0
FragmentPeakMatchAbsoluteMassDeviation = 0.001
SampleName = EQ300804_MetFragCL_PCL
MetFragCandidateWriter = XLS
OfflineSpectralDatabaseFile = MoNA-export-LC-MS-MS_Spectra-20241014-0.005.mb
MetFragDatabaseType = LocalCSV
LocalDatabasePath = PubChemLite_exposomics_20241227.csv
MetFragScoreTypes = FragmenterScore,OfflineIndividualMoNAScore,AnnoTypeCount,Patent_Count,PubMed_Count
MetFragScoreWeights = 1.0,1.0,1.0,1.0,1.0
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter,IsotopeFilter
MaximumTreeDepth = 2
NumberThreads = 2
UseSmiles = true

This example can be run using the following command (for MetFragCL v2.4.5):

# java -jar MetFrag2.4.5-CL.jar MetFragCL_EQ300804_PCL_MpHp.txt

This will generate the following output:

INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Got 110 candidate(s)
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 10 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 20 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 30 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 40 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 50 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 60 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 70 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 80 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 90 %
INFO de.ipbhalle.metfraglib.process.ProcessingStatus - 100 %
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) were discarded before processing due to pre-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) discarded during processing due to errors
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) discarded after processing due to post-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Stored 110 candidate(s)

First MetFrag uses the ion settings (IsPositiveIonMode = true; PrecursorIonMode = 1; IonizedPrecursorMass = 163.1229) and tolerance (DatabaseSearchRelativeMassDeviation = 5.0) to retrieve candidates from PubChemLite via the localCSV option (here, saved in the same folder as the MetFrag jar file). 110 matching candidates were retrieved (since the localCSV file is quite large, this can take a little bit of time to run). Then the processing starts, with progress reported in percent. After the processing is finished, a small summary is given detailing the number of discarded candidates due to the defined pre- and post-processing filters and errors occured during the processing. Since PubChemLite was optimized for MetFrag and mass spectrometry data processing, these are usually zero.
The result file(s) is/are stored in the result directory given in the parameter file (ResultsPath). The format(s) of the result file(s) is given by the parameter MetFragCandidateWriter. The results are sorted by the "Score" column (maximim score of 5 due to the 5 score parameters set, all with weight = 1) and clearly shows that Nicotine is ranked better (Score=4.57, MoNA match 0.999 - i.e. a Level 2a identification) than the remaining candidates (Score = 1.89 or less). The XLS output contains all columns available in PubChemLite, not just the selected scoring terms, allowing additional interpretation of the results by the classification categories. See the PubChemLite website for more information about PubChemLite.

Defining Parameters

The following headings describe the main groups of parameters.

Peak List Path

This parameter defines the path to the peak list (MS/MS fragments), which can be a two or three column text file containing m/z in the first column and intensities in the second (and optionally third column, to read files that contain both absolute and relative intensities).
PeakListPath = example_data.txt

Database Parameters - Retrieving Candidates

These parameters define the settings for candidate retrieval. By default, neutral species are queried (i.e., neutral exact mass or molecular formula). Settings to enable querying of charged mass (i.e. m/z values) are given below. The settings are a combination of database and retrieval parameters. If multiple candidate retrieval options are defined, PrecursorCompoundIDs over-rides NeutralPrecursorMolecularFormula, which over-rides NeutralPrecursorMass. It is possible to perform queries with m/z values instead of neutral masses; see section "Adduct and Charged Mass Handling" below.
# Database settings
MetFragDatabaseType = ...
LocalDatabasePath = ... (only needed for LocalSDF, LocalCSV or LocalPSV)
ChemSpiderToken = ... (only needed for ChemSpiderRest)

# Retrieval settings (at least one of these three groups must be defined)
NeutralPrecursorMass = ...
DatabaseSearchRelativeMassDeviation = ... (a value in ppm)
# AND/OR
NeutralPrecursorMolecularFormula = ...
# AND/OR
PrecursorCompoundIDs = ...

Different database (MetFragDatabaseType) options for retrieving candidate molecules include:

Using a database from a local file (LocalSDF, LocalCSV, LocalPSV) requires setting a file path to the database file (LocalDatabasePath). The KEGG, PubChem and ChemSpider databases can be queried either by database dependent compound ids (PrecursorCompoundIDs), molecular formula (NeutralPrecursorMolecularFormula) or neutral monoisotopic mass and relative mass deviation (NeutralPrecursorMass, DatabaseSearchRelativeMassDeviation). This is an example query to retrieve candidates from PubChem via molecular formula:
MetFragDatabaseType = PubChem
NeutralPrecursorMolecularFormula = C9H11Cl3NO3PS

while here is an example query to retrieve candidates from PubChemLite (localCSV option, file downloaded into the same directory as the jar file) via exact mass:

MetFragDatabaseType = localCSV
LocalDatabasePath = PubChemLite_exposomics_20241025.csv
NeutralPrecursorMass = 253.966126
DatabaseSearchRelativeMassDeviation = 5

Peak Matching Parameters (Fragmentation Settings)

The peak matching parameters, or fragmentation settings, are defined with the following options. The absolute and relative deviations are additive. For PrecursorIonMode options, see below.
FragmentPeakMatchAbsoluteMassDeviation = 0.001
FragmentPeakMatchRelativeMassDeviation = 5
PrecursorIonMode = 1
IsPositiveIonMode = True

Output Parameters

The output options are defined using the following three parameters. SampleName defines the name of the results file, and the output file path is defined using ResultsPath. The output options are one or more of SDF, XLS, CSV, ExtendedXLS, ExtendedFragmentsXLS. The latter two options give additional outputs (including images) not possible in CSV or SDF formats.
MetFragCandidateWriter = XLS
SampleName = example_1
ResultsPath = .

Additional Parameters

For advanced users, the following parameters offer additional options, such as different pre- or post-processing options, increasing the number of fragmentation steps (MaximumTreeDepth) or threads used (NumberThreads). The pre-processing option UnconnectedCompoundFilter will eliminate salts and mixtures, while the IsotopeFilter option will remove non-standard isotope forms (containing deuterium, 13C, 15N etc.) that would not be observed at the query mass/formula. The post-processing option InChIKeyFilter collapses all candidates with the same InChIKey first block (structural skeleton) together with the results from the best-scoring candidate. UseSmiles defines whether SMILES (recommended) or InChIs of the candidates are used for fragmentation. SMILES are recommended since InChIs have some non-standard tautomer definitions that can adversely affect the fragmentation results. For most use cases, these parameters should remain at the default settings given below:
MaximumTreeDepth = 2
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter,IsotopeFilter
MetFragPostProcessingCandidateFilter = InChIKeyFilter
NumberThreads = 1
useSmiles = TRUE

Advanced Database Scoring Options

For basic MetFrag use, the following score settings could be used. However, using more advanced scoring settings such as spectral match or other additional scoring terms described below will improve the performance.
MetFragScoreTypes = FragmenterScore
MetFragScoreWeights = 1.0

There are several different options for more advanced scoring schemes, depending on the database. Selecting ExtendedPubChem enables the inclusion of patent (PubChemNumberPatents) and reference/literature information (PubChemNumberPubMedReferences) for the retrieved candidates. This can be defined as follows:

MetFragDatabaseType = ExtendedPubChem
MetFragScoreTypes = FragmenterScore,PubChemNumberPatents,PubChemNumberPubMedReferences
MetFragScoreWeights = 1.0,1.0,1.0

For local file databases (LocalSDF, LocalCSV, LocalPSV), additional numerical scoring terms can be included using unique column headers (PSV, CSV) or tags (SDF). As an example, the recommended scoring terms for PubChemLite (DOI: 10.1186/s13321-021-00489-0) are as follows (including the recommended spectral library matching option, see next section):

MetFragDatabaseType = localCSV
LocalDatabasePath = PubChemLite_exposomics_20241025.csv
MetFragScoreTypes = FragmenterScore,OfflineIndividualMoNAScore,AnnoTypeCount,Patent_Count,PubMed_Count
MetFragScoreWeights = 1.0,1.0,1.0,1.0,1.0

It is possible to adjust the weights. For example, up to 5 reference scores can be retrieved from ChemSpider, which can be weighted to form a combined reference score total of maximum 1 (total score maximum 2) as follows:

MetFragDatabaseType = ChemSpiderRest
ChemSpiderToken = ...
MetFragScoreTypes = FragmenterScore,ChemSpiderReferenceCount,ChemSpiderNumberExternalReferences,ChemSpiderRSCCount,ChemSpiderNumberPubMedReferences,ChemSpiderDataSourceCount
MetFragScoreWeights = 1.0,0.2,0.2,0.2,0.2,0.2

The parameter file tells MetFrag which information to include in the final scoring via the database, scoring term and associated weight. If in doubt, use the MetFrag Web interface to generate example parameter files by selecting all desired settings (it is also possible to adjust the weights) and pressing the "Download Parameters" button. For local databases, suitable additional scoring terms, if available, will appear automatically on the web interface in the "Candidate Filter & Score Settings" section (bottom right).

Statistical Scoring

MetFrag also includes scoring parameters based on a statistical learning approach (Bayesian model). These scores can be used along with the FragmenterScore as follows:
MetFragScoreTypes = FragmenterScore,AutomatedPeakFingerprintAnnotationScore,AutomatedLossFingerprintAnnotationScore
This new model is included since MetFrag2.4.5-CL.jar. Examples to try include spectra from the CASMI2016 contest for positive and negative mode. More examples can be found on GitHub.

Spectral Library Match Scores

MetFrag has two kinds of scores to take spectral library matches into account, using local files created from MassBank of North America (MoNA) download files. DOI:10.5281/zenodo.13951786 redirects to the latest LC-MS mb file for download, while the conversion script used to create these mb files is available here. Note that this is a slightly non-standard format due to the fingerprint required for MetFusion. It is possible to use zero, one or both spectral library terms by including these options in the MetFragScoreTypes (shown here in combination with the FragmenterScore):
MetFragScoreTypes = FragmenterScore,OfflineIndividualMoNAScore,OfflineMetFusionScore
MetFragScoreWeights = 1.0,1.0,1.0

The spectral library to use can be defined as a single file or a directory and MetFrag will read all .mb files in that directory:

OfflineSpectralDatabaseFile = /path/to/MoNA-export-LC-MS.mb
# OR
OfflineSpectralDatabaseFile = /path/to/

The OfflineIndividualMoNAScore matches spectra to the candidates using the InChIKey, reporting the best similarity match if multiple spectra with the same InChIKey exist. This option allows the generation of "Level 2a" annotations (spectral similarity match, according to DOI: 10.1021/es5002105) with sufficiently high match values (e.g., >0.9). The OfflineMetFusionScore uses the MetFusion scoring approach and will return spectral match values even if no spectrum exists, using both spectral and structural similarity (see Gerlich et al, DOI: 10.1002/jms.3123).

Adduct and Charged Mass Handling

The adduct type of the precursor (PrecursorIonMode) is used to calculate fragment masses. The following adduct types can be set by using the appropriate numerical value encoding the following types:

positive (IsPositiveIonMode = True)
1-[M+H]+
18-[M+NH4]+
23-[M+Na]+
39-[M+K]+
33-[M+CH3OH+H]+
42-[M+ACN+H]+
64-[M+ACN+Na]+
83-[M+2ACN+H]+

negative (IsPositiveIonMode = False)
-1-[M-H]-
35-[M+Cl]-
45-[M+HCOO]-
59-[M+CH3COO]-

no adduct (IsPositiveIonMode = True/False)
0-[M]+/-

The PrecursorIonMode and IsPositiveIonMode parameters can be coupled with IonizedPrecursorMass (instead of NeutralPrecursorMass) to use the charged mass (m/z) from the instrument to perform the database search (candidate retrieval), as follows. See the PubChemLite section above for a full example.

IsPositiveIonMode = true
PrecursorIonMode = 1
IonizedPrecursorMass = 163.1229
DatabaseSearchRelativeMassDeviation = 5.0

Further Help

As mentioned above, it is possible to use the MetFrag Web interface to generate parameter files by selecting all desired settings and pressing the "Download Parameters" button. Please post a GitHub issue if any parameters require further explanation. If you are having issues with the settings, please check the MetFrag log file or inline output (which usually provide informative but rather verbose error messages) and previous issue postings before posting a GitHub issue. Please include as many details as possible, such as parameter settings, log messages, version number and operating system. Thank you!