MetFrag CL

MetFrag is available as a commandline tool. It combines the efficient fragmenter and functionalities to include additional information to score the retrieved candidates. The inclusion of retention time information from liquid chromatography and reference information is possible.

Download MetFrag CL MetFrag CL on github


Usage

After downloading the executable jar MetFrag can generally be run by

# java -jar MetFragCommandLine-VERSION-jar-with-dependencies.jar [parameter file]

Parameter file

MetFrag CL needs a parameter file of specific layout as input. The parameter file contains all necessary information for the processing of a given MS/MS peak list. An example parameter file can be downloaded here and corresponding example data file containg the mz peak list is also needed.

To view the contents of the example file it can be opened with a text editor. Lines starting with # are comments and not used by MetFrag.

#
# data file containing mz intensity peak pairs (one per line)
#
PeakListPath = example_data.txt
#
# database parameters -> how to retrieve candidates
#
#
MetFragDatabaseType = PubChem
NeutralPrecursorMolecularFormula = C9H11Cl3NO3PS
NeutralPrecursorMass = 348.926284
#
# peak matching parameters
#
FragmentPeakMatchAbsoluteMassDeviation = 0.001
FragmentPeakMatchRelativeMassDeviation = 5
PrecursorIonMode = 1
IsPositiveIonMode = True
#
# scoring parameters
#
MetFragScoreTypes = FragmenterScore
MetFragScoreWeights = 1.0
#
# output
# SDF, XLS, CSV, ExtendedXLS, ExtendedFragmentsXLS
#
MetFragCandidateWriter = XLS
SampleName = example_1
ResultsPath = .
#
# following parameteres can be kept as they are
#
MaximumTreeDepth = 2
MetFragPreProcessingCandidateFilter = UnconnectedCompoundFilter
MetFragPostProcessingCandidateFilter = InChIKeyFilter
# NumberThreads = 1

A first example run can be realized by the following command:

# java -jar MetFrag2.4.5-CL.jar example_parameter_file.txt

You will get the following output:

INFO de.ipbhalle.metfraglib.database.OnlinePubChemDatabase - Fetching candidates from PubChem
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Got 8 candidate(s)
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 10 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 30 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 40 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 50 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 60 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 80 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 90 %
INFO de.ipbhalle.metfraglib.process.CombinedSingleCandidateMetFragProcess - 100 %
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) were discarded before processing due to pre-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 0 candidate(s) discarded during processing due to errors
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - 1 candidate(s) discarded after processing due to post-filtering
INFO de.ipbhalle.metfraglib.process.CombinedMetFragProcess - Stored 7 candidate(s)

First MetFrag uses the defined database parameters to retrieve candidate. In this case the molecular formula is used (C9H11Cl3NO3PS) resulting in 8 matching candidates. Then the processing starts and the progress is given in percent numbers. After the processing is finished MetFrag gives you small summary about the number of discarded candidates due to the defined pre- and post-processing filters and errors occured during the processing. The latter can be caused by e.g. InChI parsing errors.
The result file(s) is/are stored in the result directory given in the parameter file (ResultsPath). The format of the result file is given by the parameter MetFragCandidateWriter

Databases

There are different databases available that can be queried for candidate molecules (MetFragDatabaseType)

If you use a local file database (LocalSDF, LocalCSV, LocalPSV) you have to provide the path to the file database (LocalDatabasePath). The KEGG, PubChem and ChemSpider databases are queried either by database dependent compound ids (PrecursorCompoundIDs), molecular formula (NeutralPrecursorMolecularFormula) or neutral monoisotopic mass (NeutralPrecursorMass) together with a relative mass deviation (DatabaseSearchRelativeMassDeviation) in the given order if more than one is defined. Next to PubChem there is also an extended PubChem database available that fetches patent (PubChemNumberPatents) and reference (PubChemNumberPubMedReferences) information for the retrieved candidates. These can then be used as an additional scoring term like the additional information that comes with a ChemSpider database query. These are the number of references (ChemSpiderReferenceCount), external references (ChemSpiderNumberExternalReferences), citations in Royal Society of Chemistry journals (ChemSpiderRSCCount), references in PubMed (ChemSpiderNumberPubMedReferences) and data sources (ChemSpiderDataSourceCount). To tell MetFrag which information you want to be included in the final scoring, you just need to adapt the parameter file. First set the proper database (ExtendedPubChem) and add the additional scoring term (PubChemNumberPatents) and a weight defining the influence of the additional scoring term in the final scoring.
MetFragDatabaseType = ExtendedPubChem
MetFragScoreTypes = FragmenterScore,PubChemNumberPatents
MetFragScoreWeights = 1.0,0.2

Statistical Scoring (new)

MetFrag now includes new scoring parameters which are based on a statistical learning approach. Therefore annotations of fragment-structures and m/z-peaks are learned by a bayesian model. The new scores can be used along with the FragmenterScore:
MetFragScoreTypes = FragmenterScore,AutomatedPeakFingerprintAnnotationScore,AutomatedLossFingerprintAnnotationScore
You can find examples of the CASMI2016 contest for positive and negative mode. Starting with MetFrag2.4.5-CL.jar the tool includes a trained model which can directly be used with the provided parameter files. The new scoring parameters have shown to improve MetFrag's annotation results. More examples can be found on GitHub.

Spectral library MetFusion scores

OfflineSpectralDatabaseFile
MetFrag is implementing two kinds of scores that take a spectral library into account. You can specify a single file:

OfflineSpectralDatabaseFile = /path/to/MoNA-export-LC-MS.mb
or specify a directory and MetFrag will read all conatined .mb files from that directory:

OfflineSpectralDatabaseFile = /path/to/

Further Parameters

PrecursorIonMode
The adduct type of the precursor is used to calculate fragment masses. Following adduct types can be set by their appropriate numerical value encoding the following types:

positive (IsPositiveIonMode = True)
1-[M+H]+
18-[M+NH4]+
23-[M+Na]+
39-[M+K]+
33-[M+CH3OH+H]+
42-[M+ACN+H]+
64-[M+ACN+Na]+
83-[M+2ACN+H]+

negative (IsPositiveIonMode = False)
-1-[M-H]-
35-[M+Cl]-
45-[M+HCOO]-
59-[M+CH3COO]-

no adduct (IsPositiveIonMode = True/False)
0-[M]+/-