MetFragR

MetFrag is available as R-package.

MetFragR on github Submit an issue


The R package enables functionalities from the MetFrag Commandline tool to be used within the R programming language.

Install

Local Install

First, check out the MetFragR GitHub repository and build the package (on command line):

git clone https://github.com/ipb-halle/MetFragR.git
cd MetFragR
R CMD check metfRag
R CMD build metfRag

After the succesful build turn into R and install the package:

install.packages("metfRag",repos=NULL,type="source")
library(metfRag)

Net Install (over GitHub)

The easiest way to install MetFragR is to use the GitHub link:

library(devtools)
install_github("c-ruttkies/MetFragR/metfRag")
library(metfRag)

Example

The following example shows how to run a simple MetFrag query in R.

#
# first define the settings object
#
settingsObject<-list()
#
# set database parameters to select candidates
#
settingsObject[["DatabaseSearchRelativeMassDeviation"]]<-5.0
settingsObject[["FragmentPeakMatchAbsoluteMassDeviation"]]<-0.001
settingsObject[["FragmentPeakMatchRelativeMassDeviation"]]<-5.0
settingsObject[["MetFragDatabaseType"]]<-"PubChem"
#
# the more information about the precurosr is available
# the more precise is the candidate selection
#
settingsObject[["NeutralPrecursorMass"]]<-253.966126
settingsObject[["NeutralPrecursorMolecularFormula"]]<-"C7H5Cl2FN2O3"
settingsObject[["PrecursorCompoundIDs"]]<-c("50465", "57010914", "56974741", "88419651", "23354334")
#
# pre and post-processing filter
#
# define filters to filter unconnected compounds (e.g. salts)
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("UnconnectedCompoundFilter","IsotopeFilter")
settingsObject[["MetFragPostProcessingCandidateFilter"]]<-c("InChIKeyFilter")
#
# define the peaklist as 2-dimensional matrix
#
settingsObject[["PeakList"]]<-matrix(c(
90.97445, 681,
106.94476, 274,
110.02750, 110,
115.98965, 95,
117.98540, 384,
124.93547, 613,
124.99015, 146,
125.99793, 207,
133.95592, 777,
143.98846, 478,
144.99625, 352,
146.00410, 999,
151.94641, 962,
160.96668, 387,
163.00682, 782,
172.99055, 17,
178.95724, 678,
178.97725, 391,
180.97293, 999,
196.96778, 720,
208.96780, 999,
236.96245, 999,
254.97312, 999), ncol=2, byrow=TRUE)
#
# run MetFrag
#
scored.candidates<-run.metfrag(settingsObject)
#
# scored.candidates is a data.frame with scores and candidate properties
#

Candidate Filters

Pre-pocessing Candidate Filters

Filters can be defined to filter candidates prior to fragmentation. Following filters are available:

UnconnectedCompoundFilter-filter non-connected compounds (e.g. salts)
IsotopeFilter-filter compounds containing non-standard isotopes
MinimumElementsFilter-filter by minimum of contained elements
MaximumElementsFilter-filter by maximum of contained elements
SmartsSubstructureInclusionFilter-filter by presence of defined sub-structures
SmartsSubstructureExclusionFilter-filter by absence of defined sub-structures
ElementInclusionFilter-filter by presence of defined elements (other elements are allowed)
ElementInclusionExclusiveFilter-filter by presence of defined elements (no other elements are allowed)
ElementExclusionFilter-filter by absence of defined sub-structures


When defining pre-processing filters further parameters have to be defined:

MinimumElementsFilter-FilterMinimumElements
MaximumElementsFilter-FilterMaximumElements
SmartsSubstructureInclusionFilter-FilterSmartsInclusionList
SmartsSubstructureExclusionFilter-FilterSmartsExclusionList
ElementInclusionFilter-FilterIncludedElements
ElementInclusionExclusiveFilter-FilterIncludedElements
ElementExclusionFilter-FilterExcludedElements


Examples:

#
# MinimumElementsFilter
#
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MinimumElementsFilter")
# include compounds with at least 2 nitrogens and 3 oxygens
settingsObject[["FilterMinimumElements"]]<-"N2O3"
#
# MaximumElementsFilter
#
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MaximumElementsFilter")
# filter out compounds with at maximum 5 nitrogens and 7 oxygens
settingsObject[["FilterMinimumElements"]]<-"N5O7"
#
# SmartsSubstructureInclusionFilter
#
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("SmartsSubstructureInclusionFilter")
# include compounds containing benzene
settingsObject[["SmartsSubstructureInclusionScoreSmartsList"]]<-c("c1ccccc1")
#
# SmartsSubstructureExclusionFilter
#
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("SmartsSubstructureExclusionFilter")
# filter out compounds containing hydroxyl groups
settingsObject[["SmartsSubstructureInclusionScoreSmartsList"]]<-c("[OX2H]")
#
# ElementInclusionFilter
#
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("ElementInclusionFilter")
# include compounds containing nitrogen and oxygen
settingsObject[["FilterIncludedElements"]]<-c("N","O")
#
# ElementExclusionFilter
#
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("ElementExclusionFilter")
# filter out compounds including bromine or chlorine
settingsObject[["FilterExcludedElements"]]<-c("Cl","Br")

Defining multiple filters at once is possible:

settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MinimumElementsFilter", "MaximumElementsFilter")
settingsObject[["FilterMinimumElements"]]<-"N2O3"
settingsObject[["FilterMinimumElements"]]<-"N5O7"

Post-pocessing Candidate Filters

Filters can be defined to filter candidates after fragmentation and scoring. Following filters are available:

InChIKeyFilter-filter stereoisomers by comparing first part of compounds' InChIKeys
only the best-scored candidate remains in the result list


Candidate Scores

MetFrag is able to include different scores used to calculate a final score which is used to rank candidates within the candidate list. Besides pre-defined scores database dependent scores can be defined. For each additional defined score a weight needs to be defined used to calculate the final score for each candidate.

Pre-defined Candidate Scores

FragmenterScore-Uses intensities, m/z values and bond energies of fragment-peak-matches
SmartsSubstructureInclusionScore-Score candiates by presence of defined substructures
SmartsSubstructureExclusionScore-Score candiates by absence of defined substructures
SuspectListScore-Score candidates by presence in defined suspect list
RetentionTimeScore-Score candidate with retention time information
OfflineMetFusionScore-Uses predifined spectral library to calculate MetFusion like similarity score


When defining additional scores further parameters need to be defined:

SmartsSubstructureInclusionScore-ScoreSmartsInclusionList
SmartsSubstructureExclusionScore-ScoreSmartsExclusionList
SuspectListScore-ScoreSuspectLists
- File path of file containing InChIKeys of suspect list one per line
RetentionTimeScore-RetentionTimeTrainingFile, ExperimentalRetentionTimeValue
- RetentionTimeTrainingFile is the file path of file containing retention time and InChI for logP calculation -> one per line
- example file can be found here
- mandatory columns are: RetentionTime, InChI or UserLogP
- InChI is used to calculate logP values with models included within CDK
- UserLogP (if defined) is used instead of InChI but has to be available within the candidate list as well
- RetentionTime values and ExperimentalRetentionTimeValue need to be acquired on the same system


Database Dependent Scores

Dependent on the used database different scores. When using local file databases any score defined as candidate property can be used as scoring term.

Following scoring terms are pre-defined for available databases:

ExtendedPubChem-PubChemNumberPatents,PubChemNumberPubMedReferences
ChemSpider-ChemSpiderReferenceCount,ChemSpiderNumberExternalReferences,ChemSpiderRSCCount,ChemSpiderNumberPubMedReferences,ChemSpiderDataSourceCount


Examples

Defining FragmenterScore, SmartsSubstructureInclusionScore and RetentionTimeScore together with necessary parameters.

settingsObject[["MetFragScoreTypes"]]<-c("FragmenterScore","SmartsSubstructureInclusionScore","RetentionTimeScore")
settingsObject[["ScoreSmartsInclusionList"]]<-c("[OX2H]","c1cccc1")
settingsObject[["RetentionTimeTrainingFile"]]<-"C:/Documents/RetentionTimeFile.csv"
settingsObject[["ExperimentalRetentionTimeValue"]]<-9.4
settingsObject[["MetFragScoreWeights"]]<-c(1.0,0.5,0.5)

Defining FragmenterScore and database dependent soring terms for ExtendedPubChem database.

settingsObject[["MetFragDatabaseType"]]<-c("ExtendedPubChem")
settingsObject[["MetFragScoreTypes"]]<-c("FragmenterScore","PubChemNumberPatents","PubChemNumberPubMedReferences")
settingsObject[["MetFragScoreWeights"]]<-c(1.0,0.5,0.5)