MetFrag is available as R-package.
MetFragR on github Submit an issue
The R package enables functionalities from the MetFrag Commandline tool to be used within the R programming language.
First, check out the MetFragR GitHub repository and build the package (on command line):
git clone https://github.com/ipb-halle/MetFragR.git |
cd MetFragR |
R CMD check metfRag |
R CMD build metfRag |
After the succesful build turn into R and install the package:
install.packages("metfRag",repos=NULL,type="source") |
library(metfRag) |
The easiest way to install MetFragR is to use the GitHub link:
library(devtools) |
install_github("c-ruttkies/MetFragR/metfRag") |
library(metfRag) |
The following example shows how to run a simple MetFrag query in R.
# |
# first define the settings object |
# |
settingsObject<-list() |
# |
# set database parameters to select candidates |
# |
settingsObject[["DatabaseSearchRelativeMassDeviation"]]<-5.0 |
settingsObject[["FragmentPeakMatchAbsoluteMassDeviation"]]<-0.001 |
settingsObject[["FragmentPeakMatchRelativeMassDeviation"]]<-5.0 |
settingsObject[["MetFragDatabaseType"]]<-"PubChem" |
# |
# the more information about the precurosr is available |
# the more precise is the candidate selection |
# |
settingsObject[["NeutralPrecursorMass"]]<-253.966126 |
settingsObject[["NeutralPrecursorMolecularFormula"]]<-"C7H5Cl2FN2O3" |
settingsObject[["PrecursorCompoundIDs"]]<-c("50465", "57010914", "56974741", "88419651", "23354334") |
# |
# pre and post-processing filter |
# |
# define filters to filter unconnected compounds (e.g. salts) |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("UnconnectedCompoundFilter","IsotopeFilter") |
settingsObject[["MetFragPostProcessingCandidateFilter"]]<-c("InChIKeyFilter") |
# |
# define the peaklist as 2-dimensional matrix |
# |
settingsObject[["PeakList"]]<-matrix(c( |
90.97445, 681, |
106.94476, 274, |
110.02750, 110, |
115.98965, 95, |
117.98540, 384, |
124.93547, 613, |
124.99015, 146, |
125.99793, 207, |
133.95592, 777, |
143.98846, 478, |
144.99625, 352, |
146.00410, 999, |
151.94641, 962, |
160.96668, 387, |
163.00682, 782, |
172.99055, 17, |
178.95724, 678, |
178.97725, 391, |
180.97293, 999, |
196.96778, 720, |
208.96780, 999, |
236.96245, 999, |
254.97312, 999), ncol=2, byrow=TRUE) |
# |
# run MetFrag |
# |
scored.candidates<-run.metfrag(settingsObject) |
# |
# scored.candidates is a data.frame with scores and candidate properties |
# |
Filters can be defined to filter candidates prior to fragmentation. Following filters are available:
UnconnectedCompoundFilter | - | filter non-connected compounds (e.g. salts) |
IsotopeFilter | - | filter compounds containing non-standard isotopes |
MinimumElementsFilter | - | filter by minimum of contained elements |
MaximumElementsFilter | - | filter by maximum of contained elements |
SmartsSubstructureInclusionFilter | - | filter by presence of defined sub-structures |
SmartsSubstructureExclusionFilter | - | filter by absence of defined sub-structures |
ElementInclusionFilter | - | filter by presence of defined elements (other elements are allowed) |
ElementInclusionExclusiveFilter | - | filter by presence of defined elements (no other elements are allowed) |
ElementExclusionFilter | - | filter by absence of defined sub-structures |
When defining pre-processing filters further parameters have to be defined:
MinimumElementsFilter | - | FilterMinimumElements |
MaximumElementsFilter | - | FilterMaximumElements |
SmartsSubstructureInclusionFilter | - | FilterSmartsInclusionList |
SmartsSubstructureExclusionFilter | - | FilterSmartsExclusionList |
ElementInclusionFilter | - | FilterIncludedElements |
ElementInclusionExclusiveFilter | - | FilterIncludedElements |
ElementExclusionFilter | - | FilterExcludedElements |
Examples:
# |
# MinimumElementsFilter |
# |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MinimumElementsFilter") |
# include compounds with at least 2 nitrogens and 3 oxygens |
settingsObject[["FilterMinimumElements"]]<-"N2O3" |
# |
# MaximumElementsFilter |
# |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MaximumElementsFilter") |
# filter out compounds with at maximum 5 nitrogens and 7 oxygens |
settingsObject[["FilterMinimumElements"]]<-"N5O7" |
# |
# SmartsSubstructureInclusionFilter |
# |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("SmartsSubstructureInclusionFilter") |
# include compounds containing benzene |
settingsObject[["SmartsSubstructureInclusionScoreSmartsList"]]<-c("c1ccccc1") |
# |
# SmartsSubstructureExclusionFilter |
# |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("SmartsSubstructureExclusionFilter") |
# filter out compounds containing hydroxyl groups |
settingsObject[["SmartsSubstructureInclusionScoreSmartsList"]]<-c("[OX2H]") |
# |
# ElementInclusionFilter |
# |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("ElementInclusionFilter") |
# include compounds containing nitrogen and oxygen |
settingsObject[["FilterIncludedElements"]]<-c("N","O") |
# |
# ElementExclusionFilter |
# |
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("ElementExclusionFilter") |
# filter out compounds including bromine or chlorine |
settingsObject[["FilterExcludedElements"]]<-c("Cl","Br") |
Defining multiple filters at once is possible:
settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MinimumElementsFilter", "MaximumElementsFilter") |
settingsObject[["FilterMinimumElements"]]<-"N2O3" |
settingsObject[["FilterMinimumElements"]]<-"N5O7" |
Filters can be defined to filter candidates after fragmentation and scoring. Following filters are available:
InChIKeyFilter | - | filter stereoisomers by comparing first part of compounds' InChIKeys |
only the best-scored candidate remains in the result list |
MetFrag is able to include different scores used to calculate a final score which is used to rank candidates within the candidate list. Besides pre-defined scores database dependent scores can be defined. For each additional defined score a weight needs to be defined used to calculate the final score for each candidate.
FragmenterScore | - | Uses intensities, m/z values and bond energies of fragment-peak-matches |
SmartsSubstructureInclusionScore | - | Score candiates by presence of defined substructures |
SmartsSubstructureExclusionScore | - | Score candiates by absence of defined substructures |
SuspectListScore | - | Score candidates by presence in defined suspect list |
RetentionTimeScore | - | Score candidate with retention time information |
OfflineMetFusionScore | - | Uses predifined spectral library to calculate MetFusion like similarity score |
When defining additional scores further parameters need to be defined:
SmartsSubstructureInclusionScore | - | ScoreSmartsInclusionList |
SmartsSubstructureExclusionScore | - | ScoreSmartsExclusionList |
SuspectListScore | - | ScoreSuspectLists |
- File path of file containing InChIKeys of suspect list one per line | ||
RetentionTimeScore | - | RetentionTimeTrainingFile, ExperimentalRetentionTimeValue |
- RetentionTimeTrainingFile is the file path of file containing retention time and InChI for logP calculation -> one per line | ||
- example file can be found here | ||
- mandatory columns are: RetentionTime, InChI or UserLogP | ||
- InChI is used to calculate logP values with models included within CDK | ||
- UserLogP (if defined) is used instead of InChI but has to be available within the candidate list as well | ||
- RetentionTime values and ExperimentalRetentionTimeValue need to be acquired on the same system |
Dependent on the used database different scores. When using local file databases any score defined as candidate property can be used as scoring term.
Following scoring terms are pre-defined for available databases:
ExtendedPubChem | - | PubChemNumberPatents,PubChemNumberPubMedReferences |
ChemSpider | - | ChemSpiderReferenceCount,ChemSpiderNumberExternalReferences,ChemSpiderRSCCount,ChemSpiderNumberPubMedReferences,ChemSpiderDataSourceCount |
Defining FragmenterScore, SmartsSubstructureInclusionScore and RetentionTimeScore together with necessary parameters.
settingsObject[["MetFragScoreTypes"]]<-c("FragmenterScore","SmartsSubstructureInclusionScore","RetentionTimeScore") |
settingsObject[["ScoreSmartsInclusionList"]]<-c("[OX2H]","c1cccc1") |
settingsObject[["RetentionTimeTrainingFile"]]<-"C:/Documents/RetentionTimeFile.csv" |
settingsObject[["ExperimentalRetentionTimeValue"]]<-9.4 |
settingsObject[["MetFragScoreWeights"]]<-c(1.0,0.5,0.5) |
Defining FragmenterScore and database dependent soring terms for ExtendedPubChem database.
settingsObject[["MetFragDatabaseType"]]<-c("ExtendedPubChem") |
settingsObject[["MetFragScoreTypes"]]<-c("FragmenterScore","PubChemNumberPatents","PubChemNumberPubMedReferences") |
settingsObject[["MetFragScoreWeights"]]<-c(1.0,0.5,0.5) |