MetFrag is available as R-package.
MetFragR on github Submit an issue
The R package enables functionalities from the MetFrag Commandline tool to be used within the R programming language.
First, check out the MetFragR GitHub repository and build the package (on command line):
| git clone https://github.com/ipb-halle/MetFragR.git |
| cd MetFragR |
| R CMD check metfRag |
| R CMD build metfRag |
After the succesful build turn into R and install the package:
| install.packages("metfRag",repos=NULL,type="source") |
| library(metfRag) |
The easiest way to install MetFragR is to use the GitHub link:
| library(devtools) |
| install_github("c-ruttkies/MetFragR/metfRag") |
| library(metfRag) |
The following example shows how to run a simple MetFrag query in R.
| # |
| # first define the settings object |
| # |
| settingsObject<-list() |
| # |
| # set database parameters to select candidates |
| # |
| settingsObject[["DatabaseSearchRelativeMassDeviation"]]<-5.0 |
| settingsObject[["FragmentPeakMatchAbsoluteMassDeviation"]]<-0.001 |
| settingsObject[["FragmentPeakMatchRelativeMassDeviation"]]<-5.0 |
| settingsObject[["MetFragDatabaseType"]]<-"PubChem" |
| # |
| # the more information about the precurosr is available |
| # the more precise is the candidate selection |
| # |
| settingsObject[["NeutralPrecursorMass"]]<-253.966126 |
| settingsObject[["NeutralPrecursorMolecularFormula"]]<-"C7H5Cl2FN2O3" |
| settingsObject[["PrecursorCompoundIDs"]]<-c("50465", "57010914", "56974741", "88419651", "23354334") |
| # |
| # pre and post-processing filter |
| # |
| # define filters to filter unconnected compounds (e.g. salts) |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("UnconnectedCompoundFilter","IsotopeFilter") |
| settingsObject[["MetFragPostProcessingCandidateFilter"]]<-c("InChIKeyFilter") |
| # |
| # define the peaklist as 2-dimensional matrix |
| # |
| settingsObject[["PeakList"]]<-matrix(c( |
| 90.97445, 681, |
| 106.94476, 274, |
| 110.02750, 110, |
| 115.98965, 95, |
| 117.98540, 384, |
| 124.93547, 613, |
| 124.99015, 146, |
| 125.99793, 207, |
| 133.95592, 777, |
| 143.98846, 478, |
| 144.99625, 352, |
| 146.00410, 999, |
| 151.94641, 962, |
| 160.96668, 387, |
| 163.00682, 782, |
| 172.99055, 17, |
| 178.95724, 678, |
| 178.97725, 391, |
| 180.97293, 999, |
| 196.96778, 720, |
| 208.96780, 999, |
| 236.96245, 999, |
| 254.97312, 999), ncol=2, byrow=TRUE) |
| # |
| # run MetFrag |
| # |
| scored.candidates<-run.metfrag(settingsObject) |
| # |
| # scored.candidates is a data.frame with scores and candidate properties |
| # |
Filters can be defined to filter candidates prior to fragmentation. Following filters are available:
| UnconnectedCompoundFilter | - | filter non-connected compounds (e.g. salts) |
| IsotopeFilter | - | filter compounds containing non-standard isotopes |
| MinimumElementsFilter | - | filter by minimum of contained elements |
| MaximumElementsFilter | - | filter by maximum of contained elements |
| SmartsSubstructureInclusionFilter | - | filter by presence of defined sub-structures |
| SmartsSubstructureExclusionFilter | - | filter by absence of defined sub-structures |
| ElementInclusionFilter | - | filter by presence of defined elements (other elements are allowed) |
| ElementInclusionExclusiveFilter | - | filter by presence of defined elements (no other elements are allowed) |
| ElementExclusionFilter | - | filter by absence of defined sub-structures |
When defining pre-processing filters further parameters have to be defined:
| MinimumElementsFilter | - | FilterMinimumElements |
| MaximumElementsFilter | - | FilterMaximumElements |
| SmartsSubstructureInclusionFilter | - | FilterSmartsInclusionList |
| SmartsSubstructureExclusionFilter | - | FilterSmartsExclusionList |
| ElementInclusionFilter | - | FilterIncludedElements |
| ElementInclusionExclusiveFilter | - | FilterIncludedElements |
| ElementExclusionFilter | - | FilterExcludedElements |
Examples:
| # |
| # MinimumElementsFilter |
| # |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MinimumElementsFilter") |
| # include compounds with at least 2 nitrogens and 3 oxygens |
| settingsObject[["FilterMinimumElements"]]<-"N2O3" |
| # |
| # MaximumElementsFilter |
| # |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MaximumElementsFilter") |
| # filter out compounds with at maximum 5 nitrogens and 7 oxygens |
| settingsObject[["FilterMinimumElements"]]<-"N5O7" |
| # |
| # SmartsSubstructureInclusionFilter |
| # |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("SmartsSubstructureInclusionFilter") |
| # include compounds containing benzene |
| settingsObject[["SmartsSubstructureInclusionScoreSmartsList"]]<-c("c1ccccc1") |
| # |
| # SmartsSubstructureExclusionFilter |
| # |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("SmartsSubstructureExclusionFilter") |
| # filter out compounds containing hydroxyl groups |
| settingsObject[["SmartsSubstructureInclusionScoreSmartsList"]]<-c("[OX2H]") |
| # |
| # ElementInclusionFilter |
| # |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("ElementInclusionFilter") |
| # include compounds containing nitrogen and oxygen |
| settingsObject[["FilterIncludedElements"]]<-c("N","O") |
| # |
| # ElementExclusionFilter |
| # |
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("ElementExclusionFilter") |
| # filter out compounds including bromine or chlorine |
| settingsObject[["FilterExcludedElements"]]<-c("Cl","Br") |
Defining multiple filters at once is possible:
| settingsObject[["MetFragPreProcessingCandidateFilter"]]<-c("MinimumElementsFilter", "MaximumElementsFilter") |
| settingsObject[["FilterMinimumElements"]]<-"N2O3" |
| settingsObject[["FilterMinimumElements"]]<-"N5O7" |
Filters can be defined to filter candidates after fragmentation and scoring. Following filters are available:
| InChIKeyFilter | - | filter stereoisomers by comparing first part of compounds' InChIKeys |
| only the best-scored candidate remains in the result list |
MetFrag is able to include different scores used to calculate a final score which is used to rank candidates within the candidate list. Besides pre-defined scores database dependent scores can be defined. For each additional defined score a weight needs to be defined used to calculate the final score for each candidate.
| FragmenterScore | - | Uses intensities, m/z values and bond energies of fragment-peak-matches |
| SmartsSubstructureInclusionScore | - | Score candiates by presence of defined substructures |
| SmartsSubstructureExclusionScore | - | Score candiates by absence of defined substructures |
| SuspectListScore | - | Score candidates by presence in defined suspect list |
| RetentionTimeScore | - | Score candidate with retention time information |
| OfflineMetFusionScore | - | Uses predifined spectral library to calculate MetFusion like similarity score |
When defining additional scores further parameters need to be defined:
| SmartsSubstructureInclusionScore | - | ScoreSmartsInclusionList |
| SmartsSubstructureExclusionScore | - | ScoreSmartsExclusionList |
| SuspectListScore | - | ScoreSuspectLists |
| - File path of file containing InChIKeys of suspect list one per line | ||
| RetentionTimeScore | - | RetentionTimeTrainingFile, ExperimentalRetentionTimeValue |
| - RetentionTimeTrainingFile is the file path of file containing retention time and InChI for logP calculation -> one per line | ||
| - example file can be found here | ||
| - mandatory columns are: RetentionTime, InChI or UserLogP | ||
| - InChI is used to calculate logP values with models included within CDK | ||
| - UserLogP (if defined) is used instead of InChI but has to be available within the candidate list as well | ||
| - RetentionTime values and ExperimentalRetentionTimeValue need to be acquired on the same system |
Dependent on the used database different scores. When using local file databases any score defined as candidate property can be used as scoring term.
Following scoring terms are pre-defined for available databases:
| ExtendedPubChem | - | PubChemNumberPatents,PubChemNumberPubMedReferences |
| ChemSpider | - | ChemSpiderReferenceCount,ChemSpiderNumberExternalReferences,ChemSpiderRSCCount,ChemSpiderNumberPubMedReferences,ChemSpiderDataSourceCount |
Defining FragmenterScore, SmartsSubstructureInclusionScore and RetentionTimeScore together with necessary parameters.
| settingsObject[["MetFragScoreTypes"]]<-c("FragmenterScore","SmartsSubstructureInclusionScore","RetentionTimeScore") |
| settingsObject[["ScoreSmartsInclusionList"]]<-c("[OX2H]","c1cccc1") |
| settingsObject[["RetentionTimeTrainingFile"]]<-"C:/Documents/RetentionTimeFile.csv" |
| settingsObject[["ExperimentalRetentionTimeValue"]]<-9.4 |
| settingsObject[["MetFragScoreWeights"]]<-c(1.0,0.5,0.5) |
Defining FragmenterScore and database dependent soring terms for ExtendedPubChem database.
| settingsObject[["MetFragDatabaseType"]]<-c("ExtendedPubChem") |
| settingsObject[["MetFragScoreTypes"]]<-c("FragmenterScore","PubChemNumberPatents","PubChemNumberPubMedReferences") |
| settingsObject[["MetFragScoreWeights"]]<-c(1.0,0.5,0.5) |