CAS注册表到R中的Pubchem cid标识符转换

时间:2022-09-08 14:57:36

An annoying problem many chemists are faced with is to convert CAS registry numbers of chemical compounds (stored in some commercial database that is not readily accessible) to Pubchem identifiers (openly available). Pubchem kind of supports conversion between the two, but only through their manual web interface, and not their official PUG REST programmatic interface.

许多化学家面临的烦人问题是将化学化合物的CAS登记号(存储在一些不易获取的商业数据库中)转换为Pubchem标识符(公开可用)。 Pubchem类支持两者之间的转换,但只能通过他们的手动Web界面,而不是他们的官方PUG REST程序化界面。

A solution in Ruby is given here, based on the e-utilities interface: http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-with-ruby/

这里给出了Ruby的解决方案,基于电子实用程序界面:http://depth-first.com/articles/2007/09/13/hacking-pubchem-convert-cas-numbers-into-pubchem-cids-与-红宝石/

Does anybody know how this would translate into R?

有谁知道这将如何转化为R?

EDIT: based on the answerbelow, the most elegant solution is:

编辑:根据答案,最优雅的解决方案是:

library(XML)
library(RCurl)

CAStocids=function(query) {
  xmlresponse = xmlParse( getURL(paste("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query,sep="") ) )
  cids = sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
  return(cids)
}

> CAStocids("64318-79-2")
[1] "6434870" "5282237"

cheers, Tom

2 个解决方案

#1


5  

This how the Ruby code does it, translated to R, uses RCurl and XML:

这个Ruby代码如何实现它,转换为R,使用RCurl和XML:

> xmlresponse = xmlParse( getURL("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=64318-79-2") )

and here's how to extract the Id nodes:

以下是如何提取Id节点:

> sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 [1] "6434870" "5282237"

wrap all that in a function....

将所有内容包装在函数中....

 convertU = function(query){
    xmlresponse = xmlParse(getURL(
       paste0("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query))) 
    sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 }

> convertU("64318-79-2")
[1] "6434870" "5282237"
> convertU("64318-79-1")
list()
> convertU("64318-78-2")
list()
> convertU("64313-78-2")
[1] "313"

maybe needs a test if not found.

如果找不到可能需要测试。

#2


0  

I think you should still be able to convert CAS numbers to PubChem ID's using the PUG where instead of the name of the compound you enter the CAS number. Of course this might not be as specific if the CAS numbers overlap. I haven't tested it.

我认为您仍然可以使用PUG而不是输入CAS编号的化合物名称将CAS编号转换为PubChem ID。当然,如果CAS号重叠,这可能不是那么具体。我没有测试过。

An example with aspirin https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON

阿司匹林的一个例子https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON

#1


5  

This how the Ruby code does it, translated to R, uses RCurl and XML:

这个Ruby代码如何实现它,转换为R,使用RCurl和XML:

> xmlresponse = xmlParse( getURL("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=64318-79-2") )

and here's how to extract the Id nodes:

以下是如何提取Id节点:

> sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 [1] "6434870" "5282237"

wrap all that in a function....

将所有内容包装在函数中....

 convertU = function(query){
    xmlresponse = xmlParse(getURL(
       paste0("http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pccompound&retmax=100&term=",query))) 
    sapply(xpathSApply(xmlresponse, "//Id"), function(n){xmlValue(n)})
 }

> convertU("64318-79-2")
[1] "6434870" "5282237"
> convertU("64318-79-1")
list()
> convertU("64318-78-2")
list()
> convertU("64313-78-2")
[1] "313"

maybe needs a test if not found.

如果找不到可能需要测试。

#2


0  

I think you should still be able to convert CAS numbers to PubChem ID's using the PUG where instead of the name of the compound you enter the CAS number. Of course this might not be as specific if the CAS numbers overlap. I haven't tested it.

我认为您仍然可以使用PUG而不是输入CAS编号的化合物名称将CAS编号转换为PubChem ID。当然,如果CAS号重叠,这可能不是那么具体。我没有测试过。

An example with aspirin https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON

阿司匹林的一个例子https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/50-78-2/cids/JSON