如何使用SPARQL查询SUMO本体

时间:2022-04-02 18:06:07

I'm using the SUMO ontology which I want to query with SPARQL. A typical entry in the SUMO, e.g., for a city, looks like this:

我正在使用SUMO本体,我想用SPARQL查询。 SUMO中的典型条目,例如城市,如下所示:

<owl:Thing rdf:ID="MadridSpain">
 <rdfs:isDefinedBy rdf:resource="http://www.ontologyportal.org/SUMO.owl"/>
 <rdf:type rdf:resource="#City"/>
 <owl:comment xml:lang="en">The City of Madrid in Spain.</owl:comment>
 <geographicSubregion rdf:resource="#Spain" />
 <externalImage rdf:datatype="xsd:anyURI">[...]</externalImage>
 <rdfs:label xml:lang="en">madrid spain</rdfs:label>
</owl:Thing>

If I want to get all cities from the ontology I use this example query (which works fine):

如果我想从本体获取所有城市,我使用此示例查询(工作正常):

String prefix = "PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> "
              + "PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>";
String rdq = prefix + "SELECT ?N ?O WHERE {?N rdf:type <http://www.ontologyportal.or/SUMO.owl#City>}";

My problem start when I want to filter the results. Suppose that I only want all cities which are a geographicSubregion of Spain. First I tried to solve that problem by analyzing all the results in Java and Jena which takes a huge amount of time (5-10s for each result, ~10000 results at all).

当我想过滤结果时,我的问题开始了。假设我只想要所有属于西班牙地理区域的城市。首先,我尝试通过分析Java和Jena中的所有结果来解决这个问题,这需要花费大量时间(每个结果5-10秒,总共10000个结果)。

Query myQuery = QueryFactory.create(rdq);
QueryExecution qexec = QueryExecutionFactory.create(myQuery, owlModel);
try {
 ResultSet results = qexec.execSelect();
 for (; results.hasNext();) {
  QuerySolution sol = results.nextSolution();
  Resource res = sol.getResource("N");
  StmtIterator it = res.listProperties();

  while(it.hasNext()){
   Statement state = it.next();
   //Doing some filtering
   System.out.println("predicate: " + state.getPredicate().toString());
   System.out.println("subject: " + state.getSubject().toString());
   System.out.println("object: " + state.getObject().toString());
  }
 }
}catch (Exception e) {
 e.printStackTrace();
 System.err.println("Query Error " + e.getMessage());
}

Sure this isn't really effective and it must exist an easier way by using the right query. But at the moment I'm stuck at defining such a query. I tried the following ones, but none of them works.

当然这不是真正有效,并且必须通过使用正确的查询以更简单的方式存在。但目前我仍然坚持定义这样的查询。我尝试了以下的,但没有一个有效。

SELECT ?N ?O WHERE { ?N rdf:type <http://www.ontologyportal.org/SUMO.owl#City> . 
 { SELECT ?N WHERE { (rdf:type ?b rdf:statement) .
 (rdf:Predicate ?b <http://www.ontologyportal.org/SUMO.owl#geographicSubregion>) .
 (rdf:Object ?b <http://www.ontologyportal.org/SUMO.owl#Spain>) } } }

SELECT ?N ?O WHERE { (rdf:statement ?b) .
 (rdf:Predicate ?b <http://www.ontologyportal.org/SUMO.owl#geographicSubregion>) . 
 (rdf:Object ?b <http://www.ontologyportal.org/SUMO.owl#Spain>) . }";

Does someone has an idea how to create a query which gets all cities within a country?

有人知道如何创建一个获取国家内所有城市的查询吗?

1 个解决方案

#1


5  

I took the RDF you presented to make a minimal RDF file that I could query against:

我拿了你提出的RDF来制作一个我可以查询的最小RDF文件:

<rdf:RDF xmlns="http://www.ontologyportal.org/SUMO.owl#"
         xml:base="http://www.ontologyportal.org/SUMO.owl"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <owl:Thing rdf:ID="MadridSpain">
    <rdfs:isDefinedBy rdf:resource="http://www.ontologyportal.org/SUMO.owl"/>
    <rdf:type rdf:resource="#City"/>
    <owl:comment xml:lang="en">The City of Madrid in Spain.</owl:comment>
    <geographicSubregion rdf:resource="#Spain" />
    <externalImage rdf:datatype="xsd:anyURI">[...]</externalImage>
    <rdfs:label xml:lang="en">madrid spain</rdfs:label>
  </owl:Thing>
</rdf:RDF>

SPARQL is a query language for matching data in RDF graphs. The edges in an RDF graph are triples, simple statements of the form subject predicate object. You were matching against a single triple.

SPARQL是一种用于匹配RDF图中数据的查询语言。 RDF图中的边是三元组,形式主语谓词对象的简单语句。你匹配一个三联。

?N rdf:type <http://www.ontologyportal.org/SUMO.owl#City>

Your query, as it is, will be easier to write if you define a prefix for sumo:, so we end up with (also renaming ?N to ?city):

如果你为sumo:定义一个前缀,那么你的查询就会更容易编写,所以我们最终得到(同样重命名?N到?city):

prefix sumo: <http://www.ontologyportal.org/SUMO.owl#>
select ?city where { 
  ?city rdf:type sumo:City .
}

That selects all the cities, as you've seen. Now you just need to match an additional triple, so we just add it to the query:

正如您所见,这会选择所有城市。现在你只需要匹配一个额外的三元组,所以我们只需将它添加到查询中:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#>
select ?city where { 
  ?city rdf:type sumo:City .
  ?city sumo:geographicSubregion sumo:Spain .
}

To make this look nicer, two abbreviations can be applied. First, in SPARQL, rdf:type can be written as a, because it's so common (and then we won't need to define the prefix, too). Second, when you have multiple triples with the same subject, you can list the predicate object parts separated by a semicolon. We end up with

为了使这看起来更好,可以应用两个缩写。首先,在SPARQL中,rdf:type可以写成a,因为它很常见(然后我们也不需要定义前缀)。其次,如果您有多个具有相同主题的三元组,则可以列出以分号分隔的谓词对象部分。我们结束了

prefix sumo: <http://www.ontologyportal.org/SUMO.owl#>
select ?city where { 
  ?city a sumo:City ;
        sumo:geographicSubregion sumo:Spain .
}

When I run this against the RDF above using Jena's command line tools, I get the following results:

当我使用Jena的命令行工具对上面的RDF运行时,我得到以下结果:

$ arq --data sumo.rdf --query query.sparql
--------------------
| city             |
====================
| sumo:MadridSpain |
--------------------

Why the other queries didn't work

What you were trying to do in stuff like this

你在这样的事情中试图做什么

(rdf:type ?b rdf:statement) .
(rdf:Predicate ?b <http://www.ontologyportal.org/SUMO.owl#geographicSubregion>) .
(rdf:Object ?b <http://www.ontologyportal.org/SUMO.owl#Spain>)

was using the RDF reification vocabulary. First, the syntax would need be to

正在使用RDF物化词汇表。首先,语法需要

?b a rdf:Statement ;
   rdf:subject ?city ;
   rdf:predicate sumo:geographicSubregion ;
   rdf:object sumo:Spain .

in order to match a reified triple of the form that you needed to answer your query. However, this query requires that there be four triples in the graph of the specified form, and those aren't in the model. Just because a triple is in the graph doesn't mean a reified version of it is. (After all, since all the triples that are used to reify the first triple would also have to be reified, and then those ones would, and so on.) SPARQL only lets you query the triples that are actually in the data.

为了匹配您需要回答查询的表单的已知三元组。但是,此查询要求指定表单的图形中有四个三元组,而这些三元组不在模型中。仅仅因为图中的三元组并不意味着它的具体化版本。 (毕竟,因为用于神化第一个三元组的所有三元组也必须具体化,然后那些将会,等等。)SPARQL只允许您查询实际存在于数据中的三元组。

#1


5  

I took the RDF you presented to make a minimal RDF file that I could query against:

我拿了你提出的RDF来制作一个我可以查询的最小RDF文件:

<rdf:RDF xmlns="http://www.ontologyportal.org/SUMO.owl#"
         xml:base="http://www.ontologyportal.org/SUMO.owl"
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
         xmlns:owl="http://www.w3.org/2002/07/owl#"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <owl:Thing rdf:ID="MadridSpain">
    <rdfs:isDefinedBy rdf:resource="http://www.ontologyportal.org/SUMO.owl"/>
    <rdf:type rdf:resource="#City"/>
    <owl:comment xml:lang="en">The City of Madrid in Spain.</owl:comment>
    <geographicSubregion rdf:resource="#Spain" />
    <externalImage rdf:datatype="xsd:anyURI">[...]</externalImage>
    <rdfs:label xml:lang="en">madrid spain</rdfs:label>
  </owl:Thing>
</rdf:RDF>

SPARQL is a query language for matching data in RDF graphs. The edges in an RDF graph are triples, simple statements of the form subject predicate object. You were matching against a single triple.

SPARQL是一种用于匹配RDF图中数据的查询语言。 RDF图中的边是三元组,形式主语谓词对象的简单语句。你匹配一个三联。

?N rdf:type <http://www.ontologyportal.org/SUMO.owl#City>

Your query, as it is, will be easier to write if you define a prefix for sumo:, so we end up with (also renaming ?N to ?city):

如果你为sumo:定义一个前缀,那么你的查询就会更容易编写,所以我们最终得到(同样重命名?N到?city):

prefix sumo: <http://www.ontologyportal.org/SUMO.owl#>
select ?city where { 
  ?city rdf:type sumo:City .
}

That selects all the cities, as you've seen. Now you just need to match an additional triple, so we just add it to the query:

正如您所见,这会选择所有城市。现在你只需要匹配一个额外的三元组,所以我们只需将它添加到查询中:

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#>
select ?city where { 
  ?city rdf:type sumo:City .
  ?city sumo:geographicSubregion sumo:Spain .
}

To make this look nicer, two abbreviations can be applied. First, in SPARQL, rdf:type can be written as a, because it's so common (and then we won't need to define the prefix, too). Second, when you have multiple triples with the same subject, you can list the predicate object parts separated by a semicolon. We end up with

为了使这看起来更好,可以应用两个缩写。首先,在SPARQL中,rdf:type可以写成a,因为它很常见(然后我们也不需要定义前缀)。其次,如果您有多个具有相同主题的三元组,则可以列出以分号分隔的谓词对象部分。我们结束了

prefix sumo: <http://www.ontologyportal.org/SUMO.owl#>
select ?city where { 
  ?city a sumo:City ;
        sumo:geographicSubregion sumo:Spain .
}

When I run this against the RDF above using Jena's command line tools, I get the following results:

当我使用Jena的命令行工具对上面的RDF运行时,我得到以下结果:

$ arq --data sumo.rdf --query query.sparql
--------------------
| city             |
====================
| sumo:MadridSpain |
--------------------

Why the other queries didn't work

What you were trying to do in stuff like this

你在这样的事情中试图做什么

(rdf:type ?b rdf:statement) .
(rdf:Predicate ?b <http://www.ontologyportal.org/SUMO.owl#geographicSubregion>) .
(rdf:Object ?b <http://www.ontologyportal.org/SUMO.owl#Spain>)

was using the RDF reification vocabulary. First, the syntax would need be to

正在使用RDF物化词汇表。首先,语法需要

?b a rdf:Statement ;
   rdf:subject ?city ;
   rdf:predicate sumo:geographicSubregion ;
   rdf:object sumo:Spain .

in order to match a reified triple of the form that you needed to answer your query. However, this query requires that there be four triples in the graph of the specified form, and those aren't in the model. Just because a triple is in the graph doesn't mean a reified version of it is. (After all, since all the triples that are used to reify the first triple would also have to be reified, and then those ones would, and so on.) SPARQL only lets you query the triples that are actually in the data.

为了匹配您需要回答查询的表单的已知三元组。但是,此查询要求指定表单的图形中有四个三元组,而这些三元组不在模型中。仅仅因为图中的三元组并不意味着它的具体化版本。 (毕竟,因为用于神化第一个三元组的所有三元组也必须具体化,然后那些将会,等等。)SPARQL只允许您查询实际存在于数据中的三元组。