I have a little problem... if I have this simple SPARQL query
SELECT ?abstract
<http://dbpedia.org/resource/Mitsubishi> <http://dbpedia.org/ontology/abstract> ?abstract.
FILTER langMatches( lang(?abstract), 'en')}
I have this result: SPARQL Result and it has a non-English character... is there any idea how to remove them and retrieve just English words?
1 个解决方案
You'll need to define exactly what characters you want and don't want in your result, but you can use replace to replace characters outside of a range with, e.g., empty strings. If you wanted to exclude all but the Basic Latin, Latin-1 Supplement, Latin Extended-A, and Latin Extended-B ranges, (which ends up being \u0000–\u024f) you could do the following:
您需要在结果中准确定义您想要和不想要的字符,但是您可以使用replace来替换范围之外的字符,例如空字符串。如果您想要排除除Basic Basic,Latin-1 Supplement,Latin Extended-A和Latin Extended-B范围之外的所有内容(最终为\ u0000 \ u024f),您可以执行以下操作:
SELECT ?abstract ?cleanAbstract
dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract
FILTER langMatches( lang(?abstract), 'en')
bind(replace(?abstract,"[^\\x{0000}-\\x{024f}]","") as ?cleanAbstract)
Or even simpler:
SELECT (replace(?abstract_,"[^\\x{0000}-\\x{024f}]","") as ?abstract)
dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract_
FILTER langMatches(lang(?abstract_), 'en')
The Mitsubishi Group (, Mitsubishi Gurūpu) (also known as the Mitsubishi Group of Companies or Mitsubishi Companies) is a group of autonomous Japanese multinational companies covering a range of businesses which share the Mitsubishi brand, trademark, and legacy.The Mitsubishi group of companies form a loose entity, the Mitsubishi Keiretsu, which is often referenced in Japanese and US media and official reports; in general these companies all descend from the zaibatsu of the same name. The top 25 companies are also members of the Mitsubishi Kin'yōkai, or "Friday Club", and meet monthly. In addition the Mitsubishi.com Committee exists to facilitate communication and access of the Mitsubishi brand through a portal web site.
You may find the Latin script in Unicode Wikipedia article useful.
您可能会发现Unicode Wikipedia文章中的拉丁文脚本很有用。
You'll need to define exactly what characters you want and don't want in your result, but you can use replace to replace characters outside of a range with, e.g., empty strings. If you wanted to exclude all but the Basic Latin, Latin-1 Supplement, Latin Extended-A, and Latin Extended-B ranges, (which ends up being \u0000–\u024f) you could do the following:
您需要在结果中准确定义您想要和不想要的字符,但是您可以使用replace来替换范围之外的字符,例如空字符串。如果您想要排除除Basic Basic,Latin-1 Supplement,Latin Extended-A和Latin Extended-B范围之外的所有内容(最终为\ u0000 \ u024f),您可以执行以下操作:
SELECT ?abstract ?cleanAbstract
dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract
FILTER langMatches( lang(?abstract), 'en')
bind(replace(?abstract,"[^\\x{0000}-\\x{024f}]","") as ?cleanAbstract)
Or even simpler:
SELECT (replace(?abstract_,"[^\\x{0000}-\\x{024f}]","") as ?abstract)
dbpedia:Mitsubishi dbpedia-owl:abstract ?abstract_
FILTER langMatches(lang(?abstract_), 'en')
The Mitsubishi Group (, Mitsubishi Gurūpu) (also known as the Mitsubishi Group of Companies or Mitsubishi Companies) is a group of autonomous Japanese multinational companies covering a range of businesses which share the Mitsubishi brand, trademark, and legacy.The Mitsubishi group of companies form a loose entity, the Mitsubishi Keiretsu, which is often referenced in Japanese and US media and official reports; in general these companies all descend from the zaibatsu of the same name. The top 25 companies are also members of the Mitsubishi Kin'yōkai, or "Friday Club", and meet monthly. In addition the Mitsubishi.com Committee exists to facilitate communication and access of the Mitsubishi brand through a portal web site.
You may find the Latin script in Unicode Wikipedia article useful.
您可能会发现Unicode Wikipedia文章中的拉丁文脚本很有用。