How can you tell an XML parser to ignore entities that are referenced but not declared?
如何告诉XML解析器忽略被引用但未声明的实体?
I am getting exceptions like this:
我得到这样的例外:
org.xml.sax.SAXParseException: The entity "alpha" was referenced, but not declared.
org.xml.sax.SAXParseException:引用了实体“alpha”,但未声明。
What I want is for the parser to treat the string "α" as a simple string of characters, not as a character entity.
我想要的是解析器处理字符串“α”作为一个简单的字符串,而不是一个字符实体。
Also, I have a lot of these entities, so I can't tell the parser to ignore them singly.
此外,我有很多这些实体,所以我不能告诉解析器单独忽略它们。
1 个解决方案
#1
5
You could write a script (using sed, or perl, for example) that uses regexp replacement to preprocess the input documents and escape the ampersands, except at the beginning of character entities that XML recognizes (i.e. the five predefined ones, and any that you have declared).
您可以编写一个脚本(例如使用sed或perl),它使用regexp替换来预处理输入文档并转义&符号,除了在XML识别的字符实体的开头(即五个预定义的那些,以及任何你宣布)。
E.g. the script would replace &
with &
at the beginning of strings like α
, yielding α
. But it would leave <
and  
alone.
例如。该脚本将取代&&在字符串的开头,如α,yielding& alpha;。但它会留下<独自一人。
The question you're asking boils down to "How do I get tools that are designed to parse XML (i.e. well-formed XML) to handle non-XML (i.e. not-well-formed XML)?" And the answer will pretty much always be to use non-XML tools first to fix up the input and make it well-formed.
您要问的问题归结为“我如何获得旨在解析XML的工具(即格式良好的XML)来处理非XML(即格式不正确的XML)?”答案几乎总是首先使用非XML工具来修复输入并使其格式良好。
#1
5
You could write a script (using sed, or perl, for example) that uses regexp replacement to preprocess the input documents and escape the ampersands, except at the beginning of character entities that XML recognizes (i.e. the five predefined ones, and any that you have declared).
您可以编写一个脚本(例如使用sed或perl),它使用regexp替换来预处理输入文档并转义&符号,除了在XML识别的字符实体的开头(即五个预定义的那些,以及任何你宣布)。
E.g. the script would replace &
with &
at the beginning of strings like α
, yielding &alpha;
. But it would leave <
and  
alone.
例如。该脚本将取代&&在字符串的开头,如α,yielding& alpha;。但它会留下<独自一人。
The question you're asking boils down to "How do I get tools that are designed to parse XML (i.e. well-formed XML) to handle non-XML (i.e. not-well-formed XML)?" And the answer will pretty much always be to use non-XML tools first to fix up the input and make it well-formed.
您要问的问题归结为“我如何获得旨在解析XML的工具(即格式良好的XML)来处理非XML(即格式不正确的XML)?”答案几乎总是首先使用非XML工具来修复输入并使其格式良好。