I face issue parsing xhtml with DOCTYPE declaration using DOM parser.
我面临使用DOM解析器解析带有DOCTYPE声明的xhtml的问题。
Error: java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20
错误:java.io.IOException:服务器返回HTTP响应代码:503为URL:http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20
Declaration: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
声明:DOCTYPE html PUBLIC“ - // W3C // DTD XHTML 1.0 Transitional // EN”“http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Is there a way to parse the xhtml to a Document object ignoring the DOCTYPE declaration.
有没有办法将xhtml解析为忽略DOCTYPE声明的Document对象。
4 个解决方案
#1
4
A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)
对我有用的解决方案是为DocumentBuilder提供一个返回空流的假解析器。这里有一个很好的解释(看看kdgregory的最后一条消息)
http://forums.sun.com/thread.jspa?threadID=5362097
here's kdgregory's solution:
这是kdgregory的解决方案:
documentBuilder.setEntityResolver(new EntityResolver()
{
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException
{
return new InputSource(new StringReader(""));
}
});
#2
1
The parser is required to download the DTD, but you may get around it by setting the standalone attribute on the <?xml... ?>
line.
解析器需要下载DTD,但您可以通过在 行上设置独立属性来解决它。
Note however, that this particular error is most likely triggered by a confusion between XML Schema definitions and DTD URL's. See http://www.w3schools.com/xhtml/xhtml_dtd.asp for details. The right one is:
但请注意,此特定错误很可能是由XML架构定义与DTD URL之间的混淆引发的。有关详细信息,请参见http://www.w3schools.com/xhtml/xhtml_dtd.asp。正确的是:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
#3
1
The easiest thing to do is to set validating=false in your DocumentBuilderFactory. If you want to do validation, download the DTD and use a local copy. As commented by Rachel above, this is discussed at The WWW Consortium.
最简单的方法是在DocumentBuilderFactory中设置validating = false。如果要进行验证,请下载DTD并使用本地副本。正如上面的Rachel所评论,这在WWW Consortium进行了讨论。
In short, because the default DocumentBuilderFactory downloads the DTD every time it validates, the W3 was getting hit every time a typical programmer tried to parse an XHTML file in Java. They can't afford that much traffic, so they respond with an error.
简而言之,因为每次验证时默认的DocumentBuilderFactory都会下载DTD,所以每当典型的程序员尝试用Java解析XHTML文件时,W3就会受到攻击。他们承担不起那么多的流量,所以他们回答错误。
#4
0
Instead of the fake resolver, the following code snippet instructs the parser to really ignore the external DTD from the DOCTYPE declaration:
以下代码片段指示解析器真正忽略DOCTYPE声明中的外部DTD,而不是伪解析器:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
(...)
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )
#1
4
A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)
对我有用的解决方案是为DocumentBuilder提供一个返回空流的假解析器。这里有一个很好的解释(看看kdgregory的最后一条消息)
http://forums.sun.com/thread.jspa?threadID=5362097
here's kdgregory's solution:
这是kdgregory的解决方案:
documentBuilder.setEntityResolver(new EntityResolver()
{
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException
{
return new InputSource(new StringReader(""));
}
});
#2
1
The parser is required to download the DTD, but you may get around it by setting the standalone attribute on the <?xml... ?>
line.
解析器需要下载DTD,但您可以通过在 行上设置独立属性来解决它。
Note however, that this particular error is most likely triggered by a confusion between XML Schema definitions and DTD URL's. See http://www.w3schools.com/xhtml/xhtml_dtd.asp for details. The right one is:
但请注意,此特定错误很可能是由XML架构定义与DTD URL之间的混淆引发的。有关详细信息,请参见http://www.w3schools.com/xhtml/xhtml_dtd.asp。正确的是:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
#3
1
The easiest thing to do is to set validating=false in your DocumentBuilderFactory. If you want to do validation, download the DTD and use a local copy. As commented by Rachel above, this is discussed at The WWW Consortium.
最简单的方法是在DocumentBuilderFactory中设置validating = false。如果要进行验证,请下载DTD并使用本地副本。正如上面的Rachel所评论,这在WWW Consortium进行了讨论。
In short, because the default DocumentBuilderFactory downloads the DTD every time it validates, the W3 was getting hit every time a typical programmer tried to parse an XHTML file in Java. They can't afford that much traffic, so they respond with an error.
简而言之,因为每次验证时默认的DocumentBuilderFactory都会下载DTD,所以每当典型的程序员尝试用Java解析XHTML文件时,W3就会受到攻击。他们承担不起那么多的流量,所以他们回答错误。
#4
0
Instead of the fake resolver, the following code snippet instructs the parser to really ignore the external DTD from the DOCTYPE declaration:
以下代码片段指示解析器真正忽略DOCTYPE声明中的外部DTD,而不是伪解析器:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
(...)
DocumentBuilderFactory f = DocumentBuilderFactory.newInstance();
f.setValidating(false);
f.setAttribute("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder builder = f.newDocumentBuilder();
Document document = builder.parse( ... )