We hit an issue that the XML validator present in Java JRE got very slow in JRE 1.6u24 and this is present even in the most recent update.
我们遇到了一个问题,即Java JRE中存在的XML验证器在JRE 1.6u24中变得非常慢,即使在最近的更新中也是如此。
Validating 1000 XMLs takes for us:
验证1000个XML需要我们:
~1.4 seconds for version <= 1.6u23; ~15.2 seconds for versions >= 1.6u24
版本<= 1.6u23~1.4秒;版本大约15.2秒> = 1.6u24
Which is 10 times slower! I tried to search if anyone found the issue already, but I can't find anything.
这慢了10倍!我试图搜索是否有人发现了这个问题,但我找不到任何东西。
We can of course workaround it by using another library like woodstox, but we would prefer pure JRE and I can't believe this could be there for such a long time without any improvement being made...
我们当然可以通过使用像woodstox这样的另一个库来解决它,但我们更喜欢纯粹的JRE,我无法相信这可能会存在很长时间没有任何改进......
The code we test looks like this:
我们测试的代码如下所示:
public static void main(String[] args) throws XMLStreamException, SAXException, IOException, ParserConfigurationException {
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new Source[] {
new StreamSource(new File("schema1.xsd")),
new StreamSource(new File("schema2.xsd")) });
Validator validator = schema.newValidator();
XMLInputFactory staxFactory = XMLInputFactory.newInstance();
String xml = FileUtils.readFileToString(new File("to_validate.xml"), "UTF-8");
Date start = new Date();
for (int i = 0; i < 1000; i++) {
XMLStreamReader xmlr = staxFactory.createXMLStreamReader(new StringReader(xml));
StAXSource ss = new StAXSource(xmlr);
validator.validate(ss);
}
Date end = new Date();
System.out.println("seconds needed: " + (end.getTime() - start.getTime()) / 1000f);
}
1 个解决方案
#1
0
I don't think Woodstox includes an XSD validator; if you want an alternative, you could try the Apache version of Xerces (which is usually far better than the JDK version), or Saxon-EE.
我不认为Woodstox包含XSD验证器;如果你想要一个替代品,你可以尝试Apache版本的Xerces(通常比JDK版本好得多)或Saxon-EE。
The most likely reason for a slowdown like this is that you are fetching DTDs from the W3C web site. I don't know why this would change between JDK versions, but it is certainly something that changed about a year ago, when W3C took a policy decision to throttle requests for common DTDs by deliberately delaying the response. If this is the problem then the solution is to use a catalog to redirect access to a local copy; or if you use Saxon 9.4, it has copies of the most common DTDs built in.
像这样放慢速度的最可能原因是你从W3C网站上获取DTD。我不知道为什么这会在JDK版本之间发生变化,但它肯定会在一年前发生变化,当时W3C通过故意延迟响应来采取策略决策来限制对常见DTD的请求。如果这是问题,那么解决方案是使用目录来重定向对本地副本的访问;或者如果您使用Saxon 9.4,它会包含内置最常见DTD的副本。
#1
0
I don't think Woodstox includes an XSD validator; if you want an alternative, you could try the Apache version of Xerces (which is usually far better than the JDK version), or Saxon-EE.
我不认为Woodstox包含XSD验证器;如果你想要一个替代品,你可以尝试Apache版本的Xerces(通常比JDK版本好得多)或Saxon-EE。
The most likely reason for a slowdown like this is that you are fetching DTDs from the W3C web site. I don't know why this would change between JDK versions, but it is certainly something that changed about a year ago, when W3C took a policy decision to throttle requests for common DTDs by deliberately delaying the response. If this is the problem then the solution is to use a catalog to redirect access to a local copy; or if you use Saxon 9.4, it has copies of the most common DTDs built in.
像这样放慢速度的最可能原因是你从W3C网站上获取DTD。我不知道为什么这会在JDK版本之间发生变化,但它肯定会在一年前发生变化,当时W3C通过故意延迟响应来采取策略决策来限制对常见DTD的请求。如果这是问题,那么解决方案是使用目录来重定向对本地副本的访问;或者如果您使用Saxon 9.4,它会包含内置最常见DTD的副本。