如何使用saxon内置目录特性

时间:2022-06-02 15:34:10

I downloaded SaxonHE9-4-0-6J and want to process XHTML on CLI. However Saxon tries to load DTD from W3C and it takes too much time for every simple command.

我下载了SaxonHE9-4-0-6J,希望在CLI上处理XHTML。然而,Saxon试图从W3C加载DTD,每个简单的命令都需要花费太多时间。

I have xml catalog, which I use successfully with xmllint by set env variable pointing to catalog file, but I have no idea how to make Saxon use it. Google reveals whole history of changes (thus confusion) in regards of using catalogs with Saxon, and none made me happy.

我有xml catalog,它在xmllint中通过set env变量指向catalog文件成功地使用,但是我不知道如何使用Saxon。谷歌揭示了使用Saxon目录的整个历史变化(因此令人困惑),没有一个让我高兴。

I downloaded resolver.jar and set it in my CLASSPATH, but I can't make Saxon use it. After various combinations, I followed http://www.saxonica.com/documentation/sourcedocs/xml-catalogs.xml by using just catalog variable, like:

我下载的解析器。jar并将它设置到我的类路径中,但是我不能让Saxon使用它。在各种组合之后,我使用了目录变量http://www.saxonica.com/documentation/sourcedocs/xml-catalogs.xml,如下所示:

-catalog:path-to-my-catalog

目录:path-to-my-catalog

(tried both URI and regular paths), and without setting -r, -x, -y switches, but Saxon doesn't see it. I get this error:

(尝试了URI和常规路径),没有设置-r、-x和-y开关,但是Saxon没有看到它。我得到这个错误:

Query processing failed: Failed to load Apache catalog resolver library

查询处理失败:未能加载Apache目录解析器库

resolver.jar is set in my classpath and I can use it from command line:

解析器。jar在我的类路径中设置,我可以从命令行使用它:

C:\temp>java org.apache.xml.resolver.apps.resolver
Usage: resolver [options] keyword

Where:

-c catalogfile  Loads a particular catalog file.
-n name         Sets the name.
-p publicId     Sets the public identifier.
-s systemId     Sets the system identifier.
-a              Makes the system URI absolute before resolution
-u uri          Sets the URI.
-d integer      Set the debug level.
keyword         Identifies the type of resolution to perform:
                doctype, document, entity, notation, public, system,
                or uri.

OTOH, Saxon archive itself already includes XHTML and various other DTDs, so there must be simple way out from this frustration.

OTOH、Saxon归档本身已经包含XHTML和各种其他dtd,因此必须有简单的方法来摆脱这种困扰。

How to use Saxon on command-line and instruct it to use local DTDs?

如何在命令行上使用Saxon并指示它使用本地dtd ?

2 个解决方案

#1


5  

From the saxonica link in your question:

来自你们问题中的saxonica链接:

When the -catalog option is used on the command line, this overrides the internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on setting the XML parser's EntityResolver, it is not possible to use them in conjunction.

当在命令行上使用-catalog选项时,将覆盖Saxon中使用的内部解析器(从9.4开始),以将众所周知的W3C引用(如XHTML DTD)重定向到Saxon的这些资源的本地副本。因为这两个特性都依赖于设置XML解析器的EntityResolver,所以不可能将它们结合使用。

This sounds to me like Saxon automatically uses local copies of the well-known W3C DTDs, but if you specify -catalog, it does not use the internal resolver and you have to specify these explicitly in your catalog.

这听起来就像Saxon会自动使用著名的W3C dtd的本地副本,但是如果您指定-catalog,它不会使用内部解析器,您必须在编目中显式地指定这些副本。


Here's a working example of using a catalog with Saxon...

下面是使用Saxon目录的一个工作示例……

File/directory structure of my example

我的示例的文件/目录结构

C:/so_test/lib
C:/so_test/lib/catalog.xml
C:/so_test/lib/resolver.jar
C:/so_test/lib/saxon9he.jar
C:/so_test/lib/test.dtd
C:/so_test/test.xml

XML DTD (so_test/lib/test.dtd)

XML DTD(so_test / lib / test.dtd)

<!ELEMENT test (foo)>
<!ELEMENT foo (#PCDATA)>

XML Instance (so_test/test.xml)

XML实例(so_test / test.xml)

Note that the system identifier points to a location that doesn't exist to make sure the catalog is being used.

注意,系统标识符指向一个不存在的位置,以确保使用了目录。

<!DOCTYPE test PUBLIC "-//TEST//Dan Test//EN" "dir_that_doesnt_exist/test.dtd">
<test>
    <foo>Success!</foo>
</test>

XML Catalog (so_test/lib/catalog.xml)

XML目录(so_test / lib / catalog.xml)

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <group prefer="public" xml:base="file:///C:/so_test/lib">
        <public publicId="-//TEST//Dan Test//EN" uri="lib/test.dtd"/>
    </group>
</catalog>

Command Line

命令行

Note the -dtd option to enable validation.

注意-dtd选项以启用验证。

C:\so_test>java -cp lib/saxon9he.jar;lib/resolver.jar net.sf.saxon.Query -s:"test.xml" -qs:"<results>{data(/test/foo)}</results>" -catalog:"lib/catalog.xml" -dtd

Results

结果

<results>Success!</results>

If I make the XML instance invalid:

如果我使XML实例无效:

<!DOCTYPE test PUBLIC "-//TEST//Dan Test//EN" "dir_that_doesnt_exist/test.dtd">
<test>
    <x/>
    <foo>Success!</foo>
</test>

and run the same command line as above, here is the result:

运行与上面相同的命令行,结果如下:

Recoverable error on line 4 column 6 of test.xml:
  SXXP0003: Error reported by XML parser: Element type "x" must be declared.
Recoverable error on line 6 column 8 of test.xml:
  SXXP0003: Error reported by XML parser: The content of element type "test" must match "(foo)".
Query processing failed: The XML parser reported two validation errors

Hopefully this example will help you figure out what to change with your setup.

希望这个示例能帮助您了解如何更改您的设置。

Also, using the -t option gives you additional information such as what catalog was loaded and if the public identifier was resolved:

另外,使用-t选项可以提供额外的信息,比如加载了什么目录,以及公共标识符是否已解析:

Loading catalog: file:///C:/so_test/lib/catalog.xml
Saxon-HE 9.4.0.6J from Saxonica
Java version 1.6.0_35
Analyzing query from {<results>{data(/test/foo)}</results>}
Analysis time: 122.70132 milliseconds
Processing file:/C:/so_test/test.xml
Using parser org.apache.xml.resolver.tools.ResolvingXMLReader
Building tree for file:/C:/so_test/test.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Resolved public: -//TEST//Dan Test//EN
        file:/C:/so_test/lib/test.dtd
Tree built in 0 milliseconds
Tree size: 5 nodes, 8 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?><results>Success!</results>Execution time: 19.482079ms
Memory used: 20648808

Additional Information

额外的信息

Saxon distributes the Apache version of Xerces, so use the resolver.jar found in the Apache Xerces distribution.

Saxon分发Xerces的Apache版本,因此使用解析器。jar可以在Apache Xerces发行版中找到。

#2


0  

Daniel Haley has answered better than I could about how to use an explicit catalog with Saxon.

关于如何使用Saxon中的显式目录,Daniel Haley给出了比我更好的答案。

As for using built-in copies of the well-known DTDs, Saxon 9.4 will indeed do this automatically by default if it recognizes the system ID or public ID of the required resource. If it's going to the W3C site, the first thing we need to discover is the precise form of the DOCTYPE you are using.

对于使用众所周知的dtd的内置副本,如果Saxon 9.4识别所需资源的系统ID或公共ID,那么它在默认情况下确实会自动执行此操作。如果是W3C站点,我们首先需要发现的是您正在使用的DOCTYPE的精确形式。

The error message about failure to load the Apache catalog resolver actually means that Saxon has been unable to load the class org.apache.xml.resolver.CatalogManager. I wonder if you're using a version of the resolver that doesn't include this class? I can't think of any other explanation.

关于装入Apache编目解析器失败的错误消息实际上意味着Saxon无法装入类org.apache.xml.resolver.CatalogManager。我想知道你是否在使用一个不包含这个类的解析器版本?我想不出别的解释了。

#1


5  

From the saxonica link in your question:

来自你们问题中的saxonica链接:

When the -catalog option is used on the command line, this overrides the internal resolver used in Saxon (from 9.4) to redirect well-known W3C references (such as the XHTML DTD) to Saxon's local copies of these resources. Because both these features rely on setting the XML parser's EntityResolver, it is not possible to use them in conjunction.

当在命令行上使用-catalog选项时,将覆盖Saxon中使用的内部解析器(从9.4开始),以将众所周知的W3C引用(如XHTML DTD)重定向到Saxon的这些资源的本地副本。因为这两个特性都依赖于设置XML解析器的EntityResolver,所以不可能将它们结合使用。

This sounds to me like Saxon automatically uses local copies of the well-known W3C DTDs, but if you specify -catalog, it does not use the internal resolver and you have to specify these explicitly in your catalog.

这听起来就像Saxon会自动使用著名的W3C dtd的本地副本,但是如果您指定-catalog,它不会使用内部解析器,您必须在编目中显式地指定这些副本。


Here's a working example of using a catalog with Saxon...

下面是使用Saxon目录的一个工作示例……

File/directory structure of my example

我的示例的文件/目录结构

C:/so_test/lib
C:/so_test/lib/catalog.xml
C:/so_test/lib/resolver.jar
C:/so_test/lib/saxon9he.jar
C:/so_test/lib/test.dtd
C:/so_test/test.xml

XML DTD (so_test/lib/test.dtd)

XML DTD(so_test / lib / test.dtd)

<!ELEMENT test (foo)>
<!ELEMENT foo (#PCDATA)>

XML Instance (so_test/test.xml)

XML实例(so_test / test.xml)

Note that the system identifier points to a location that doesn't exist to make sure the catalog is being used.

注意,系统标识符指向一个不存在的位置,以确保使用了目录。

<!DOCTYPE test PUBLIC "-//TEST//Dan Test//EN" "dir_that_doesnt_exist/test.dtd">
<test>
    <foo>Success!</foo>
</test>

XML Catalog (so_test/lib/catalog.xml)

XML目录(so_test / lib / catalog.xml)

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <group prefer="public" xml:base="file:///C:/so_test/lib">
        <public publicId="-//TEST//Dan Test//EN" uri="lib/test.dtd"/>
    </group>
</catalog>

Command Line

命令行

Note the -dtd option to enable validation.

注意-dtd选项以启用验证。

C:\so_test>java -cp lib/saxon9he.jar;lib/resolver.jar net.sf.saxon.Query -s:"test.xml" -qs:"<results>{data(/test/foo)}</results>" -catalog:"lib/catalog.xml" -dtd

Results

结果

<results>Success!</results>

If I make the XML instance invalid:

如果我使XML实例无效:

<!DOCTYPE test PUBLIC "-//TEST//Dan Test//EN" "dir_that_doesnt_exist/test.dtd">
<test>
    <x/>
    <foo>Success!</foo>
</test>

and run the same command line as above, here is the result:

运行与上面相同的命令行,结果如下:

Recoverable error on line 4 column 6 of test.xml:
  SXXP0003: Error reported by XML parser: Element type "x" must be declared.
Recoverable error on line 6 column 8 of test.xml:
  SXXP0003: Error reported by XML parser: The content of element type "test" must match "(foo)".
Query processing failed: The XML parser reported two validation errors

Hopefully this example will help you figure out what to change with your setup.

希望这个示例能帮助您了解如何更改您的设置。

Also, using the -t option gives you additional information such as what catalog was loaded and if the public identifier was resolved:

另外,使用-t选项可以提供额外的信息,比如加载了什么目录,以及公共标识符是否已解析:

Loading catalog: file:///C:/so_test/lib/catalog.xml
Saxon-HE 9.4.0.6J from Saxonica
Java version 1.6.0_35
Analyzing query from {<results>{data(/test/foo)}</results>}
Analysis time: 122.70132 milliseconds
Processing file:/C:/so_test/test.xml
Using parser org.apache.xml.resolver.tools.ResolvingXMLReader
Building tree for file:/C:/so_test/test.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Resolved public: -//TEST//Dan Test//EN
        file:/C:/so_test/lib/test.dtd
Tree built in 0 milliseconds
Tree size: 5 nodes, 8 characters, 0 attributes
<?xml version="1.0" encoding="UTF-8"?><results>Success!</results>Execution time: 19.482079ms
Memory used: 20648808

Additional Information

额外的信息

Saxon distributes the Apache version of Xerces, so use the resolver.jar found in the Apache Xerces distribution.

Saxon分发Xerces的Apache版本,因此使用解析器。jar可以在Apache Xerces发行版中找到。

#2


0  

Daniel Haley has answered better than I could about how to use an explicit catalog with Saxon.

关于如何使用Saxon中的显式目录,Daniel Haley给出了比我更好的答案。

As for using built-in copies of the well-known DTDs, Saxon 9.4 will indeed do this automatically by default if it recognizes the system ID or public ID of the required resource. If it's going to the W3C site, the first thing we need to discover is the precise form of the DOCTYPE you are using.

对于使用众所周知的dtd的内置副本,如果Saxon 9.4识别所需资源的系统ID或公共ID,那么它在默认情况下确实会自动执行此操作。如果是W3C站点,我们首先需要发现的是您正在使用的DOCTYPE的精确形式。

The error message about failure to load the Apache catalog resolver actually means that Saxon has been unable to load the class org.apache.xml.resolver.CatalogManager. I wonder if you're using a version of the resolver that doesn't include this class? I can't think of any other explanation.

关于装入Apache编目解析器失败的错误消息实际上意味着Saxon无法装入类org.apache.xml.resolver.CatalogManager。我想知道你是否在使用一个不包含这个类的解析器版本?我想不出别的解释了。