I'm writing a Clojure library to parse Mac OS X's XML-based property list files. The code works fine unless you give it a large input file, at which point you get java.lang.OutOfMemoryError: Java heap space
.
我正在编写一个Clojure库来解析Mac OS X的基于XML的属性列表文件。代码工作正常,除非你给它一个大的输入文件,此时你得到java.lang.OutOfMemoryError:Java堆空间。
Here's an example input file (small enough to work fine):
这是一个示例输入文件(小到可以正常工作):
<plist version="1.0">
<dict>
<key>Integer example</key>
<integer>5</integer>
<key>Array example</key>
<array>
<integer>2</integer>
<real>3.14159</real>
</array>
<key>Dictionary example</key>
<dict>
<key>Number</key>
<integer>8675309</integer>
</dict>
</dict>
</plist>
clojure.xml/parse
turns this into:
clojure.xml / parse将其转换为:
{:tag :plist, :attrs {:version "1.0"}, :content [
{:tag :dict, :attrs nil, :content [
{:tag :key, :attrs nil, :content ["Integer example"]}
{:tag :integer, :attrs nil, :content ["5"]}
{:tag :key, :attrs nil, :content ["Array example"]}
{:tag :array, :attrs nil, :content [
{:tag :integer, :attrs nil, :content ["2"]}
{:tag :real, :attrs nil, :content ["3.14159"]}
]}
{:tag :key, :attrs nil, :content ["Dictionary example"]}
{:tag :dict, :attrs nil, :content [
{:tag :key, :attrs nil, :content ["Number"]}
{:tag :integer, :attrs nil, :content ["8675309"]}
]}
]}
]}
My code turns this into the Clojure data structure
我的代码将其转换为Clojure数据结构
{"Dictionary example" {"Number" 8675309},
"Array example" [2 3.14159],
"Integer example" 5}
The relevant part of my code looks like
我的代码的相关部分看起来像
; extract the content contained within e.g. <integer>...</integer>
(defn- first-content
[c]
(first (c :content)))
; return a parsed version of the given tag
(defmulti content (fn [c] (c :tag)))
(defmethod content :array
[c]
(apply vector (for [item (c :content)] (content item))))
(defmethod content :dict
[c]
(apply hash-map (for [item (c :content)] (content item))))
(defmethod content :integer
[c]
(Long. (first-content c)))
(defmethod content :key
[c]
(first-content c))
(defmethod content :real
[c]
(Double. (first-content c)))
; take a java.io.File (or similar) and return the parsed version
(defn parse-plist
[source]
(content (first-content (clojure.xml/parse source))))
The meat of the code is the content
function, a multimethod that dispatches on the :tag (the name of the XML tag). I'm wondering whether there is something different I should be doing in order to make this recursion work better. I tried replacing all three calls to content
with trampoline content
, but that didn't work. Is there anything fancy I should do to get this mutual recursion to work more efficiently? Or am I taking a fundamentally wrong approach?
代码的内容是内容函数,一种调用:标记(XML标记的名称)的多方法。我想知道是否有一些不同我应该做的,以使这个递归更好地工作。我尝试用蹦床内容替换所有三个内容调用,但这不起作用。我是否应该做些什么来使这种相互递归更有效地工作?或者我采取了根本错误的做法?
Edit: By the way, this code is available on GitHub, in which form it might be easier to play around with.
编辑:顺便说一句,这个代码在GitHub上可用,在这种形式下它可能更容易使用。
2 个解决方案
#1
4
You have multiple (one per child) recursive calls from a single method so your code isn't (and can't be without a heavy reorg)tail-recursive. trampoline
is intended for mutual tail-recursive functions.
你有一个方法的多个(每个孩子一个)递归调用,所以你的代码不是(并且不能没有重组)尾递归。 trampoline用于相互尾递归函数。
How deep, how long is your large XML file? I'm asking because you are getting an OoM not a SO.
你的大型XML文件有多长,有多长?我问,因为你得到的是OoM而不是SO。
Anyway, to solve your recursion problem (which is unlikely to be the one causing the exception) you have to walk down your XML datastructure (eg with xml-zip
) while maintaining a stack (vector or list) representing your result tree under construction. It's ironic that the traversal of the XML datastructure is somewhat equivalent to the sax events which were used to build the structure.
无论如何,要解决递归问题(不太可能是导致异常的问题),您必须沿着XML数据结构(例如使用xml-zip),同时保持表示正在构建的结果树的堆栈(向量或列表)。具有讽刺意味的是,遍历XML数据结构有点等同于用于构建结构的sax事件。
#2
4
Heavy recursion will cause a *Exception
, not an OutOfMemoryError
. Also the recursion does not seem to be very deep here (only 3 levels as per the XML file in your example).
重递归将导致*Exception,而不是OutOfMemoryError。此外递归似乎不是很深(根据您的示例中的XML文件只有3个级别)。
My guess is, the OutOfMemoryError
is being thrown because the data structure your large XML files are being parsed into are too large to fit in the JVM heap. You can try increasing the heap size using -Xms
and -Xmx
options. However, the correct way to parse huge XML files is to use SAX events rather than building a tree (DOM or Clojure data structure).
我的猜测是,抛出了OutOfMemoryError,因为正在解析大型XML文件的数据结构太大而无法放入JVM堆中。您可以尝试使用-Xms和-Xmx选项增加堆大小。但是,解析大型XML文件的正确方法是使用SAX事件而不是构建树(DOM或Clojure数据结构)。
#1
4
You have multiple (one per child) recursive calls from a single method so your code isn't (and can't be without a heavy reorg)tail-recursive. trampoline
is intended for mutual tail-recursive functions.
你有一个方法的多个(每个孩子一个)递归调用,所以你的代码不是(并且不能没有重组)尾递归。 trampoline用于相互尾递归函数。
How deep, how long is your large XML file? I'm asking because you are getting an OoM not a SO.
你的大型XML文件有多长,有多长?我问,因为你得到的是OoM而不是SO。
Anyway, to solve your recursion problem (which is unlikely to be the one causing the exception) you have to walk down your XML datastructure (eg with xml-zip
) while maintaining a stack (vector or list) representing your result tree under construction. It's ironic that the traversal of the XML datastructure is somewhat equivalent to the sax events which were used to build the structure.
无论如何,要解决递归问题(不太可能是导致异常的问题),您必须沿着XML数据结构(例如使用xml-zip),同时保持表示正在构建的结果树的堆栈(向量或列表)。具有讽刺意味的是,遍历XML数据结构有点等同于用于构建结构的sax事件。
#2
4
Heavy recursion will cause a *Exception
, not an OutOfMemoryError
. Also the recursion does not seem to be very deep here (only 3 levels as per the XML file in your example).
重递归将导致*Exception,而不是OutOfMemoryError。此外递归似乎不是很深(根据您的示例中的XML文件只有3个级别)。
My guess is, the OutOfMemoryError
is being thrown because the data structure your large XML files are being parsed into are too large to fit in the JVM heap. You can try increasing the heap size using -Xms
and -Xmx
options. However, the correct way to parse huge XML files is to use SAX events rather than building a tree (DOM or Clojure data structure).
我的猜测是,抛出了OutOfMemoryError,因为正在解析大型XML文件的数据结构太大而无法放入JVM堆中。您可以尝试使用-Xms和-Xmx选项增加堆大小。但是,解析大型XML文件的正确方法是使用SAX事件而不是构建树(DOM或Clojure数据结构)。