I have to read a XML file with about ~4000 lines on Android. First I tried the SimpleXML library because it's the easiest and it took about 2 minutes on my HTC Desire. So I thought maybe SimpleXML is so slow because of reflection and all the other magic that this library uses. I rewrote my parser and used the built-in DOM parsing method with some special attention for performance. That helped a bit but it still took about 60 seconds which is still totally unacceptable. After a bit of research I found this article on developer.com. There are some graphs that show that the other two available methods - the SAX parser and Android's XML Pull-Parser - are equally slow. And at the end of the article you'll find the following statement:
我必须阅读一个在Android上有大约4000行代码的XML文件。首先,我尝试了SimpleXML库,因为它是最简单的,我花了大约2分钟来实现HTC的愿望。所以我认为SimpleXML可能因为反射和这个库使用的其他魔法而很慢。我重写了解析器,使用了内置的DOM解析方法,并特别注意性能。这有点帮助,但它仍然花了大约60秒,这仍然是完全不可接受的。经过一番研究,我在developer.com网站上找到了这篇文章。有一些图表显示,其他两个可用的方法(SAX解析器和Android的XML拖拽解析器)也同样慢。在文章的最后,你会发现下面的语句:
The first surprise I had was at how slow all three methods were. Users don't want to wait long for results on mobile phones, so parsing anything more than a few dozen records may mandate a different method.
我第一次感到惊讶的是,这三种方法都有多么慢。用户不希望在移动电话上等待结果,因此解析超过几十条记录的任何内容可能需要使用不同的方法。
What might be a "different method"? What to do if you have more than "a few dozen records"?
什么是“不同的方法”?如果你有超过“几十张唱片”怎么办?
8 个解决方案
#1
33
Original answer, in 2012
(note: make sure you read the 2016 update below!)
(注:请务必阅读以下2016年更新!)
I just did some perf testing comparing parsers on Android (and other platforms). The XML file being parsed is only 500 lines or so (its a Twitter search Atom feed), but Pull and DOM parsing can churn through about 5 such documents a second on a Samsung Galaxy S2 or Motorola Xoom2. SimpleXML (pink in the chart) as used by the OP ties for slowest with DOM parsing.
我刚刚在Android(和其他平台)上做了一些比较解析器的性能测试。被解析的XML文件只有大约500行(它是一个Twitter搜索Atom提要),但是在三星Galaxy S2或摩托罗拉Xoom2上,拉和DOM解析可以每秒处理大约5个这样的文档。SimpleXML(图中的粉色)是OP绑定中用于DOM解析最慢的。
SAX Parsing is an order of magnitude faster on both of my Android devices, managing 40 docs/sec single-threaded, and 65+/sec multi-threaded.
SAX解析在我的两款Android设备上都快了一个数量级,管理40个文档/秒单线程,65个以上/秒多线程。
Android 2.3.4:
安卓2.3.4:
The code is available in github, and a discussion here.
代码可以在github中找到,这里有一个讨论。
Update 18th March 2016
OK, so its been almost 4 years and the world has moved on. I finally got around to re-running the tests on:
好吧,已经快四年了,世界在向前发展。我终于有时间重新运行测试:
- A Samsung Galaxy S3 running Android 4.1.2
- 运行Android 4.1.2的三星Galaxy S3
- A Nexus7 (2012) running Android 4.4.4
- 运行Android 4.4.4的Nexus7 (2012)
- A Nexus5 running Android 6.0.1
- 一个运行Android 6.0.1的Nexus5
Somewhere between Android 4.4.4 and Android 6.0.1 the situation changed drastically and we have a new winner: Pull Parsing FTW at more than twice the throughput of SAX. Unfortunately I don't know exactly when this change arrived as I don't have any devices running Android > 4.4.4 and < 6.0.1.
在Android 4.4.4和Android 6.0.1之间,情况发生了巨大的变化,我们有了一个新的赢家:解析FTW的吞吐量是SAX的两倍多。不幸的是,我不知道这个更改是什么时候到来的,因为我没有运行Android > 4.4.4和< 6.0.1的设备。
Android 4.1.2:
Android 4.1.2:
Android 4.4.4:
Android 4.4.4:
Android 6.0.1:
Android 6.0.1中:
#2
5
I think the best way to work with XML on Android is use VDT-XML library
我认为在Android上使用XML的最好方法是使用VDT-XML库
My XML file contains more then 60 000 lines and VDT-XML handles it as following:
我的XML文件包含超过60000行,VDT-XML处理如下:
Nexus 5 : 2055 millisec
Nexus 5: 2055毫秒
Galaxy Note 4 : 2498 milisec
Galaxy Note 4: 2498米秒。
You can find more benchmark reports by link : VTD-XML Benchmark
您可以通过链接:VTD-XML基准找到更多的基准报告
Short example of XML file
XML文件的简短示例
<database name="products">
<table name="category">
<column name="catId">20</column>
<column name="catName">Fruit</column>
</table>
<table name="category">
<column name="catId">31</column>
<column name="catName">Vegetables</column>
</table>
<table name="category">
<column name="catId">45</column>
<column name="catName">Rice</column>
</table>
<table name="category">
<column name="catId">50</column>
<column name="catName">Potatoes</column>
</table>
</database>
Configuration of "build.gradle" file
“构建配置。gradle”文件
dependencies {
compile files('libs/vtd-xml.jar')
}
Source code example:
源代码示例:
import com.ximpleware.AutoPilot;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
String fileName = "products.xml";
VTDGen vg = new VTDGen();
if (vg.parseFile(fileName, true)) {
VTDNav vn = vg.getNav();
AutoPilot table = new AutoPilot(vn);
table.selectXPath("database/table");
while (table.iterate()) {
String tableName = vn.toString(vn.getAttrVal("name"));
if (tableName.equals("category")) {
AutoPilot column = new AutoPilot(vn);
column.selectElement("column");
while (column.iterate()) {
String text = vn.toNormalizedString(vn.getText());
String name = vn.toString(vn.getAttrVal("name"));
if (name.equals("catId")) {
Log.d("Category ID = " + text);
} else if (name.equals("catName")) {
Log.d("Category Name = " + text);
}
}
}
}
}
Result
结果
Category ID = 20
Category Name = Fruit
Category ID = 31
Category Name = Vegetables
Category ID = 45
Category Name = Rice
Category ID = 50
Category Name = Potatoes
it works for me and hope it helps you.
它对我有用,希望它能帮助你。
#3
0
Using the SAX parser, I can parse a 15,000-line XML file in around 10 seconds on my HTC Desire. I suspect there is some other issue involved.
使用SAX解析器,我可以按照HTC的要求在大约10秒内解析一个15,000行XML文件。我怀疑还有别的问题。
Are you populating a database from the XML? If so, are you remembering to wrap your entire parse operation in a DB transaction? That alone can speed things up by an order of magnitude.
您是否从XML中填充数据库?如果是,您是否记得在DB事务中包装整个解析操作?单凭这一点就能把事情的速度提高一个数量级。
#4
0
If you are parsing Dates within your XML that can significantly slow down your parsing. With the more recent versions of Android this becomes less of a problem (as they optimised the loading of timezone info)
如果在XML中解析日期,可以显著降低解析的速度。随着Android最新版本的出现,这就不再是个问题了(因为他们优化了时区信息的加载)
If you have Dates that are being parsed and you don't need them, then you could use a SAX parser and ignore any of the Date elements.
如果您有正在解析的日期,并且不需要它们,那么您可以使用SAX解析器并忽略任何日期元素。
Or if you can change your XML schema, consider storing the Dates as integers rather than formatted strings.
或者,如果可以更改XML模式,可以考虑将日期存储为整数,而不是格式化字符串。
You mentioned you are making string comparisons, this can be pretty expensive as well. Perhaps consider using a HashMap for the strings you are comparing, this can give noticeable performance benifits.
您提到您正在进行字符串比较,这也可能非常昂贵。也许可以考虑为正在比较的字符串使用HashMap,这可以带来明显的性能好处。
#5
0
It's very hard to tell you why your code is slow without seeing your code, and it's very hard to believe your assertion that the slowness is due to the XML parser when you haven't provided details of any measurements to prove this.
很难告诉您为什么您的代码在没有看到您的代码的情况下是缓慢的,并且很难相信您的断言,即由于XML解析器没有提供任何度量的细节来证明这一点,所以慢度是由XML解析器造成的。
#6
0
we're using the pull-parser very effectively for 1MB XML Files - and they are read in about 10-20 Seconds on my Desire. So if your code is okay, the speed will be as well. It's obvious that DOM is very slow on a limited memory environment, but pull or SAX really aren't
对于1MB的XML文件,我们非常有效地使用拖放解析器——根据我的要求,它们将在10-20秒内被读取。所以如果你的代码没问题,速度也一样。很明显,DOM在有限的内存环境中非常慢,但是pull或SAX实际上不是。
#7
0
If your parsing from a Socket its the I/O thats taking the time, not the parsing. Try consume the data first, then parse once loaded and measure the performance. If the file is too big then consider a BufferedInputStream with a very large buffer, this should improve performance for you.
如果您从套接字进行解析,这是I/O,这需要时间,而不是解析。先尝试使用数据,然后在加载后进行解析并测量性能。如果文件太大,那么考虑使用一个非常大的缓冲区的BufferedInputStream,这应该可以提高您的性能。
I very seriously doubt Simple XML is going to take 2 minutes to load 4000 lines, I realise a handset is going to be a lot slower than a workstation, however I can load 200,000 lines of XML in 600ms on my workstation.
我非常怀疑简单的XML加载4000行需要2分钟,我意识到手机会比工作站慢很多,但是我可以在我的工作站上600ms加载200,000行XML。
#8
-1
Rather than making it a synchronous process, make it asynchronous. You can have a button that starts an IntentService which will process the data for you and will update the results and show a notification when it is done. That way you don't stop the UI thread.
与其让它成为一个同步进程,不如让它成为异步进程。您可以有一个按钮,启动一个IntentService,它将为您处理数据,并将更新结果,并在完成时显示通知。这样就不会停止UI线程。
#1
33
Original answer, in 2012
(note: make sure you read the 2016 update below!)
(注:请务必阅读以下2016年更新!)
I just did some perf testing comparing parsers on Android (and other platforms). The XML file being parsed is only 500 lines or so (its a Twitter search Atom feed), but Pull and DOM parsing can churn through about 5 such documents a second on a Samsung Galaxy S2 or Motorola Xoom2. SimpleXML (pink in the chart) as used by the OP ties for slowest with DOM parsing.
我刚刚在Android(和其他平台)上做了一些比较解析器的性能测试。被解析的XML文件只有大约500行(它是一个Twitter搜索Atom提要),但是在三星Galaxy S2或摩托罗拉Xoom2上,拉和DOM解析可以每秒处理大约5个这样的文档。SimpleXML(图中的粉色)是OP绑定中用于DOM解析最慢的。
SAX Parsing is an order of magnitude faster on both of my Android devices, managing 40 docs/sec single-threaded, and 65+/sec multi-threaded.
SAX解析在我的两款Android设备上都快了一个数量级,管理40个文档/秒单线程,65个以上/秒多线程。
Android 2.3.4:
安卓2.3.4:
The code is available in github, and a discussion here.
代码可以在github中找到,这里有一个讨论。
Update 18th March 2016
OK, so its been almost 4 years and the world has moved on. I finally got around to re-running the tests on:
好吧,已经快四年了,世界在向前发展。我终于有时间重新运行测试:
- A Samsung Galaxy S3 running Android 4.1.2
- 运行Android 4.1.2的三星Galaxy S3
- A Nexus7 (2012) running Android 4.4.4
- 运行Android 4.4.4的Nexus7 (2012)
- A Nexus5 running Android 6.0.1
- 一个运行Android 6.0.1的Nexus5
Somewhere between Android 4.4.4 and Android 6.0.1 the situation changed drastically and we have a new winner: Pull Parsing FTW at more than twice the throughput of SAX. Unfortunately I don't know exactly when this change arrived as I don't have any devices running Android > 4.4.4 and < 6.0.1.
在Android 4.4.4和Android 6.0.1之间,情况发生了巨大的变化,我们有了一个新的赢家:解析FTW的吞吐量是SAX的两倍多。不幸的是,我不知道这个更改是什么时候到来的,因为我没有运行Android > 4.4.4和< 6.0.1的设备。
Android 4.1.2:
Android 4.1.2:
Android 4.4.4:
Android 4.4.4:
Android 6.0.1:
Android 6.0.1中:
#2
5
I think the best way to work with XML on Android is use VDT-XML library
我认为在Android上使用XML的最好方法是使用VDT-XML库
My XML file contains more then 60 000 lines and VDT-XML handles it as following:
我的XML文件包含超过60000行,VDT-XML处理如下:
Nexus 5 : 2055 millisec
Nexus 5: 2055毫秒
Galaxy Note 4 : 2498 milisec
Galaxy Note 4: 2498米秒。
You can find more benchmark reports by link : VTD-XML Benchmark
您可以通过链接:VTD-XML基准找到更多的基准报告
Short example of XML file
XML文件的简短示例
<database name="products">
<table name="category">
<column name="catId">20</column>
<column name="catName">Fruit</column>
</table>
<table name="category">
<column name="catId">31</column>
<column name="catName">Vegetables</column>
</table>
<table name="category">
<column name="catId">45</column>
<column name="catName">Rice</column>
</table>
<table name="category">
<column name="catId">50</column>
<column name="catName">Potatoes</column>
</table>
</database>
Configuration of "build.gradle" file
“构建配置。gradle”文件
dependencies {
compile files('libs/vtd-xml.jar')
}
Source code example:
源代码示例:
import com.ximpleware.AutoPilot;
import com.ximpleware.VTDGen;
import com.ximpleware.VTDNav;
String fileName = "products.xml";
VTDGen vg = new VTDGen();
if (vg.parseFile(fileName, true)) {
VTDNav vn = vg.getNav();
AutoPilot table = new AutoPilot(vn);
table.selectXPath("database/table");
while (table.iterate()) {
String tableName = vn.toString(vn.getAttrVal("name"));
if (tableName.equals("category")) {
AutoPilot column = new AutoPilot(vn);
column.selectElement("column");
while (column.iterate()) {
String text = vn.toNormalizedString(vn.getText());
String name = vn.toString(vn.getAttrVal("name"));
if (name.equals("catId")) {
Log.d("Category ID = " + text);
} else if (name.equals("catName")) {
Log.d("Category Name = " + text);
}
}
}
}
}
Result
结果
Category ID = 20
Category Name = Fruit
Category ID = 31
Category Name = Vegetables
Category ID = 45
Category Name = Rice
Category ID = 50
Category Name = Potatoes
it works for me and hope it helps you.
它对我有用,希望它能帮助你。
#3
0
Using the SAX parser, I can parse a 15,000-line XML file in around 10 seconds on my HTC Desire. I suspect there is some other issue involved.
使用SAX解析器,我可以按照HTC的要求在大约10秒内解析一个15,000行XML文件。我怀疑还有别的问题。
Are you populating a database from the XML? If so, are you remembering to wrap your entire parse operation in a DB transaction? That alone can speed things up by an order of magnitude.
您是否从XML中填充数据库?如果是,您是否记得在DB事务中包装整个解析操作?单凭这一点就能把事情的速度提高一个数量级。
#4
0
If you are parsing Dates within your XML that can significantly slow down your parsing. With the more recent versions of Android this becomes less of a problem (as they optimised the loading of timezone info)
如果在XML中解析日期,可以显著降低解析的速度。随着Android最新版本的出现,这就不再是个问题了(因为他们优化了时区信息的加载)
If you have Dates that are being parsed and you don't need them, then you could use a SAX parser and ignore any of the Date elements.
如果您有正在解析的日期,并且不需要它们,那么您可以使用SAX解析器并忽略任何日期元素。
Or if you can change your XML schema, consider storing the Dates as integers rather than formatted strings.
或者,如果可以更改XML模式,可以考虑将日期存储为整数,而不是格式化字符串。
You mentioned you are making string comparisons, this can be pretty expensive as well. Perhaps consider using a HashMap for the strings you are comparing, this can give noticeable performance benifits.
您提到您正在进行字符串比较,这也可能非常昂贵。也许可以考虑为正在比较的字符串使用HashMap,这可以带来明显的性能好处。
#5
0
It's very hard to tell you why your code is slow without seeing your code, and it's very hard to believe your assertion that the slowness is due to the XML parser when you haven't provided details of any measurements to prove this.
很难告诉您为什么您的代码在没有看到您的代码的情况下是缓慢的,并且很难相信您的断言,即由于XML解析器没有提供任何度量的细节来证明这一点,所以慢度是由XML解析器造成的。
#6
0
we're using the pull-parser very effectively for 1MB XML Files - and they are read in about 10-20 Seconds on my Desire. So if your code is okay, the speed will be as well. It's obvious that DOM is very slow on a limited memory environment, but pull or SAX really aren't
对于1MB的XML文件,我们非常有效地使用拖放解析器——根据我的要求,它们将在10-20秒内被读取。所以如果你的代码没问题,速度也一样。很明显,DOM在有限的内存环境中非常慢,但是pull或SAX实际上不是。
#7
0
If your parsing from a Socket its the I/O thats taking the time, not the parsing. Try consume the data first, then parse once loaded and measure the performance. If the file is too big then consider a BufferedInputStream with a very large buffer, this should improve performance for you.
如果您从套接字进行解析,这是I/O,这需要时间,而不是解析。先尝试使用数据,然后在加载后进行解析并测量性能。如果文件太大,那么考虑使用一个非常大的缓冲区的BufferedInputStream,这应该可以提高您的性能。
I very seriously doubt Simple XML is going to take 2 minutes to load 4000 lines, I realise a handset is going to be a lot slower than a workstation, however I can load 200,000 lines of XML in 600ms on my workstation.
我非常怀疑简单的XML加载4000行需要2分钟,我意识到手机会比工作站慢很多,但是我可以在我的工作站上600ms加载200,000行XML。
#8
-1
Rather than making it a synchronous process, make it asynchronous. You can have a button that starts an IntentService which will process the data for you and will update the results and show a notification when it is done. That way you don't stop the UI thread.
与其让它成为一个同步进程,不如让它成为异步进程。您可以有一个按钮,启动一个IntentService,它将为您处理数据,并将更新结果,并在完成时显示通知。这样就不会停止UI线程。