I was given the task of doing quality check on a machine translation xml file. The translations are from English to a foreign language. I have about 2000 translation blocks in the file and I have to check 200 of them by adding my remarks in the block enclosed in a < comment > tag with a quality attribute. Is there a linux command or some text editor out there which can count the number of comment tags I add or just the number of time the word '/comment' occurs so I don't have to keep track manually?
我被赋予了对机器翻译xml文件进行质量检查的任务。翻译是从英语到外语。我在文件中有大约2000个翻译块,我必须通过在带有quality属性的
5 个解决方案
#1
7
grep '/comment' yourfile.xml -o | wc -l
grep'/ comment'yourfile.xml -o | wc -l
#2
2
This XSLT stylesheet can be run on any platform and will tell you how many comment elements there are in the XML document:
这个XSLT样式表可以在任何平台上运行,并告诉你XML文档中有多少注释元素:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:value-of select="count(//comment)"/>
</xsl:template>
</xsl:stylesheet>
If you add a XSLT processing instruction at the top of the XML file that points to this XSLT( e.g. <?xml-stylesheet href="countComments.xsl" type="text/xsl"?>
), then you could just load the XML file in a browser and see the number displayed.
如果在XML文件的顶部添加一条指向此XSLT的XSLT处理指令(例如 ),那么您只需加载浏览器中的XML文件,并查看显示的数字。
#3
0
If you know that the </comment>
doesn't occur more than once per line, just use grep -c "</comment>"
. Example:
如果您知道 每行不会出现多次,只需使用grep -c“ ”。例:
[~/.logs]> grep -c ldap johnf.2010-02-12.log
103
This searches for the string ldap
in the file johnf.2010-02-12.log
. The string appears on 103 distinct lines.
这将在文件johnf.2010-02-12.log中搜索字符串ldap。该字符串出现在103个不同的行上。
#4
0
As long as the comments appear on their own line, you could try
只要评论出现在他们自己的行上,你就可以试试
cat file | grep -c comment
The -c stands for 'count'.
-c代表'count'。
#5
0
your tag says linux, so i assume you have *nix tools like awk
你的标签说linux,所以我假设你有像awk这样的* nix工具
awk '{for(i=1;i<=NF;i++){if($i=="/comment"){++c} } }END{print "total: "c}' xmlfile
#1
7
grep '/comment' yourfile.xml -o | wc -l
grep'/ comment'yourfile.xml -o | wc -l
#2
2
This XSLT stylesheet can be run on any platform and will tell you how many comment elements there are in the XML document:
这个XSLT样式表可以在任何平台上运行,并告诉你XML文档中有多少注释元素:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<xsl:value-of select="count(//comment)"/>
</xsl:template>
</xsl:stylesheet>
If you add a XSLT processing instruction at the top of the XML file that points to this XSLT( e.g. <?xml-stylesheet href="countComments.xsl" type="text/xsl"?>
), then you could just load the XML file in a browser and see the number displayed.
如果在XML文件的顶部添加一条指向此XSLT的XSLT处理指令(例如 ),那么您只需加载浏览器中的XML文件,并查看显示的数字。
#3
0
If you know that the </comment>
doesn't occur more than once per line, just use grep -c "</comment>"
. Example:
如果您知道 每行不会出现多次,只需使用grep -c“ ”。例:
[~/.logs]> grep -c ldap johnf.2010-02-12.log
103
This searches for the string ldap
in the file johnf.2010-02-12.log
. The string appears on 103 distinct lines.
这将在文件johnf.2010-02-12.log中搜索字符串ldap。该字符串出现在103个不同的行上。
#4
0
As long as the comments appear on their own line, you could try
只要评论出现在他们自己的行上,你就可以试试
cat file | grep -c comment
The -c stands for 'count'.
-c代表'count'。
#5
0
your tag says linux, so i assume you have *nix tools like awk
你的标签说linux,所以我假设你有像awk这样的* nix工具
awk '{for(i=1;i<=NF;i++){if($i=="/comment"){++c} } }END{print "total: "c}' xmlfile