I am trying to use arelle to read a zip
file of an XBRL
filling.
我正在尝试使用arelle读取XBRL填充的zip文件。
This is done by giving the command:
这是通过发出以下命令来实现的:
C:\a>python arelleCmdLine.py -f C:\Python33\sec\2010\03\0000002809-0001047469-10
-002778-xbrl.zip
I am getting a UnicodeDecodeError
我得到一个UnicodeDecodeError
C:\a>python arelleCmdLine.py -f C:\Python33\sec\2010\03\0000002809-0001047469-10
-002778-xbrl.zip
[xmlSchema:syntax] Unrecoverable error: 'utf-8' codec can't decode byte 0x81 in
position 11: invalid start byte, 0000002809-0001047469-10-002778-xbrl.zip, impor
ting source element - 0000002809-0001047469-10-002778-xbrl.zip
Traceback (most recent call last):
File "C:\a\arelle\ModelDocument.py", line 131, in load
xmlDocument = etree.parse(file,parser=_parser,base_url=filepath)
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src\lxml\lxml.etree.c:6
9970)
File "parser.pxi", line 1770, in lxml.etree._parseDocument (src\lxml\lxml.etre
e.c:102272)
File "parser.pxi", line 1790, in lxml.etree._parseFilelikeDocument (src\lxml\l
xml.etree.c:102531)
File "parser.pxi", line 1685, in lxml.etree._parseDocFromFilelike (src\lxml\lx
ml.etree.c:101457)
File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike
(src\lxml\lxml.etree.c:97084)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDo
c (src\lxml\lxml.etree.c:91290)
File "parser.pxi", line 679, in lxml.etree._handleParseResult (src\lxml\lxml.e
tree.c:92441)
File "lxml.etree.pyx", line 327, in lxml.etree._ExceptionContext._raise_if_sto
red (src\lxml\lxml.etree.c:10196)
File "parser.pxi", line 373, in lxml.etree._FileReaderContext.copyToBuffer (sr
c\lxml\lxml.etree.c:89098)
File "C:\Python33\lib\codecs.py", line 301, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 11: invalid
start byte
It has do to something with utf-8
encoding and the character it represents but i cannot figure out what should i do. I found some guide but didn't help me address the issue.
它与utf-8编码和它所代表的字符有关,但我不知道该怎么做。我找到了一些指南,但没有帮助我解决这个问题。
2 个解决方案
#1
0
The issue was created because the program demands to parse not the whole Zip folder but a specific file (in this case the instance folder) which lies in the subdirectory of the zip folder.
这个问题的产生是因为程序要求解析位于Zip文件夹的子目录中的特定文件(在本例中是实例文件夹),而不是整个Zip文件夹。
To access the zip directory:
要访问zip目录:
If our file inside the zip directory is 1.xml
C:\a>python arelleCmdLine.py -f C:\Python33\sec\2010\03\0000002809-0001047469-10
-002778-xbrl.zip\1.xml
Verdict:
结论:
A UnicodeDecodeError: 'utf-8' cant decode byte 0x81
was caused because of the above reason.
一个UnicodeDecodeError: 'utf-8'不能解码字节0x81是由于上述原因造成的。
#2
-1
After a quick search up, the misbehaving byte appears to be the Ctrl
key, according to the unicode byte database. As the appearance of Ctrl
exists only as a haxi number and doesn't have it's own letter, I'm thinking that utf
is having trouble printing it as a visible char, so the above error arises.
在快速搜索之后,根据unicode字节数据库,错误的字节似乎是Ctrl键。由于Ctrl的外观仅作为haxi数字存在,并且没有自己的字母,所以我认为utf无法将其打印为可见字符,因此会出现上面的错误。
#1
0
The issue was created because the program demands to parse not the whole Zip folder but a specific file (in this case the instance folder) which lies in the subdirectory of the zip folder.
这个问题的产生是因为程序要求解析位于Zip文件夹的子目录中的特定文件(在本例中是实例文件夹),而不是整个Zip文件夹。
To access the zip directory:
要访问zip目录:
If our file inside the zip directory is 1.xml
C:\a>python arelleCmdLine.py -f C:\Python33\sec\2010\03\0000002809-0001047469-10
-002778-xbrl.zip\1.xml
Verdict:
结论:
A UnicodeDecodeError: 'utf-8' cant decode byte 0x81
was caused because of the above reason.
一个UnicodeDecodeError: 'utf-8'不能解码字节0x81是由于上述原因造成的。
#2
-1
After a quick search up, the misbehaving byte appears to be the Ctrl
key, according to the unicode byte database. As the appearance of Ctrl
exists only as a haxi number and doesn't have it's own letter, I'm thinking that utf
is having trouble printing it as a visible char, so the above error arises.
在快速搜索之后,根据unicode字节数据库,错误的字节似乎是Ctrl键。由于Ctrl的外观仅作为haxi数字存在,并且没有自己的字母,所以我认为utf无法将其打印为可见字符,因此会出现上面的错误。