I'm trying to use BeautifulSoup to parse an XML file. One of the elements has a hyphen in it: distribution-code
我正在尝试使用BeautifulSoup来解析XML文件。其中一个元素中有一个连字符:分布代码
How do I access it? I've tried:
我该如何访问它?我试过了:
soup.distribution-code
soup."distribution-code" (tried single quotes too)
soup.[distribution-code]
but none of these work.
但这些都不起作用。
1 个解决方案
#1
4
You can access non-hyphenated elements by attribute reference using regular Python syntax, i.e. obj.name
, however, -
is not a valid character when using that syntax (Python treats it as the "minus" operator), hence you can not access such elements by that method.
您可以使用常规Python语法(即obj.name)通过属性引用访问非连字符元素,但是,当使用该语法时,它不是有效字符(Python将其视为“减号”运算符),因此您无法访问通过该方法的元素。
Instead, use soup.find()
or soup.find_all()
:
相反,使用soup.find()或soup.find_all():
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<thing><id>1234</id><distribution-code>555444333</distribution-code></thing>')
>>> soup.thing
<thing><id>1234</id><distribution-code>555444333</distribution-code></thing>
>>> soup.id
<id>1234</id>
>>> soup.distribution-code
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'code' is not defined
>>> soup.find('distribution-code')
<distribution-code>555444333</distribution-code>
Or, as pointed out in chepner's comment, you can use getattr()
and setattr()
to get and set attributes that contain hyphens. I think that soup.find()
is the more common method for accessing those elements.
或者,正如chepner的注释中所指出的,您可以使用getattr()和setattr()来获取和设置包含连字符的属性。我认为soup.find()是访问这些元素的更常用方法。
#1
4
You can access non-hyphenated elements by attribute reference using regular Python syntax, i.e. obj.name
, however, -
is not a valid character when using that syntax (Python treats it as the "minus" operator), hence you can not access such elements by that method.
您可以使用常规Python语法(即obj.name)通过属性引用访问非连字符元素,但是,当使用该语法时,它不是有效字符(Python将其视为“减号”运算符),因此您无法访问通过该方法的元素。
Instead, use soup.find()
or soup.find_all()
:
相反,使用soup.find()或soup.find_all():
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<thing><id>1234</id><distribution-code>555444333</distribution-code></thing>')
>>> soup.thing
<thing><id>1234</id><distribution-code>555444333</distribution-code></thing>
>>> soup.id
<id>1234</id>
>>> soup.distribution-code
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'code' is not defined
>>> soup.find('distribution-code')
<distribution-code>555444333</distribution-code>
Or, as pointed out in chepner's comment, you can use getattr()
and setattr()
to get and set attributes that contain hyphens. I think that soup.find()
is the more common method for accessing those elements.
或者,正如chepner的注释中所指出的,您可以使用getattr()和setattr()来获取和设置包含连字符的属性。我认为soup.find()是访问这些元素的更常用方法。