Python3 Beautiful Soup获取HTML标记锚点

时间:2022-03-08 17:03:09

I am trying to use BS4 and Python to save and replace the content of the first <translate> tag in a HTML file.

我正在尝试使用BS4和Python来保存和替换HTML文件中第一个 标记的内容。

Now I am trying to do something like this:

现在我想尝试做这样的事情:

translate_bs4 = bs4_object.find('translate')
translate_key = '{{ key }}'
translate_initial = str(title_bs4)
translate_bs4.string = translate_key

My test case is:

我的测试用例是:

<translate>tag with <other_tag>some text</other_tag></translate>
<much_longer_file>...</much_longer_file>

and the HTML is the expected one of:

并且HTML是预期的:

<translate>{{ key }}</translate>
<much_longer_file>...</much_longer_file>

but the value of translate_initial is

但translate_initial的值是

<translate>tag with <other_tag>some text</other_tag></translate>

instead of expected

而不是预期的

tag with <other_tag>some text</other_tag>

I know that it can be easy extracted with a regex, but I want a some more DOM-related solution.

我知道用正则表达式可以很容易地提取它,但我想要一些更多与DOM相关的解决方案。

1 个解决方案

#1


1  

Try this:

尝试这个:

translate_bs4 = bs4_object.find('translate')
translate_initial = translate_bs4.decode_contents(formatter="html")

#1


1  

Try this:

尝试这个:

translate_bs4 = bs4_object.find('translate')
translate_initial = translate_bs4.decode_contents(formatter="html")