Python:从URL读取HTML源代码并将日期输入程序

时间:2022-12-04 15:51:59

I'm a beginner at Python and I want to read info from a site and get some of the data as output in my textbox (I use EasyGUI). I have found this to get the HTML source of a URL but now I want to work with the HTML output, I know how to work with XML and I guess it's a bit the same for HTML. Is there any way to work with the elements and attributes?

我是Python的初学者,我想从一个站点读取信息,并在我的文本框中输出一些数据(我使用EasyGUI)。我发现这是为了获取URL的HTML源代码,但现在我想使用HTML输出,我知道如何使用XML,我想这对HTML来说有点相同。有没有办法处理元素和属性?

filehandle = urllib.urlopen('URL')

for lines in filehandle.readlines():
    print lines

filehandle.close()

thanks in advance

提前致谢

2 个解决方案

#1


3  

As suggested, Beautiful soup is a library that can help you. http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html, shows a straightforward example.

正如所建议的那样,美丽的汤是一个可以帮助你的图书馆。 http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html,显示了一个简单的例子。

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(filehandle.read())
titleTag = soup.html.head.title

Python has a built in parser too. http://docs.python.org/library/htmlparser.html

Python也有一个内置的解析器。 http://docs.python.org/library/htmlparser.html

BeautifulSoup is very good at handling broken html though.

BeautifulSoup非常擅长处理破碎的html。

#2


0  

If you're familiar with jQuery's syntax to select HTML elements, you may find pyquery useful.

如果您熟悉jQuery的语法来选择HTML元素,您可能会发现pyquery很有用。

#1


3  

As suggested, Beautiful soup is a library that can help you. http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html, shows a straightforward example.

正如所建议的那样,美丽的汤是一个可以帮助你的图书馆。 http://www.crummy.com/software/BeautifulSoup/bs3/download/2.x/documentation.html,显示了一个简单的例子。

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(filehandle.read())
titleTag = soup.html.head.title

Python has a built in parser too. http://docs.python.org/library/htmlparser.html

Python也有一个内置的解析器。 http://docs.python.org/library/htmlparser.html

BeautifulSoup is very good at handling broken html though.

BeautifulSoup非常擅长处理破碎的html。

#2


0  

If you're familiar with jQuery's syntax to select HTML elements, you may find pyquery useful.

如果您熟悉jQuery的语法来选择HTML元素,您可能会发现pyquery很有用。