I am trying to extract informations out of the HTML-Code of a Youtube Playlist page. (Playlist Name, Video Names, Video Links)
我试图从Youtube播放列表页面的HTML代码中提取信息。 (播放列表名称,视频名称,视频链接)
I know it is bad practice to use Regex but since this programm is just for personal use and I only read in 1 line per video in the playlist it doesn't need to be very sophisticated.
我知道使用正则表达式是不好的做法,但由于这个程序仅供个人使用,我只在播放列表中每个视频读取1行,因此不需要非常复杂。
Like I said per video there is basicly only 1 line I need.
就像我说的每个视频基本上我只需要一行。
Example:
<tr class="pl-video yt-uix-tile " data-video-id="VIDEO-ID" data-set-video-id="" data-title="TITLE"><td class="pl-video-handle "></td><td class="pl-video-index"></td><td class="pl-video-thumbnail"><a href="reflink inside palylist" class="ux-thumb-wrap yt-uix-sessionlink contains-addto pl-video-thumb" data-sessionlink="sessionlink"> <span class="video-thumb yt-thumb yt-thumb-72"
The only 2 information I basicly need are VIDEO-ID and TITLE. My RegEx pattern looks like this so far:
我基本上需要的唯一2个信息是VIDEO-ID和TITLE。到目前为止,我的RegEx模式看起来像这样:
Pattern pLine = Pattern.compile("<tr class=\"(?<line>.*)");
He finds exactly the lines I need but every attempt from me to get only TITLE and VIDEO-ID got me no results :/
他找到了我需要的线条,但是每次尝试只获得TITLE和VIDEO-ID都没有结果:/
I'm sorry if this is a trivial question or one that shouldn't be asked here. But that is my situation so far. And no this is NO homework ;)
如果这是一个微不足道的问题,或者不应该在这里提出问题,我很抱歉。但到目前为止,这是我的情况。这不是没有作业;)
2 个解决方案
#1
3
.*?data-video-id="(.*?)".*?data-title="(.*?)"
This should do it.Extract match 1 and match 2.
这应该这样做。提取匹配1并匹配2。
See demo.
#2
1
Using the following expressions matches the video-id and title fine in your given example.
使用以下表达式匹配给定示例中的视频ID和标题。
ID: "data-video-id=\"([^\"]+)\""
Title: "data-title=\"([^\"]+)\""
#1
3
.*?data-video-id="(.*?)".*?data-title="(.*?)"
This should do it.Extract match 1 and match 2.
这应该这样做。提取匹配1并匹配2。
See demo.
#2
1
Using the following expressions matches the video-id and title fine in your given example.
使用以下表达式匹配给定示例中的视频ID和标题。
ID: "data-video-id=\"([^\"]+)\""
Title: "data-title=\"([^\"]+)\""