I have the following code which grabs YouTube URLs stored in a string variable:
我有下面的代码,它获取存储在字符串变量中的YouTube url:
function getVideoUrlsFromString($html) {
$regex = '#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/)|youtu\.be\/)([a-zA-Z0-9-]*))#i';
preg_match_all($regex, $html, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
$html = 'https://www.youtube-nocookie.com/embed/VWrlXsmcL2E';
$html = getVideoUrlsFromString($html);
print_r($html);
But it doesn't work with:
但这并不适用于:
https://www.youtube-nocookie.com/embed/VWrlXsmcL2E
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
Is there any way to alter the regex to work with these 2 common YouTube URLs?
有什么方法可以修改regex来使用这两个常见的YouTube url吗?
2 个解决方案
#1
2
Something like this should do the trick:
像这样的东西应该可以达到这个目的:
<?php
function getVideoUrlsFromString($html) {
$regex = '#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/|v\/)|youtu\.be\/|youtube\-nocookie\.com\/embed\/)([a-zA-Z0-9-]*))#i';
preg_match_all($regex, $html, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
$html = '
https://www.youtube-nocookie.com/embed/VWrlXsmcL2E
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
';
$html = getVideoUrlsFromString($html);
print_r($html);
Output:
输出:
Array
(
[0] => www.youtube-nocookie.com/embed/VWrlXsmcL2E
[1] => www.youtube.com/v/NLqAF9hrVbY
)
Here's a diff of the two to see what was added:
下面是其中的一小部分,看看添加了什么:
#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/ )|youtu\.be\/ )([a-zA-Z0-9-]*))#i
#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/|v\/)|youtu\.be\/|youtube\-nocookie\.com\/embed\/)([a-zA-Z0-9-]*))#i
#2
0
The problem is that your current expression does not take into consideration the -nocookie
from your first example and the ...com/v/
and extra characters in the end in your second.
问题是,您当前的表达式没有考虑第一个示例中的-nocookie和第二个示例中的/ /和附加字符。
You can try and change it to something like so: ((?:www\.)?(?:youtube(?:-nocookie)?\.com\/(?:v\/|watch\?v=|embed\/)|youtu\.be\/)([a-zA-Z0-9?&=_-]*))
(example here) to match the both of them.
你可以试着改变它类似这样:((?:www \)?(?:youtube(?:-nocookie)? \ com \ /(?:v \ / |看\ ? v = |嵌入\ /)| youtu \轮回\ /)((a-zA-Z0-9 ? & = _ -)*))(例子)来匹配。
#1
2
Something like this should do the trick:
像这样的东西应该可以达到这个目的:
<?php
function getVideoUrlsFromString($html) {
$regex = '#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/|v\/)|youtu\.be\/|youtube\-nocookie\.com\/embed\/)([a-zA-Z0-9-]*))#i';
preg_match_all($regex, $html, $matches);
$matches = array_unique($matches[0]);
usort($matches, function($a, $b) {
return strlen($b) - strlen($a);
});
return $matches;
}
$html = '
https://www.youtube-nocookie.com/embed/VWrlXsmcL2E
http://www.youtube.com/v/NLqAF9hrVbY?fs=1&hl=en_US
';
$html = getVideoUrlsFromString($html);
print_r($html);
Output:
输出:
Array
(
[0] => www.youtube-nocookie.com/embed/VWrlXsmcL2E
[1] => www.youtube.com/v/NLqAF9hrVbY
)
Here's a diff of the two to see what was added:
下面是其中的一小部分,看看添加了什么:
#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/ )|youtu\.be\/ )([a-zA-Z0-9-]*))#i
#((?:www\.)?(?:youtube\.com\/(?:watch\?v=|embed\/|v\/)|youtu\.be\/|youtube\-nocookie\.com\/embed\/)([a-zA-Z0-9-]*))#i
#2
0
The problem is that your current expression does not take into consideration the -nocookie
from your first example and the ...com/v/
and extra characters in the end in your second.
问题是,您当前的表达式没有考虑第一个示例中的-nocookie和第二个示例中的/ /和附加字符。
You can try and change it to something like so: ((?:www\.)?(?:youtube(?:-nocookie)?\.com\/(?:v\/|watch\?v=|embed\/)|youtu\.be\/)([a-zA-Z0-9?&=_-]*))
(example here) to match the both of them.
你可以试着改变它类似这样:((?:www \)?(?:youtube(?:-nocookie)? \ com \ /(?:v \ / |看\ ? v = |嵌入\ /)| youtu \轮回\ /)((a-zA-Z0-9 ? & = _ -)*))(例子)来匹配。