如何使用PHP从URL获取基本域名?

时间:2022-08-23 10:37:24

I need to get the domain name from an URL. The following examples should all return google.com:

我需要从URL获取域名。以下示例应全部返回google.com:

google.com
images.google.com
new.images.google.com
www.google.com

Similarly the following URLs should all return google.co.uk.

同样,以下网址应全部返回google.co.uk。

google.co.uk
images.google.co.uk
new.images.google.co.uk
http://www.google.co.uk

I'm hesitant to use Regular Expressions, because something like domain.com/google.com could return incorrect results.

我对使用正则表达式犹豫不决,因为像domain.com/google.com这样的内容可能会返回错误的结果。

How can I get the top-level domain, using PHP? This needs to work on all platforms and hosts.

如何使用PHP获取*域名?这需要适用于所有平台和主机。

5 个解决方案

#1


16  

You could do this:

你可以这样做:

$urlData = parse_url($url);

$host = $urlData['host'];

** Update **

**更新**

The best way I can think of is to have a mapping of all the TLDs that you want to handle, since certain TLDs can be tricky (co.uk).

我能想到的最好的方法是绘制您想要处理的所有TLD的映射,因为某些TLD可能很棘手(co.uk)。

// you can add more to it if you want
$urlMap = array('com', 'co.uk');

$host = "";
$url = "http://www.google.co.uk";

$urlData = parse_url($url);
$hostData = explode('.', $urlData['host']);
$hostData = array_reverse($hostData);

if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
  $host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
} elseif(array_search($hostData[0], $urlMap) !== FALSE) {
  $host = $hostData[1] . '.' . $hostData[0];
}

echo $host;

#2


6  

top-level domains and second-level domains may be 2 characters long but a registered subdomain must be at least 3 characters long.

*域和二级域可能长2个字符,但注册的子域长度必须至少为3个字符。

EDIT: because of pjv's comment, i learned Australian domain names are an exception because they allow 5 TLDs as SLDs (com,net,org,asn,id) example: somedomain.com.au. i'm guessing com.au is nationally controlled domain name which "shares". so, technically, "com.au" would still be the "base domain", but that's not useful.

编辑:由于pjv的评论,我了解澳大利亚域名是一个例外,因为它们允许5个TLD作为SLD(com,net,org,asn,id)示例:somedomain.com.au。我猜com.au是国家控制的域名“共享”。所以,从技术上讲,“com.au”仍然是“基础域”,但这没用。

EDIT: there are 47,952 possible three-letter domain names (pattern: [a-zA-Z0-9][a-zA-Z0-9-][a-zA-Z0-9] or 36 * 37 * 36) combined with just 8 of the most common TLDS (com,org,etc) we have 383,616 possibilities -- without even adding in the entire scope of TLDs. 1-letter and 2-letter domain names still exist, but are not valid going forward.

编辑:有47,952个可能的三字母域名(模式:[a-zA-Z0-9] [a-zA-Z0-9 - ] [a-zA-Z0-9]或36 * 37 * 36)组合只有8种最常见的TLDS(com,org等),我们有383,616种可能性 - 甚至没有添加整个TLD范围。 1个字母和2个字母的域名仍然存在,但未来无效。

in google.com -- "google" is a subdomain of "com"

在google.com - “google”是“com”的子域名

in google.co.uk -- "google" is a subdomain of "co", which in turn is a subdomain of "uk", or a second-level domain really, since "co" is also a valid top-level domain

在google.co.uk中 - “google”是“co”的子域,后者又是“uk”的子域,或者实际上是二级域,因为“co”也是一个有效的*域名

in www.google.com -- "www" is a subdomain of "google" which is a subdomain of "com"

在www.google.com中 - “www”是“google”的子域,它是“com”的子域

"co.uk" is NOT a valid host because there is no valid domain name

“co.uk”不是有效的主机,因为没有有效的域名

going with that assumption this function will return the proper "basedomain" in almost all cases, without requiring a "url map".

按照这个假设,这个函数几乎在所有情况下都会返回正确的“basedomain”,而不需要“url map”。

if you happen to be one of the rare cases, perhaps you can modify this to fulfill particular needs...

如果您碰巧是极少数情况之一,也许您可​​以修改它以满足特殊需求......

EDIT: you must pass the domain string as a URL with it's protocol (http://, ftp://, etc) or parse_url() will not consider it a valid URL (unless you want to modify the code to behave differently)

编辑:您必须将域字符串作为带有协议的URL(http://,ftp://等)传递,或者parse_url()不会将其视为有效的URL(除非您希望修改代码以使其行为不同)

function basedomain( $str = '' )
{
    // $str must be passed WITH protocol. ex: http://domain.com
    $url = @parse_url( $str );
    if ( empty( $url['host'] ) ) return;
    $parts = explode( '.', $url['host'] );
    $slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
    return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}

if you need to be accurate use fopen or curl to open this URL: http://data.iana.org/TLD/tlds-alpha-by-domain.txt

如果您需要准确使用fopen或curl打开此URL:http://data.iana.org/TLD/tlds-alpha-by-domain.txt

then read the lines into an array and use that to compare the domain parts

然后将行读入数组并使用它来比较域部分

EDIT: to allow for Australian domains:

编辑:允许澳大利亚域名:

function au_basedomain( $str = '' )
{
    // $str must be passed WITH protocol. ex: http://domain.com
    $url = @parse_url( $str );
    if ( empty( $url['host'] ) ) return;
    $parts = explode( '.', $url['host'] );
    $slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
    if ( preg_match( '/\.(com|net|asn|org|id)\.au$/i', $url['host'] ) ) $slice = 3;
    return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}

IMPORTANT ADDITIONAL NOTES: I don't use this function to validate domains. It is generic code I only use to extract the base domain for the server it is running on from the global $_SERVER['SERVER_NAME'] for use within various internal scripts. Considering I have only ever worked on sites within the US, I have never encountered the Australian variants that pjv asked about. It is handy for internal use, but it is a long way from a complete domain validation process. If you are trying to use it in such a way, I recommend not to because of too many possibilities to match invalid domains.

重要补充说明:我不使用此功能来验证域。它是通用代码,我只用于从全局$ _SERVER ['SERVER_NAME']中提取运行它的服务器的基本域,以便在各种内部脚本中使用。考虑到我只在美国境内工作过,我从未遇到过pjv所询问的澳大利亚变种。它对内部使用很方便,但距离完整的域验证过程还有很长的路要走。如果您尝试以这种方式使用它,我建议不要因为太多可能性来匹配无效域。

#3


4  

Try using: http://php.net/manual/en/function.parse-url.php. Something like this should work:

尝试使用:http://php.net/manual/en/function.parse-url.php。像这样的东西应该工作:

$urlParts = parse_url($yourUrl);
$hostParts = explode('.', $urlParts['host']);
$hostParts = array_reverse($hostParts);
$host = $hostParts[1] . '.' . $hostParts[0];

#4


0  

Mixing with xil3 answer this is I got to check localhost as well as ip, so you can also work in development environment.
You still have to define what TLDs you want to use. other than that everything works fine.

与xil3混合回答这是我要检查localhost以及ip,所以你也可以在开发环境中工作。您仍需要定义要使用的TLD。除此之外一切正常。

<?php
function getTopLevelDomain($url){
    $urlData = parse_url($url);
    $urlHost = isset($urlData['host']) ? $urlData['host'] : '';
    $isIP = (bool)ip2long($urlHost);
    if($isIP){ /** To check if it's ip then return same ip */
        return $urlHost;
    }
    /** Add/Edit you TLDs here */
    $urlMap = array('com', 'com.pk', 'co.uk');

    $host = "";
    $hostData = explode('.', $urlHost);
    if(isset($hostData[1])){ /** To check "localhost" because it'll be without any TLDs */
        $hostData = array_reverse($hostData);

        if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
            $host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
        } elseif(array_search($hostData[0], $urlMap) !== FALSE) {
            $host = $hostData[1] . '.' . $hostData[0];
        }
        return $host;
    }
    return ((isset($hostData[0]) && $hostData[0] != '') ? $hostData[0] : 'error no domain'); /* You can change this error in future */
}
?>

you can use it like this

你可以像这样使用它

$string = 'http://googl.com.pk';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://googl.com.pk:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://googl.com';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://googl.com:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com.pk';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com.pk:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://192.168.0.101:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://192.168.0.101';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://localhost';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'https;//';
echo getTopLevelDomain( $string ) . '<br>';

$string = '';
echo getTopLevelDomain( $string ) . '<br>';

You'll get result in string like this

你会得到像这样的字符串的结果

googl.com.pk
googl.com.pk
googl.com
googl.com
googl.com.pk
googl.com.pk
googl.com
googl.com
192.168.0.101
192.168.0.101
localhost
error no domain
error no domain

#5


-3  

Use this function:

使用此功能:

function getHost($url){
    if (strpos($url,"http://")){
        $httpurl=$url;
    } else {
        $httpurl="http://".$url;
    }
    $parse = parse_url($httpurl);
    $domain=$parse['host'];

    $portion=explode(".",$domain);
    $count=sizeof($portion)-1;
    if ($count>1){
        $result=$portion[$count-1].".".$portion[$count];
    } else {
        $result=$domain;
    }
    return $result;
}

Answer all variants of example URL's.

回答示例URL的所有变体。

#1


16  

You could do this:

你可以这样做:

$urlData = parse_url($url);

$host = $urlData['host'];

** Update **

**更新**

The best way I can think of is to have a mapping of all the TLDs that you want to handle, since certain TLDs can be tricky (co.uk).

我能想到的最好的方法是绘制您想要处理的所有TLD的映射,因为某些TLD可能很棘手(co.uk)。

// you can add more to it if you want
$urlMap = array('com', 'co.uk');

$host = "";
$url = "http://www.google.co.uk";

$urlData = parse_url($url);
$hostData = explode('.', $urlData['host']);
$hostData = array_reverse($hostData);

if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
  $host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
} elseif(array_search($hostData[0], $urlMap) !== FALSE) {
  $host = $hostData[1] . '.' . $hostData[0];
}

echo $host;

#2


6  

top-level domains and second-level domains may be 2 characters long but a registered subdomain must be at least 3 characters long.

*域和二级域可能长2个字符,但注册的子域长度必须至少为3个字符。

EDIT: because of pjv's comment, i learned Australian domain names are an exception because they allow 5 TLDs as SLDs (com,net,org,asn,id) example: somedomain.com.au. i'm guessing com.au is nationally controlled domain name which "shares". so, technically, "com.au" would still be the "base domain", but that's not useful.

编辑:由于pjv的评论,我了解澳大利亚域名是一个例外,因为它们允许5个TLD作为SLD(com,net,org,asn,id)示例:somedomain.com.au。我猜com.au是国家控制的域名“共享”。所以,从技术上讲,“com.au”仍然是“基础域”,但这没用。

EDIT: there are 47,952 possible three-letter domain names (pattern: [a-zA-Z0-9][a-zA-Z0-9-][a-zA-Z0-9] or 36 * 37 * 36) combined with just 8 of the most common TLDS (com,org,etc) we have 383,616 possibilities -- without even adding in the entire scope of TLDs. 1-letter and 2-letter domain names still exist, but are not valid going forward.

编辑:有47,952个可能的三字母域名(模式:[a-zA-Z0-9] [a-zA-Z0-9 - ] [a-zA-Z0-9]或36 * 37 * 36)组合只有8种最常见的TLDS(com,org等),我们有383,616种可能性 - 甚至没有添加整个TLD范围。 1个字母和2个字母的域名仍然存在,但未来无效。

in google.com -- "google" is a subdomain of "com"

在google.com - “google”是“com”的子域名

in google.co.uk -- "google" is a subdomain of "co", which in turn is a subdomain of "uk", or a second-level domain really, since "co" is also a valid top-level domain

在google.co.uk中 - “google”是“co”的子域,后者又是“uk”的子域,或者实际上是二级域,因为“co”也是一个有效的*域名

in www.google.com -- "www" is a subdomain of "google" which is a subdomain of "com"

在www.google.com中 - “www”是“google”的子域,它是“com”的子域

"co.uk" is NOT a valid host because there is no valid domain name

“co.uk”不是有效的主机,因为没有有效的域名

going with that assumption this function will return the proper "basedomain" in almost all cases, without requiring a "url map".

按照这个假设,这个函数几乎在所有情况下都会返回正确的“basedomain”,而不需要“url map”。

if you happen to be one of the rare cases, perhaps you can modify this to fulfill particular needs...

如果您碰巧是极少数情况之一,也许您可​​以修改它以满足特殊需求......

EDIT: you must pass the domain string as a URL with it's protocol (http://, ftp://, etc) or parse_url() will not consider it a valid URL (unless you want to modify the code to behave differently)

编辑:您必须将域字符串作为带有协议的URL(http://,ftp://等)传递,或者parse_url()不会将其视为有效的URL(除非您希望修改代码以使其行为不同)

function basedomain( $str = '' )
{
    // $str must be passed WITH protocol. ex: http://domain.com
    $url = @parse_url( $str );
    if ( empty( $url['host'] ) ) return;
    $parts = explode( '.', $url['host'] );
    $slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
    return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}

if you need to be accurate use fopen or curl to open this URL: http://data.iana.org/TLD/tlds-alpha-by-domain.txt

如果您需要准确使用fopen或curl打开此URL:http://data.iana.org/TLD/tlds-alpha-by-domain.txt

then read the lines into an array and use that to compare the domain parts

然后将行读入数组并使用它来比较域部分

EDIT: to allow for Australian domains:

编辑:允许澳大利亚域名:

function au_basedomain( $str = '' )
{
    // $str must be passed WITH protocol. ex: http://domain.com
    $url = @parse_url( $str );
    if ( empty( $url['host'] ) ) return;
    $parts = explode( '.', $url['host'] );
    $slice = ( strlen( reset( array_slice( $parts, -2, 1 ) ) ) == 2 ) && ( count( $parts ) > 2 ) ? 3 : 2;
    if ( preg_match( '/\.(com|net|asn|org|id)\.au$/i', $url['host'] ) ) $slice = 3;
    return implode( '.', array_slice( $parts, ( 0 - $slice ), $slice ) );
}

IMPORTANT ADDITIONAL NOTES: I don't use this function to validate domains. It is generic code I only use to extract the base domain for the server it is running on from the global $_SERVER['SERVER_NAME'] for use within various internal scripts. Considering I have only ever worked on sites within the US, I have never encountered the Australian variants that pjv asked about. It is handy for internal use, but it is a long way from a complete domain validation process. If you are trying to use it in such a way, I recommend not to because of too many possibilities to match invalid domains.

重要补充说明:我不使用此功能来验证域。它是通用代码,我只用于从全局$ _SERVER ['SERVER_NAME']中提取运行它的服务器的基本域,以便在各种内部脚本中使用。考虑到我只在美国境内工作过,我从未遇到过pjv所询问的澳大利亚变种。它对内部使用很方便,但距离完整的域验证过程还有很长的路要走。如果您尝试以这种方式使用它,我建议不要因为太多可能性来匹配无效域。

#3


4  

Try using: http://php.net/manual/en/function.parse-url.php. Something like this should work:

尝试使用:http://php.net/manual/en/function.parse-url.php。像这样的东西应该工作:

$urlParts = parse_url($yourUrl);
$hostParts = explode('.', $urlParts['host']);
$hostParts = array_reverse($hostParts);
$host = $hostParts[1] . '.' . $hostParts[0];

#4


0  

Mixing with xil3 answer this is I got to check localhost as well as ip, so you can also work in development environment.
You still have to define what TLDs you want to use. other than that everything works fine.

与xil3混合回答这是我要检查localhost以及ip,所以你也可以在开发环境中工作。您仍需要定义要使用的TLD。除此之外一切正常。

<?php
function getTopLevelDomain($url){
    $urlData = parse_url($url);
    $urlHost = isset($urlData['host']) ? $urlData['host'] : '';
    $isIP = (bool)ip2long($urlHost);
    if($isIP){ /** To check if it's ip then return same ip */
        return $urlHost;
    }
    /** Add/Edit you TLDs here */
    $urlMap = array('com', 'com.pk', 'co.uk');

    $host = "";
    $hostData = explode('.', $urlHost);
    if(isset($hostData[1])){ /** To check "localhost" because it'll be without any TLDs */
        $hostData = array_reverse($hostData);

        if(array_search($hostData[1] . '.' . $hostData[0], $urlMap) !== FALSE) {
            $host = $hostData[2] . '.' . $hostData[1] . '.' . $hostData[0];
        } elseif(array_search($hostData[0], $urlMap) !== FALSE) {
            $host = $hostData[1] . '.' . $hostData[0];
        }
        return $host;
    }
    return ((isset($hostData[0]) && $hostData[0] != '') ? $hostData[0] : 'error no domain'); /* You can change this error in future */
}
?>

you can use it like this

你可以像这样使用它

$string = 'http://googl.com.pk';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://googl.com.pk:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://googl.com';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://googl.com:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com.pk';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com.pk:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://adad.asdasd.googl.com:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://192.168.0.101:23';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://192.168.0.101';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'http://localhost';
echo getTopLevelDomain( $string ) . '<br>';

$string = 'https;//';
echo getTopLevelDomain( $string ) . '<br>';

$string = '';
echo getTopLevelDomain( $string ) . '<br>';

You'll get result in string like this

你会得到像这样的字符串的结果

googl.com.pk
googl.com.pk
googl.com
googl.com
googl.com.pk
googl.com.pk
googl.com
googl.com
192.168.0.101
192.168.0.101
localhost
error no domain
error no domain

#5


-3  

Use this function:

使用此功能:

function getHost($url){
    if (strpos($url,"http://")){
        $httpurl=$url;
    } else {
        $httpurl="http://".$url;
    }
    $parse = parse_url($httpurl);
    $domain=$parse['host'];

    $portion=explode(".",$domain);
    $count=sizeof($portion)-1;
    if ($count>1){
        $result=$portion[$count-1].".".$portion[$count];
    } else {
        $result=$domain;
    }
    return $result;
}

Answer all variants of example URL's.

回答示例URL的所有变体。