格式化字符串为英国电话号码

时间:2023-01-14 14:24:04

I'm looking for a routine that will format a string of numbers as a UK phone number. The routine should account for UK area codes that require different formatting (i.e. London compared to Edinburgh compared to Worcester) as well as mobile numbers.

我正在寻找一个例程,将一串数字格式化为英国电话号码。该例程应该考虑需要不同格式的英国区域代码(即伦敦与爱丁堡相比,与伍斯特相比)以及移动号码。

My phone numbers are stored in the database as strings, containing only numeric characters.

我的电话号码作为字符串存储在数据库中,仅包含数字字符。

So far I have come up with this, but the performance seems poor.

到目前为止,我已经想出了这个,但表现似乎很差。

/// <summary>
/// Formats a string as a UK phone number
/// </summary>
/// <remarks>
/// 02012345678 becomes 020 1234 5678
/// 01311234567 becomes 0131 123 4567
/// 01905123456 becomes 01905 123456
/// 07816123456 becomes 07816 123456
/// </remarks>
public static string FormatPhoneNumber(string phoneNumber)
{
    string formattedPhoneNumber = null;

    if (!string.IsNullOrEmpty(phoneNumber))
    {
        System.Text.RegularExpressions.Regex area1 = new System.Text.RegularExpressions.Regex(@"^0[1-9]0");
        System.Text.RegularExpressions.Regex area2 = new System.Text.RegularExpressions.Regex(@"^01[1-9]1");

        string formatString;

        if (area1.Match(phoneNumber).Success)
        {
            formatString = "0{0:00 0000 0000}";
        }
        else if (area2.Match(phoneNumber).Success)
        {
            formatString = "0{0:000 000 0000}";
        }
        else
        {
            formatString = "0{0:0000 000000}";
        }

        formattedPhoneNumber = string.Format(formatString, Int64.Parse(phoneNumber));
    }

    return formattedPhoneNumber;
}

Thoughts welcomed on how to improve this...

关于如何改进这一点的想法受到欢迎

Edit

My initial thoughts are that I should store phone numbers as numeric fields in the database, then I can go without the Int64.Parse and know that they are truly numeric.

我最初的想法是,我应该将电话号码存储为数据库中的数字字段,然后我可以不使用Int64.Parse并知道它们是真正的数字。

Edit 2

The phone numbers will all be UK geographic or UK mobile numbers, so special cases like 0800 do not need to be considered

电话号码都是英国地理或英国手机号码,因此不需要考虑像0800这样的特殊情况

4 个解决方案

#1


11  

UK telephone numbers vary in length from 7 digits to 10 digits, not including the leading zero. "area" codes can vary between 2 and usually 4 (but occasionally 5) digits.

英国电话号码的长度从7位到10位不等,不包括前导零。 “区域”代码可以在2到4个(但偶尔为5个)数字之间变化。

All of the tables that show the area code and total length for each number prefix are available from OFCOM's website. NB: These tables are very long.

所有显示每个数字前缀的区号和总长度的表都可以从OFCOM的网站获得。注意:这些表很长。

Also, there's no standard for exactly where spaces are put. Some people might put them in difference places depending on how "readable" it makes the resulting text.

而且,对于确切放置空间的位置没有标准。有些人可能会把它们放在不同的地方,这取决于它产生文本的“可读性”。

#2


2  

** I'm looking for a routine that will format a string of numbers as a UK phone number. **

**我正在寻找一个例程,将一串数字格式化为英国电话号码。 **

You could download the Ofcom database that lists the formats for each number range, including national dialling only numbers, and do a lookup for each number you need to format. The database lists the SABCDE digits and the format: 0+10, 2+8, 3+7, 4+6, 4+5, 5+5, or 5+4 for each range.

您可以下载Ofcom数据库,其中列出了每个号码范围的格式,包括全国拨号号码,并查找您需要格式化的每个号码。数据库列出了每个范围的SABCDE数字和格式:0 + 10,2 + 8,3 + 7,4 + 6,4 + 5,5 + 5或5 + 4。

There are a small number of errors in the database (especially for 01697 and 0169 77 codes), but they number less than ten errors in more than a quarter of a million entries.

数据库中存在少量错误(特别是对于01697和0169 77代码),但在超过25万个条目中,它们的错误数少于10个。

There are four files covering 01 and 02 numbers, and separate files for various non-geographic number ranges.

有四个文件覆盖01和02号码,以及各种非地理号码范围的单独文件。

0+10 numbers are 'National Dialling Only' and are written without parentheses around the area code part. The area code will be 02x for all 02 numbers, 01xx for all 011x and 01x1 numbers, and 01xxx for most other 01 numbers (a very small number - about a dozen - will be 01xx xx though).

0 + 10个数字是“全国仅拨号”,并且在区号部分周围没有括号。区号为所有02号码的02x,所有011x和01x1号码的01xx,以及大多数其他01号码的01xxx(非常小的数字 - 大约十几个 - 将是01xx xx)。

Parentheses surround the area code on all other 01 and 02 numbers (that is, use parentheses on 01 and 02 numbers where the local number part does not begin with a 0 or a 1). Parentheses show that local dialling is possible within the same area by omitting the digits enclosed by the parentheses.

括号围绕所有其他01和02号码上的区号(即,在01和02号码上使用括号,其中本地号码部分不以0或1开头)。括号表示通过省略括号括起来的数字,可以在同一区域内进行本地拨号。

The 2+8 nomenclature shows the area code and local number length, with the entry 2075 : 2+8 meaning the number is formatted as (020) 75xx xxxx. Remember the leading zero is not 'counted' in the 2+8 determination.

2 + 8命名法显示区号和本地号码长度,条目2075:2 + 8表示号码格式为(020)75xx xxxx。请记住,在2 + 8测定中,前导零不会被“计算”。

** UK telephone numbers vary in length from 8 digits to 12 digits **

**英国电话号码的长度从8位到12位不等**

No. Since 2000, most have 10 digits after the '0' trunk code. A few still have 9 digits after the '0' trunk code.

不会。自2000年以来,大多数都在'0'中继代码之后有10位数字。少数仍然在'0'中继代码后有9位数字。

There are also a few special numbers such as 0800 1111 and 0845 4647 to consider.

还有一些特殊数字,如0800 1111和0845 4647,需要考虑。

** "area" codes can vary between 2 and 4 digits. **

**“区域”代码可在2到4位数之间变化。 **

Area codes can vary between 2 and 5 digits (the leading zero is not counted). To be clear, '020' is classed as a 2-digit area code because the leading 0 is actually the trunk code. There are also 011x and 01x1 area codes, and most numbers others have 01xxx area codes. The latter may have local numbers that are only 5 digits long instead of the more widely found 6 digit local numbers. A very small number have an 01xx xx area code and these have 5 or 4 digit local numbers.

区号可以在2到5位之间变化(不计算前导零)。需要说明的是,'020'被归类为2位区号,因为前导0实际上是中继码。还有011x和01x1区域代码,大多数其他区域代码都有01xxx区号。后者可能具有仅5位数的本地数字,而不是更广泛发现的6位数本地数字。一个非常小的数字有一个01xx xx区号,它们有5或4位本地号码。

** Also, there's no standard for exactly where spaces are put. **

**此外,对于确切放置空间的位置没有标准。 **

There is always a space between the area code part and the local number part for all 01 and 02 numbers.

对于所有01和02号码,区号和本地号码部分之间总是有一个空格。

It is also traditional for (01xx xx) area codes to have a space within the area code as shown. This represents the old local exchange groupings where this system is still in use. Other (shorter) area codes are not split.

(01xx xx)区域代码在区域代码中具有空间也是传统的,如图所示。这表示此系统仍在使用的旧本地交换分组。其他(较短)区号不分开。

Local numbers with 7 or 8 digits have a split before the fourth digit from the end. Local numbers with 4, 5, or 6 digits are not split. This applies to geographic and non-geographic numbers alike.

具有7或8位数字的本地数字在结尾的第四位数之前有一个分割。具有4,5或6位数的本地号码不会分开。这适用于地理和非地理数字。

For most 03, 08, and 09 numbers, the number is written as 0xxx xxx xxxx.

对于大多数03,08和09号码,该号码写为0xxx xxx xxxx。

Some 0800 and all 0500 numbers are written 0xxx xxxxxx.

一些0800和所有0500号码都写成0xxx xxxxxx。

For 055, 056, and 070 numbers the number is written 0xx xxxx xxxx.

对于055,056和070数字,该数字写为0xx xxxx xxxx。

For mobile and pager numbers, use 07xxx xxxxxx.

对于移动和寻呼机号码,请使用07xxx xxxxxx。

** except some people use '08000 abc def' instead of '0800 0abc def' **

**除了一些人使用'08000 abc def'而不是'0800 0abc def'**

That usage is incorrect. Do be aware that some 0800 numbers have 9 digits after the 0 trunk code, whilst others have 10 digits after the 0 trunk code.

这种用法不正确。请注意,一些0800号码在0中继码后有9位数字,而其他号码在0中继码后有10位数字。

So, both 0800 xxxxxx and 0800 xxx xxxx are correct.

所以,0800 xxxxxx和0800 xxx xxxx都是正确的。

0500 numbers use only 0500 xxxxxx.

0500号码仅使用0500 xxxxxx。

Most 03, 08, and 09 numbers are written written as 0xxx xxx xxxx.

大多数03,08和09编号写为0xxx xxx xxxx。

See also: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers#United_Kingdom

另见:http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers#United_Kingdom

#3


2  

I spent some time going through the OFCOM sheets and came up with the following.

我花了一些时间浏览OFCOM表并提出以下内容。

public static class TelephoneHelper
{

    #region Regex Patterns
    private static readonly Regex[] patterns = 
    {
        new Regex(@"(?<first>013873)(?<second>\d{5})"),
        new Regex(@"(?<first>015242)(?<second>\d{5})"),
        new Regex(@"(?<first>015394)(?<second>\d{5})"),
        new Regex(@"(?<first>015395)(?<second>\d{5})"),
        new Regex(@"(?<first>015396)(?<second>\d{5})"),
        new Regex(@"(?<first>016973)(?<second>\d{5})"),
        new Regex(@"(?<first>016974)(?<second>\d{5})"),
        new Regex(@"(?<first>016977)(?<second>\d{4}\d?)"),
        new Regex(@"(?<first>017683)(?<second>\d{5})"),
        new Regex(@"(?<first>017684)(?<second>\d{5})"),
        new Regex(@"(?<first>017687)(?<second>\d{5})"),
        new Regex(@"(?<first>019467)(?<second>\d{5})"),
        new Regex(@"(?<first>02\d)(?<second>\d{4})(?<third>\d{4})"),
        new Regex(@"(?<first>03\d{2})(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>0500\d{6})"),
        new Regex(@"(?<first>05\d{3})(?<second>\d{6})"),
        new Regex(@"(?<first>07\d{3})(?<second>\d{6})"),
        new Regex(@"(?<first>08\d{2})(?<second>\d{3})(?<third>\d{3}\d?)"),
        new Regex(@"(?<first>09\d{2})(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>01\d1)(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>011\d)(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>01\d{3})(?<second>\d{5}\d?)")
    };
    #endregion

    public static string FormatAsUkTelephone(this string number)
    {
        Regex matchedPattern = null;
        foreach (Regex pattern in patterns)
        {
            if (pattern.IsMatch(number))
            {
                matchedPattern = pattern;
                break;
            }
        }
        if (matchedPattern != null)
        {
            var mc = matchedPattern.Matches(number);
            if (mc[0].Groups.Count == 3)
            {
                return String.Format("{0} {1}", mc[0].Groups["first"], mc[0].Groups["second"]);
            }
            else if (mc[0].Groups.Count == 4)
            {
                return String.Format("{0} {1} {2}", mc[0].Groups["first"], mc[0].Groups["second"], mc[0].Groups["third"]);
            }
        }
        return number;
    }

#4


1  

I'd be tempted to use a tighter set of rules that only check the bear minimum; So on the assumption the leading zero is in the database, pseudo code would be:

我很想使用更严格的规则,只检查熊的最小值;因此,假设前导零在数据库中,伪代码将是:

if( phoneNumber.substring(1,1) == "2" )
{
    // 000 0000 0000
}
else if( phoneNumber.substring(1,1) == "1" && (phoneNumber.substring(1,1) == "2" || phoneNumber.substring(3,1) = "1") )
{
    // 0000 000 0000
}
else
{
    // 00000 000000
}

NB. your patterns are slightly wrong 023 is a three digit code, and 0800 is not

NB。你的模式略有错023是三位数代码,而0800则不是

#1


11  

UK telephone numbers vary in length from 7 digits to 10 digits, not including the leading zero. "area" codes can vary between 2 and usually 4 (but occasionally 5) digits.

英国电话号码的长度从7位到10位不等,不包括前导零。 “区域”代码可以在2到4个(但偶尔为5个)数字之间变化。

All of the tables that show the area code and total length for each number prefix are available from OFCOM's website. NB: These tables are very long.

所有显示每个数字前缀的区号和总长度的表都可以从OFCOM的网站获得。注意:这些表很长。

Also, there's no standard for exactly where spaces are put. Some people might put them in difference places depending on how "readable" it makes the resulting text.

而且,对于确切放置空间的位置没有标准。有些人可能会把它们放在不同的地方,这取决于它产生文本的“可读性”。

#2


2  

** I'm looking for a routine that will format a string of numbers as a UK phone number. **

**我正在寻找一个例程,将一串数字格式化为英国电话号码。 **

You could download the Ofcom database that lists the formats for each number range, including national dialling only numbers, and do a lookup for each number you need to format. The database lists the SABCDE digits and the format: 0+10, 2+8, 3+7, 4+6, 4+5, 5+5, or 5+4 for each range.

您可以下载Ofcom数据库,其中列出了每个号码范围的格式,包括全国拨号号码,并查找您需要格式化的每个号码。数据库列出了每个范围的SABCDE数字和格式:0 + 10,2 + 8,3 + 7,4 + 6,4 + 5,5 + 5或5 + 4。

There are a small number of errors in the database (especially for 01697 and 0169 77 codes), but they number less than ten errors in more than a quarter of a million entries.

数据库中存在少量错误(特别是对于01697和0169 77代码),但在超过25万个条目中,它们的错误数少于10个。

There are four files covering 01 and 02 numbers, and separate files for various non-geographic number ranges.

有四个文件覆盖01和02号码,以及各种非地理号码范围的单独文件。

0+10 numbers are 'National Dialling Only' and are written without parentheses around the area code part. The area code will be 02x for all 02 numbers, 01xx for all 011x and 01x1 numbers, and 01xxx for most other 01 numbers (a very small number - about a dozen - will be 01xx xx though).

0 + 10个数字是“全国仅拨号”,并且在区号部分周围没有括号。区号为所有02号码的02x,所有011x和01x1号码的01xx,以及大多数其他01号码的01xxx(非常小的数字 - 大约十几个 - 将是01xx xx)。

Parentheses surround the area code on all other 01 and 02 numbers (that is, use parentheses on 01 and 02 numbers where the local number part does not begin with a 0 or a 1). Parentheses show that local dialling is possible within the same area by omitting the digits enclosed by the parentheses.

括号围绕所有其他01和02号码上的区号(即,在01和02号码上使用括号,其中本地号码部分不以0或1开头)。括号表示通过省略括号括起来的数字,可以在同一区域内进行本地拨号。

The 2+8 nomenclature shows the area code and local number length, with the entry 2075 : 2+8 meaning the number is formatted as (020) 75xx xxxx. Remember the leading zero is not 'counted' in the 2+8 determination.

2 + 8命名法显示区号和本地号码长度,条目2075:2 + 8表示号码格式为(020)75xx xxxx。请记住,在2 + 8测定中,前导零不会被“计算”。

** UK telephone numbers vary in length from 8 digits to 12 digits **

**英国电话号码的长度从8位到12位不等**

No. Since 2000, most have 10 digits after the '0' trunk code. A few still have 9 digits after the '0' trunk code.

不会。自2000年以来,大多数都在'0'中继代码之后有10位数字。少数仍然在'0'中继代码后有9位数字。

There are also a few special numbers such as 0800 1111 and 0845 4647 to consider.

还有一些特殊数字,如0800 1111和0845 4647,需要考虑。

** "area" codes can vary between 2 and 4 digits. **

**“区域”代码可在2到4位数之间变化。 **

Area codes can vary between 2 and 5 digits (the leading zero is not counted). To be clear, '020' is classed as a 2-digit area code because the leading 0 is actually the trunk code. There are also 011x and 01x1 area codes, and most numbers others have 01xxx area codes. The latter may have local numbers that are only 5 digits long instead of the more widely found 6 digit local numbers. A very small number have an 01xx xx area code and these have 5 or 4 digit local numbers.

区号可以在2到5位之间变化(不计算前导零)。需要说明的是,'020'被归类为2位区号,因为前导0实际上是中继码。还有011x和01x1区域代码,大多数其他区域代码都有01xxx区号。后者可能具有仅5位数的本地数字,而不是更广泛发现的6位数本地数字。一个非常小的数字有一个01xx xx区号,它们有5或4位本地号码。

** Also, there's no standard for exactly where spaces are put. **

**此外,对于确切放置空间的位置没有标准。 **

There is always a space between the area code part and the local number part for all 01 and 02 numbers.

对于所有01和02号码,区号和本地号码部分之间总是有一个空格。

It is also traditional for (01xx xx) area codes to have a space within the area code as shown. This represents the old local exchange groupings where this system is still in use. Other (shorter) area codes are not split.

(01xx xx)区域代码在区域代码中具有空间也是传统的,如图所示。这表示此系统仍在使用的旧本地交换分组。其他(较短)区号不分开。

Local numbers with 7 or 8 digits have a split before the fourth digit from the end. Local numbers with 4, 5, or 6 digits are not split. This applies to geographic and non-geographic numbers alike.

具有7或8位数字的本地数字在结尾的第四位数之前有一个分割。具有4,5或6位数的本地号码不会分开。这适用于地理和非地理数字。

For most 03, 08, and 09 numbers, the number is written as 0xxx xxx xxxx.

对于大多数03,08和09号码,该号码写为0xxx xxx xxxx。

Some 0800 and all 0500 numbers are written 0xxx xxxxxx.

一些0800和所有0500号码都写成0xxx xxxxxx。

For 055, 056, and 070 numbers the number is written 0xx xxxx xxxx.

对于055,056和070数字,该数字写为0xx xxxx xxxx。

For mobile and pager numbers, use 07xxx xxxxxx.

对于移动和寻呼机号码,请使用07xxx xxxxxx。

** except some people use '08000 abc def' instead of '0800 0abc def' **

**除了一些人使用'08000 abc def'而不是'0800 0abc def'**

That usage is incorrect. Do be aware that some 0800 numbers have 9 digits after the 0 trunk code, whilst others have 10 digits after the 0 trunk code.

这种用法不正确。请注意,一些0800号码在0中继码后有9位数字,而其他号码在0中继码后有10位数字。

So, both 0800 xxxxxx and 0800 xxx xxxx are correct.

所以,0800 xxxxxx和0800 xxx xxxx都是正确的。

0500 numbers use only 0500 xxxxxx.

0500号码仅使用0500 xxxxxx。

Most 03, 08, and 09 numbers are written written as 0xxx xxx xxxx.

大多数03,08和09编号写为0xxx xxx xxxx。

See also: http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers#United_Kingdom

另见:http://en.wikipedia.org/wiki/Local_conventions_for_writing_telephone_numbers#United_Kingdom

#3


2  

I spent some time going through the OFCOM sheets and came up with the following.

我花了一些时间浏览OFCOM表并提出以下内容。

public static class TelephoneHelper
{

    #region Regex Patterns
    private static readonly Regex[] patterns = 
    {
        new Regex(@"(?<first>013873)(?<second>\d{5})"),
        new Regex(@"(?<first>015242)(?<second>\d{5})"),
        new Regex(@"(?<first>015394)(?<second>\d{5})"),
        new Regex(@"(?<first>015395)(?<second>\d{5})"),
        new Regex(@"(?<first>015396)(?<second>\d{5})"),
        new Regex(@"(?<first>016973)(?<second>\d{5})"),
        new Regex(@"(?<first>016974)(?<second>\d{5})"),
        new Regex(@"(?<first>016977)(?<second>\d{4}\d?)"),
        new Regex(@"(?<first>017683)(?<second>\d{5})"),
        new Regex(@"(?<first>017684)(?<second>\d{5})"),
        new Regex(@"(?<first>017687)(?<second>\d{5})"),
        new Regex(@"(?<first>019467)(?<second>\d{5})"),
        new Regex(@"(?<first>02\d)(?<second>\d{4})(?<third>\d{4})"),
        new Regex(@"(?<first>03\d{2})(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>0500\d{6})"),
        new Regex(@"(?<first>05\d{3})(?<second>\d{6})"),
        new Regex(@"(?<first>07\d{3})(?<second>\d{6})"),
        new Regex(@"(?<first>08\d{2})(?<second>\d{3})(?<third>\d{3}\d?)"),
        new Regex(@"(?<first>09\d{2})(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>01\d1)(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>011\d)(?<second>\d{3})(?<third>\d{4})"),
        new Regex(@"(?<first>01\d{3})(?<second>\d{5}\d?)")
    };
    #endregion

    public static string FormatAsUkTelephone(this string number)
    {
        Regex matchedPattern = null;
        foreach (Regex pattern in patterns)
        {
            if (pattern.IsMatch(number))
            {
                matchedPattern = pattern;
                break;
            }
        }
        if (matchedPattern != null)
        {
            var mc = matchedPattern.Matches(number);
            if (mc[0].Groups.Count == 3)
            {
                return String.Format("{0} {1}", mc[0].Groups["first"], mc[0].Groups["second"]);
            }
            else if (mc[0].Groups.Count == 4)
            {
                return String.Format("{0} {1} {2}", mc[0].Groups["first"], mc[0].Groups["second"], mc[0].Groups["third"]);
            }
        }
        return number;
    }

#4


1  

I'd be tempted to use a tighter set of rules that only check the bear minimum; So on the assumption the leading zero is in the database, pseudo code would be:

我很想使用更严格的规则,只检查熊的最小值;因此,假设前导零在数据库中,伪代码将是:

if( phoneNumber.substring(1,1) == "2" )
{
    // 000 0000 0000
}
else if( phoneNumber.substring(1,1) == "1" && (phoneNumber.substring(1,1) == "2" || phoneNumber.substring(3,1) = "1") )
{
    // 0000 000 0000
}
else
{
    // 00000 000000
}

NB. your patterns are slightly wrong 023 is a three digit code, and 0800 is not

NB。你的模式略有错023是三位数代码,而0800则不是