使用正则表达式拆分名称

时间:2022-02-11 21:40:49

I'm trying to come up with regular expression which will split full names.

我试着提出一个正则表达式,它会把全名分开。

The first part is validation - I want to make sure the name matches the pattern "Name Name" or "Name MI Name", where MI can be one character optionally followed by a period. This weeds out complex names like "Jose Jacinto De La Pena" - and that's fine. The expression I came up with is ^([a-zA-Z]+\s)([a-zA-Z](\.?)\s){0,1}([a-zA-Z'-]+)$ and it seems to do the job.

第一部分是验证——我希望确保名称与模式“name name”或“name MI name”匹配,其中MI可以是一个字符,也可以是一个句号。这些名字就像“Jose Jacinto De La Pena”这样的复杂名字,很好。我想出的表达式是^([a-zA-Z]+ \ s)([a-zA-Z](\ ?)\ s){ 0,1 }([a-zA-Z”——]+)和美元似乎做这项工作。

But how do I modify it to split the name into two parts only? If middle initial is present, I want it to be a part of the first "name", in other words "James T. Kirk" should be split into "James T." and "Kirk". TIA.

但是我如何修改它才能把名字分成两部分呢?如果中间名出现,我希望它是第一个“名字”的一部分,换句话说,“James T. Kirk”应该分为“James T.”和“Kirk”。TIA。

4 个解决方案

#1


3  

Just add some parenthesis

只是添加一些括号

^(([a-z]+\s)([a-z](\.?))\s){0,1}([a-z'-]+)$

Your match will be in group 1 now

你的对手现在在第一组

string resultString = null;
try {
    resultString = Regex.Match(subjectString, @"^(([a-z]+\s)([a-z](\.?))\s){0,1}([a-z'-]+)$", RegexOptions.IgnoreCase).Groups[1].Value;
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Also, I made the regex case insensitive so that you can make it shorter (no a-zA-Z but a-z)

另外,我将regex大小写不敏感,以便您可以使它更短(不是a-zA-Z,而是a-z)

Update 1

更新1

The number groups don't work well for the case there is no initial so I wrote the regex from sratch

数字组不适合这种情况没有初始值,所以我写了来自sratch的regex

^(\w+\s(\w\.\s)?)(\w+)$

\w stands for any word charater and this is maybe what you need (you can replace it by a-z if that works better)

\w代表任何单词charater(如果效果更好,你可以用a-z替换)

Update 2

更新2

There is a nice feature in C# where you can name your captures

c#中有一个很好的特性,您可以在其中命名捕获

^(?<First>\w+\s(?:\w\.\s)?)(?<Last>\w+)$

Now you can refer to the group by name instead of number (think it's a bit more readable)

现在你可以通过名称而不是数字来引用这个组(认为它更容易读)

var subjectString = "James T. Kirk";
Regex regexObj = new Regex(@"^(?<First>\w+\s(?:\w\.\s)?)(?<Last>\w+)$", RegexOptions.IgnoreCase);

var groups = regexObj.Match(subjectString).Groups;
var firstName = groups["First"].Value;
var lastName = groups["Last"].Value;

#2


0  

You can accomplish this by making what is currently your second capturing group a non-capturing group by adding ?: just before the opening parentheses, and then moving that entire second group into the end of the first group, so it would become the following:

您可以通过添加?:就在开始括号之前,然后将整个第二组移动到第一组的末尾,从而使当前的第二个捕获组成为非捕获组,从而实现这一点。

^([a-zA-Z]+\s(?:[a-zA-Z](\.?)\s)?)([a-zA-Z'-]+)

Note that I also replaced the {0,1} with ?, because they are equivalent.

注意,我还用?替换了{0,1},因为它们是等价的。

This will result in two capturing groups, one for the first name and middle initial (if it exists), and one for the last name.

这将导致两个捕获组,一个用于第一个名称和中间初始(如果存在),一个用于姓氏。

#3


0  

I'm not sure if you want this way, but there is a method of doing it without regular expressions.

我不确定您是否希望这样做,但是有一种方法可以在没有正则表达式的情况下进行。

If the name is in the form of Name Name then you could do this:

如果名字是以名字的形式出现,你可以这样做:

// fullName is a string that has the full name, in the form of 'Name Name'
string firstName = fullName.Split(' ')[0];
string lastName = fullName.Split(' ')[1];

And if the name is in the form of Name MIName then you can do this:

如果名字是以名字的形式出现,你可以这样做:

string firstName = fullName.Split('.')[0] + ".";
string lastName = fullName.Split('.')[1].Trim();

Hope this helps!

希望这可以帮助!

#4


0  

Just put the optional part in the first capturing group:

把可选的部分放在第一个捕获组:

(?i)^([a-z]+(?:\s[a-z]\.?)?)\s([a-z'-]+)$

#1


3  

Just add some parenthesis

只是添加一些括号

^(([a-z]+\s)([a-z](\.?))\s){0,1}([a-z'-]+)$

Your match will be in group 1 now

你的对手现在在第一组

string resultString = null;
try {
    resultString = Regex.Match(subjectString, @"^(([a-z]+\s)([a-z](\.?))\s){0,1}([a-z'-]+)$", RegexOptions.IgnoreCase).Groups[1].Value;
} catch (ArgumentException ex) {
    // Syntax error in the regular expression
}

Also, I made the regex case insensitive so that you can make it shorter (no a-zA-Z but a-z)

另外,我将regex大小写不敏感,以便您可以使它更短(不是a-zA-Z,而是a-z)

Update 1

更新1

The number groups don't work well for the case there is no initial so I wrote the regex from sratch

数字组不适合这种情况没有初始值,所以我写了来自sratch的regex

^(\w+\s(\w\.\s)?)(\w+)$

\w stands for any word charater and this is maybe what you need (you can replace it by a-z if that works better)

\w代表任何单词charater(如果效果更好,你可以用a-z替换)

Update 2

更新2

There is a nice feature in C# where you can name your captures

c#中有一个很好的特性,您可以在其中命名捕获

^(?<First>\w+\s(?:\w\.\s)?)(?<Last>\w+)$

Now you can refer to the group by name instead of number (think it's a bit more readable)

现在你可以通过名称而不是数字来引用这个组(认为它更容易读)

var subjectString = "James T. Kirk";
Regex regexObj = new Regex(@"^(?<First>\w+\s(?:\w\.\s)?)(?<Last>\w+)$", RegexOptions.IgnoreCase);

var groups = regexObj.Match(subjectString).Groups;
var firstName = groups["First"].Value;
var lastName = groups["Last"].Value;

#2


0  

You can accomplish this by making what is currently your second capturing group a non-capturing group by adding ?: just before the opening parentheses, and then moving that entire second group into the end of the first group, so it would become the following:

您可以通过添加?:就在开始括号之前,然后将整个第二组移动到第一组的末尾,从而使当前的第二个捕获组成为非捕获组,从而实现这一点。

^([a-zA-Z]+\s(?:[a-zA-Z](\.?)\s)?)([a-zA-Z'-]+)

Note that I also replaced the {0,1} with ?, because they are equivalent.

注意,我还用?替换了{0,1},因为它们是等价的。

This will result in two capturing groups, one for the first name and middle initial (if it exists), and one for the last name.

这将导致两个捕获组,一个用于第一个名称和中间初始(如果存在),一个用于姓氏。

#3


0  

I'm not sure if you want this way, but there is a method of doing it without regular expressions.

我不确定您是否希望这样做,但是有一种方法可以在没有正则表达式的情况下进行。

If the name is in the form of Name Name then you could do this:

如果名字是以名字的形式出现,你可以这样做:

// fullName is a string that has the full name, in the form of 'Name Name'
string firstName = fullName.Split(' ')[0];
string lastName = fullName.Split(' ')[1];

And if the name is in the form of Name MIName then you can do this:

如果名字是以名字的形式出现,你可以这样做:

string firstName = fullName.Split('.')[0] + ".";
string lastName = fullName.Split('.')[1].Trim();

Hope this helps!

希望这可以帮助!

#4


0  

Just put the optional part in the first capturing group:

把可选的部分放在第一个捕获组:

(?i)^([a-z]+(?:\s[a-z]\.?)?)\s([a-z'-]+)$