序列化字符串中的“姓氏,名字” - >“名字姓氏”

时间:2021-10-02 08:22:26

I have a bunch of strings that contain lists of names in last name, first name format, separated by commas, like so:

我有一堆字符串,其中包含姓氏,名字格式的名称列表,用逗号分隔,如下所示:

names <- c('Beaufoy, Simon, Boyle, Danny','Nolan, Christopher','Blumberg, Stuart, Cholodenko, Lisa','Seidler, David','Sorkin, Aaron')

What's the easiest way to convert all these names within the strings to first name last name format?

将字符串中的所有这些名称转换为名字姓氏格式的最简单方法是什么?

3 个解决方案

#1


9  

If you can be certain that a comma isn't going to be in a person's name, this might work:

如果你可以确定一个逗号不是一个人的名字,这可能会有效:

mynames <- c('Beaufoy, Simon, Boyle, Danny',
             'Nolan, Christopher',
             'Blumberg, Stuart, Cholodenko, Lisa',
             'Seidler, David',
             'Sorkin, Aaron',
             'Hoover, J. Edgar')
mynames2 <- strsplit(mynames, ", ")

unlist(lapply(mynames2, 
              function(x) paste(x[1:length(x) %% 2 == 0], 
                                x[1:length(x) %% 2 != 0])))
# [1] "Simon Beaufoy"     "Danny Boyle"       "Christopher Nolan"
# [4] "Stuart Blumberg"   "Lisa Cholodenko"   "David Seidler"    
# [7] "Aaron Sorkin"      "J. Edgar Hoover"        

I've added J. Edgar Hoover in there for good measure.

我已经在那里添加了J. Edgar Hoover。

If you want the names that were quoted together to stay together, add collapse = ", " to your paste() function:

如果您希望一起引用的名称保持在一起,请将collapse =“,”添加到您的paste()函数中:

unlist(lapply(mynames2, 
              function(x) paste(x[1:length(x) %% 2 == 0], 
                                x[1:length(x) %% 2 != 0],
                                collapse = ", ")))
# [1] "Simon Beaufoy, Danny Boyle"       "Christopher Nolan"               
# [3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"                   
# [5] "Aaron Sorkin"                     "J. Edgar Hoover"    

#2


3  

(1) Maintain same names in each element This can be done with a single gsub (assuming there are no commas within names):

(1)在每个元素中保持相同的名称这可以使用单个gsub来完成(假设名称中没有逗号):

> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", names)
[1] "Simon Beaufoy, Danny Boyle"       "Christopher Nolan"               
[3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"                   
[5] "Aaron Sorkin"    

> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", "Hoover, J. Edgar")
[1] "J. Edgar Hoover"

(2) Separate into one name per element If you wanted each first name last name in a separate element then use (a) scan

(2)每个元素分成一个名称如果你想在一个单独的元素中使用每个名字的姓氏,那么使用(a)扫描

scan(text = out, sep = ",", what = "")

where out is the result of the gsub above or to get it directly try (b) strapply:

如果out是上面的gsub的结果或直接得到它(b)strapply:

> library(gsubfn)
> strapply(names, "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), simplify = c)
[1] "Simon Beaufoy"     "Danny Boyle"       "Christopher Nolan"
[4] "Stuart Blumberg"   "Lisa Cholodenko"   "David Seidler"    
[7] "Aaron Sorkin"     

> strapply("Hoover, Edgar J.", "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), 
+   simplify = c)
[1] "Edgar J. Hoover"

Note that all examples above used the same regular expression for matching.

请注意,上面的所有示例都使用相同的正则表达式进行匹配

UPDATE: removed comma separating first and last name.

更新:删除分隔名字和姓氏的逗号。

UPDATE: added code to separate out each first name last name into a separate element in case that is the preferred output format.

更新:添加代码以将每个名字姓氏分隔成单独的元素,以防首选输出格式。

#3


1  

I'm in favor of @AnandaMahto's Answer, but just for fun, this illustrates another method using scan, split, and rapply.

我赞成@ AnandaMahto的答案,但为了好玩,这说明了另一种使用scan,split和rapply的方法。

names <- c(names, 'Chambers, John, Ihaka, Ross, Gentleman, Robert')

# extract names
snames <- 
lapply(names, function(x) scan(text=x, what='', sep=',', strip.white=TRUE, quiet=TRUE))

# break up names
snames<-lapply(snames, function(x) split(x, rep(seq(length(x) %/% 2), each=2)))

# collapse together, reversed
rapply(snames, function(x) paste(x[2:1], collapse=' '))

#1


9  

If you can be certain that a comma isn't going to be in a person's name, this might work:

如果你可以确定一个逗号不是一个人的名字,这可能会有效:

mynames <- c('Beaufoy, Simon, Boyle, Danny',
             'Nolan, Christopher',
             'Blumberg, Stuart, Cholodenko, Lisa',
             'Seidler, David',
             'Sorkin, Aaron',
             'Hoover, J. Edgar')
mynames2 <- strsplit(mynames, ", ")

unlist(lapply(mynames2, 
              function(x) paste(x[1:length(x) %% 2 == 0], 
                                x[1:length(x) %% 2 != 0])))
# [1] "Simon Beaufoy"     "Danny Boyle"       "Christopher Nolan"
# [4] "Stuart Blumberg"   "Lisa Cholodenko"   "David Seidler"    
# [7] "Aaron Sorkin"      "J. Edgar Hoover"        

I've added J. Edgar Hoover in there for good measure.

我已经在那里添加了J. Edgar Hoover。

If you want the names that were quoted together to stay together, add collapse = ", " to your paste() function:

如果您希望一起引用的名称保持在一起,请将collapse =“,”添加到您的paste()函数中:

unlist(lapply(mynames2, 
              function(x) paste(x[1:length(x) %% 2 == 0], 
                                x[1:length(x) %% 2 != 0],
                                collapse = ", ")))
# [1] "Simon Beaufoy, Danny Boyle"       "Christopher Nolan"               
# [3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"                   
# [5] "Aaron Sorkin"                     "J. Edgar Hoover"    

#2


3  

(1) Maintain same names in each element This can be done with a single gsub (assuming there are no commas within names):

(1)在每个元素中保持相同的名称这可以使用单个gsub来完成(假设名称中没有逗号):

> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", names)
[1] "Simon Beaufoy, Danny Boyle"       "Christopher Nolan"               
[3] "Stuart Blumberg, Lisa Cholodenko" "David Seidler"                   
[5] "Aaron Sorkin"    

> gsub("([^, ][^,]*), ([^,]+)", "\\2 \\1", "Hoover, J. Edgar")
[1] "J. Edgar Hoover"

(2) Separate into one name per element If you wanted each first name last name in a separate element then use (a) scan

(2)每个元素分成一个名称如果你想在一个单独的元素中使用每个名字的姓氏,那么使用(a)扫描

scan(text = out, sep = ",", what = "")

where out is the result of the gsub above or to get it directly try (b) strapply:

如果out是上面的gsub的结果或直接得到它(b)strapply:

> library(gsubfn)
> strapply(names, "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), simplify = c)
[1] "Simon Beaufoy"     "Danny Boyle"       "Christopher Nolan"
[4] "Stuart Blumberg"   "Lisa Cholodenko"   "David Seidler"    
[7] "Aaron Sorkin"     

> strapply("Hoover, Edgar J.", "([^, ][^,]*), ([^,]+)", x + y ~ paste(y, x), 
+   simplify = c)
[1] "Edgar J. Hoover"

Note that all examples above used the same regular expression for matching.

请注意,上面的所有示例都使用相同的正则表达式进行匹配

UPDATE: removed comma separating first and last name.

更新:删除分隔名字和姓氏的逗号。

UPDATE: added code to separate out each first name last name into a separate element in case that is the preferred output format.

更新:添加代码以将每个名字姓氏分隔成单独的元素,以防首选输出格式。

#3


1  

I'm in favor of @AnandaMahto's Answer, but just for fun, this illustrates another method using scan, split, and rapply.

我赞成@ AnandaMahto的答案,但为了好玩,这说明了另一种使用scan,split和rapply的方法。

names <- c(names, 'Chambers, John, Ihaka, Ross, Gentleman, Robert')

# extract names
snames <- 
lapply(names, function(x) scan(text=x, what='', sep=',', strip.white=TRUE, quiet=TRUE))

# break up names
snames<-lapply(snames, function(x) split(x, rep(seq(length(x) %/% 2), each=2)))

# collapse together, reversed
rapply(snames, function(x) paste(x[2:1], collapse=' '))