I have a data.frame in R with a column containing character string of the form {some letters}-{a number}{a letter}, e.g. x <- 'KFKGDLDSKFDSKJJFDI-4567W'
. So I want for instance to get a column with the numbers eg '4567' for that particular example/row. Theres only one number but it can be of any reasonable length. How can I extract the number from each row in the data.frame?
我在R中有一个data.frame,其中一列包含{some letters}形式的字符串 - {a number} {a letter},例如x < - 'KFKGDLDSKFDSKJJFDI-4567W'。因此,我希望例如为该特定示例/行获取具有数字的列,例如'4567'。只有一个数字,但它可以是任何合理的长度。如何从data.frame中的每一行中提取数字?
2 个解决方案
#1
1
Use regular expressions to extract substrings. Use as.numeric
to convert the resulting character string to a number:
使用正则表达式提取子字符串。使用as.numeric将结果字符串转换为数字:
string = 'KFKGDLDSKFDSKJJFDI-4567W'
as.numeric(regmatches(string, regexpr('\\d+', string)))
# 4567
You can easily use this to create a new column in your data frame:
您可以轻松地使用它在数据框中创建新列:
#data = data.frame(x = rep(string, 10))
transform(data, y = as.numeric(regmatches(x, regexpr('\\d+', x))))
# x y
# 1 KFKGDLDSKFDSKJJFDI-4567W 4567
# 2 KFKGDLDSKFDSKJJFDI-4567W 4567
# 3 KFKGDLDSKFDSKJJFDI-4567W 4567
# 4 KFKGDLDSKFDSKJJFDI-4567W 4567
…
#2
0
Try this one:
试试这个:
gsub("[a-zA-Z]+-([0-9]+)[a-zA-Z]","\\1", "KFKGDLDSKFDSKJJFDI-4567W")
#1
1
Use regular expressions to extract substrings. Use as.numeric
to convert the resulting character string to a number:
使用正则表达式提取子字符串。使用as.numeric将结果字符串转换为数字:
string = 'KFKGDLDSKFDSKJJFDI-4567W'
as.numeric(regmatches(string, regexpr('\\d+', string)))
# 4567
You can easily use this to create a new column in your data frame:
您可以轻松地使用它在数据框中创建新列:
#data = data.frame(x = rep(string, 10))
transform(data, y = as.numeric(regmatches(x, regexpr('\\d+', x))))
# x y
# 1 KFKGDLDSKFDSKJJFDI-4567W 4567
# 2 KFKGDLDSKFDSKJJFDI-4567W 4567
# 3 KFKGDLDSKFDSKJJFDI-4567W 4567
# 4 KFKGDLDSKFDSKJJFDI-4567W 4567
…
#2
0
Try this one:
试试这个:
gsub("[a-zA-Z]+-([0-9]+)[a-zA-Z]","\\1", "KFKGDLDSKFDSKJJFDI-4567W")