I have a column containing 1200 character strings. In each one, every four character group is hexadecimal for a number. i.e. 300 numbers in hexadecimal crammed into a 1200 character string, in every row. I need to get each number out into decimal, and into its own column (300 new columns) named 1-300. Here's what I've figured out so far:
我有一个包含1200个字符串的列。在每个字符组中,每四个字符组都是一个数字的十六进制。也就是说,在每一行中,在一个1200字串中塞入300个十六进制数字。我需要把每一个数字都写成小数,并放入它自己的列(300个新列)命名为1-300。以下是我到目前为止所发现的:
Data.frame:
BigString
[1] 0043003E803C0041004A...(etc...)
Here's what I've done so far:
以下是我到目前为止所做的:
decimal.fours <- function(x) {
strtoi(substring(BigString[x], seq(1,1197,4), seq(4,1197,4)), 16L)
}
decimal.fours(1)
[1] 283 291 239 177 ...
But now I'm stuck. How can I output these individual number, (and the remaining 296, into new columns? I have fifty total rows/strings. It would be great to do them all at once, i.e. 300 new columns, containing split up substrings from 50 strings.
但现在我卡住了。如何将这些单独的数字(以及其余的296)输出到新的列中?总共有50行/字符串。如果能一次完成所有这些操作,那就太好了,比如300个新列,其中包含50个字符串的分割子字符串。
3 个解决方案
#1
1
Obligatory tidyverse example:
义务tidyverse例子:
library(tidyverse)
Setup some data
设置一些数据
set.seed(1492)
bet <- c(0:9, LETTERS[1:6]) # alphabet for hex digit sequences
i <- 8 # number of rows
n <- 10 # number of 4-hex-digit sequences
df <- data_frame(
some_other_col=LETTERS[1:i],
big_str=map_chr(1:i, ~sample(bet, 4*n, replace=TRUE) %>% paste0(collapse=""))
)
df
## # A tibble: 8 × 2
## some_other_col big_str
## <chr> <chr>
## 1 A 432100D86CAA388C15AEA6291E985F2FD3FB6104
## 2 B BC2673D112925EBBB3FD175837AF7176C39B4888
## 3 C B4E99FDAABA47515EADA786715E811EE0502ABE8
## 4 D 64E622D7037D35DE6ADC40D0380E1DC12D753CBC
## 5 E CF7CDD7BBC610443A8D8FCFD896CA9730673B181
## 6 F ED86AEE8A7B65F843200B823CFBD17E9F3CA4EEF
## 7 G 2B9BCB73941228C501F937DA8E6EF033B5DD31F6
## 8 H 40823BBBFDF9B14839B7A95B6E317EBA9B016ED5
Do the manipulation
做这个操作
read_fwf(paste0(df$big_str, collapse="\n"),
fwf_widths(rep(4, n)),
col_types=paste0(rep("c", n), collapse="")) %>%
mutate_all(strtoi, base=16) %>%
bind_cols(df) %>%
select(some_other_col, everything(), -big_str)
## # A tibble: 8 × 11
## some_other_col X1 X2 X3 X4 X5 X6 X7 X8 X9
## <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 A 17185 216 27818 14476 5550 42537 7832 24367 54267
## 2 B 48166 29649 4754 24251 46077 5976 14255 29046 50075
## 3 C 46313 40922 43940 29973 60122 30823 5608 4590 1282
## 4 D 25830 8919 893 13790 27356 16592 14350 7617 11637
## 5 E 53116 56699 48225 1091 43224 64765 35180 43379 1651
## 6 F 60806 44776 42934 24452 12800 47139 53181 6121 62410
## 7 G 11163 52083 37906 10437 505 14298 36462 61491 46557
## 8 H 16514 15291 65017 45384 14775 43355 28209 32442 39681
## # ... with 1 more variables: X10 <int>
#2
1
You can use read.fwf
which read in files with fixed width for each column:
您可以使用阅读。fwf,读取每个列的固定宽度文件:
# an example vector of big strings
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
n = 5 # n is the number of columns for your result(300 for your real case)
as.data.frame(
lapply(read.fwf(file = textConnection(BigString),
widths = rep(4, n),
colClasses = "character"),
strtoi, base = 16))
# V1 V2 V3 V4 V5
#1 67 62 32828 65 74
#2 67 62 32828 65 74
#3 67 62 32828 65 74
If you'd like to keep the decimal.hours
function, you can modify it as follows and call lapply
to convert your bigStrings to list of integers which can be further converted to data.frame with do.call(rbind, ...)
pattern:
如果你想保留小数点的话。函数,您可以如下所示修改它,并调用lapply将您的bigstring转换为可以进一步转换为data.frame的整数列表。调用(rbind…)模式:
decimal.fours <- function(x) {
strtoi(substring(x, seq(1,1197,4), seq(4,1197,4)), 16L)
}
do.call(rbind, lapply(BigString, decimal.fours))
#3
1
just a try using base-R
试试用base-R
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
df = data.frame(BigString)
t(sapply(df$BigString, function(x) strtoi(substring(x, seq(1, 297, 4)[1:5],
seq(4, 300, 4)[1:5]), base = 16)))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 67 62 32828 65 74
#[2,] 67 62 32828 65 74
#[3,] 67 62 32828 65 74
# you can set the columns together at the end using `paste0("new_col", 1:300)`
# [1:5] was just used for this example, because i had strings of length 20cahr
#1
1
Obligatory tidyverse example:
义务tidyverse例子:
library(tidyverse)
Setup some data
设置一些数据
set.seed(1492)
bet <- c(0:9, LETTERS[1:6]) # alphabet for hex digit sequences
i <- 8 # number of rows
n <- 10 # number of 4-hex-digit sequences
df <- data_frame(
some_other_col=LETTERS[1:i],
big_str=map_chr(1:i, ~sample(bet, 4*n, replace=TRUE) %>% paste0(collapse=""))
)
df
## # A tibble: 8 × 2
## some_other_col big_str
## <chr> <chr>
## 1 A 432100D86CAA388C15AEA6291E985F2FD3FB6104
## 2 B BC2673D112925EBBB3FD175837AF7176C39B4888
## 3 C B4E99FDAABA47515EADA786715E811EE0502ABE8
## 4 D 64E622D7037D35DE6ADC40D0380E1DC12D753CBC
## 5 E CF7CDD7BBC610443A8D8FCFD896CA9730673B181
## 6 F ED86AEE8A7B65F843200B823CFBD17E9F3CA4EEF
## 7 G 2B9BCB73941228C501F937DA8E6EF033B5DD31F6
## 8 H 40823BBBFDF9B14839B7A95B6E317EBA9B016ED5
Do the manipulation
做这个操作
read_fwf(paste0(df$big_str, collapse="\n"),
fwf_widths(rep(4, n)),
col_types=paste0(rep("c", n), collapse="")) %>%
mutate_all(strtoi, base=16) %>%
bind_cols(df) %>%
select(some_other_col, everything(), -big_str)
## # A tibble: 8 × 11
## some_other_col X1 X2 X3 X4 X5 X6 X7 X8 X9
## <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 A 17185 216 27818 14476 5550 42537 7832 24367 54267
## 2 B 48166 29649 4754 24251 46077 5976 14255 29046 50075
## 3 C 46313 40922 43940 29973 60122 30823 5608 4590 1282
## 4 D 25830 8919 893 13790 27356 16592 14350 7617 11637
## 5 E 53116 56699 48225 1091 43224 64765 35180 43379 1651
## 6 F 60806 44776 42934 24452 12800 47139 53181 6121 62410
## 7 G 11163 52083 37906 10437 505 14298 36462 61491 46557
## 8 H 16514 15291 65017 45384 14775 43355 28209 32442 39681
## # ... with 1 more variables: X10 <int>
#2
1
You can use read.fwf
which read in files with fixed width for each column:
您可以使用阅读。fwf,读取每个列的固定宽度文件:
# an example vector of big strings
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
n = 5 # n is the number of columns for your result(300 for your real case)
as.data.frame(
lapply(read.fwf(file = textConnection(BigString),
widths = rep(4, n),
colClasses = "character"),
strtoi, base = 16))
# V1 V2 V3 V4 V5
#1 67 62 32828 65 74
#2 67 62 32828 65 74
#3 67 62 32828 65 74
If you'd like to keep the decimal.hours
function, you can modify it as follows and call lapply
to convert your bigStrings to list of integers which can be further converted to data.frame with do.call(rbind, ...)
pattern:
如果你想保留小数点的话。函数,您可以如下所示修改它,并调用lapply将您的bigstring转换为可以进一步转换为data.frame的整数列表。调用(rbind…)模式:
decimal.fours <- function(x) {
strtoi(substring(x, seq(1,1197,4), seq(4,1197,4)), 16L)
}
do.call(rbind, lapply(BigString, decimal.fours))
#3
1
just a try using base-R
试试用base-R
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
df = data.frame(BigString)
t(sapply(df$BigString, function(x) strtoi(substring(x, seq(1, 297, 4)[1:5],
seq(4, 300, 4)[1:5]), base = 16)))
# [,1] [,2] [,3] [,4] [,5]
#[1,] 67 62 32828 65 74
#[2,] 67 62 32828 65 74
#[3,] 67 62 32828 65 74
# you can set the columns together at the end using `paste0("new_col", 1:300)`
# [1:5] was just used for this example, because i had strings of length 20cahr