从分成300个子字符串的长字符串中创建新的列?

时间:2023-01-05 22:57:49

I have a column containing 1200 character strings. In each one, every four character group is hexadecimal for a number. i.e. 300 numbers in hexadecimal crammed into a 1200 character string, in every row. I need to get each number out into decimal, and into its own column (300 new columns) named 1-300. Here's what I've figured out so far:

我有一个包含1200个字符串的列。在每个字符组中,每四个字符组都是一个数字的十六进制。也就是说,在每一行中,在一个1200字串中塞入300个十六进制数字。我需要把每一个数字都写成小数,并放入它自己的列(300个新列)命名为1-300。以下是我到目前为止所发现的:

  Data.frame:
                      BigString
                 [1]  0043003E803C0041004A...(etc...)

Here's what I've done so far:

以下是我到目前为止所做的:

decimal.fours <- function(x) {
    strtoi(substring(BigString[x], seq(1,1197,4), seq(4,1197,4)), 16L)
}
decimal.fours(1)
[1] 283   291   239   177 ...

But now I'm stuck. How can I output these individual number, (and the remaining 296, into new columns? I have fifty total rows/strings. It would be great to do them all at once, i.e. 300 new columns, containing split up substrings from 50 strings.

但现在我卡住了。如何将这些单独的数字(以及其余的296)输出到新的列中?总共有50行/字符串。如果能一次完成所有这些操作,那就太好了,比如300个新列,其中包含50个字符串的分割子字符串。

3 个解决方案

#1


1  

Obligatory tidyverse example:

义务tidyverse例子:

library(tidyverse)

Setup some data

设置一些数据

set.seed(1492)

bet <- c(0:9, LETTERS[1:6]) # alphabet for hex digit sequences
i <- 8                      # number of rows
n <- 10                     # number of 4-hex-digit sequences

df <- data_frame(
   some_other_col=LETTERS[1:i],
   big_str=map_chr(1:i, ~sample(bet, 4*n, replace=TRUE) %>% paste0(collapse=""))
)

df
## # A tibble: 8 × 2
##   some_other_col                                  big_str
##            <chr>                                    <chr>
## 1              A 432100D86CAA388C15AEA6291E985F2FD3FB6104
## 2              B BC2673D112925EBBB3FD175837AF7176C39B4888
## 3              C B4E99FDAABA47515EADA786715E811EE0502ABE8
## 4              D 64E622D7037D35DE6ADC40D0380E1DC12D753CBC
## 5              E CF7CDD7BBC610443A8D8FCFD896CA9730673B181
## 6              F ED86AEE8A7B65F843200B823CFBD17E9F3CA4EEF
## 7              G 2B9BCB73941228C501F937DA8E6EF033B5DD31F6
## 8              H 40823BBBFDF9B14839B7A95B6E317EBA9B016ED5

Do the manipulation

做这个操作

read_fwf(paste0(df$big_str, collapse="\n"),
         fwf_widths(rep(4, n)),
         col_types=paste0(rep("c", n), collapse="")) %>%
  mutate_all(strtoi, base=16) %>%
  bind_cols(df) %>%
  select(some_other_col, everything(), -big_str)
## # A tibble: 8 × 11
##   some_other_col    X1    X2    X3    X4    X5    X6    X7    X8    X9
##            <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1              A 17185   216 27818 14476  5550 42537  7832 24367 54267
## 2              B 48166 29649  4754 24251 46077  5976 14255 29046 50075
## 3              C 46313 40922 43940 29973 60122 30823  5608  4590  1282
## 4              D 25830  8919   893 13790 27356 16592 14350  7617 11637
## 5              E 53116 56699 48225  1091 43224 64765 35180 43379  1651
## 6              F 60806 44776 42934 24452 12800 47139 53181  6121 62410
## 7              G 11163 52083 37906 10437   505 14298 36462 61491 46557
## 8              H 16514 15291 65017 45384 14775 43355 28209 32442 39681
## # ... with 1 more variables: X10 <int>

#2


1  

You can use read.fwf which read in files with fixed width for each column:

您可以使用阅读。fwf,读取每个列的固定宽度文件:

# an example vector of big strings
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")

n = 5                  # n is the number of columns for your result(300 for your real case)
as.data.frame(
      lapply(read.fwf(file = textConnection(BigString), 
                      widths = rep(4, n), 
                      colClasses = "character"), 
             strtoi, base = 16))

#  V1 V2    V3 V4 V5
#1 67 62 32828 65 74
#2 67 62 32828 65 74
#3 67 62 32828 65 74

If you'd like to keep the decimal.hours function, you can modify it as follows and call lapply to convert your bigStrings to list of integers which can be further converted to data.frame with do.call(rbind, ...) pattern:

如果你想保留小数点的话。函数,您可以如下所示修改它,并调用lapply将您的bigstring转换为可以进一步转换为data.frame的整数列表。调用(rbind…)模式:

decimal.fours <- function(x) {
    strtoi(substring(x, seq(1,1197,4), seq(4,1197,4)), 16L)
}

do.call(rbind, lapply(BigString, decimal.fours))

#3


1  

just a try using base-R

试试用base-R

BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
df = data.frame(BigString)


t(sapply(df$BigString, function(x) strtoi(substring(x, seq(1, 297, 4)[1:5],
                                                    seq(4, 300, 4)[1:5]), base = 16)))
#     [,1] [,2]  [,3] [,4] [,5]
#[1,]   67   62 32828   65   74
#[2,]   67   62 32828   65   74
#[3,]   67   62 32828   65   74

# you can set the columns together at the end using `paste0("new_col", 1:300)` 
# [1:5] was just used for this example, because i had strings of length 20cahr

#1


1  

Obligatory tidyverse example:

义务tidyverse例子:

library(tidyverse)

Setup some data

设置一些数据

set.seed(1492)

bet <- c(0:9, LETTERS[1:6]) # alphabet for hex digit sequences
i <- 8                      # number of rows
n <- 10                     # number of 4-hex-digit sequences

df <- data_frame(
   some_other_col=LETTERS[1:i],
   big_str=map_chr(1:i, ~sample(bet, 4*n, replace=TRUE) %>% paste0(collapse=""))
)

df
## # A tibble: 8 × 2
##   some_other_col                                  big_str
##            <chr>                                    <chr>
## 1              A 432100D86CAA388C15AEA6291E985F2FD3FB6104
## 2              B BC2673D112925EBBB3FD175837AF7176C39B4888
## 3              C B4E99FDAABA47515EADA786715E811EE0502ABE8
## 4              D 64E622D7037D35DE6ADC40D0380E1DC12D753CBC
## 5              E CF7CDD7BBC610443A8D8FCFD896CA9730673B181
## 6              F ED86AEE8A7B65F843200B823CFBD17E9F3CA4EEF
## 7              G 2B9BCB73941228C501F937DA8E6EF033B5DD31F6
## 8              H 40823BBBFDF9B14839B7A95B6E317EBA9B016ED5

Do the manipulation

做这个操作

read_fwf(paste0(df$big_str, collapse="\n"),
         fwf_widths(rep(4, n)),
         col_types=paste0(rep("c", n), collapse="")) %>%
  mutate_all(strtoi, base=16) %>%
  bind_cols(df) %>%
  select(some_other_col, everything(), -big_str)
## # A tibble: 8 × 11
##   some_other_col    X1    X2    X3    X4    X5    X6    X7    X8    X9
##            <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1              A 17185   216 27818 14476  5550 42537  7832 24367 54267
## 2              B 48166 29649  4754 24251 46077  5976 14255 29046 50075
## 3              C 46313 40922 43940 29973 60122 30823  5608  4590  1282
## 4              D 25830  8919   893 13790 27356 16592 14350  7617 11637
## 5              E 53116 56699 48225  1091 43224 64765 35180 43379  1651
## 6              F 60806 44776 42934 24452 12800 47139 53181  6121 62410
## 7              G 11163 52083 37906 10437   505 14298 36462 61491 46557
## 8              H 16514 15291 65017 45384 14775 43355 28209 32442 39681
## # ... with 1 more variables: X10 <int>

#2


1  

You can use read.fwf which read in files with fixed width for each column:

您可以使用阅读。fwf,读取每个列的固定宽度文件:

# an example vector of big strings
BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")

n = 5                  # n is the number of columns for your result(300 for your real case)
as.data.frame(
      lapply(read.fwf(file = textConnection(BigString), 
                      widths = rep(4, n), 
                      colClasses = "character"), 
             strtoi, base = 16))

#  V1 V2    V3 V4 V5
#1 67 62 32828 65 74
#2 67 62 32828 65 74
#3 67 62 32828 65 74

If you'd like to keep the decimal.hours function, you can modify it as follows and call lapply to convert your bigStrings to list of integers which can be further converted to data.frame with do.call(rbind, ...) pattern:

如果你想保留小数点的话。函数,您可以如下所示修改它,并调用lapply将您的bigstring转换为可以进一步转换为data.frame的整数列表。调用(rbind…)模式:

decimal.fours <- function(x) {
    strtoi(substring(x, seq(1,1197,4), seq(4,1197,4)), 16L)
}

do.call(rbind, lapply(BigString, decimal.fours))

#3


1  

just a try using base-R

试试用base-R

BigString = c("0043003E803C0041004A", "0043003E803C0041004A", "0043003E803C0041004A")
df = data.frame(BigString)


t(sapply(df$BigString, function(x) strtoi(substring(x, seq(1, 297, 4)[1:5],
                                                    seq(4, 300, 4)[1:5]), base = 16)))
#     [,1] [,2]  [,3] [,4] [,5]
#[1,]   67   62 32828   65   74
#[2,]   67   62 32828   65   74
#[3,]   67   62 32828   65   74

# you can set the columns together at the end using `paste0("new_col", 1:300)` 
# [1:5] was just used for this example, because i had strings of length 20cahr