如何从R中的文本创建表?

时间:2022-02-12 01:12:24

In R, what would be the best way to separate the following data into a table with 2 columns?

在R中,将以下数据分成具有2列的表的最佳方法是什么?

March 09, 2018
0.084752
March 10, 2018
0.084622
March 11, 2018
0.084622
March 12, 2018
0.084437
March 13, 2018
0.084785
March 14, 2018
0.084901

2018年3月9日0.084752 2018年3月10日0.084622 2018年3月11日0.084622 2018年3月12日0.084437 2011年3月13日0.084785 2015年3月14日0.084901

I considered using a for loop but was advised against it. I do not know how to parse things very well, so if the best method involves this process please be as clear as possible.

我考虑过使用for循环,但建议不要使用它。我不知道如何解析事情,所以如果最好的方法涉及这个过程,请尽可能清楚。

The final table should look something like this:

决赛桌应该是这样的:

https://i.stack.imgur.com/u5hII.png

Thank you!

3 个解决方案

#1


1  

Input:

input <- c("March 09, 2018",
"0.084752",
"March 10, 2018",
"0.084622",
"March 11, 2018",
"0.084622",
"March 12, 2018",
"0.084437",
"March 13, 2018",
"0.084785",
"March 14, 2018",
"0.084901")

Method:

library(dplyr)
library(lubridate)
df <- matrix(input, ncol = 2, byrow = TRUE) %>% 
  as_tibble() %>% 
  mutate(V1 = mdy(V1), V2 = as.numeric(V2))

Output:

df
# A tibble: 6 x 2
  V1             V2
  <date>      <dbl>
1 2018-03-09 0.0848
2 2018-03-10 0.0846
3 2018-03-11 0.0846
4 2018-03-12 0.0844
5 2018-03-13 0.0848
6 2018-03-14 0.0849

Use names() or rename() to rename each columns.

使用names()或rename()重命名每列。

names(df) <- c("Date", "Value")

#2


1  

data.table::fread can read "...a string (containing at least one \n)...." 'f' in fread stands for 'fast' so the code below should work on fairly large chunks as well.

data.table :: fread可以读取“...一个字符串(至少包含一个\ n)......”fread中的'f'代表'fast',所以下面的代码也应该适用于相当大的块。

require(data.table)

x = 'March 09, 2018
0.084752
March 10, 2018
0.084622
March 11, 2018
0.084622
March 12, 2018
0.084437
March 13, 2018
0.084785
March 14, 2018
0.084901'

o = fread(x, sep = '\n', header = FALSE)
o[, V1L := shift(V1, type = "lead")]
o[, keep := (1:.N)%% 2 != 0 ]

z = o[(keep)]
z[, keep := NULL]
z

#3


0  

result = data.frame(matrix(input, ncol = 2, byrow = T), stringsAsFactors = FALSE)
result
#               X1       X2
# 1 March 09, 2018 0.084752
# 2 March 10, 2018 0.084622
# 3 March 11, 2018 0.084622
# 4 March 12, 2018 0.084437
# 5 March 13, 2018 0.084785
# 6 March 14, 2018 0.084901

You should next adjust the names and classes, something like this:

接下来你应该调整名称和类,如下所示:

names(result) = c("date", "value")
result$value = as.numeric(result$value)
# etc.

Using Nik's nice input:

使用Nik的好输入:

input = c(
    "March 09, 2018",
    "0.084752",
    "March 10, 2018",
    "0.084622",
    "March 11, 2018",
    "0.084622",
    "March 12, 2018",
    "0.084437",
    "March 13, 2018",
    "0.084785",
    "March 14, 2018",
    "0.084901"
)

#1


1  

Input:

input <- c("March 09, 2018",
"0.084752",
"March 10, 2018",
"0.084622",
"March 11, 2018",
"0.084622",
"March 12, 2018",
"0.084437",
"March 13, 2018",
"0.084785",
"March 14, 2018",
"0.084901")

Method:

library(dplyr)
library(lubridate)
df <- matrix(input, ncol = 2, byrow = TRUE) %>% 
  as_tibble() %>% 
  mutate(V1 = mdy(V1), V2 = as.numeric(V2))

Output:

df
# A tibble: 6 x 2
  V1             V2
  <date>      <dbl>
1 2018-03-09 0.0848
2 2018-03-10 0.0846
3 2018-03-11 0.0846
4 2018-03-12 0.0844
5 2018-03-13 0.0848
6 2018-03-14 0.0849

Use names() or rename() to rename each columns.

使用names()或rename()重命名每列。

names(df) <- c("Date", "Value")

#2


1  

data.table::fread can read "...a string (containing at least one \n)...." 'f' in fread stands for 'fast' so the code below should work on fairly large chunks as well.

data.table :: fread可以读取“...一个字符串(至少包含一个\ n)......”fread中的'f'代表'fast',所以下面的代码也应该适用于相当大的块。

require(data.table)

x = 'March 09, 2018
0.084752
March 10, 2018
0.084622
March 11, 2018
0.084622
March 12, 2018
0.084437
March 13, 2018
0.084785
March 14, 2018
0.084901'

o = fread(x, sep = '\n', header = FALSE)
o[, V1L := shift(V1, type = "lead")]
o[, keep := (1:.N)%% 2 != 0 ]

z = o[(keep)]
z[, keep := NULL]
z

#3


0  

result = data.frame(matrix(input, ncol = 2, byrow = T), stringsAsFactors = FALSE)
result
#               X1       X2
# 1 March 09, 2018 0.084752
# 2 March 10, 2018 0.084622
# 3 March 11, 2018 0.084622
# 4 March 12, 2018 0.084437
# 5 March 13, 2018 0.084785
# 6 March 14, 2018 0.084901

You should next adjust the names and classes, something like this:

接下来你应该调整名称和类,如下所示:

names(result) = c("date", "value")
result$value = as.numeric(result$value)
# etc.

Using Nik's nice input:

使用Nik的好输入:

input = c(
    "March 09, 2018",
    "0.084752",
    "March 10, 2018",
    "0.084622",
    "March 11, 2018",
    "0.084622",
    "March 12, 2018",
    "0.084437",
    "March 13, 2018",
    "0.084785",
    "March 14, 2018",
    "0.084901"
)