I have the following data:
我有以下数据:
id,response,date
123,{"showAgain":1421547783703,"answer":null,"details":null,"user_id":2423553}, 2015-01-11 02:23:03
124,{"showAgain":1421683620119,"answer":["Never"],"details":null,"user_id":4933822,"company_id":992211,"category":"apple"}, 2015-01-12 16:06:56
125,{"showAgain":1421692043509,"answer":["Sometimes","other"],"details":"I like bread.","user_id":2390922,"company_id":119988,"category":"banana"},2015-01-12 18:27:23
To be clear, the "response" column values are what you see within the curly brackets.
为清楚起见,“响应”列值是您在大括号中看到的。
I'd need to break that response into new columns, but the string doesn't always have the same number of values. The desired output would be this:
我需要将响应分解为新列,但字符串并不总是具有相同数量的值。期望的输出是这样的:
id,answer,details,user_id,company_id,category,date
123,NA,NA,2423553,NA,NA,2015-01-11 02:23:03
124,Never,NA,4933822,992211,apple,2015-01-12 16:06:56
125,Other,"I like bread",2390922,119988,banana,2015-01-12 18:27:23
The NA can also be blank or NULL, I'm indifferent. On row 3 "answer" could also be a concatenation of the two replies "Sometimes.Other". Or it could be broken out into a new column called answer2. There will never be more than 2 values in the incoming "answer" field (95% of time it will be 1 value).
NA也可以是空白或NULL,我无动于衷。在第3行“回答”也可以是两个回复“有时候。其他”的串联。或者它可以分解为一个名为answer2的新列。传入的“答案”字段中永远不会有超过2个值(95%的时间它将是1个值)。
Any clues on how to approach this would be welcome.
关于如何处理这个问题的任何线索都会受到欢迎。
1 个解决方案
#1
1
Here's a start:
这是一个开始:
library(stringr)
library(dplyr)
library(jsonlite)
library(data.table)
lines <- readLines("data.txt")
build_cols <- function(x) {
data.frame(cbind(id=x[2], date=x[4], rbind(fromJSON(x[3]))))
}
rbindlist(lapply(str_match_all(lines[2:length(lines)],
"([[:digit:]]+),(\\{.*\\}),(.*$)"),
build_cols), fill=TRUE) %>%
select(id,answer,details,user_id,company_id,category,date)
## id answer details user_id company_id category date
## 1: 123 NULL NULL 2423553 NULL NULL 2015-01-11 02:23:03
## 2: 124 Never NULL 4933822 992211 apple 2015-01-12 16:06:56
## 3: 125 Sometimes,other I like bread. 2390922 119988 banana 2015-01-12 18:27:23
#1
1
Here's a start:
这是一个开始:
library(stringr)
library(dplyr)
library(jsonlite)
library(data.table)
lines <- readLines("data.txt")
build_cols <- function(x) {
data.frame(cbind(id=x[2], date=x[4], rbind(fromJSON(x[3]))))
}
rbindlist(lapply(str_match_all(lines[2:length(lines)],
"([[:digit:]]+),(\\{.*\\}),(.*$)"),
build_cols), fill=TRUE) %>%
select(id,answer,details,user_id,company_id,category,date)
## id answer details user_id company_id category date
## 1: 123 NULL NULL 2423553 NULL NULL 2015-01-11 02:23:03
## 2: 124 Never NULL 4933822 992211 apple 2015-01-12 16:06:56
## 3: 125 Sometimes,other I like bread. 2390922 119988 banana 2015-01-12 18:27:23