I've got the following JSON string in a txt file and I'm trying to extract a data frame from the 'visualLogs' variable. I can read the JSON string in and I can access the visualLogs list, but I have failed all day long to convert this into a 9 column data frame of floating point numbers
我在txt文件中有以下JSON字符串,我正在尝试从“visualLogs”变量中提取数据帧。我可以读取JSON字符串,也可以访问visualLogs列表,但我一整天都没能将它转换成一个9列的浮点数数据框
{
"visualScore" : 0,
"selfReportingResults" : 5,
"voiceScore" : "No Data",
"selfReportScore" : 0,
"subject" : "Baseline for patient: 108",
"email" : "steven.vannoy@gmail.com",
"visualLogs" : [
"time,anger,contempt,disgust,engagement,joy,sadness,surprise,valence\r22.61086,0.00633,0.19347,0.56258,0.18005,0.00223,0.0165,0.31969,0.0\r22.81096,0.00478,0.19439,0.45847,0.09747,0.00188,0.02188,0.22043,0.0\r"
],
"askedQuestions" : [
"What is your name?",
"How old are you?",
"What tim is it?"
],
"voiceCompleteResults" : {
"status" : "fail"
}
}
with open(f4lJasonFileName) as data_file:
feelDat = json.load(data_file)
x = feelDat['visualLogs'][0] # Ultimately there will be more than one of these
All of my attempts to convert x to a data frame have failed. I've achieved getting a 1 column data frame of text values, but that's not what I need.
我将x转换为数据帧的所有尝试都失败了。我已经获得了文本值的1列数据帧,但这不是我需要的。
I've replaced those '\r' characters with commas, which ends up getting the one column text data frame, but I want 9 columns with the labels and then rows of floating points.
我已经用逗号替换了那些'\r'字符,结果得到了一个列文本框,但是我想要9列的标签,然后是几行浮点数。
1 个解决方案
#1
1
Once you have loaded the json, you need to split on \r then on the commas:
加载json之后,需要在\r上拆分,然后在逗号上拆分:
import pandas as pd
spl = d["visualLogs"][0].split("\r")
df = pd.DataFrame([v for v in map(lambda x: x.split(","), spl[1:]) if v[0]], columns=spl[0].split(","))
Probably easier to understand broken into parts:
可能更容易理解分成部分:
import pandas as pd
# split into lines creating an iterator so we don't have to slice.
spl = iter(d["visualLogs"][0].rstrip().split("\r"))
# split first line to get the column names.
columns = next(spl).split(",")
# split remaining lines into individual rows, removing empty row.
rows = [v for v in (sub_str.split(",") for sub_str in spl) if len(v) > 1]
df = pd.DataFrame(data=rows, columns=columns)
We could also just spl = iter(d["visualLogs"][0].split())
as there is no other whitespace.
我们也可以只使用spl = iter(d["visualLogs"][0].split()),因为没有其他空格。
Or use read_csv using a StringIO object:
或者使用StringIO对象使用read_csv:
import pandas as pd
spl = d["visualLogs"][0]
from io import StringIO
df = pd.read_csv(StringIO(spl))
Which gives you:
它给你:
time anger contempt disgust engagement joy sadness \
0 22.61086 0.00633 0.19347 0.56258 0.18005 0.00223 0.01650
1 22.81096 0.00478 0.19439 0.45847 0.09747 0.00188 0.02188
surprise valence
0 0.31969 0
1 0.22043 0
#1
1
Once you have loaded the json, you need to split on \r then on the commas:
加载json之后,需要在\r上拆分,然后在逗号上拆分:
import pandas as pd
spl = d["visualLogs"][0].split("\r")
df = pd.DataFrame([v for v in map(lambda x: x.split(","), spl[1:]) if v[0]], columns=spl[0].split(","))
Probably easier to understand broken into parts:
可能更容易理解分成部分:
import pandas as pd
# split into lines creating an iterator so we don't have to slice.
spl = iter(d["visualLogs"][0].rstrip().split("\r"))
# split first line to get the column names.
columns = next(spl).split(",")
# split remaining lines into individual rows, removing empty row.
rows = [v for v in (sub_str.split(",") for sub_str in spl) if len(v) > 1]
df = pd.DataFrame(data=rows, columns=columns)
We could also just spl = iter(d["visualLogs"][0].split())
as there is no other whitespace.
我们也可以只使用spl = iter(d["visualLogs"][0].split()),因为没有其他空格。
Or use read_csv using a StringIO object:
或者使用StringIO对象使用read_csv:
import pandas as pd
spl = d["visualLogs"][0]
from io import StringIO
df = pd.read_csv(StringIO(spl))
Which gives you:
它给你:
time anger contempt disgust engagement joy sadness \
0 22.61086 0.00633 0.19347 0.56258 0.18005 0.00223 0.01650
1 22.81096 0.00478 0.19439 0.45847 0.09747 0.00188 0.02188
surprise valence
0 0.31969 0
1 0.22043 0