When screen-scraping some website, I extract data from <script>
tags.
The data I get is not in standard JSON
format. I cannot use json.loads()
.
屏幕抓取一些网站时,我从
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = {'x':1, 'y':2, 'z':3}
Currently, I use regex
to transform the raw data to JSON
format.
But I feel pretty bad when I encounter complicated data structure.
目前,我使用正则表达式将原始数据转换为JSON格式。但是当我遇到复杂的数据结构时,我感觉非常糟糕。
Do you have some better solutions?
你有更好的解决方案吗?
3 个解决方案
#1
19
demjson.decode()
import demjson
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = demjson.decode(js_obj)
jsonnet.evaluate_snippet()
import json, _jsonnet
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))
ast.literal_eval()
import ast
# from
js_obj = "{'x':1, 'y':2, 'z':3}"
# to
py_obj = ast.literal_eval(js_obj)
#2
1
This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads. Or is this what you're already doing?
这可能无处不在,但作为一个开始,这是一个简单的正则表达式,应该将键转换为带引号的字符串,以便您可以传递到json.loads。或者你正在做什么?
In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'
In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'
In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'
Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'
#3
-4
Simply:
只是:
import json
py_obj = json.loads(js_obj_stringified)
Above is the Python portion of the code. In javascript portion of the code:
上面是代码的Python部分。在代码的javascript部分:
js_obj_stringified = JSON.stringify(data);
JSON.stringify turns a Javascript object into JSON text and stores that JSON text in a string. It is a safe way to pass (via POST/GET) a javascript object to python to process.
JSON.stringify将Javascript对象转换为JSON文本,并将该JSON文本存储在字符串中。这是一种安全的方式来传递(通过POST / GET)一个javascript对象到python进行处理。
#1
19
demjson.decode()
import demjson
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = demjson.decode(js_obj)
jsonnet.evaluate_snippet()
import json, _jsonnet
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))
ast.literal_eval()
import ast
# from
js_obj = "{'x':1, 'y':2, 'z':3}"
# to
py_obj = ast.literal_eval(js_obj)
#2
1
This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads. Or is this what you're already doing?
这可能无处不在,但作为一个开始,这是一个简单的正则表达式,应该将键转换为带引号的字符串,以便您可以传递到json.loads。或者你正在做什么?
In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'
In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'
In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'
Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'
#3
-4
Simply:
只是:
import json
py_obj = json.loads(js_obj_stringified)
Above is the Python portion of the code. In javascript portion of the code:
上面是代码的Python部分。在代码的javascript部分:
js_obj_stringified = JSON.stringify(data);
JSON.stringify turns a Javascript object into JSON text and stores that JSON text in a string. It is a safe way to pass (via POST/GET) a javascript object to python to process.
JSON.stringify将Javascript对象转换为JSON文本,并将该JSON文本存储在字符串中。这是一种安全的方式来传递(通过POST / GET)一个javascript对象到python进行处理。