如何将原始javascript对象转换为python字典?

时间:2021-07-28 20:19:31

When screen-scraping some website, I extract data from <script> tags.
The data I get is not in standard JSON format. I cannot use json.loads().

屏幕抓取一些网站时,我从

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = {'x':1, 'y':2, 'z':3}

Currently, I use regex to transform the raw data to JSON format.
But I feel pretty bad when I encounter complicated data structure.

目前,我使用正则表达式将原始数据转换为JSON格式。但是当我遇到复杂的数据结构时,我感觉非常糟糕。

Do you have some better solutions?

你有更好的解决方案吗?

3 个解决方案

#1


19  

demjson.decode()

import demjson

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = demjson.decode(js_obj)

jsonnet.evaluate_snippet()

import json, _jsonnet

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))

ast.literal_eval()

import ast

# from
js_obj = "{'x':1, 'y':2, 'z':3}"

# to
py_obj = ast.literal_eval(js_obj)

#2


1  

This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads. Or is this what you're already doing?

这可能无处不在,但作为一个开始,这是一个简单的正则表达式,应该将键转换为带引号的字符串,以便您可以传递到json.loads。或者你正在做什么?

In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'

In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'

In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'

Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'

#3


-4  

Simply:

只是:

import json
py_obj = json.loads(js_obj_stringified)

Above is the Python portion of the code. In javascript portion of the code:

上面是代码的Python部分。在代码的javascript部分:

js_obj_stringified = JSON.stringify(data);

JSON.stringify turns a Javascript object into JSON text and stores that JSON text in a string. It is a safe way to pass (via POST/GET) a javascript object to python to process.

JSON.stringify将Javascript对象转换为JSON文本,并将该JSON文本存储在字符串中。这是一种安全的方式来传递(通过POST / GET)一个javascript对象到python进行处理。

#1


19  

demjson.decode()

import demjson

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = demjson.decode(js_obj)

jsonnet.evaluate_snippet()

import json, _jsonnet

# from
js_obj = '{x:1, y:2, z:3}'

# to
py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))

ast.literal_eval()

import ast

# from
js_obj = "{'x':1, 'y':2, 'z':3}"

# to
py_obj = ast.literal_eval(js_obj)

#2


1  

This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads. Or is this what you're already doing?

这可能无处不在,但作为一个开始,这是一个简单的正则表达式,应该将键转换为带引号的字符串,以便您可以传递到json.loads。或者你正在做什么?

In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'

In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'

In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'

Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'

#3


-4  

Simply:

只是:

import json
py_obj = json.loads(js_obj_stringified)

Above is the Python portion of the code. In javascript portion of the code:

上面是代码的Python部分。在代码的javascript部分:

js_obj_stringified = JSON.stringify(data);

JSON.stringify turns a Javascript object into JSON text and stores that JSON text in a string. It is a safe way to pass (via POST/GET) a javascript object to python to process.

JSON.stringify将Javascript对象转换为JSON文本,并将该JSON文本存储在字符串中。这是一种安全的方式来传递(通过POST / GET)一个javascript对象到python进行处理。