I have two lists of text which I would like to extract certain information.
我有两个文本列表,我想提取某些信息。
The first line (first few terms) looks like
第一行(前几个术语)看起来像
line = "{"af":"16.63","al":"11.58",..."
I would like to extract only the letters between "" into a list if possible. e.g. ["af","al"...]
.
如果可能的话,我想只将“”之间的字母提取到列表中。例如[ “AF”, “人” ...]。
The second line is very long and contains a sequence which looks like
第二行很长,包含一个看起来像的序列
line = "...,"name":"Papua New Guinea"},..."
I just want the string after "name":"<country>"
to be in another list if possible. e.g. [...,"Papua New Guinea",...]
. The same pattern appears again and and again "name":"<country>"}
, I would just like the countries.
我只想在“name”之后输入字符串:“
These both could be piped to two lists in different files using SED perhaps. I just need to get rid of all of the surrounding "fluff".
这些都可以使用SED通过管道传输到不同文件中的两个列表。我只需要摆脱所有周围的“绒毛”。
I've tried a combination of regex but it doesn't work. I can't get the syntax correct. Thanks in advance.
我尝试过正则表达式的组合,但它不起作用。我无法正确理解语法。提前致谢。
1 个解决方案
#1
1
You are looking at JSON data; use the json
module to parse this into Python structures. The rest of your tasks are then easy:
您正在查看JSON数据;使用json模块将其解析为Python结构。其余的任务很简单:
first_structure = json.loads(line)
print first_structure.keys()
second_structure = json.loads(countries_text)
print [d['name'] for d in second_structure]
#1
1
You are looking at JSON data; use the json
module to parse this into Python structures. The rest of your tasks are then easy:
您正在查看JSON数据;使用json模块将其解析为Python结构。其余的任务很简单:
first_structure = json.loads(line)
print first_structure.keys()
second_structure = json.loads(countries_text)
print [d['name'] for d in second_structure]