现在有如下格式的json串:
1
|
“detail_time”:” 2016 - 03 - 30 16 : 00 : 00 ”,”device_id”:” 123456 ”,”os”:”Html5Wap”,”session_flow_id”:” 1d1819f3 - 8e19 - 4597 - b50d - ba379adcd8e5”,”user_longitude”: 0.0000 ,”user_latitude”: 0.0000 ,”search_id”:xxx,”search_guid”: - 543326548 ,”search_type”: 7 ,”AAA”: 4 ,”BBB”: - 1 ,”CCC”:[],”DDD”: 3 ,”EEE”: 2 ,”FFF”: 1459267200 ,”GGG”: 1459353600 ,”aaa”: 90954603 ,”bbb”:[{“xxx”: 1500848 ,”x”: 1 ,”bf”: 0 ,”pp”: 2 ,”sroom”: 2 ,”ppp”: 108 ,”cost”: 97.2 ,”coupon”: 108 ,”drr”: 108 },{“xxx”: 1500851 ,”x”: 1 ,”bf”: 0 ,”pp”: 1 ,”sroom”: 2 ,”ppp”: 108 ,”cost”: 97.2 ,”coupon”: 108 ,”drr”: 108 },{“xxx”: 2336691 ,”x”: 1 ,”bf”: 1 ,”pp”: 1 ,”sroom”: 3 ,”ppp”: 199 ,”cost”: 169.15 ,”coupon”: 191 ,”drr”: 199 },{“xxx”: 2336692 ,”x”: 1 ,”bf”: 1 ,”pp”: 2 ,”sroom”: 4 ,”ppp”: 102 ,”cost”: 91.8 ,”coupon”: 102 ,”drr”: 102 },{“xxx”: 1500848 ,”x”: 1 ,”bf”: 0 ,”pp”: 2 ,”sroom”: 3 ,”ppp”: 118 ,”cost”: 106.2 ,”coupon”: 118 ,”drr”: 118 },{“xxx”: 1500851 ,”x”: 1 ,”bf”: 0 ,”pp”: 1 ,”sroom”: 3 ,”ppp”: 118 ,”cost”: 106.2 ,”coupon”: 118 ,”drr”: 118 },{“xxx”: 2336693 ,”x”: 1 ,”bf”: 1 ,”pp”: 1 ,”sroom”: 5 ,”ppp”: 199 ,”cost”: 169.15 ,”coupon”: 191 ,”drr”: 199 },{“xxx”: 2336694 ,”x”: 1 ,”bf”: 1 ,”pp”: 2 ,”sroom”: 6 ,”ppp”: 112 ,”cost”: 100.3 ,”coupon”: 112 ,”drr”: 112 },{“xxx”: 1500848 ,”x”: 1 ,”bf”: 0 ,”pp”: 2 ,”sroom”: 1 ,”ppp”: 98 ,”cost”: 88.2 ,”coupon”: 98 ,”drr”: 98 },{“xxx”: 1500851 ,”x”: 1 ,”bf”: 0 ,”pp”: 1 ,”sroom”: 1 ,”ppp”: 98 ,”cost”: 88.2 ,”coupon”: 98 ,”drr”: 98 },{“xxx”: 2336687 ,”x”: 1 ,”bf”: 1 ,”pp”: 1 ,”sroom”: 1 ,”ppp”: 189 ,”cost”: 160.65 ,”coupon”: 182 ,”drr”: 189 },{“xxx”: 2336689 ,”x”: 1 ,”bf”: 1 ,”pp”: 2 ,”sroom”: 2 ,”ppp”: 93 ,”cost”: 83.3 ,”coupon”: 93 ,”drr”: 93 },{“xxx”: 1500848 ,”x”: 1 ,”bf”: 0 ,”pp”: 2 ,”sroom”: 4 ,”ppp”: 128 ,”cost”: 115.2 ,”coupon”: 128 ,”drr”: 128 },{“xxx”: 1500851 ,”x”: 1 ,”bf”: 0 ,”pp”: 1 ,”sroom”: 4 ,”ppp”: 128 ,”cost”: 115.2 ,”coupon”: 128 ,”drr”: 128 },{“xxx”: 2336695 ,”x”: 1 ,”bf”: 1 ,”pp”: 1 ,”sroom”: 7 ,”ppp”: 239 ,”cost”: 203.15 ,”coupon”: 230 ,”drr”: 239 },{“xxx”: 2336696 ,”x”: 1 ,”bf”: 1 ,”pp”: 2 ,”sroom”: 8 ,”ppp”: 121 ,”cost”: 108.8 ,”coupon”: 121 ,”drr”: 121 }],”ppp_min”: 93.00 ,”ppp_max”: 239.00 ,”ppp_avg”: 134.88 ,”ppp_med”: 118.00 ,”ppp_min_cost”: 83.30 ,”ppp_min_promotion_type”: - 1 ,”ppp_min_promotion_amount”: - 1 ,”bf_ppp_min”: 149.00 ,”bf_ppp_min_cost”: 83.30 ,”bf_ppp_min_promotion_type”: - 1 ,”bf_ppp_min_promotion_amount”: - 1 }
|
现在想拿到device_id的具体值。最简单的方式就是用解析json串的方式得到,代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
#!/usr/bin/env python
#coding:utf-8
import json
import sys
import collections
import time
def t1():
start = time.clock()
for line in sys.stdin:
try :
line = line.strip()
decoded = json.loads(line)
device_id = decoded[ "device_id" ]
print device_id
except Exception,ex:
pass
end = time.clock()
print "The cost time is: %f" % (end - start)
t1()
|
以上代码能顺利完成任务。
不幸的是,现在是大数据时代,数据量嘛,自然都很大。用了一万条数据做测试,耗时达到了惊人的。。。将近10s。
转换下思路,采用正则匹配的方式
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
|
#!/usr/bin/env python
import re
import sys
import time
def t1():
start = time.clock()
count = 0
for line in sys.stdin:
line = line.strip()
pattern = re. compile ( "(?:\"device_id\":\")([^\"]+)" )
search = pattern.search(line)
if search:
count + = 1
#print search.groups()[0]
end = time.clock()
print "The count is: %d" % (count)
print "The cost time is: %f" % (end - start)
t1()
|
注意匹配的时候
1
|
re. compile ( "(?:\"device_id\":\")([^\"]+)" )
|
第一个分组表示不捕获,只捕获后面的分组。
同样一万条数据,运行耗时是。。。0.05s。效率提高了多少倍,表示算不过来了。
以上这篇python解析json串与正则匹配对比方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/bitcarmanlee/article/details/51026548