python实现类jq的json路径过滤

开发过程中访问接口时经常用到jq来过滤json，用着觉得不是很爽，于是自己搞一个舒服的 ^_

先说需求：

输入：参数1:被过滤对象(json、dict、list), 参数2:过滤路径

输出：过滤结果(python对象)，默认格式化输出结果，key按字母顺序排列

支持过滤方式：

dict key过滤 .key
dict key列表 .keys()
dict value列表 .values()
dict key,value对 .iterms()
list过滤 .3 或 .[3]
list负索引 .-2 或 .[-2]
list切片1 .2:6 或 .[2:6]
list切片2 .2: 或 .[2:]
list切片3 .:6 或 .[:6]
list step1 .1:6:2 或 .[1:6:2]
list step2 .1::2 或 .[1::2]
list step3 .::2 或 .[::2]
string过滤..与list相同
string切片..与list相同
string 切片 step..与list相同

废话不多说，直接上核心代码，兼容Py2和Py3

from __future__ import unicode_literals

import json

import six

def ppt(obj, path='.', with_print=True, normal_path_print=False):

    base_string = str if six.PY3 else basestring

    obj = json.loads(obj) if isinstance(obj, base_string) else obj

    find_str, find_map = '', ['["%s"]', '[%s]', '%s', '.%s']

    for im in path.split('.'):

        if not im:

            continue

        if isinstance(obj, (list, tuple, base_string)):

            if im.startswith('[') and im.endswith(']'):

                im = im[1:-1]

            if ':' in im:

                slice_default = [0, len(obj), 1]

                obj, quota = obj[slice(

                    *[int(sli) if sli else slice_default[i] for i, sli in

                      enumerate(im.split(':'))])], 1

            else:

                obj, quota = obj[int(im)], 1

        else:

            if im in obj:

                obj, quota = obj[im], 0

            elif im.endswith('()'):

                obj, quota = list(getattr(obj, im[:-2])()), 3

            else:

                if im.isdigit():

                    obj, quota = obj[int(im)], 1

                else:

                    raise KeyError(im)

        find_str += find_map[quota] % im

    if with_print:

        print(obj if isinstance(obj, base_string) else

              json.dumps(obj,

                         indent=4,

                         sort_keys=True,

                         ensure_ascii=False))

    if normal_path_print:

        print('get it normally with: <obj>%s' % find_str)

    return obj

函数名：ppt，pretty print, 想不起更好的简短的命名了 ?_?

参数说明：

obj 输入的对象
path='.' 过滤字符串
with_print=True 是否格式化打印输出过滤结果
normal_path_print=False 是否输出过滤器反解后的正常查找方式

举例：

> test = '{"a": [1, 3, 4, 9, 10, 0, 5, 3, 7], "c": [{"h": 1, "d": [{"e": ["f", "g"]}]}], "b": "1234567890", "d": null}'

> ppt(test)

{

    "a": [

        1,

        3,

        4,

        9,

        10,

        0,

        5,

        3,

        7

    ],

    "b": "1234567890",

    "c": [

        {

            "d": [

                {

                    "e": [

                        "f",

                        "g"

                    ]

                }

            ],

            "h": 1

        }

    ],

    "d": null

}

上述输出key按字母顺序排序

> ppt(test, '.a.::2', normal_path_print=True)

[

    1,

    4,

    10,

    5,

    7

]

get it normally with: <obj>["a"][::2]

> ppt(test, '.c.0.keys()', normal_path_print=True)

[

    "h",

    "d"

]

get it normally with: <obj>["c"][0].keys()

方便的地方：

如一个复杂的引用数据的方式

['all_angles'][0]['nodes'][-1]['children'][1]['children'][3]['id']

换用更简单的方式，可以更简单快速的定位数据：

'all_angles.0.nodes.-1.children.1.children.3.id'

秒客网

python实现类jq的json路径过滤

相关文章