如何从Web获取内容并从字节转换为str / json

I have a very large txt file to read and process it, but as I'm new at Python, I don't know what is the format of the file and how can I can read it. Below there is a sample:

我有一个非常大的txt文件来读取和处理它，但由于我是Python的新手，我不知道文件的格式是什么，我怎么能读取它。下面是一个示例：

[
    {"content": "111111", "n": "ITEM 1", "a": "ANOTHER", "t": 1},
    {"content": "222222", "n": "ITEM 2", "a": "ANOTHER", "t": 1},
    {"content": "333333", "n": "ITEM 3", "a": "ANOTHER", "t": 1}
]

So, I need to take a loop each item inside the list '[]' (what I think I did), and then, each item like "content", "n", "a", "t".

所以，我需要在列表'[]'（我认为我做了）中的每个项目中循环，然后，每个项目如“content”，“n”，“a”，“t”。

I tried to read the file and take a loop like this:

我试着读取文件并采取这样的循环：

for item in thecontent:
    data = json.load(item)

pprint(data)

I think I got each 'item' on the loop above as a string, not as json.

我想我把上面的循环中的每个'item'都作为字符串，而不是json。

Edit 2 I think that I need to use the ujson data type, as the sample I got at the documentation is the same here, above. If you want to know better, go to the documentation page

编辑2我认为我需要使用ujson数据类型，因为我在文档中获得的示例在上面是相同的。如果您想更好地了解，请转到文档页面

>>> import ujson
>>> ujson.dumps([{"key": "value"}, 81, True])
'[{"key":"value"},81,true]'
>>> ujson.loads("""[{"key": "value"}, 81, true]""")
[{u'key': u'value'}, 81, True]

Thanks everyone!

感谢大家！

Edit 3: I kept looking for any answer about the problem I had, and just found that the problem wasn't about 'how to read' a list or tuples, because I did this by the file.

编辑3：我一直在寻找关于我遇到的问题的任何答案，并且发现问题不是关于'如何阅读'列表或元组，因为我是通过文件做到的。

The main problem was about how to convert bytes to string when get the content from web, and I solve it in this topic, more specifically at this reply.

主要问题是如何在从web获取内容时将字节转换为字符串，我在本主题中解决了这个问题，更具体地说是在此回复中。

The code I wrote to get the webcontent and convert it to json is that:

我写的获取webcontent并将其转换为json的代码是：

def get_json_by_url(url):
    r = requests.get(url)
    r.raise_for_status()
    return json.loads(r.content.decode('utf-8'))

So, as maybe this is a solution for anyone who is looking for this, I've changed the title from 'How to read a list of tuples (or json) in python' to 'How to get content from web and convert from bytes to str/json' wich was the problem I got.

所以，对于正在寻找这个的人来说，这可能是一个解决方案，我已经将标题从'如何读取python中的元组（或json）列表'更改为'如何从Web获取内容并从字节转换str / json'是我遇到的问题。

I'm sorry about not to explain very well the problem, so as I'm new at Python, sometimes it takes a lot of time to diagnose what is the problem itself.

我很抱歉不能很好地解释这个问题，所以当我是Python新手时，有时需要花很多时间来诊断问题本身。

Thanks all!

谢谢大家！

1 个解决方案

#1

These two solutions both worked for me, and assume that the file is in the format in your example above. It depends on what you want to do with this data after you've loaded it from the file, though (you didn't specify this).

这两个解决方案都适用于我，并假设该文件采用上面示例中的格式。这取决于您在从文件加载数据后要对此数据执行的操作（但您没有指定此数据）。

Firstly, the simple/fast version which ends up with all data in one list (a list of dictionaries):

首先，简单/快速版本以一个列表中的所有数据结尾（字典列表）：

import json

with open("myFile.txt", "r") as f:
    data = json.load(f)  #  load the entire file content into one list of many dictionaries

#  process data here as desired possibly in a loop if you like
print data

Output:

输出：

[{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}, {u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}, {u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}]

For a very large file, or if you don't want all the data in a single list:

对于非常大的文件，或者如果您不希望单个列表中的所有数据：

import json

with open("myFile.txt", "r") as f:
    for line in f:                       #  for each line in the file
        line = line.strip(", ][\n")      #  strip off any leading and trailing commas, spaces, square brackets and newlines
        if len(line):                    #  if there is anything left in the line it should look like "{ key: value... }"
            try:
                data = json.loads(line)  #  load the line into a single dictionary
                #  process a single item (dictionary) of data here in whatever way you like
                print data
            except:
                print "invalid json:  " + line

Output:

输出：

{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}
{u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}
{u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}

The first option should be fine for most cases, even for reasonably large files.

对于大多数情况，第一个选项应该没问题，即使对于相当大的文件也是如此。

#1

这两个解决方案都适用于我，并假设该文件采用上面示例中的格式。这取决于您在从文件加载数据后要对此数据执行的操作（但您没有指定此数据）。

Firstly, the simple/fast version which ends up with all data in one list (a list of dictionaries):

首先，简单/快速版本以一个列表中的所有数据结尾（字典列表）：

import json

with open("myFile.txt", "r") as f:
    data = json.load(f)  #  load the entire file content into one list of many dictionaries

#  process data here as desired possibly in a loop if you like
print data

Output:

输出：

[{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}, {u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}, {u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}]

For a very large file, or if you don't want all the data in a single list:

对于非常大的文件，或者如果您不希望单个列表中的所有数据：

import json

with open("myFile.txt", "r") as f:
    for line in f:                       #  for each line in the file
        line = line.strip(", ][\n")      #  strip off any leading and trailing commas, spaces, square brackets and newlines
        if len(line):                    #  if there is anything left in the line it should look like "{ key: value... }"
            try:
                data = json.loads(line)  #  load the line into a single dictionary
                #  process a single item (dictionary) of data here in whatever way you like
                print data
            except:
                print "invalid json:  " + line

Output:

输出：

{u'content': u'111111', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 1'}
{u'content': u'222222', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 2'}
{u'content': u'333333', u'a': u'ANOTHER', u't': 1, u'n': u'ITEM 3'}

The first option should be fine for most cases, even for reasonably large files.

对于大多数情况，第一个选项应该没问题，即使对于相当大的文件也是如此。

秒客网

如何从Web获取内容并从字节转换为str / json

1 个解决方案

#1

#1

相关文章