使用python PRAW提取reddit注释并使用结果创建数据帧

时间:2022-09-27 22:56:13

I'm looking to pull all the comments from a reddit post and ultimately get the author name, comment, and upvotes into a dataframe. I'm fairly new to programming so I'm having a tough time..

我希望从reddit帖子中获取所有评论,并最终将作者姓名,评论和upvotes纳入数据框。我对编程很新,所以我很难过......

Right now I'm pulling the stickied comment using PRAW and trying to use a for loop to iterate through the comments and create a list of dictionaries with the author and comment. For some reason it's only adding the first author comment dictinoary pairing to the list and repeating it. Here's what I have:

现在我正在使用PRAW拉出粘滞的评论并尝试使用for循环迭代评论并创建一个带有作者和评论的字典列表。出于某种原因,它只是将第一作者评论dictinoary配对添加到列表并重复它。这就是我所拥有的:

import praw
import pandas as pd
import pprint

reddit = praw.Reddit(xxx)
sub = reddit.subreddit('ethtrader')
hot_python = sub.hot(limit=1)



for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        post = {}
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

Any ideas? Apologies for the ugly code I'm a novice here. Thanks!

有任何想法吗?为丑陋的代码道歉我在这里是新手。谢谢!

1 个解决方案

#1


0  

for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post = {} # put this here
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

You should declare a new post dict inside the for loop, because when you append it to the list, you're actually appending a reference to the post dict, and then you change the same dict with the new data and it changes for all references to that dict. Your list at the end is just a list of references to the same dict.

你应该在for循环中声明一个新的post dict,因为当你将它追加到列表中时,你实际上是在附加一个对dict的引用,然后你用新数据改变了同一个dict并且它对所有引用都有所改变那个词。您最后的列表只是对同一个词典的引用列表。

#1


0  

for submissions in hot_python:
    if submission.stickied:
        print('Title: {}, ups: {}, downs: {}'.format(submissions.title, submissions.ups,submissions.downs))
        postlist = []                                                 
        submission.comments.replace_more(limit=0)
        for comment in submission.comments: 
            post = {} # put this here
            post['Author'] = comment.author
            post['Comment'] = comment.body
            postlist.append(post)

You should declare a new post dict inside the for loop, because when you append it to the list, you're actually appending a reference to the post dict, and then you change the same dict with the new data and it changes for all references to that dict. Your list at the end is just a list of references to the same dict.

你应该在for循环中声明一个新的post dict,因为当你将它追加到列表中时,你实际上是在附加一个对dict的引用,然后你用新数据改变了同一个dict并且它对所有引用都有所改变那个词。您最后的列表只是对同一个词典的引用列表。