Python re.sub():期望的字符串或类字节对象

时间:2021-09-13 15:30:08

I'm gathering tweets and filtering each tweet by replacing any words preceded https://, #, @ with an empty string. I have a filter function that does just this:

我正在收集推文,并通过用空字符串替换https://,#,@之前的任何单词来过滤每条推文。我有一个过滤功能就是这样:

def filter(line):
        pattern  = r"(https?://|[@#])\S*"
        s = re.sub(pattern, '', line)
        return s

When I try to run this, Python returns this error:

当我尝试运行它时,Python返回此错误:

 File "C:\Users\John\Desktop\Sia_prefExer02.py", line 36, in <module>
    filteredLine = filter(line)
  File "C:\Users\John\Desktop\Sia_prefExer02.py", line 25, in filter
    s = re.sub(pattern, '', line)
  File "C:\Users\John\AppData\Local\Programs\Python\Python36-32\lib\re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

Line 36 is: filteredLine = filter(line) Line 25 is: s = re.sub(pattern, '', line)

第36行是:filteredLine = filter(line)第25行是:s = re.sub(pattern,'',line)

What seems to be the problem here? Is this related rather to the API? I'm using Tweepy.

这里似乎有什么问题?这与API有关吗?我正在使用Tweepy。

1 个解决方案

#1


0  

I would assume that you are running your filter(line) method on each of the tweets.

我假设你在每条推文上运行你的过滤器(line)方法。

However based on the Tweepy docs "Hello Tweepy" example, you should be running it on tweet.text:

但是根据Tweepy docs“Hello Tweepy”示例,您应该在tweet.text上运行它:

import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print tweet.text

If you look in the API Reference you can see that most methods return Status objects. (Which is admittedly hard to read code!)

如果您查看API参考,您会发现大多数方法都返回Status对象。 (这无疑是难以阅读的代码!)

You could wrap you function to work like so and replace line 36 to filteredLine = filter_tweet(line)

您可以将函数包装起来,并将第36行替换为filteredLine = filter_tweet(line)

def filter_tweet(status):
    return filter(status.text)

Side Note: If this is the case, I would rename your line variable to tweet_status or something so that you can better guess that it's a Status object, not a string.

旁注:如果是这种情况,我会将你的行变量重命名为tweet_status或其他东西,以便你可以更好地猜测它是一个Status对象,而不是一个字符串。


If this doesn't work please paste your code, so that we can correctly diagnose the issue. As I'm really going out on a whim here.

如果这不起作用,请粘贴您的代码,以便我们能够正确诊断问题。因为我真的在这里突发奇想。

#1


0  

I would assume that you are running your filter(line) method on each of the tweets.

我假设你在每条推文上运行你的过滤器(line)方法。

However based on the Tweepy docs "Hello Tweepy" example, you should be running it on tweet.text:

但是根据Tweepy docs“Hello Tweepy”示例,您应该在tweet.text上运行它:

import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print tweet.text

If you look in the API Reference you can see that most methods return Status objects. (Which is admittedly hard to read code!)

如果您查看API参考,您会发现大多数方法都返回Status对象。 (这无疑是难以阅读的代码!)

You could wrap you function to work like so and replace line 36 to filteredLine = filter_tweet(line)

您可以将函数包装起来,并将第36行替换为filteredLine = filter_tweet(line)

def filter_tweet(status):
    return filter(status.text)

Side Note: If this is the case, I would rename your line variable to tweet_status or something so that you can better guess that it's a Status object, not a string.

旁注:如果是这种情况,我会将你的行变量重命名为tweet_status或其他东西,以便你可以更好地猜测它是一个Status对象,而不是一个字符串。


If this doesn't work please paste your code, so that we can correctly diagnose the issue. As I'm really going out on a whim here.

如果这不起作用,请粘贴您的代码,以便我们能够正确诊断问题。因为我真的在这里突发奇想。