从Twitter获取我想要的数据的最佳方法是什么?

时间:2023-01-14 11:17:44

I'm currently saving some Twitter data in MySQL. My host only lets me run cron jobs every hour, so to semi-simulate realtime results, I've copied the same file 6 times, and run one every 10 minutes (the host DOES let you control the hourly offset). This is stupid, I think.

我目前正在MySQL中保存一些Twitter数据。我的主机只允许我每小时运行一次cron作业,所以为了半模拟实时结果,我复制了相同的文件6次,并且每10分钟运行一次(主机让你控制每小时的偏移量)。我认为这是愚蠢的。

Is there some mechanism I can learn about that would push the data my way? Any thoughts or suggestions welcome.

是否有一些我可以了解的机制会以我的方式推送数据?欢迎任何想法或建议。

(I've steered myself away from just querying their server with each page view; I know enough to know that's poor practice)

(我已经避免了在每个页面视图中查询服务器;我知道这很糟糕的做法)

5 个解决方案

#1


How about accessing a web page (which will in turn execute the program) hosted at the server by adding to cron at client side (home system):

如何通过在客户端(家庭系统)添加到cron来访问服务器上托管的网页(进而执行程序):

/usr/bin/curl http://yourserver.com/twitter

Otherwise, you can run the following bash script every hour:

否则,您可以每小时运行以下bash脚本:

#!/bin/bash

for (( i = 0; i < 6; i += 1 )); do
    /usr/bin/curl 'http://yourserver.com/twitter'
    sleep 600
done

#2


You can sanely pull twitter data triggered from your requests. It's a little esoteric, but essentially you store locking data in a table to ensure only one request polls the data from twitter every N minutes (or whenever you need it). Example:

您可以理智地提取您的请求触发的Twitter数据。这有点深奥,但基本上你将锁定数据存储在一个表中,以确保每N分钟(或任何你需要的时候)只有一个请求从twitter轮询数据。例:

  1. Request checks to see if new twitter data needs to be retrieved
  2. 请求检查以查看是否需要检索新的Twitter数据

  3. Check lock table to see if another request is already talking to twitter
  4. 检查锁定表以查看是否有其他请求已在与twitter通信

  5. Add record to lock table. Make sure to specify data in a column that is set to unique via a database constraint. This will keep you from making two locks.
  6. 将记录添加到锁定表。确保在通过数据库约束设置为unique的列中指定数据。这样可以防止你进行两次锁定。

  7. Talk to twitter, save twitter data.
  8. 与推特交谈,保存推特数据。

  9. Remove lock record
  10. 删除锁定记录

For speed, ensure your lock table is in memory or use memcached instead. Of course, if you can use memcached you probably have full control over cron anyway. :)

要获得速度,请确保您的锁定表位于内存中或使用memcached。当然,如果你可以使用memcached,你可能无论如何都可以完全控制cron。 :)

#3


A relatively simple solution is to run a cron job on another computer. It would do the requests to Twitter then execute a HTTP POST to a designated page on the server (e.g. http://foo.com/latestTwitterData). Of course, you would want to have authentication to prevent random crap getting sent to you.

一个相对简单的解决方案是在另一台计算机上运行cron作业。它会向Twitter发出请求,然后对服务器上的指定页面执行HTTP POST(例如http://foo.com/latestTwitterData)。当然,您可能希望进行身份验证以防止随机垃圾邮件发送给您。

I don't know if this is reasonable for your situation.

我不知道这对你的情况是否合理。

#4


It's pretty easy to run code every second or so.

每隔一秒左右运行代码非常容易。

// pseudocode
while(1) {
    // do request

    // sleep 1 second
    sleep(1);
}

#5


Why not just put a while loop into your program and then sleep N seconds between however long you need the updates? You can then die after 59 minutes 30 seconds.

为什么不在你的程序中放一个while循环,然后在你需要更新之间的N秒之间休息?然后你可以在59分30秒后死亡。

Alternatively, to optimize the copying of multiple files, you can add multiple calls to your program within the single cron line. Something like:

或者,要优化多个文件的复制,您可以在单个cron行中向程序添加多个调用。就像是:

./prog.pl; sleep 60; ./prog.pl

./prog.pl;睡60; ./prog.pl

#1


How about accessing a web page (which will in turn execute the program) hosted at the server by adding to cron at client side (home system):

如何通过在客户端(家庭系统)添加到cron来访问服务器上托管的网页(进而执行程序):

/usr/bin/curl http://yourserver.com/twitter

Otherwise, you can run the following bash script every hour:

否则,您可以每小时运行以下bash脚本:

#!/bin/bash

for (( i = 0; i < 6; i += 1 )); do
    /usr/bin/curl 'http://yourserver.com/twitter'
    sleep 600
done

#2


You can sanely pull twitter data triggered from your requests. It's a little esoteric, but essentially you store locking data in a table to ensure only one request polls the data from twitter every N minutes (or whenever you need it). Example:

您可以理智地提取您的请求触发的Twitter数据。这有点深奥,但基本上你将锁定数据存储在一个表中,以确保每N分钟(或任何你需要的时候)只有一个请求从twitter轮询数据。例:

  1. Request checks to see if new twitter data needs to be retrieved
  2. 请求检查以查看是否需要检索新的Twitter数据

  3. Check lock table to see if another request is already talking to twitter
  4. 检查锁定表以查看是否有其他请求已在与twitter通信

  5. Add record to lock table. Make sure to specify data in a column that is set to unique via a database constraint. This will keep you from making two locks.
  6. 将记录添加到锁定表。确保在通过数据库约束设置为unique的列中指定数据。这样可以防止你进行两次锁定。

  7. Talk to twitter, save twitter data.
  8. 与推特交谈,保存推特数据。

  9. Remove lock record
  10. 删除锁定记录

For speed, ensure your lock table is in memory or use memcached instead. Of course, if you can use memcached you probably have full control over cron anyway. :)

要获得速度,请确保您的锁定表位于内存中或使用memcached。当然,如果你可以使用memcached,你可能无论如何都可以完全控制cron。 :)

#3


A relatively simple solution is to run a cron job on another computer. It would do the requests to Twitter then execute a HTTP POST to a designated page on the server (e.g. http://foo.com/latestTwitterData). Of course, you would want to have authentication to prevent random crap getting sent to you.

一个相对简单的解决方案是在另一台计算机上运行cron作业。它会向Twitter发出请求,然后对服务器上的指定页面执行HTTP POST(例如http://foo.com/latestTwitterData)。当然,您可能希望进行身份验证以防止随机垃圾邮件发送给您。

I don't know if this is reasonable for your situation.

我不知道这对你的情况是否合理。

#4


It's pretty easy to run code every second or so.

每隔一秒左右运行代码非常容易。

// pseudocode
while(1) {
    // do request

    // sleep 1 second
    sleep(1);
}

#5


Why not just put a while loop into your program and then sleep N seconds between however long you need the updates? You can then die after 59 minutes 30 seconds.

为什么不在你的程序中放一个while循环,然后在你需要更新之间的N秒之间休息?然后你可以在59分30秒后死亡。

Alternatively, to optimize the copying of multiple files, you can add multiple calls to your program within the single cron line. Something like:

或者,要优化多个文件的复制,您可以在单个cron行中向程序添加多个调用。就像是:

./prog.pl; sleep 60; ./prog.pl

./prog.pl;睡60; ./prog.pl