如何运行Python程序来定期更新现有的pandas DataFrame?

时间:2021-04-10 09:25:07

I am creating panel data by importing from a database's API using a function called instance which generates a pd.DataFrame column of 200 dict objects, each containing the values for the same variables (e.g. "Number of comments" and "Number of views") corresponding to one of the 200 members of the panel.

我通过使用名为instance的函数从数据库的API导入来创建面板数据,该函数生成包含相同变量值的200个dict对象的pd.DataFrame列(例如“注释数”和“视图数”)对应于该小组的200名成员之一。

This data is constantly being updated in real time and the database does not store its data. In other words, if one wants to keep track of how the data progresses over time, one must manually call the function instance every desired period (e.g. every hour).

该数据不断实时更新,数据库不存储其数据。换句话说,如果想要跟踪数据如何随时间推移,则必须每隔所需的时间段(例如每小时)手动调用该函数实例。

I am wondering how I would go about writing a program to passively run my instance function every hour appending it to every other hour's execution. For this purpose, I have found the threading module of potential interest, particularly its Timer program, but have had difficulty applying it effectively. This is what I have come up with:

我想知道如何编写一个程序来每小时被动地运行我的实例函数,并将其附加到每隔一小时执行一次。为此,我发现了潜在感兴趣的线程模块,特别是它的Timer程序,但是难以有效地应用它。这就是我想出的:

def instance_log(year, month, day, loglength):
    start = datetime.datetime.now()    
    log = instance(year,month,day)
    t = threading.Timer(60, log.join(instance(year, month, day)))
    t.start()
    if datetime.datetime.now() > start+datetime.timedelta(hours=loglength):
        t.cancel()
        return(log)

I tried running this program for loglength=1 (i.e. update the log DataFrame every minute for an hour), but it failed. Any help diagnosing what I did wrong or suggesting an alternate means of achieving what I'd want would be greatly appreciated.

我尝试运行此程序的loglength = 1(即每分钟更新日志DataFrame一小时),但它失败了。任何帮助诊断我做错了什么或建议实现我想要的替代方法将不胜感激。

By the way, to avoid confusion, I should clarify the inputs year, month, and day are used to identify the 200 panel members so that I use the same panelists for each iteration of instance.

顺便说一句,为了避免混淆,我应该澄清输入年,月和日用于识别200个面板成员,以便我为每个实例迭代使用相同的小组成员。

1 个解决方案

#1


Without knowing too much about your Instance (assuming it's a class) API this is how I would do this:

在不了解您的实例(假设它是一个类)API的情况下,我就是这样做的:

#!/usr/bin/env python


from __future__ import print_function


from circuits import Event, Component, Timer


class Instance(object):
    """My Instance Object"""


class App(Component):

    def init(self, instance):
        self.instance = instance

        # Create a scheduled event every hour
        Timer(60 * 60, Event.create("log_instance"), persist=True).register(self)

    def log_instance(self, year, month, day, loglength):
        """Event Handler for scheduled log_instance Event"""
        log = self.instance(year, month, day)
        print(log)  # Do something with log


instance = Instance()  # create instance?
App(instance).run()

This doesn't use Python's threading library but provides a reusable and composable event-driven structure that you can extend using the circuits framework. (caveat: I'm the author of this framework/library and am biased towards Event-Driven approaches!).

这不使用Python的线程库,而是提供可重用且可组合的事件驱动结构,您可以使用电路框架进行扩展。 (告诫:我是这个框架/库的作者,我偏向于事件驱动的方法!)。

NB: This is untested code as I'm not familiar with your exact requirements or your Instance's API (nor have you really shown that in the question).

注意:这是未经测试的代码,因为我不熟悉您的确切要求或您的Instance的API(您在问题中也没有真正证明)。

#1


Without knowing too much about your Instance (assuming it's a class) API this is how I would do this:

在不了解您的实例(假设它是一个类)API的情况下,我就是这样做的:

#!/usr/bin/env python


from __future__ import print_function


from circuits import Event, Component, Timer


class Instance(object):
    """My Instance Object"""


class App(Component):

    def init(self, instance):
        self.instance = instance

        # Create a scheduled event every hour
        Timer(60 * 60, Event.create("log_instance"), persist=True).register(self)

    def log_instance(self, year, month, day, loglength):
        """Event Handler for scheduled log_instance Event"""
        log = self.instance(year, month, day)
        print(log)  # Do something with log


instance = Instance()  # create instance?
App(instance).run()

This doesn't use Python's threading library but provides a reusable and composable event-driven structure that you can extend using the circuits framework. (caveat: I'm the author of this framework/library and am biased towards Event-Driven approaches!).

这不使用Python的线程库,而是提供可重用且可组合的事件驱动结构,您可以使用电路框架进行扩展。 (告诫:我是这个框架/库的作者,我偏向于事件驱动的方法!)。

NB: This is untested code as I'm not familiar with your exact requirements or your Instance's API (nor have you really shown that in the question).

注意:这是未经测试的代码,因为我不熟悉您的确切要求或您的Instance的API(您在问题中也没有真正证明)。