每秒向MySQL表插入数据(每秒一次)

时间:2023-01-13 07:29:40

i'm new in Python, Raspberry Pi and MySQL and i hope that you can help me. I'm trying to write a script in Python that could insert data every second into a MySQL table. I can insert data but is not periodical like i want, I already tried a lot and i can't find the solution to my problem. Here goes my Python code and the data inserted into the MySQL table:

我是Python、Raspberry Pi和MySQL的新手,希望你能帮助我。我正在尝试用Python编写一个脚本,它可以每秒钟向MySQL表插入数据。我可以插入数据,但不像我想的那样是周期性的,我已经尝试了很多,我找不到解决问题的方法。下面是我的Python代码和插入到MySQL表中的数据:

Python code:

#!/usr/bin/env python

import MySQLdb
import time

while True:
    db = MySQLdb.connect("localhost", "mauro", "12345", "temps")
    curs=db.cursor()
    try:
        curs.execute ("""INSERT INTO thetemps 
                values(0, CURRENT_DATE(), NOW(), 28)""")
        db.commit()
        print "Data committed"
    except:
        print "Error"
        db.rollback()
    db.close()
    time.sleep(1)

Table Result:

+-----+------------+----------+------+
| id  | date       | time     | temp |
+-----+------------+----------+------+
| 107 | 2015-11-06 | 19:16:41 |   28 |
| 108 | 2015-11-06 | 19:16:42 |   28 |
| 109 | 2015-11-06 | 19:16:45 |   28 |
| 110 | 2015-11-06 | 19:16:46 |   28 |
| 111 | 2015-11-06 | 19:16:47 |   28 |
| 112 | 2015-11-06 | 19:16:48 |   28 |
| 113 | 2015-11-06 | 19:16:56 |   28 |
| 114 | 2015-11-06 | 19:17:00 |   28 |
| 115 | 2015-11-06 | 19:17:03 |   28 |
| 116 | 2015-11-06 | 19:17:05 |   28 |
| 117 | 2015-11-06 | 19:17:06 |   28 |
| 118 | 2015-11-06 | 19:17:07 |   28 |
| 119 | 2015-11-06 | 19:17:08 |   28 |
| 120 | 2015-11-06 | 19:17:09 |   28 |
| 121 | 2015-11-06 | 19:17:10 |   28 |
| 122 | 2015-11-06 | 19:17:11 |   28 |
+-----+------------+----------+------+

As you can see, sometimes the scrip insert data periodicaly, and sometimes we have 8 seconds of interval between the data. So, my question is: is possible to the interval between the data be 1 second every time? What am i doing wrong? Sorry for the bad english and thanks in advance!

如您所见,有时scrip定期插入数据,有时数据之间有8秒的间隔。我的问题是,数据之间的间隔是否可能是1秒?我做错了什么?不好意思英语说得不好,提前谢谢!

5 个解决方案

#1


2  

You're establishing a new connection to the database server on each iteration. This can take arbitrary amount of time. Moving .connect(), etc. outside of the loop may give you more consistent timings:

您将在每次迭代中建立一个新的数据库服务器连接。这可以花费任意的时间。在循环之外移动.connect()等可能会给您提供更一致的计时:

db = MySQLdb.connect("localhost", "mauro", "12345", "temps")
curs = db.cursor()

while True:    
    try:
        curs.execute ("""INSERT INTO thetemps 
                values(0, CURRENT_DATE(), NOW(), 28)""")
        db.commit()
        print "Data committed"
    except:
        print "Error"
        db.rollback()    
    time.sleep(1)
db.close()

#2


1  

Don't use transaction for this, maybe some table are locked, when attempt to insert new row.

不要为此使用事务,当尝试插入新行时,可能某些表被锁定。

#3


1  

is possible to the interval between the data be 1 second every time?

数据之间的间隔是否可能是1秒?

Theoretically, yes, but in practice there're too many other factors outside of your control that are more likely to get in the way. Some of these include, but are not limited to:

理论上是这样,但实际上有太多你无法控制的因素更有可能阻碍你。其中包括但不限于:

  1. OS kernel's task scheduler
  2. 操作系统内核的任务调度器
  3. Task priorities relative to others
  4. 任务优先级相对于其他优先级
  5. Overall system load
  6. 整个系统的负载
  7. Amount of data already in the table (check time complexity of binary trees)
  8. 表中已存在的数据量(检查二叉树的时间复杂度)

This means that even if your system was idle most of the time, the time.sleep(1) is not guaranteed to always sleep for exactly 1 second, and even if it did, the system may've been doing something else (e.g. more I/O) and require different amounts of time to perform the same operations every time.

这意味着,即使你的系统大部分时间是空闲的,time . sleep(1)不能保证总是睡1秒,即使那样,系统可能一直在做别的东西(如I / O)和需要不同数量的时间每次都执行相同的操作。

Also, instead of creating a new connection every time inside the loop, you should keep the connection open and save the overhead.

此外,与每次在循环中创建新连接不同,您应该保持连接打开并保存开销。

What am i doing wrong?

我做错了什么?

I don't think you're doing anything particularly wrong here. The code looks OK, except for the extra overhead of creating a new connection every time --which you shouldn't. That aside, the issue here boils down to factors outside of your control.

我不认为你做错了什么。代码看起来不错,除了每次创建新连接的额外开销——这是不应该的。撇开这一点不谈,这里的问题归结为你无法控制的因素。

That being said, there're some things you can do to improve your chances.

也就是说,你可以做一些事情来提高你的机会。


A few additional suggestions to improve performance

Storage Engine

In addition to avoiding the overhead of opening/closing the database connection on every iteration, you should check the storage engine used for the table. For example, depending on your MySQL version, the default might still be MyISAM, which requires table locking for writing.

除了避免在每次迭代中打开/关闭数据库连接的开销外,还应该检查表使用的存储引擎。例如,根据您的MySQL版本,默认情况下可能仍然是MyISAM,这需要表锁定来编写。

In contrast, InnoDB only requires row locking when writing to the table, which should improve things if something else is using the table. If you find you're not using InnoDB, issue an alter table ... query to change the storage engine.

相比之下,InnoDB只需要在写入表时进行行锁定,如果有其他东西在使用该表,那么它应该会改进一些东西。如果你发现你没有使用InnoDB,发布一个修改表……查询以更改存储引擎。

Auto-Commit instead of Transaction

Transactions are meant to group a set of 2 or more queries as a single unit, but you're submitting individual queries. Instead, you should configure MySQL to have automatic commits enabled, so that it doesn't have to wait for an explicit commit request after your query is submitted and executed, saving some communication overhead between the server and your client.

事务将把一组2个或多个查询作为一个单元分组,但是您正在提交单个查询。相反,您应该配置MySQL使其具有自动提交功能,这样就不必在提交和执行查询之后等待显式提交请求,从而节省了服务器和客户机之间的通信开销。

Influence the OS Scheduler by Increasing Your Priority

You can set a higher priority for your program in order for the scheduler to be more helpful here. It might also help doing the same thing for the database service/process.

您可以为程序设置更高的优先级,以便调度程序在这里更有帮助。它还可以帮助对数据库服务/进程执行相同的操作。

Other user-level tasks could also have their priorities lowered a bit, if necessary.

如果有必要,其他用户级任务的优先级也会降低一些。

#4


0  

Try create the connection to the db before while condition to keep open the connection.

在条件保持打开连接之前,尝试创建到db的连接。

#5


0  

The problem I see is that connecting + inserting takes time, this will add up and your process will eventually behind.

我看到的问题是,连接+插入需要时间,这会增加,您的进程最终会落后。

What I would do is separate the data gathering (make sure you read your temperatures every second) from the data loading (data loading can take longer than a second if needed, but you won't fall behind).

我要做的是将数据收集(确保每秒钟读取您的温度)与数据加载分开(如果需要,数据加载可能需要超过一秒,但您不会落后)。

So, if I were you, I'd have two separate scripts running in parallel and communicating through some simple, fast and reliable mechanism. A list in Redis would probably work great. You could also use something like ZMQ.

所以,如果我是你,我将有两个独立的脚本并行运行,并通过一些简单、快速和可靠的机制进行通信。用Redis的列表可能会很有用。您还可以使用类似ZMQ的东西。

Something like this:

是这样的:

# gather.py
while True:
    temp = get_temp()
    redis.lpush('temps', pickle.dumps([temp, datetime.now]))
    time.sleep(1)

and..

和. .

# load.py
while True:
    # Block until new temps are available in redis
    data, key = redis.brpop('temps')
    # Get all temps queued
    datapoints = [data] +  redis.rpop('temps'):
    db = MySQLdb.connect("localhost", "mauro", "12345", "temps")
    curs=db.cursor()
    try:
        for p in datapoints:
            point = pickle.loads(p)
            curs.execute ("INSERT INTO thetemps (temp, timestamp) values({}, {})".format(point[0], point[1])
        db.commit()
        print "Data committed"
    except:
        print "Error"
        db.rollback()

You could add some improvements on the above like reusing the db connection, making sure you don't lose the temps if there's a DB error, using a namedtuple instead of an array for the datapoints, etc.

您可以在上面添加一些改进,比如重用db连接,确保如果有db错误,使用namedtuple,而不是datapoints的数组,就不会丢失temps。

#1


2  

You're establishing a new connection to the database server on each iteration. This can take arbitrary amount of time. Moving .connect(), etc. outside of the loop may give you more consistent timings:

您将在每次迭代中建立一个新的数据库服务器连接。这可以花费任意的时间。在循环之外移动.connect()等可能会给您提供更一致的计时:

db = MySQLdb.connect("localhost", "mauro", "12345", "temps")
curs = db.cursor()

while True:    
    try:
        curs.execute ("""INSERT INTO thetemps 
                values(0, CURRENT_DATE(), NOW(), 28)""")
        db.commit()
        print "Data committed"
    except:
        print "Error"
        db.rollback()    
    time.sleep(1)
db.close()

#2


1  

Don't use transaction for this, maybe some table are locked, when attempt to insert new row.

不要为此使用事务,当尝试插入新行时,可能某些表被锁定。

#3


1  

is possible to the interval between the data be 1 second every time?

数据之间的间隔是否可能是1秒?

Theoretically, yes, but in practice there're too many other factors outside of your control that are more likely to get in the way. Some of these include, but are not limited to:

理论上是这样,但实际上有太多你无法控制的因素更有可能阻碍你。其中包括但不限于:

  1. OS kernel's task scheduler
  2. 操作系统内核的任务调度器
  3. Task priorities relative to others
  4. 任务优先级相对于其他优先级
  5. Overall system load
  6. 整个系统的负载
  7. Amount of data already in the table (check time complexity of binary trees)
  8. 表中已存在的数据量(检查二叉树的时间复杂度)

This means that even if your system was idle most of the time, the time.sleep(1) is not guaranteed to always sleep for exactly 1 second, and even if it did, the system may've been doing something else (e.g. more I/O) and require different amounts of time to perform the same operations every time.

这意味着,即使你的系统大部分时间是空闲的,time . sleep(1)不能保证总是睡1秒,即使那样,系统可能一直在做别的东西(如I / O)和需要不同数量的时间每次都执行相同的操作。

Also, instead of creating a new connection every time inside the loop, you should keep the connection open and save the overhead.

此外,与每次在循环中创建新连接不同,您应该保持连接打开并保存开销。

What am i doing wrong?

我做错了什么?

I don't think you're doing anything particularly wrong here. The code looks OK, except for the extra overhead of creating a new connection every time --which you shouldn't. That aside, the issue here boils down to factors outside of your control.

我不认为你做错了什么。代码看起来不错,除了每次创建新连接的额外开销——这是不应该的。撇开这一点不谈,这里的问题归结为你无法控制的因素。

That being said, there're some things you can do to improve your chances.

也就是说,你可以做一些事情来提高你的机会。


A few additional suggestions to improve performance

Storage Engine

In addition to avoiding the overhead of opening/closing the database connection on every iteration, you should check the storage engine used for the table. For example, depending on your MySQL version, the default might still be MyISAM, which requires table locking for writing.

除了避免在每次迭代中打开/关闭数据库连接的开销外,还应该检查表使用的存储引擎。例如,根据您的MySQL版本,默认情况下可能仍然是MyISAM,这需要表锁定来编写。

In contrast, InnoDB only requires row locking when writing to the table, which should improve things if something else is using the table. If you find you're not using InnoDB, issue an alter table ... query to change the storage engine.

相比之下,InnoDB只需要在写入表时进行行锁定,如果有其他东西在使用该表,那么它应该会改进一些东西。如果你发现你没有使用InnoDB,发布一个修改表……查询以更改存储引擎。

Auto-Commit instead of Transaction

Transactions are meant to group a set of 2 or more queries as a single unit, but you're submitting individual queries. Instead, you should configure MySQL to have automatic commits enabled, so that it doesn't have to wait for an explicit commit request after your query is submitted and executed, saving some communication overhead between the server and your client.

事务将把一组2个或多个查询作为一个单元分组,但是您正在提交单个查询。相反,您应该配置MySQL使其具有自动提交功能,这样就不必在提交和执行查询之后等待显式提交请求,从而节省了服务器和客户机之间的通信开销。

Influence the OS Scheduler by Increasing Your Priority

You can set a higher priority for your program in order for the scheduler to be more helpful here. It might also help doing the same thing for the database service/process.

您可以为程序设置更高的优先级,以便调度程序在这里更有帮助。它还可以帮助对数据库服务/进程执行相同的操作。

Other user-level tasks could also have their priorities lowered a bit, if necessary.

如果有必要,其他用户级任务的优先级也会降低一些。

#4


0  

Try create the connection to the db before while condition to keep open the connection.

在条件保持打开连接之前,尝试创建到db的连接。

#5


0  

The problem I see is that connecting + inserting takes time, this will add up and your process will eventually behind.

我看到的问题是,连接+插入需要时间,这会增加,您的进程最终会落后。

What I would do is separate the data gathering (make sure you read your temperatures every second) from the data loading (data loading can take longer than a second if needed, but you won't fall behind).

我要做的是将数据收集(确保每秒钟读取您的温度)与数据加载分开(如果需要,数据加载可能需要超过一秒,但您不会落后)。

So, if I were you, I'd have two separate scripts running in parallel and communicating through some simple, fast and reliable mechanism. A list in Redis would probably work great. You could also use something like ZMQ.

所以,如果我是你,我将有两个独立的脚本并行运行,并通过一些简单、快速和可靠的机制进行通信。用Redis的列表可能会很有用。您还可以使用类似ZMQ的东西。

Something like this:

是这样的:

# gather.py
while True:
    temp = get_temp()
    redis.lpush('temps', pickle.dumps([temp, datetime.now]))
    time.sleep(1)

and..

和. .

# load.py
while True:
    # Block until new temps are available in redis
    data, key = redis.brpop('temps')
    # Get all temps queued
    datapoints = [data] +  redis.rpop('temps'):
    db = MySQLdb.connect("localhost", "mauro", "12345", "temps")
    curs=db.cursor()
    try:
        for p in datapoints:
            point = pickle.loads(p)
            curs.execute ("INSERT INTO thetemps (temp, timestamp) values({}, {})".format(point[0], point[1])
        db.commit()
        print "Data committed"
    except:
        print "Error"
        db.rollback()

You could add some improvements on the above like reusing the db connection, making sure you don't lose the temps if there's a DB error, using a namedtuple instead of an array for the datapoints, etc.

您可以在上面添加一些改进,比如重用db连接,确保如果有db错误,使用namedtuple,而不是datapoints的数组,就不会丢失temps。