在python中将线程安全写入文件

时间:2021-10-22 11:47:30

How can I write data to a file in Python thread-safe? I want to safe some variables to a file for every request and every hour I want to do some grouping and write it to mysql.

如何在Python线程安全的情况下将数据写入文件?我想为每个请求安全地将一些变量保存到文件中,每隔一小时我想要进行一些分组并将其写入mysql。

In Java I now put it in an Array which is cached and this is written to a file when the array is full.

在Java中,我现在将它放在一个缓存的数组中,当数组已满时将其写入文件。

How can I do this in Python? There are many concurrent requests so it has to be thread-safe.

我怎么能用Python做到这一点?有许多并发请求,因此它必须是线程安全的。

EDIT:

We ended up using the logging module which works fine.

我们最终使用了正常工作的日志模块。

3 个解决方案

#1


Look at the Queue class, it is thread safe.

查看Queue类,它是线程安全的。

from Queue import Queue
writeQueue = Queue()

in thread

writeQueue.put(repr(some_object))

Then to dump it to a file,

然后将其转储到文件中,

outFile = open(path,'w')
while writeQueue.qsize():
  outFile.write(writeQueue.get())
outFile.flush()
outFile.close()

Queue will accept any python object, so if you're trying to do something other than print to a file, just store the objects from the worker threads via Queue.put.

队列将接受任何python对象,因此如果您尝试执行除打印到文件之外的其他操作,只需通过Queue.put存储工作线程中的对象。

If you need to split the commits across multiple invocations of the script, you'll need a way to cache partially built commits to disk. To avoid multiple copies trying to write to the file at the same time, use the lockfile module, available via pip. I usually use json to encode data for these purposes, it supports serializing strings, unicode, lists, numbers, and dicts, and is safer than pickle.

如果需要跨脚本的多个调用拆分提交,则需要一种方法将部分构建的提交缓存到磁盘。要避免多个副本同时尝试写入文件,请使用通过pip提供的lockfile模块。我通常使用json为这些目的编码数据,它支持序列化字符串,unicode,列表,数字和dicts,并且比pickle更安全。

with lockfile.LockFile('/path/to/file.sql'):
  fin=open('/path/to/file')
  data=json.loads(fin.read())
  data.append(newdata)
  fin.close()
  fout=open('/path/to/file','w')
  fout.write(json.dumps(data))
  fout.close()

Note that depending on OS features, the time taken to lock and unlock the file as well as rewrite it for every request may be more than you expect. If possible, try to just append to the file, as that will be faster. Also, you may want to use a client/server model, where each 'request' launches a worker script which connects to a server process and forwards the data on via a network socket. This sidesteps the need for lockfiles; depending on how much data you're talking, it may be able to hold it all in memory in the server process, or the server may need to serialize it to disk and pass it to the database that way.

请注意,根据操作系统功能,锁定和解锁文件以及为每个请求重写文件所花费的时间可能比您预期的要多。如果可能的话,尝试只追加到文件,因为这会更快。此外,您可能希望使用客户端/服务器模型,其中每个“请求”都会启动一个连接到服务器进程的工作脚本,并通过网络套接字转发数据。这避免了对锁定文件的需求;根据您正在谈论的数据量,它可能能够在服务器进程中将其全部保存在内存中,或者服务器可能需要将其序列化到磁盘并以此方式将其传递到数据库。

WSGI server example:

WSGI服务器示例:

from Queue import Queue
q=Queue()
def flushQueue():
    with open(path,'w') as f:
       while q.qsize():
           f.write(q.get())

def application(env, start_response):
   q.put("Hello World!")
   if q.qsize() > 999:
       flushQueue()
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]

#2


We used the logging module:

我们使用了日志模块:

import logging

logpath = "/tmp/log.log"
logger = logging.getLogger('log')
logger.setLevel(logging.INFO)
ch = logging.FileHandler(logpath)
ch.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(ch)


def application(env, start_response):
   logger.info("%s %s".format("hello","world!"))
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]

#3


I've made a simple writer, that uses threading and Queue and works fine with multiple threads. Pros: teoreticaly it can aссept data from multiple processes without blocking them, and write asynconiosly in other thread. Cons: additional thread for writing consumes resourses; in CPython threading doesn't give real multithreading.

我做了一个简单的编写器,它使用线程和队列,并且可以在多个线程中正常工作。优点:teoreticaly它可以从多个进程中获取数据而不会阻塞它们,并在其他线程中编写asynconiosly。缺点:写作的额外线程消耗资源;在CPython中,线程并没有给出真正的多线程。

from Queue import Queue, Empty
from threading import Thread

class SafeWriter:
    def __init__(self, *args):
        self.filewriter = open(*args)
        self.queue = Queue()
        self.finished = False
        Thread(name = "SafeWriter", target=self.internal_writer).start()  

    def write(self, data):
        self.queue.put(data)

    def internal_writer(self):
        while not self.finished:
            try:
                data = self.queue.get(True, 1)
            except Empty:
                continue    
            self.filewriter.write(data)
            self.queue.task_done()

    def close(self):
        self.queue.join()
        self.finished = True
        self.filewriter.close()

#use it like ordinary open like this:
w = SimpleWriter("filename", "w")
w.write("can be used among multiple threads")
w.close() #it is really important to close or the program would not end 

#1


Look at the Queue class, it is thread safe.

查看Queue类,它是线程安全的。

from Queue import Queue
writeQueue = Queue()

in thread

writeQueue.put(repr(some_object))

Then to dump it to a file,

然后将其转储到文件中,

outFile = open(path,'w')
while writeQueue.qsize():
  outFile.write(writeQueue.get())
outFile.flush()
outFile.close()

Queue will accept any python object, so if you're trying to do something other than print to a file, just store the objects from the worker threads via Queue.put.

队列将接受任何python对象,因此如果您尝试执行除打印到文件之外的其他操作,只需通过Queue.put存储工作线程中的对象。

If you need to split the commits across multiple invocations of the script, you'll need a way to cache partially built commits to disk. To avoid multiple copies trying to write to the file at the same time, use the lockfile module, available via pip. I usually use json to encode data for these purposes, it supports serializing strings, unicode, lists, numbers, and dicts, and is safer than pickle.

如果需要跨脚本的多个调用拆分提交,则需要一种方法将部分构建的提交缓存到磁盘。要避免多个副本同时尝试写入文件,请使用通过pip提供的lockfile模块。我通常使用json为这些目的编码数据,它支持序列化字符串,unicode,列表,数字和dicts,并且比pickle更安全。

with lockfile.LockFile('/path/to/file.sql'):
  fin=open('/path/to/file')
  data=json.loads(fin.read())
  data.append(newdata)
  fin.close()
  fout=open('/path/to/file','w')
  fout.write(json.dumps(data))
  fout.close()

Note that depending on OS features, the time taken to lock and unlock the file as well as rewrite it for every request may be more than you expect. If possible, try to just append to the file, as that will be faster. Also, you may want to use a client/server model, where each 'request' launches a worker script which connects to a server process and forwards the data on via a network socket. This sidesteps the need for lockfiles; depending on how much data you're talking, it may be able to hold it all in memory in the server process, or the server may need to serialize it to disk and pass it to the database that way.

请注意,根据操作系统功能,锁定和解锁文件以及为每个请求重写文件所花费的时间可能比您预期的要多。如果可能的话,尝试只追加到文件,因为这会更快。此外,您可能希望使用客户端/服务器模型,其中每个“请求”都会启动一个连接到服务器进程的工作脚本,并通过网络套接字转发数据。这避免了对锁定文件的需求;根据您正在谈论的数据量,它可能能够在服务器进程中将其全部保存在内存中,或者服务器可能需要将其序列化到磁盘并以此方式将其传递到数据库。

WSGI server example:

WSGI服务器示例:

from Queue import Queue
q=Queue()
def flushQueue():
    with open(path,'w') as f:
       while q.qsize():
           f.write(q.get())

def application(env, start_response):
   q.put("Hello World!")
   if q.qsize() > 999:
       flushQueue()
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]

#2


We used the logging module:

我们使用了日志模块:

import logging

logpath = "/tmp/log.log"
logger = logging.getLogger('log')
logger.setLevel(logging.INFO)
ch = logging.FileHandler(logpath)
ch.setFormatter(logging.Formatter('%(message)s'))
logger.addHandler(ch)


def application(env, start_response):
   logger.info("%s %s".format("hello","world!"))
   start_response('200 OK', [('Content-Type', 'text/html')])
   return ["Hello!"]

#3


I've made a simple writer, that uses threading and Queue and works fine with multiple threads. Pros: teoreticaly it can aссept data from multiple processes without blocking them, and write asynconiosly in other thread. Cons: additional thread for writing consumes resourses; in CPython threading doesn't give real multithreading.

我做了一个简单的编写器,它使用线程和队列,并且可以在多个线程中正常工作。优点:teoreticaly它可以从多个进程中获取数据而不会阻塞它们,并在其他线程中编写asynconiosly。缺点:写作的额外线程消耗资源;在CPython中,线程并没有给出真正的多线程。

from Queue import Queue, Empty
from threading import Thread

class SafeWriter:
    def __init__(self, *args):
        self.filewriter = open(*args)
        self.queue = Queue()
        self.finished = False
        Thread(name = "SafeWriter", target=self.internal_writer).start()  

    def write(self, data):
        self.queue.put(data)

    def internal_writer(self):
        while not self.finished:
            try:
                data = self.queue.get(True, 1)
            except Empty:
                continue    
            self.filewriter.write(data)
            self.queue.task_done()

    def close(self):
        self.queue.join()
        self.finished = True
        self.filewriter.close()

#use it like ordinary open like this:
w = SimpleWriter("filename", "w")
w.write("can be used among multiple threads")
w.close() #it is really important to close or the program would not end