I'm using the cloudfile module to upload files to rackspace cloud files, using something like this pseudocode:
我使用cloudfile模块将文件上传到rackspace云文件,使用如下伪代码:
import cloudfiles
username = '---'
api_key = '---'
conn = cloudfiles.get_connection(username, api_key)
testcontainer = conn.create_container('test')
for f in get_filenames():
obj = testcontainer.create_object(f)
obj.load_from_filename(f)
My problem is that I have a lot of small files to upload, and it takes too long this way.
我的问题是,我有很多小的文件要上传,而且花费的时间太长了。
Buried in the documentation, I see that there is a class ConnectionPool, which supposedly can be used to upload files in parallell.
在文档中,我看到有一个类ConnectionPool,据说可以用来并行地上传文件。
Could someone please show how I can make this piece of code upload more than one file at a time?
有人能告诉我如何让这段代码一次上传多个文件吗?
1 个解决方案
#1
7
The ConnectionPool
class is meant for a multithreading application that ocasionally has to send something to rackspace.
ConnectionPool类是用于多线程应用程序的,而ocasily必须将一些东西发送到rackspace。
That way you can reuse your connection but you don't have to keep 100 connections open if you have 100 threads.
这样你就可以重用你的连接,但是如果你有100个线程,你就不需要保持100个连接。
You are simply looking for a multithreading/multiprocessing uploader. Here's an example using the multiprocessing
library:
您只是在寻找一个多线程/多处理上传程序。这里有一个使用多处理库的例子:
import cloudfiles
import multiprocessing
USERNAME = '---'
API_KEY = '---'
def get_container():
conn = cloudfiles.get_connection(USERNAME, API_KEY)
testcontainer = conn.create_container('test')
return testcontainer
def uploader(filenames):
'''Worker process to upload the given files'''
container = get_container()
# Keep going till you reach STOP
for filename in iter(filenames.get, 'STOP'):
# Create the object and upload
obj = container.create_object(filename)
obj.load_from_filename(filename)
def main():
NUMBER_OF_PROCESSES = 16
# Add your filenames to this queue
filenames = multiprocessing.Queue()
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
multiprocessing.Process(target=uploader, args=(filenames,)).start()
# You can keep adding tasks until you add STOP
filenames.put('some filename')
# Stop all child processes
for i in range(NUMBER_OF_PROCESSES):
filenames.put('STOP')
if __name__ == '__main__':
multiprocessing.freeze_support()
main()
#1
7
The ConnectionPool
class is meant for a multithreading application that ocasionally has to send something to rackspace.
ConnectionPool类是用于多线程应用程序的,而ocasily必须将一些东西发送到rackspace。
That way you can reuse your connection but you don't have to keep 100 connections open if you have 100 threads.
这样你就可以重用你的连接,但是如果你有100个线程,你就不需要保持100个连接。
You are simply looking for a multithreading/multiprocessing uploader. Here's an example using the multiprocessing
library:
您只是在寻找一个多线程/多处理上传程序。这里有一个使用多处理库的例子:
import cloudfiles
import multiprocessing
USERNAME = '---'
API_KEY = '---'
def get_container():
conn = cloudfiles.get_connection(USERNAME, API_KEY)
testcontainer = conn.create_container('test')
return testcontainer
def uploader(filenames):
'''Worker process to upload the given files'''
container = get_container()
# Keep going till you reach STOP
for filename in iter(filenames.get, 'STOP'):
# Create the object and upload
obj = container.create_object(filename)
obj.load_from_filename(filename)
def main():
NUMBER_OF_PROCESSES = 16
# Add your filenames to this queue
filenames = multiprocessing.Queue()
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
multiprocessing.Process(target=uploader, args=(filenames,)).start()
# You can keep adding tasks until you add STOP
filenames.put('some filename')
# Stop all child processes
for i in range(NUMBER_OF_PROCESSES):
filenames.put('STOP')
if __name__ == '__main__':
multiprocessing.freeze_support()
main()