I have files in an S3 bucket. The data files are named with a date on the end of a standard label.
我在S3存储桶中有文件。数据文件以标准标签末尾的日期命名。
For example, a file key looks like this:
例如,文件键如下所示:
test_file_2016-12-01.tar.gz
test_file_2016-12-01.tar.gz
I wish to download files from date x
to date y
. I can do this like so:
我希望从日期x到日期y下载文件。我可以这样做:
conn = boto.connect_s3(host="s3-eu-west-1.amazonaws.com")
bucket = conn.get_bucket(bucket_name)
k = Key(bucket)
key_prefix = "test_file"
date_o = date(2016,11,30)
date_1 = date(2016,12,01)
day_delta = date_1 - date_o
for i in range(day_delta.days +1):
file_key = key_prefix + str(date_o + td(days=i)) + "tar.gz"
# Get the file
k.key = file_key
# Location for download destination
temp_location = "./tmp/" + file_key
k.get_contents_to_filename(temp_location)
However, I am now harvesting finer resolution data and wish to add data with hour resolution.
但是,我现在正在收集更精细的分辨率数据,并希望以小时分辨率添加数据。
Thus the files look like this:
因此文件看起来像这样:
test_file_2016-12-01-10.tar.gz
test_file_2016-12-01-10.tar.gz
I can handle time delta's well using the timedelta
feature of datetime
but this does not support hour
as well.
我可以使用datetime的timedelta功能很好地处理时间delta,但这也不支持小时。
How can I adjust this to specify capturing the files between something like:
如何调整此选项以指定在以下内容之间捕获文件:
date_o = datetime(2016,11,30,01,0,0)
date_1 = datetime(2016,12,01,12,0,0)
1 个解决方案
#1
1
Internally, the datetime
module will convert timedelta
hours into seconds, which means that we have to first complete the calculation in seconds and then divide back by 3600 to get our desired range of hours. After that, we just need to supply strftime
with our desired format as we iterate in order to display individual hours.
在内部,datetime模块会将timedelta小时转换为秒,这意味着我们必须先在几秒钟内完成计算,然后再除以3600以获得所需的小时范围。在那之后,我们只需要为我们所需的格式提供strftime,以便显示单个小时。
import datetime as dt
date_o = dt.datetime(2016, 11, 30, 0)
date_1 = dt.datetime(2016, 12, 1, 0)
delta_hours = (date_1 - date_o + dt.timedelta(hours=1)).total_seconds() / 3600
for hour in range(int(delta_hours)):
current_time = date_o + dt.timedelta(hours=hour)
file_name = 'test_file_' + dt.datetime.strftime(current_time,
'%Y-%m-%d-%H') + '.tar.gz'
print(file_name)
#1
1
Internally, the datetime
module will convert timedelta
hours into seconds, which means that we have to first complete the calculation in seconds and then divide back by 3600 to get our desired range of hours. After that, we just need to supply strftime
with our desired format as we iterate in order to display individual hours.
在内部,datetime模块会将timedelta小时转换为秒,这意味着我们必须先在几秒钟内完成计算,然后再除以3600以获得所需的小时范围。在那之后,我们只需要为我们所需的格式提供strftime,以便显示单个小时。
import datetime as dt
date_o = dt.datetime(2016, 11, 30, 0)
date_1 = dt.datetime(2016, 12, 1, 0)
delta_hours = (date_1 - date_o + dt.timedelta(hours=1)).total_seconds() / 3600
for hour in range(int(delta_hours)):
current_time = date_o + dt.timedelta(hours=hour)
file_name = 'test_file_' + dt.datetime.strftime(current_time,
'%Y-%m-%d-%H') + '.tar.gz'
print(file_name)