I have a range of json files stored in an S3 bucket on AWS.
我在AWS上的S3存储桶中存储了一系列json文件。
I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database.
我希望使用AWS lambda python服务来解析此json并将解析后的结果发送到AWS RDS MySQL数据库。
I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added).
我有一个稳定的python脚本,用于解析和写入数据库。我需要lambda脚本来遍历json文件(当它们被添加时)。
Each json file contains a list, simple consisting of results = [content]
每个json文件都包含一个列表,简单包含结果= [content]
In pseudo-code what I want is:
在伪代码中我想要的是:
- Connect to the S3 bucket (
jsondata
) - 连接到S3存储桶(jsondata)
- Read the contents of the JSON file (
results
) - 阅读JSON文件的内容(结果)
- Execute my script for this data (
results
) - 为此数据执行我的脚本(结果)
I can list the buckets I have by:
我可以列出我拥有的桶:
import boto3
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
Giving:
赠送:
jsondata
But I cannot access this bucket to read its results.
但我无法访问此存储桶来读取其结果。
There doesn't appear to be a read
or load
function.
似乎没有读取或加载功能。
I wish for something like
我希望有类似的东西
for bucket in s3.buckets.all():
print(bucket.contents)
EDIT
编辑
I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself.
我误解了一些事情。 lambda必须自己下载,而不是在S3中读取文件。
From here it seems that you must give lambda a download path, from which it can access the files itself
从这里看来,你必须给lambda一个下载路径,从中可以访问文件本身
import libraries
s3_client = boto3.client('s3')
def function to be executed:
blah blah
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
s3_client.download_file(bucket, key, download_path)
2 个解决方案
#1
7
You can use bucket.objects.all()
to get a list of the all objects in the bucket (you also have alternative methods like filter
, page_size
and limit
depending on your need)
您可以使用bucket.objects.all()来获取存储桶中所有对象的列表(您还可以使用其他方法,如filter,page_size和limit,具体取决于您的需要)
These methods return an iterator with S3.ObjectSummary
objects in it, from there you can use the method object.get
to retrieve the file.
这些方法返回一个带有S3.ObjectSummary对象的迭代器,从那里你可以使用方法object.get来检索文件。
#2
13
s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')
#1
7
You can use bucket.objects.all()
to get a list of the all objects in the bucket (you also have alternative methods like filter
, page_size
and limit
depending on your need)
您可以使用bucket.objects.all()来获取存储桶中所有对象的列表(您还可以使用其他方法,如filter,page_size和limit,具体取决于您的需要)
These methods return an iterator with S3.ObjectSummary
objects in it, from there you can use the method object.get
to retrieve the file.
这些方法返回一个带有S3.ObjectSummary对象的迭代器,从那里你可以使用方法object.get来检索文件。
#2
13
s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')