使用Lambda从S3读取数据

I have a range of json files stored in an S3 bucket on AWS.

我在AWS上的S3存储桶中存储了一系列json文件。

I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database.

我希望使用AWS lambda python服务来解析此json并将解析后的结果发送到AWS RDS MySQL数据库。

I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added).

我有一个稳定的python脚本，用于解析和写入数据库。我需要lambda脚本来遍历json文件（当它们被添加时）。

Each json file contains a list, simple consisting of results = [content]

每个json文件都包含一个列表，简单包含结果= [content]

In pseudo-code what I want is:

在伪代码中我想要的是：

Connect to the S3 bucket (jsondata)
连接到S3存储桶（jsondata）
Read the contents of the JSON file (results)
阅读JSON文件的内容（结果）
Execute my script for this data (results)
为此数据执行我的脚本（结果）

I can list the buckets I have by:

我可以列出我拥有的桶：

import boto3

s3 = boto3.resource('s3')

for bucket in s3.buckets.all():
    print(bucket.name)

Giving:

赠送：

jsondata

But I cannot access this bucket to read its results.

但我无法访问此存储桶来读取其结果。

There doesn't appear to be a read or load function.

似乎没有读取或加载功能。

I wish for something like

我希望有类似的东西

for bucket in s3.buckets.all():
   print(bucket.contents)

EDIT

编辑

I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself.

我误解了一些事情。 lambda必须自己下载，而不是在S3中读取文件。

From here it seems that you must give lambda a download path, from which it can access the files itself

从这里看来，你必须给lambda一个下载路径，从中可以访问文件本身

import libraries

s3_client = boto3.client('s3')

def function to be executed:
   blah blah

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'] 
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        s3_client.download_file(bucket, key, download_path)

2 个解决方案

#1

You can use bucket.objects.all() to get a list of the all objects in the bucket (you also have alternative methods like filter, page_sizeand limit depending on your need)

您可以使用bucket.objects.all（）来获取存储桶中所有对象的列表（您还可以使用其他方法，如filter，page_size和limit，具体取决于您的需要）

These methods return an iterator with S3.ObjectSummary objects in it, from there you can use the method object.get to retrieve the file.

这些方法返回一个带有S3.ObjectSummary对象的迭代器，从那里你可以使用方法object.get来检索文件。

#2

s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')

#1