使用Lambda从S3读取数据

时间:2020-12-06 23:11:01

I have a range of json files stored in an S3 bucket on AWS.

我在AWS上的S3存储桶中存储了一系列json文件。

I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database.

我希望使用AWS lambda python服务来解析此json并将解析后的结果发送到AWS RDS MySQL数据库。

I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added).

我有一个稳定的python脚本,用于解析和写入数据库。我需要lambda脚本来遍历json文件(当它们被添加时)。

Each json file contains a list, simple consisting of results = [content]

每个json文件都包含一个列表,简单包含结果= [content]

In pseudo-code what I want is:

在伪代码中我想要的是:

  1. Connect to the S3 bucket (jsondata)
  2. 连接到S3存储桶(jsondata)
  3. Read the contents of the JSON file (results)
  4. 阅读JSON文件的内容(结果)
  5. Execute my script for this data (results)
  6. 为此数据执行我的脚本(结果)

I can list the buckets I have by:

我可以列出我拥有的桶:

import boto3

s3 = boto3.resource('s3')

for bucket in s3.buckets.all():
    print(bucket.name)

Giving:

赠送:

jsondata

But I cannot access this bucket to read its results.

但我无法访问此存储桶来读取其结果。

There doesn't appear to be a read or load function.

似乎没有读取或加载功能。

I wish for something like

我希望有类似的东西

for bucket in s3.buckets.all():
   print(bucket.contents)

EDIT

编辑

I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself.

我误解了一些事情。 lambda必须自己下载,而不是在S3中读取文件。

From here it seems that you must give lambda a download path, from which it can access the files itself

从这里看来,你必须给lambda一个下载路径,从中可以访问文件本身

import libraries

s3_client = boto3.client('s3')

def function to be executed:
   blah blah

def handler(event, context):
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = record['s3']['object']['key'] 
        download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
        s3_client.download_file(bucket, key, download_path)

2 个解决方案

#1


7  

You can use bucket.objects.all() to get a list of the all objects in the bucket (you also have alternative methods like filter, page_sizeand limit depending on your need)

您可以使用bucket.objects.all()来获取存储桶中所有对象的列表(您还可以使用其他方法,如filter,page_size和limit,具体取决于您的需要)

These methods return an iterator with S3.ObjectSummary objects in it, from there you can use the method object.get to retrieve the file.

这些方法返回一个带有S3.ObjectSummary对象的迭代器,从那里你可以使用方法object.get来检索文件。

#2


13  

s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')

#1


7  

You can use bucket.objects.all() to get a list of the all objects in the bucket (you also have alternative methods like filter, page_sizeand limit depending on your need)

您可以使用bucket.objects.all()来获取存储桶中所有对象的列表(您还可以使用其他方法,如filter,page_size和limit,具体取决于您的需要)

These methods return an iterator with S3.ObjectSummary objects in it, from there you can use the method object.get to retrieve the file.

这些方法返回一个带有S3.ObjectSummary对象的迭代器,从那里你可以使用方法object.get来检索文件。

#2


13  

s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')