I'm learning how to use Python in the Amazon AWS Lambda service. I'm trying to read characters from an S3 object, and write them to another S3 object. I realize I can copy the S3 object to a local tmp file, but I wanted to "stream" the S3 input into the script, process and output, without the local copy stage if possible. I'm using code from this * (Second answer) that suggests a solution for this.
我正在学习如何在Amazon AWS Lambda服务中使用Python。我正在尝试从S3对象中读取字符,并将它们写入另一个S3对象。我意识到我可以将S3对象复制到本地tmp文件,但是我希望将S3输入“流”化到脚本,处理和输出,如果可能的话没有本地复制阶段。我正在使用*中的代码(第二个答案),为此提出了解决方案。
This code contains two "yield()" statements which are causing my otherwise working script to throw a "generator is noto JSON serializable" error. I'm trying to understand why a "yield()" statement would throw this error. Is this a Lambda environment restriction, or is this something specific to my code that is creating the serialization issue. (Likely due to using an S3 file object?).
此代码包含两个“yield()”语句,这些语句导致我的其他工作脚本抛出“generator is noto JSON serializable”错误。我试图理解为什么“yield()”语句会抛出此错误。这是一个Lambda环境限制,还是这个特定于我的代码创建序列化问题的东西。 (可能是因为使用了S3文件对象?)。
Here is my code that I run in Lambda. If I comment out the two yield statements it runs but the output file is empty.
这是我在Lambda中运行的代码。如果我注释掉它运行的两个yield语句,但输出文件是空的。
from __future__ import print_function
import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')
s3 = boto3.client('s3')
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
inbucket = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
outkey = "out" + inkey
try:
infile = s3.get_object(Bucket=inbucket, Key=inkey)
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
raise e
tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
# upload_path = '/tmp/resized-{}'.format(key)
with open(tmp_path,'w') as out:
unfinished_line = ''
for byte in infile:
byte = unfinished_line + byte
#split on whatever, or use a regex with re.split()
lines = byte.split('\n')
unfinished_line = lines.pop()
for line in lines:
out.write(line)
yield line # This line causes JSON error if uncommented
yield unfinished_line # This line causes JSON error if uncommented
#
# Upload the file to S3
#
tmp = open(tmp_path,"r")
try:
outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
except Exception as e:
print(e)
print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
raise e
tmp.close()
2 个解决方案
#1
#2
0
Thanks to Lei Shi for answering the specific point I was asking about. Also Thanks to FujiApple for pointing out a missed coding mistake in the original code. I was able to develop a solution without using yield that seemed to work copying the input file to output. But with Lei Shi and FujiApples comments I was able to modify that code to create a sub function, called by the lambda handler which could be a generator.
感谢雷石回答我所询问的具体问题。还要感谢FujiApple在原始代码中指出错过的编码错误。我能够开发一个没有使用yield的解决方案,似乎可以将输入文件复制到输出。但是在Lei Shi和FujiApples的评论中,我能够修改该代码来创建一个子函数,由lambda处理程序调用,该处理程序可以是一个生成器。
from __future__ import print_function
import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')
s3 = boto3.client('s3')
def processFile( inbucket,inkey,outbucket,outkey):
try:
infile = s3.get_object(Bucket=inbucket, Key=inkey)
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
raise e
inbody = infile['Body']
tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
# upload_path = '/tmp/resized-{}'.format(key)
with open(tmp_path,'w') as out:
unfinished_line = ''
bytes=inbody.read(4096)
while( bytes ):
bytes = unfinished_line + bytes
#split on whatever, or use a regex with re.split()
lines = bytes.split('\n')
print ("bytes %s" % bytes)
unfinished_line = lines.pop()
for line in lines:
print ("line %s" % line)
out.write(line)
yield line # if this line is commented out uncomment the unfinished line if() clause below
bytes=inbody.read(4096)
# if(unfinished_line):
# out.write(unfinished_line)
#
# Upload the file to S3
#
tmp = open(tmp_path,"r")
try:
outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
except Exception as e:
print(e)
print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
raise e
tmp.close()
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
inbucket = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey = urllib.unquote_plus(event['Records'][0]['s3']['object'] ['key'].encode('utf8'))
outkey = "out" + inkey
processFile( inbucket,inkey,outbucket,outkey)
I'm posting the solution which uses yield in a sub "generator" function. Without the "yield" the code misses the last line, which was picked up by the if clause commented out.
我发布了在子“生成器”函数中使用yield的解决方案。如果没有“yield”,代码就会错过最后一行,这句话被if子句注释掉了。
#1
2
A function includes yield
is actually a generator, whereas the lambda handler needs to be a function that optionally returns a json-serializable value.
函数包括yield实际上是一个生成器,而lambda处理程序需要是一个可选地返回json可序列化值的函数。
#2
0
Thanks to Lei Shi for answering the specific point I was asking about. Also Thanks to FujiApple for pointing out a missed coding mistake in the original code. I was able to develop a solution without using yield that seemed to work copying the input file to output. But with Lei Shi and FujiApples comments I was able to modify that code to create a sub function, called by the lambda handler which could be a generator.
感谢雷石回答我所询问的具体问题。还要感谢FujiApple在原始代码中指出错过的编码错误。我能够开发一个没有使用yield的解决方案,似乎可以将输入文件复制到输出。但是在Lei Shi和FujiApples的评论中,我能够修改该代码来创建一个子函数,由lambda处理程序调用,该处理程序可以是一个生成器。
from __future__ import print_function
import json
import urllib
import uuid
import boto3
import re
print('Loading IO function')
s3 = boto3.client('s3')
def processFile( inbucket,inkey,outbucket,outkey):
try:
infile = s3.get_object(Bucket=inbucket, Key=inkey)
except Exception as e:
print(e)
print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(inkey, bucket))
raise e
inbody = infile['Body']
tmp_path = '/tmp/{}{}'.format(uuid.uuid4(), "tmp.txt")
# upload_path = '/tmp/resized-{}'.format(key)
with open(tmp_path,'w') as out:
unfinished_line = ''
bytes=inbody.read(4096)
while( bytes ):
bytes = unfinished_line + bytes
#split on whatever, or use a regex with re.split()
lines = bytes.split('\n')
print ("bytes %s" % bytes)
unfinished_line = lines.pop()
for line in lines:
print ("line %s" % line)
out.write(line)
yield line # if this line is commented out uncomment the unfinished line if() clause below
bytes=inbody.read(4096)
# if(unfinished_line):
# out.write(unfinished_line)
#
# Upload the file to S3
#
tmp = open(tmp_path,"r")
try:
outfile = s3.put_object(Bucket=outbucket,Key=outkey,Body=tmp)
except Exception as e:
print(e)
print('Error putting object {} from bucket {} Body {}. Make sure they exist and your bucket is in the same region as this function.'.format(outkey, outbucket,"tmp.txt"))
raise e
tmp.close()
def lambda_handler(event, context):
print("Received event: " + json.dumps(event, indent=2))
# Get the object from the event and show its content type
inbucket = event['Records'][0]['s3']['bucket']['name']
outbucket = "outlambda"
inkey = urllib.unquote_plus(event['Records'][0]['s3']['object'] ['key'].encode('utf8'))
outkey = "out" + inkey
processFile( inbucket,inkey,outbucket,outkey)
I'm posting the solution which uses yield in a sub "generator" function. Without the "yield" the code misses the last line, which was picked up by the if clause commented out.
我发布了在子“生成器”函数中使用yield的解决方案。如果没有“yield”,代码就会错过最后一行,这句话被if子句注释掉了。