我想创建一个从S3获取压缩文件(可能包含csv文件列表)、解压缩并上传回s3的lambda。由于lambda受到内存/磁盘大小的限制,我必须将其从s3流式传输并返回到它中。我使用python (boto3),请看下面的代码
count = 0
obj = s3.Object( bucket_name, key )
buffer = io.BytesIO(obj.get()["Body"].read())
print (buffer)
z = zipfile.ZipFile(buffer)
for x in z.filelist:
with z.open(x) as foo2:
print(sys.getsizeof(foo2))
line_counter = 0
out_buffer = io.BytesIO()
for f in foo2:
out_buffer.write(f)
# out_buffer.writelines(f)
line_counter += 1
print (line_counter)
print foo2.name
s3.Object( bucket_name, "output/"+foo2.name+"_output" ).upload_fileobj(out_buffer)
out_buffer.close()
z.close()
结果是,在存储桶中创建空文件。例如:如果file: input.zip包含文件: 1.csv,2.csv,我在存储桶2中得到具有相应名称的空csv文件。此外,我不确定它是否真的流媒体文件,或只是下载所有的压缩文件谢谢
发布于 2018-01-31 02:57:23
在上传之前,你需要用seek返回到ByesIO文件的开头。
out_buffer = io.BytesIO()
for f in foo2:
out_buffer.write(f)
# out_buffer.writelines(f)
line_counter += 1
out_buffer.seek(0) # Change stream position to beginning of file
s3.Object( bucket_name, "output/"+foo2.name+"_output").upload_fileobj(out_buffer)
out_buffer.close()
发布于 2020-06-09 23:32:30
您可以将文件从S3解压缩并解压到S3。
s3Bucket ="s3-bucket" #Provide S3 bucket name
file_name = "test.zip" #Provide zip file name
s3=boto3.resource('s3')
zip_obj = s3.Object(bucket_name=s3Bucket, key=file_name)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = zipfile.ZipFile(buffer)
for file in z.namelist():
file_info = z.getinfo(file)
s3.meta.client.upload_fileobj(
z.open(file),
Bucket=s3Bucket,
Key=file,
ExtraArgs={'ServerSideEncryption':'aws:kms','SSEKMSKeyId':'alias/<alias_name>'})
参考- https://github.com/vhvinod/ftp-to-s3/blob/master/extract-s3-to-s3.py
https://stackoverflow.com/questions/48525338
复制相似问题