下面是一个常见的共享函数,用于迭代存储桶中的所有对象,但如果我只想迭代特定的键怎么办,假设测试URI是: S3 ://S3-data-lake/test1/test2/
在测试二之后有五个json文件,即s3:// test -data-lake/test1/test2/test1.json..
我如何修改这段代码来处理上面的问题?
def iterate_bucket_items(bucket):
"""
Generator that iterates over all objects in a given s3 bucket
See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2
for return data format
:param bucket: name of s3 bucket
:return: dict of metadata for an object
"""
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
yield item
for i in iterate_bucket_items(bucket='my_bucket'):
print i
发布于 2021-10-08 00:01:47
您可以使用Prefix
def iterate_bucket_items(bucket, prefix=''):
"""
Generator that iterates over all objects in a given s3 bucket
See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.list_objects_v2
for return data format
:param bucket: name of s3 bucket
:return: dict of metadata for an object
"""
client = boto3.client('s3')
paginator = client.get_paginator('list_objects_v2')
page_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
for page in page_iterator:
if page['KeyCount'] > 0:
for item in page['Contents']:
yield item
for i in iterate_bucket_items(bucket='my_bucket', prefix='test1/test2/'):
print(i)
https://stackoverflow.com/questions/69488682
复制相似问题