我正在尝试使用aws SDK和spark进行aws分块上传,文件大小约为14 of,但出现内存不足错误。它在这一行给出了错误- val bytes: Array[Byte] = IOUtils.toByteArray(is)
我曾尝试将驱动程序内存和执行器内存提升到100G,并尝试了一些其他的spark优化。
下面是我正在尝试的代码:
val tm = TransferManagerBuilder.standard.withS3Client(s3Client).build
val fs = FileSystem.get(new Configuration())
val filePath = new Path(hdfsFilePath)
val is:InputStream = fs.open(filePath)
val om = new ObjectMetadata()
val bytes: Array[Byte] = IOUtils.toByteArray(is)
om.setContentLength(bytes.length)
val byteArrayInputStream: ByteArrayInputStream = new ByteArrayInputStream(bytes)
val request = new PutObjectRequest(bucketName, keyName, byteArrayInputStream, om).withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(kmsKey)).withCannedAcl(CannedAccessControlList.BucketOwnerFullControl)
val upload = tm.upload(request)
这是我得到的一个例外:
java.lang.OutOfMemoryError
at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at com.amazonaws.util.IOUtils.toByteArray(IOUtils.java:45)
发布于 2019-06-25 05:35:32
PutObjectRequest
accepts File
public PutObjectRequest(String bucketName, String key, File file)
下面这样的代码应该可以工作(我还没有检查过):
val result = TransferManagerBuilder.standard.withS3Client(s3Client)
.build
.upload(
new PutObjectRequest(
bucketName,
keyName,
new File(new Path(hdfsFilePath))
)
.withSSEAwsKeyManagementParams(new SSEAwsKeyManagementParams(kmsKey))
.withCannedAcl(CannedAccessControlList.BucketOwnerFullControl)
)
https://stackoverflow.com/questions/56739977
复制相似问题