我试图在测试容器中的Local堆栈中将拼图写入S3,并得到以下错误:
org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable
它使用的是真正的S3,它使用的是Spark2.4和Hadoop2.7。
我使用的是: Scala 2.12.15,Spark 3.2.1,hadoop-aws 3.3.1,testcontainers-scala-local堆栈0.40.8
代码非常简单,只需将数据写入s3位置:
val path = "s3a://***"
import spark.implicits._
val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)
发布于 2022-07-18 14:00:46
您可以在创建桶版本时禁用它。下面是一个示例:
//create an S3 client using localstack container
S3Client s3Client = S3Client.builder ()
.endpointOverride (localStackContainer.getEndpointOverride (LocalStackContainer.Service.S3))
.credentialsProvider (StaticCredentialsProvider.create (AwsBasicCredentials
.create (localStackContainer.getAccessKey (), localStackContainer.getSecretKey ())))
.region (Region.of (localStackContainer.getRegion ()))
.build ();
// create desired bucket
s3Client.createBucket (builder -> builder.bucket (<your-bucket-name>));
//disable versioning on your bucket
s3Client.putBucketVersioning (builder -> builder
.bucket (<your-bucket-name>)
.versioningConfiguration (builder1 -> builder1
.status (BucketVersioningStatus.SUSPENDED)));
https://stackoverflow.com/questions/72617282
复制相似问题