我正在尝试读入数据流中的bigquery数据集。它找不到我指定的bigquery数据集/表。
job_name是预处理-ga360-190523-130005
由于某种原因,它在位置'US‘中搜索数据集
modules versions are apache-beam 2.5.0,google-cloud-dataflow 2.0.0, google-cloud-bigquery 0.25.0
搜索了文档,找不到为什么会发生这种情况的答案。
OUTPUT_DIR = "gs://some-bucket/some-folder/"
#dictionary of pipeline options
options = {
"staging_location": "gs://some-bucket/some-folder/stage/"
"temp_location": "gs://some-bucket/some-folder/tmp/"
"job_name": job_name,
"project": PROJECT,
"runner": "DirectRunner",
"location":'europe-west2',
"region":'europe-west2',
}
#instantiate PipelineOptions object using options dictionary
opts = beam.pipeline.PipelineOptions(flags = [], **options)
#instantantiate Pipeline object using PipelineOptions
with beam.Pipeline(options=opts) as
outfile = "gs://some-bucket/some-folder/train.csv"
(
p | "read_train" >> beam.io.Read(beam.io.BigQuerySource(query =
my_query, use_standard_sql = True))
| "tocsv_train" >> beam.Map(to_csv)
| "write_train" >> beam.io.Write(beam.io.WriteToText(outfile))
)
print("Done")
响应:
‘’cache control‘:'private','date':’清华,2019年5月23日13:00:08 GMT','x-frame-options':'SAMEORIGIN','content-type':'application/json;“原因”:"notFound“},”状态“:"NOT_FOUND”} }
发布于 2019-05-24 01:16:56
在Apache Beam2.5.0Python SDK中,non US query sources weren't yet supported。
看起来Apache Beam2.8.0 Python SDK [Release Notes,PR,JIRA]中添加了支持。
https://stackoverflow.com/questions/56277513
复制相似问题