我想使用将来自主题的PubSub消息数据插入到BigQuery表中。一切都很好,但是在BigQuery表中,我可以看到诸如“߈���”这样的不可读字符串。这是我的管道:
p.apply(PubsubIO.Read.named("ReadFromPubsub").topic("projects/project-name/topics/topic-name"))
.apply(ParDo.named("Transformation").of(new StringToRowConverter()))
.apply(BigQueryIO.Write.named("Write into BigQuery").to("project-name:dataset-name.table")
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED))
我的简单StringToRowConverter函数是:
class StringToRowConverter extends DoFn<String, TableRow> {
private static final long serialVersionUID = 0;
@Override
public void processElement(ProcessContext c) {
for (String word : c.element().split(",")) {
if (!word.isEmpty()) {
System.out.println(word);
c.output(new TableRow().set("data", word));
}
}
}
}
这是我通过邮件请求发送的信息:
POST https://pubsub.googleapis.com/v1/projects/project-name/topics/topic-name:publish
{
"messages": [
{
"attributes":{
"key": "tablet, smartphone, desktop",
"value": "eng"
},
"data": "34gf5ert"
}
]
}
我遗漏了什么?谢谢!
发布于 2015-09-17 08:04:52
根据https://cloud.google.com/pubsub/reference/rest/v1/PubsubMessage,发布消息的JSON有效负载是base64编码的。默认情况下,数据流中的PubsubIO使用字符串UTF8编码器。您提供的示例字符串"34gf5ert",当base64 64解码,然后解释为一个UTF-8字符串时,准确地给出了"߈���“。
发布于 2016-02-08 04:14:35
这就是我如何解压我的公共信息:
@Override
public void processElement(ProcessContext c) {
String json = c.element();
HashMap<String,String> items = new Gson().fromJson(json, new TypeToken<HashMap<String, String>>(){}.getType());
String unpacked = items.get("JsonKey");
希望它对你有用。
https://stackoverflow.com/questions/32632602
复制