将JSON写入BigQuery表时类型不兼容

基础概念

BigQuery是云数据仓库服务，支持结构化数据的存储和查询。JSON（JavaScript Object Notation）是一种轻量级的数据交换格式，易于人阅读和编写，同时也易于机器解析和生成。

类型不兼容问题

当将JSON数据写入BigQuery表时，可能会遇到类型不兼容的问题。这是因为JSON中的数据类型可能与BigQuery表定义的数据类型不匹配。

原因

数据类型不匹配：JSON中的某些字段类型与BigQuery表中定义的字段类型不一致。例如，JSON中的字符串可能对应BigQuery中的整数类型。
嵌套结构问题：JSON中的嵌套结构可能没有正确映射到BigQuery表的嵌套结构。
空值处理：JSON中的空值可能没有正确处理，导致类型不匹配。

解决方法

1. 数据类型转换

在将JSON数据写入BigQuery之前，确保JSON中的数据类型与BigQuery表定义的数据类型一致。可以使用编程语言进行数据类型转换。

import json
from google.cloud import bigquery

# 示例JSON数据
json_data = {
    "id": "123",
    "name": "John Doe",
    "age": "30"
}

# 将JSON数据转换为字典
data_dict = json.loads(json_data)

# 数据类型转换
data_dict["id"] = int(data_dict["id"])
data_dict["age"] = int(data_dict["age"])

# BigQuery表结构
schema = [
    bigquery.SchemaField("id", "INTEGER"),
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("age", "INTEGER")
]

# 写入BigQuery表
client = bigquery.Client()
table_ref = client.dataset("your_dataset").table("your_table")
table = bigquery.Table(table_ref, schema=schema)
rows_to_insert = [data_dict]
errors = client.insert_rows(table, rows_to_insert)

if errors == []:
    print("New rows have been added.")
else:
    print("Encountered errors while inserting rows: {}".format(errors))

2. 嵌套结构处理

确保JSON中的嵌套结构正确映射到BigQuery表的嵌套结构。

# 示例嵌套JSON数据
nested_json_data = {
    "id": "123",
    "name": "John Doe",
    "address": {
        "street": "123 Main St",
        "city": "Anytown"
    }
}

# 将JSON数据转换为字典
nested_data_dict = json.loads(nested_json_data)

# BigQuery表结构（嵌套）
nested_schema = [
    bigquery.SchemaField("id", "INTEGER"),
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("address", "RECORD", fields=[
        bigquery.SchemaField("street", "STRING"),
        bigquery.SchemaField("city", "STRING")
    ])
]

# 写入BigQuery表
table_ref = client.dataset("your_dataset").table("your_table")
table = bigquery.Table(table_ref, schema=nested_schema)
rows_to_insert = [nested_data_dict]
errors = client.insert_rows(table, rows_to_insert)

if errors == []:
    print("New rows have been added.")
else:
    print("Encountered errors while inserting rows: {}".format(errors))

3. 空值处理

确保JSON中的空值正确处理，可以使用默认值或忽略空值。

# 示例JSON数据包含空值
json_data_with_null = {
    "id": "123",
    "name": None,
    "age": "30"
}

# 将JSON数据转换为字典
data_dict_with_null = json.loads(json_data_with_null)

# 处理空值
data_dict_with_null["name"] = ""  # 使用默认值

# 写入BigQuery表
rows_to_insert = [data_dict_with_null]
errors = client.insert_rows(table, rows_to_insert)

if errors == []:
    print("New rows have been added.")
else:
    print("Encountered errors while inserting rows: {}".format(errors))