Data Integration

Last updated: 2024-11-25 09:31:41

1. How to handle ClickHouse to DLC sync reporting network timeout

Problem description:
When users sync ClickHouse data to DLC, it succeeds with small datasets, but reports network timeout with larger tables.



Locating the Issue:
1. Logging into the execution environment POD, Telnet to the target ClickHouse datasource connection shows connectivity, and logging in is also successful, thus ruling out network issues.
2. It is suspected that large ClickHouse data volume or high concurrency leads to read timeout. Further reduce concurrency in CCN.

3. If the same error persists after limiting concurrency and increasing the data read operation timeout in ClickHouse and still encounter errors, further investigation is required to ensure that the customer’s execution environment resource group and ClickHouse VPC are consistent. If not, the network must be connected via CCN (Cloud Connect Network).
4. Confirm the required public network bandwidth for cross-regional network connectivity with the CCN product, and verify whether the user has connected the two VPCs across regions via CCN but without enabling public network bandwidth.

5. For testing purposes, CCN defaults to 10Kb of traffic, explaining why small data syncs and network tests are normal, but large data volumes report network issues.

6. Cause:
If the execution resource group and datasource are not in the same VPC and cross-region VPC connection via CCN is without enabling public network bandwidth, default CCN supports less than 10Kb of traffic, exceeding which causes a disconnection, making small data volumes feasible but large data volumes fail.
7. Solution:
Solve the issue by adding public network bandwidth via CCN.

2. How to handle issues when requiring public network access but network is disconnected

Problem description:

Locating the Issue:
Locate through logs.
Cause:
The data source uses a public network, and the resource group's network does not default to public.
Solution:
Add the subnet of the resource group to the NAT Gateway. For details, refer to NAT Gateway Configuration and Access the Internet through NAT Gateway .
Note
If it is the main account's resource group, you can replace the resource group's subnet routing policy with EKS.

3. How to handle data synchronization failure when it prompts that the data source is not accessible

Problem description:
Offline synchronization prompts that the database cannot be connected, but the database is actually accessible.



Locating the Issue:
The network of the WeData DataInLong resource group and CDB are not under the same VPC and are not interconnected.
Cause:
The DataInLong execution resource group and data source are under the user's VPC network. They need to be in the same VPC network, otherwise, the network will not be accessible, and the execution resource group cannot synchronize data correctly. If they are in different VPC networks.
Solution:
Use CCN or Peering Connection to enable cross-VPC network intercommunication. If the data source is a public network instance, configure the NAT Gateway.

4. How to handle Hive On COS table permission error during data write

Problem description:
Offline synchronization, hive data source, table data stored in COS, error during data write: java.lang.Exception: Retrieve the file metadata file failure.



Locating the Issue:
Check if the account is associated with the WeData_QCSRole role and if this role has the COS CAM policy configured.
Cause:
When integrating tasks for hive on cos, write data through the WeData_QCSRole role to obtain temporary credentials from CAM. If the COS-related CAM policy is not added under the WeData_QCSRole role, the obtained temporary credentials will not have permission to read/write the COS bucket.
Solution:
On the role page of CAM, search for "wedata", find WeData_QCSRole, and check if it includes COS in the associated policies. If not, add QcloudCOSFullAccess.