The content of this page has been automatically translated by AI. If you encounter any problems while reading, you can view the corresponding content in Chinese.

Using Custom Templates for Data Quality

Last updated: 2025-04-09 14:30:09

Background

Tencent Cloud Data Development and Governance Platform Wedata's data quality supports creating custom templates and batch management, helping you customize table quality inspection logic based on business scenarios. This document introduces how to create rule templates through the custom template page and create detection rules for tables on the data monitoring page according to the custom rule templates.

Operation Process





Step 1 Preparations

1. Create a user and a project Within the Wedata product, you need to first create a user and a project. For detailed operation guide, check Preparations.
2. Create a scheduling resource group
3. Running quality inspection tasks requires creating a scheduling resource group. For detailed operation guide, check Scheduling Resource Group.

Step Two Create Custom Template

1. Enter Data Quality > Rule Template, click Custom Template, add a template and save.
SQL expression:
select count(table1.${table_1.column_2}) AS count
from ${table_1} table1
join ${table_2} table2
on table1.${table_1.column_1} = table2.${table_2.column_1}
where table1.${table_1.column_3} >= ${param_1} and table1.${table_1.column_3} <= ${param_2}
and table2.${table_2.column_2} >= ${param_3} and table2.${table_2.column_2} <= ${param_4};
Explanation:
Two tables appear in the previous context: ${table_1} and ${table_2}.
${table_1} indicates the primary table scanned by the monitoring rule;
${table_2} refers to other tables in the same data source and database (you can also choose the primary table itself in actual use);
Four fields of Table 1 are used, respectively:
${table_1.column_1}: used for association with Table 2;
${table_1.column_2}: used for result counting;
${table_1.column_3}: used for filtering conditions, greater than or equal to Parameter 1, less than or equal to Parameter 2;
${table_1.column_4}: represents the partition field of Table 1, which can save computing resources significantly and avoid scanning full data;
Two fields of Table 2 are used, respectively:
${table_1.column_1}: used for association with Table 1;
${table_1.column_2}: used for filtering conditions, greater than or equal to Parameter 3, less than or equal to Parameter 4;
Used 4 where parameters, which are:
${param_1}: minimum value of Field 3 in Table 1 in SQL;
${param_2}: maximum value of Field 3 in Table 1 in SQL;
${param_3}: minimum value of Field 2 in Table 2 in SQL;
${param_4}: maximum value of Field 2 in Table 2 in SQL.
Final calculation result: the count of eligible Field 2 in Table 1, a number.

Screenshot example:




Step 3 Create a Quality Rule

1. 1. Enter Data Monitoring, find the table to be monitored, and click Configure Monitoring Task.



2. Click Add Rule, select Custom Template for the rule type, select the newly created template, choose database and table parameters and where parameter based on the template variables, configure the trigger conditions and level, and click Save.



Notes:
Please first analyze what each field means before using a custom template and then map them.

Step 4 Test-Run

1. 1. Click trial run, select an execution engine, computational resource, and execution resource, and select the rule just created in the validation rules.



2. 1. Click view execution results, navigate to Ops management page to view execution results.



3. 1. Click Results & Logs to view running logs.
Among them, EXECUTING SQL: xxxxxx prints the SQL submitted to the hive/spark/dlc engine for quality inspection.