With the partition catalog feature, you can store data with different characteristics in different catalogs. In this way, when exploring data, you can filter data by partition through the where condition. This greatly reduces the scanned data volume and improves the query efficiency.
Note
Partitions within the same table should utilize the same data type and format.
Data Lake Compute's native tables implement implicit partitioning, allowing you to disregard the partition directory structure.
Creating a Partitioned Table
Specify the partition field through the PARTITIONED BY parameter in the table creation statement.
Example: Creating the test_part partition table
Add partitions using the alter table add partition command.
If your data partition catalog uses the Hive partitioning rule (partition column name=partition column value), the rule can be used to add partitions. The catalog is organized as follows:
Add partitions by specifying the location with the alter table command.
If your data adopts a general COS catalog (not in the "partition column name=partition column value" format), you can specify a catalog when adding a partition.
Sample SQL:
Utilizing MSCK REPAIR for Automatic Partition Addition
By using the MSCK REPAIR TABLE statement, the system scans the data directory specified during table creation. If new partition directories exist, the system will automatically add these partitions to the metadata information of the data table.
SQL Reference:
We recommend adding partitions primarily through the 'alter table' method. If you choose to use 'msck repair' for automatic partition addition, the following constraints apply:
The MSCK REPAIR TABLE command only adds partitions to the table metadata, it does not delete partitions.
When dealing with large volumes of data, it is not recommended to use the MSCK REPAIR TABLE method, as it scans the entire data volume and may lead to timeouts.
If the partition directory does not follow Hive's partitioning rule: partition column name = partition column value, the MSCK REPAIR TABLE method cannot be used.