With Data Lake Compute, you can complete data analysis queries on COS in just a minute. It currently supports multiple formats including CSV, ORC, PARQUET, JSON, ARVO, and text files.
Preliminary Preparations
Before initiating a query, you need to activate the internal permissions of Data Lake Compute and configure the path for query results.
Step 1: Establish the necessary internal permissions for Data Lake Compute.
Note
If the user already has the necessary permissions, or if they are the root account administrator, this step can be disregarded.
If you are logging in as a sub-account for the first time, in addition to the necessary CAM authorization, you also need to request any Data Lake Compute admin or root account admin to grant you the necessary Data Lake Compute permissions from the Permission Management menu on the left side of the Data Lake Compute console (for a detailed explanation of permissions, please refer to DLC Permission Overview).
1. Table Permissions: Grant read and write operation permissions to the corresponding catalog, database, table, and view.
2. Engine Permissions: These can grant usage, monitoring, and modification rights to the computation engine.
Note
The system will automatically provide each user with a shared public-engine based on the Presto kernel, allowing you to quickly try it out without the need to purchase a private cluster first.
Upon initial use of Data Lake Compute, you must first configure the path for query results. Once configured, the query results will be saved to this COS path.
2. Navigate to Data Exploration via the left sidebar menu.
3. Select Database, click +, choose Create Database to establish a new database. As shown below:
Enter the database name and its descriptive information.
4. After selecting the execution engine in the upper right corner, execute the generated 'create database' statement to complete the database creation.
The details are as shown below:
For detailed operation steps and configuration methods, please refer to Database Management.
Step 2: Create an External Table
If you are familiar with SQL statements, write the CREATE TABLE statement in the query and skip the creation wizard.
2. Navigate to Data Exploration via the left sidebar menu.
3. Select the database/table, right-click on the newly created table, and choose Create External Table.
Note
External tables typically refer to data files stored in your own COS bucket. Data Lake Compute can directly create external tables for analysis without the need for additional data loading. Given the characteristics of external tables, actions such as executing 'drop table' will not delete your original data in Data Lake Compute, but only the metadata of the table.
4. Follow the guide to generate the table creation statement, completing each step in the following order: Data Path > Data Format > Data Format Configuration > Edit Partition.
Step 1: Select the COS path where the data files are stored (the path must be a directory under the COS bucket, not directly to the COS bucket). A shortcut for quickly uploading files to COS is also provided here. This operation requires relevant COS permissions.
Currently, Data Lake Compute supports the creation of: File, CSV, JSON, PARQUET, ORC, AVRD
Note
Structure inference is an auxiliary tool for table creation and cannot guarantee 100% accuracy. You still need to review and verify whether the field names and types meet your expectations, and edit them to the correct information based on the actual situation.
Step 3: If there are no partitions, you can skip this step. Enabling partition use can reasonably enhance analysis performance. For detailed partition information, refer to Query Partition Table.
5. Click Complete to generate the SQL table creation statement. Execute the generated statement after selecting the data engine to complete the table creation.
Step 3: Execute SQL Analysis
After the data is prepared, write the SQL analysis statement, select an appropriate compute engine, and start data analysis.
Sample
Write a SQL statement with all data query results being SUCCESS and run the statement after selecting a compute engine.
select *fromDataLakeCatalog.demo2.demo_audit_table where _c5 ='SUCCESS'