The content of this page has been automatically translated by AI. If you encounter any problems while reading, you can view the corresponding content in Chinese.
Business research will involve building the data warehouse framework based on business and data dimensions, using the concept of layering, categorization, and domain-based abstraction Definition:
The Data Hierarchy Definition includes ODS, DWD, DWS, ADS, and DIM layers, which are mapped through logical layering and physical database associations.
Business Type Definition includes business classification, subject domain, and business process, managing data objects with custom business catalog management.
After Definition, the data warehouse architecture will be automatically generated. The subsequent process of defining model metrics dimensions will rely on the overall data warehouse architecture for Definition management.
Model Design
After building the data warehouse, the designer will define the logical model and physical model based on data features and business scenarios. During the logical table definition, standardized naming is performed according to the data warehouse architecture. Additionally, metadata and value range standards can be bound during field configuration to complete the standardized definition process
The model design process will consider both data-driven (Bottom-up) and business-driven (Top-down) approaches:
1. For the data-driven dimension, the designer first needs to synchronize raw data from the production source system to the interface layer. After cleansing and transforming, the raw data in the interface layer will form the detail table, also known as the fact table, which stores the finest granularity data. The detail table is the source for metric statistics, and its fields will be bound with basic metrics and dimension conditions.
2. From the business-driven perspective, based on business scenarios, designers need to define an aggregation layer and market layer. The summary table in the aggregation layer will store aggregated metric data under different dimension conditions and will be used as the target table for derived metrics, forming a one-to-one binding.
3. The defined analysis dimensions will create dimension tables that store attribute hierarchy data for dimensions. These tables will be one-to-one bound with common dimensions and can be auto-generated during dimension definition.
After completing the logical model design, the physical model can be generated through the publishing action, linking the design and development processes. If some physical models have already been created, they can be reverse-imported to generate logical models, completing the design phase.
Business Definition
During business research, the designer needs to abstractly define indicators and dimensions based on business scenarios:
1. Indicator Definition
Indicators are divided into two categories: basic metrics and derived metrics
1.1 Basic metrics are measurements that do not include dimensional conditions. They require definitions of their basic attributes, statistical calibers, and unit precisions. Derived metrics will inherit the unit of the basic metrics. The data for basic metrics comes from a specific field in the detail table, so they need to be associated in the indicator definition.
1.2 Derived metrics can be defined by adding dimensional conditions to basic metrics for specific characteristic ranges, such as user growth in a certain channel or product type. They can also be the result of combined calculations of multiple derived metrics under the same dimensional conditions, such as growth rate. Once defined, derived metrics are bound to a field in a summary table, facilitating indicator production.
2. Dimension Definition
Dimensions can be classified into the following categories:
2.1 Common dimension: This can be understood as the group by condition in SQL. A common dimension uniquely corresponds to a dimension table, associated during dimension modeling
2.2 Business Constraint: Also known as a modifier, it is used to filter tag characteristics from business dimensions
2.3 Time period: Time-based limiting conditions
2.4 Degenerate Dimension: Dimensions reverted to the fact table. This usually happens when a dimension has no other content besides the primary key, even though it's a legitimate dimension key. Reverting it to the fact table reduces the number of associations and improves query performance.
Indicators and dimensions need to be defined and published sequentially to establish associations with the table model and be referenced by subsequent derived definitions, guiding the implementation of indicator production.
Data Standard
Model Metrics Definition: During the development and production process, operations must adhere to a unified data standard. Therefore, business objects need rules to be defined. Standards management involves defining standards across the following four modules:
1. Definition Standard Rules
You can define standards at the table, field, and indicator levels. Metadata standards define the naming and type specifications of business objects, and value range standards define the characteristics of value ranges.
After rule release, you can perform association binding in the model design/physical model fields:
You can also use ETL tasks for standard conversion tasks during subsequent data development processes:
Configure Conversion Rules:
2. Definition Standard Encoding
For data enumeration types, management requires standard encoding. After encoding is defined and released, it can be referenced in standard rules.
3. Definition Measurement Unit
When defining indicators, measurement units will be used. The system presets common units. For custom units, you can define them in this module.
4. Definition Terminology Dictionary
Industry standard metadata will be defined in bulk in the terminology dictionary. After definition release, it can be referenced in standard rules.