Basic Information Configuration
Log in to TI-ONE console, choose Model Service > Online Services in the left sidebar to go to the Online Services list page. On the service list page, click Create Service to go to the service startup page. On the service startup page, configure the relevant parameters for the online services.
1. Basic service information

Parameter description:
Parameter | Description |
Service Name | Name of the service. You can enter it based on the rules prompted on the interface. |
Service Version | Version number automatically generated by the system. |
Service Description | Configure a description for the service as needed. |
Region | Services under the same account are isolated by region. The value of the Region field is automatically entered based on the region you selected on the service list page. |
Deployment Mode | Multiple deployment modes are supported: Standard deployment: 1 instance runs under a single replica, suitable for most standard scenarios. Multi-machine distributed deployment: Multiple instances run in coordination under a single replica, suitable for scenarios where models require multi-machine parallelism. Multi-role deployment (prefill and decode separation): Currently only available through allowlist; suitable for scenarios where the service needs to be deployed with multiple roles (such as prefill and decode roles). There are multiple roles under a single replica, and each role has 1 or more instances running in coordination. Note: After a service is created, the deployment mode cannot be modified by updating the service or adding versions. Choose carefully during creation. |
Source of Machine | You can choose Select from CVMs instances or Purchased on TI-ONE : In the Select from CVMs instances mode, you can use resource groups of CVM instances purchased in the Resource Group Management module to deploy services. The computing resource fee has been paid when the resource group is purchased, so no fees will be deducted when the service is started. In the Purchased on TI-ONE mode, you do not need to purchase a resource group in advance. Fees are charged based on the CVM instance specifications required by the service. When the service is started, fees for the first two hours are frozen. After that, fees are charged hourly based on the number of running instances. |
Resource Group | If you choose the Select from CVMs instances mode, you can select resource groups from the Resource Group Management module. |
Service Replica Configuration

Parameter description:
Parameter | Description |
Model Source/Model | Multiple model sources are supported (the supported model sources may not be entirely consistent across different deployment modes): Built-in LLM: industry-leading open-source LLMs provided in the platform's Model Hub. Cloud Storage (CFS/GooseFSx/COS) CFS: The model files required for service deployment are stored in Cloud File Storage (CFS). Select the CFS instance where the model is located. When entering the path, specify it down to the level of the model's directory (for example, if the model is a fine-tuned checkpoint500, enter /a/b/checkpoint500 as the path). CFS Turbo is only supported when the CVM instance source is Select from CVM instances. GooseFSx: The model files required for service deployment are stored in GooseFSx. Select the GooseFSx instance where the model is located, and enter the directory of the model in the path field. GooseFSx is only supported when the CVM instance source is Select from CVM instances. COS: The model files required for service deployment are stored in Cloud Object Storage (COS). Select the bucket instance where the model is located and enter the directory. COS is only supported when the CVM instance source is Select from CVM instances. Container image: The custom image required for service deployment has encapsulated the model files, eliminating the need for model file mounting, and the custom image has been uploaded to Tencent Container Registry (TCR). Resource Group Preloading: Only applicable to the scenario of Select from CVM instances. You can preload model files/image files to the corresponding resource group in advance, improving the model loading speed during service startup. Data Source: Select the data source file that you have pre-added in the Platform Management - Data Source Management module. The Data Source Management module provides unified file permission management. Model repository: The model files required for service deployment have been imported into the Model Repository module. |
Image | Built-in LLM: The platform provides a corresponding built-in running image for each built-in LLM; you do not need to modify it. Cloud Storage/Resource Group Preloading/Data Source: All support 2 types of image selection. Built-in: The platform provides various open-source built-in images and Tencent's self-developed images for inference acceleration. Custom: You can choose TCR or enter a custom image address (along with the username and password for the private image repository, if applicable). Container image: You can directly select a custom image that has been uploaded to TCR, and the image contains the required model files. Model Repository: The running environment will automatically assign values based on the configuration information of the model repository. If you need to use a custom image to start the service, it is recommended that the image file size does not exceed 34 GB. |
Storage Mount | When creating an online service, users are supported to configure file mount paths for input and output while also configuring model files. |
Enable gRPC | The switch is disabled by default. When the switch is disabled, only calls using HTTP are supported. When the switch is enabled, calls using the gRPC protocol are supported. |
Port | You can configure the ports exposed by the container. The valid port range is 1024–65535, excluding 8502–8510, 6006, and 9092. Pay special attention to two special port numbers: 8500 is the default port for gRPC, and 8501 is the default port for REST; they must not be confused. |
Resource | 1. In the yearly/monthly subscription (resource group) mode, you can set how many resources to request from the selected resource group to start the current service. 2. In the pay-as-you-go mode, you can choose the CVM instance specifications required for starting the current service as needed. |
Startup Command | Optional. You can configure the startup command of a container. |
Environment Variable | Optional. You can configure the environment variables of a container. |
Graceful Shutdown Period | Corresponds to Kubernetes' terminationGracePeriodSeconds. Pods whose shutdown time exceeds this limit will be forcibly terminated. The default value is 30s. |
PreStop | Optional. Corresponds to Kubernetes' PreStop command. Pods run this command before termination to achieve a graceful shutdown. The command format is a string array, for example: "["sleep", "70"]". |
Select a Sidecar TCR | Optional. You can customize the sidecar container image. |
Service Feature Configuration

Parameter description:
Parameter | Description |
Request Traffic Throttling | You can configure traffic throttling values for the service: When traffic throttling is not applied, the default maximum queries per second (QPS) per service is 500. If the configured traffic throttling value exceeds 500, it will be capped at 500. After an upgrade package is purchased for the service, the total traffic throttling value of the service is subject to the upper limit of the purchased upgrade package. The maximum QPS per replica is the traffic throttling for one instance. The maximum concurrency per replica is the traffic throttling for one instance. Note: This traffic throttling value applies to a single replica. When the service scales, the overall traffic throttling value of the service is updated according to the set value multiplied by the number of replicas. The maximum QPS for a single service group is 500. If the total traffic throttling value set for the services under the service group exceeds 500, it will be capped at 500. |
Replica Adjustment | Manual: You can customize the number of service replicas, with a minimum value of 1. Auto: You can choose a timed policy, an HPA-based automatic policy, or a combined timed+HPA policy. For details, please refer to Online Service Operation. ![]() |
Use RDMA | When the number of replicas or the number of instances per replica exceeds 1, and the selected GPU model supports RDMA, you can enable RDMA. After RDMA is enabled, services are preferentially scheduled to nodes supporting RDMA. If a single node lacks sufficient IP addresses for RDMA NICs, services may be scheduled to multiple nodes. |
Scheduling Policy | For the "Standard Deployment" mode, you can customize the node scheduling policy for multiple replicas: Prioritize Filling the Entire Machine: When multiple replicas are scheduled, prioritize packing them onto a single node to reduce GPU fragmentation probability. Priority Distributed Deployment: When multiple replicas are scheduled, prioritize distributing them across different nodes to enhance service high availability (note: successful scattered deployment requires sufficient resources). |
Generate authentication token | If authentication is enabled, signature authentication will be performed during service calling. For started services, you can view the signature key and signature calculation guide on the service calling page. After authentication is enabled, the first key for the service will be automatically generated. You can view and create additional keys under Service Authentication on the service details page. |
CLS Log Shipping | The platform provides free storage of service logs for the last 15 days. If you require persistent log storage, more flexible log search capabilities, and log monitoring alarm capabilities, you can enable CLS Log Shipping, and then service logs will be shipped to Cloud Log Service (CLS) according to the logset and log topic. |
Rolling Update Policy | You can configure the rolling update policy to ensure smooth service upgrades: MaxSurge: maximum number of extra replicas that can be created beyond the desired number during a rolling update. MaxUnavailable: upper limit of the number of unavailable replicas allowed during a rolling update. |
Sanity Check | The health check mechanism of Kubernetes automatically detects and recovers failed containers, ensuring traffic is routed to healthy instances. Liveness probe: verify whether the service process is alive and whether the container is running normally. Trigger phase: continuously run after container startup (throughout the entire lifecycle). Readiness probe: verify whether the service is ready to process requests and whether the container is prepared to receive traffic. Trigger phase: continuously run after container startup (throughout the entire lifecycle). Startup probe: monitor slow-startup containers and verify if the application within the container has completed initialization. Trigger phase: run only during container startup (stop after success). Three check methods are supported: HTTPGet, TCPSocket, and Exec. |
Auto stop | The platform supports the automatic stop of model services. After Automatic Stop is enabled, online services will stop automatically at the specified stop time, and the computing power billing for the service will also stop. |
Tag | You can add tags to services for authorization based on tags. |
After confirming that the configuration information of the service is correct, click Start Service to deploy the service. During deployment, a gateway will be created, and computing resources will be scheduled for you. This process takes some time. Once the service is successfully deployed, its status will change to Running.
Deployment Parameters Best Practices
This section provides a detailed explanation of some important parameters for service deployment and offers best practice recommendations for parameter configuration.
Health Check
1. What Are Health Checks
Health checks are a critical mechanism to ensure the stable operation of your online services. By periodically checking the health status of service instances (Pods), they ensure that traffic is only routed to healthy instances. Unhealthy instances are restarted or replaced, thereby enhancing the service's availability and reliability.
TI-ONE provides three types of health check probes based on Kubernetes:
Probe Types | Startup Probe | Readiness Probe | Liveness Probe |
Core role | Startup probes verify whether the application within the container has completed initialization. | Check whether the container is ready to receive traffic. | Check whether the container is running properly. |
Trigger phase | Runs only during the container startup phase (stops upon success). | Continuously runs after container startup (throughout the entire lifecycle). | Continuously runs after container startup (throughout the entire lifecycle). |
Failure Consequence | Terminate the container and restart it. | Removed from the service CLB pool, traffic forwarding stopped (without restarting the container). | Terminate the container and restart it. |
Typical Scenario | Slow startup applications (such as Java service warm-up and data loading) | Dependencies initialization completed (such as database connections and configuration file loading) | Handle deadlocks, process crashes, or unrecoverable exceptions. |
Detection Frequency | Low frequency (usually with longer intervals, such as 10s) | Medium frequency (adjustable based on business requirements) | High frequency (quickly detect abnormalities, such as 5s) |
Priority | Highest (disable other probes during startup) | Medium | Low |
2. The Role of Health Checks
Improve service availability
Through readiness probes, ensure that only prepared instances receive traffic, preventing requests from being sent to instances that are not ready yet, thereby reducing request failures to zero.
Automatically recover from failures
Through liveness probes, when a failure in a running instance is detected (such as application deadlock but process still running), the system automatically restarts the instance to rapidly restore service.
Graceful handling of startup and termination
For applications with slow startup, readiness probes prevent traffic reception during initialization; for applications requiring graceful shutdown, readiness probes coordinate with lifecycle hooks to ensure removal from load balancers before termination.
3. How to Configure Health Checks
On the TI-ONE platform, you can configure health checks in the parameter form when creating or updating a service. You will need to select the appropriate detection method based on your application characteristics.

3.1 Detection Method
TI-ONE supports the following three detection methods:
Detection Method | Method Description | Scenarios | Configuration Example |
HTTP GET Probe | Send an HTTP GET request to the specified path and port within the container. If the returned status code is between 200 and 399, it is considered successful. | Web services, API services | Path: /health, Port: 8080 |
TCP Socket Probe | Attempt to establish a TCP connection with the container's specified port. If the connection is successfully established, it is considered successful. | Databases, caches, non-HTTP services | Port: 6379 (Redis) |
Exec Command Execution Probe | Execute the specified command within the container. If the command's exit code is 0, it is considered successful. | Services requiring complex check logic | Command: python check_health.py |
3.2 Configuration Parameter Description
The supported configuration parameters for the three types of probes are described as follows:
Parameter | Parameter Description | Recommended Values for Parameters |
Delay for waiting service startup | After the container is started up, it indicates how long to wait before initiating the first probe. Set this value based on the application startup time to avoid false negatives due to the application not being fully started. | For example: 30-60 seconds, to give the application sufficient startup time and avoid false negatives. |
Polling Probe Interval | The interval between each probe. Too high a frequency imposes significant overhead on Pods, while too low a frequency fails to promptly reflect container failures. | For example: 5-15 seconds, to balance real-time requirements and system overhead. |
Check Timeout Duration | Timeout Duration per Probe, beyond which the probe is considered failed. | For example: 3-5 seconds, to prevent false negatives caused by network jitter. |
Failure Threshold | Number of Consecutive Failures Before Marking Instance as Unhealthy | For example: 3 times, marking as unhealthy only after 3 consecutive failures to avoid false positives from transient failures. |
Success Threshold | Consecutive Number of Successes Before the Instance is Marked as Healthy | For example: 1 time. Setting it to 1 for transient failures that can self-heal quickly enables prompt service recovery. |
3.3 Configuration Example
Suppose you have a Web service listening on port 8080 with a health check endpoint at /health. This service takes approximately 30 seconds to start.
Liveness Probe Configuration (Recommended):
Check Method: HTTP GET
Invocation Path: /health
Port: 8080
Delay for waiting service startup: 40 seconds (ensuring the application has started)
Polling Probe Interval: 10 seconds
Check Timeout Duration: 5 seconds
Failure Threshold: 3
Success Threshold: 1
Readiness Probe Configuration (Recommended):
Check Method: HTTP GET
Invocation Path: /health
Port: 8080
Delay for waiting service startup: 40 seconds
Polling Interval: 5 seconds (Readiness checks can be more frequent to allow quick addition to CLB)
Check Timeout Duration: 5 seconds
Failure Threshold: 1 (Marked as not ready after a single failure)
Success Threshold: 1
Note:
If your service does not have a dedicated health check interface, you can use the root path / or perform a TCP port check. For non-Web services, you may choose TCP checks or command execution checks.
4. How to View Health Check Status
On the TI-ONE platform, you can view the status and events of health checks by:
4.1 Service Instance List
Go to the online service details page and view the "Instance List" Tab page. Each instance displays its current status (Running, Ready, and so on). If an instance restarts due to health check failures, you can see the number of container restarts and status changes.
4.2 Event Log
Go to the online service details page and view the "Events" Tab page. The platform will record events such as health check failures and container restarts. Here you can see specific failure causes, such as "Liveness probe failed" or "Readiness probe failed".
