Tencent Cloud TI-ONE Platform (TI-ONE) provides the labeling feature for Large Language Model (LLM) and Multimodal Large Language Model (MLLM) datasets, allowing you to customize the schema information of a dataset to flexibly build a custom labeling workbench.
Creating an Labeling Task
Choose Data Center > Datasets module, select the created LLM dataset, and then choose Operation > Lebel. The backend will automatically create the corresponding labeling workbench based on the schema configuration information of the dataset.
Must-Knows:
The Data Center only associates this dataset with your Cloud File Storage (CFS) path and does not copy or dump your original data files.
When you label this dataset on TI-ONE, labeling results will be written directly and in real time into the original files of your dataset. Therefore, if you do not want the original files to be modified, you need to back up the original files in advance.
LLM Label Workbenchs
The flexible schema configurations of TI-ONE support labeling scenarios, including but not limited to: filtering of high-quality text Q&A pairs, text data cleansing, reviewing or modification of image Q&A pairs, evaluation of image Q&A competitors, image multi-turn Q&A, and multimodal reading comprehension for image text descriptions.
The following are 3 examples of labeling benchs corresponding to different schema configurations:
1. Multi-image multi-turn Q&A
desc: Multi-image multi-turn Q&A
record_fields:
- name: img # Annotation component name displayed in the annotation console.
key: img # Name of the JSON field for exporting annotation results.
type: ImageListInput # Component type
help: "Please add the field description." # Component help description
value:
{{- range .Values.img }} # Use a loop to reference the image list.
- {{ . }}
{{- end }}
- name: target # Annotation component name displayed in the annotation console.
key: target # Name of the JSON field for exporting annotation results.
type: List # Component type
help: "Please add the field description." # Component help description
value:
{{- range .Values.target }} # Use a loop to expand the contents of the list.
-
- name: Question # Annotation component name displayed in the annotation console.
key: question # Name of the JSON field for exporting annotation results.
type: TextInput # Component type
help: "Please add the field description." # Component help description.
value: "{{ .question }}"
size: MultiLine
- name: Answer # Annotation component name displayed in the annotation console.
key: answer # Name of the JSON field for exporting annotation results.
type: TextInput # Component type.
help: "Please add the field description." # Component help description.
value: "{{ .answer }}"
size: LongArticle
{{- end }}
In this scenario, you can configure schemas to display multiple images and Q&A pairs, and you can add, delete, or reorder Q&A pairs. In addition, you can also configure different input box sizes based on the text length of the question and answer fields.
Detailed features of an labeling workbench:
Click Zoom View above the image area on the left side to adjust the display size of an image.
Click Settings on the right side of the title bar to configure the font size and whether to display the markdown content in the text.
Click to switch between the Unlabeled and Labeled statuses. If there are modifications to the labeling content in the current sample, the backend will automatically change the status to Labeled. If there are no modifications in the current sample, you can also manually click Unlabeled to change the status to Labeled.
2. Multi-model evaluation for a single image
desc: Automatically generated YAML
record_fields:
- name: Image # Annotation component name displayed in the annotation console.
key: Images # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: ImageViewer # Component type.
help: "Please add the field description." # Component help description
value: "{{ .Values.Images }}"
- name: Question # Annotation component name displayed in the annotation console.
key: Query # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type
help: "Please add the field description." # Component help description.
value: "{{ .Values.Query }}"
size: MultiLine
- name: Reference answer # Annotation component name displayed in the annotation console.
key: sn_vl_0_6_0_10b_8k_beta_0624 # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type
help: "Please add the field description." # Component help description.
value: "{{ .Values.gtmodel }}"
size: LongArticle
- name: Select the best model. # Annotation component name displayed in the annotation console.
key: correct_model # JSON field key corresponding to the component when JSON annotation results are exported.
type: StringSelector # Indicates that the component type is a string selection component.
option: SingleSelector # Indicates that this component allows multiple selections. Valid values of this field: SingleSelector/MultiSelector
help: Model evaluation # Component help description.
choices: # Specify the option content.
- Model 1
- Model 2
- Model 3
- Discard all
- name: Answer of model 1 # Annotation component name displayed in the annotation console.
key: kzx2npurd5 # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type
help: "Please add the field description." # Component help description.
value: '{{ index .Values.model1 }}'
size: MultiLine
- name: Answer of model 2 # Annotation component name displayed in the annotation console.
key: FT_qw15_sft_0626_v100_800 # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type.
help: "Please add the field description." # Component help description.
value: "{{ .Values.model2 }}"
size: MultiLine
- name: Answer of model 3 # Annotation component name displayed in the annotation console.
key: V4_FT_qw15_sft_0726_temp_old_500 # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type.
help: "Please add the field description" # Component help description.
value: "{{ .Values.model3 }}"
size: MultiLine
In this scenario, you can configure schemas to display a single test image and the inference results of different models, and configure the names of the models to be evaluated.
3. Filtering of high-quality text Q&A pairs
desc: Automatically generated YAML.
record_fields:
- name: Question # Annotation component name displayed in the annotation console.
key: question # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextViewer # Component type.
help: "Please add the field description." # Component help description.
value: "{{ .Values.question }}"
size: MultiLine
- name: Answer # Annotation component name displayed in the annotation console.
key: answer # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type
help: "Please add the field description." # Component help description
value: "{{ .Values.answer }}"
size: MultiLine
- name: tag # Annotation component name displayed in the annotation console.
key: tag # Name of the JSON field for exporting annotation results. (The name can contain letters and underscores and cannot start with a digit.)
type: TextInput # Component type.
help: "Please add the field description." # Component help description.
value: "{{ .Values.tag }}"
size: SingleLine
- name: Correct or not # Annotation component name displayed in the annotation console.
key: correct # JSON field key corresponding to the component when JSON annotation results are exported.
type: StringSelector # Indicates that the component type is a string selection component.
help: Please determine whether the answer is correct. # Component help description.
option: SingleSelector # Indicates that this component allows only single selection. Valid values of this field: SingleSelector/MultiSelector.
choices: # Content of the specified option.
- Correct
- Discard
- Questionable
In this scenario, you can configure a schema to display the non-editable Question field to prevent misoperations during the labeling process. You can also set the Answer field to an editable field, and set custom filtering enumeration values to Correct, Discard, and Questionable.