Extracting Object Content

Last updated: 2023-09-13 11:05:25

Feature Overview

This document provides an overview of APIs and SDK code samples for object content extraction.
API
Operation
Description
Extracting Object Content
Extracts the content of a specified object

Extracting Object Content

Note

This API is used to extract content from a specific object.

Method prototype

select_object_content(Bucket, Key, Expression, ExpressionType, InputSerialization, OutputSerialization, RequestProgress=None, **kwargs)

Sample Request

# -*- coding=utf-8
from qcloud_cos import CosConfig
from qcloud_cos import CosS3Client
import sys
import os
import logging

# Under normal circumstances, use the INFO log level. To locate issues, change it to DEBUG, and the SDK will print communication information with the server.
logging.basicConfig(level=logging.INFO, stream=sys.stdout)

# 1. Set user attributes, including secret_id, secret_key, region, etc. Appid has been removed from CosConfig, please include Appid in the Bucket parameter. Bucket is composed of BucketName-Appid.
secret_id = os.environ['COS_SECRET_ID'] # User <1>SecretId</1>. We recommend that you use a sub-account key and follow the principle of least privilege to reduce risks. For more information on how to obtain a sub-account key, visit https://cloud.tencent.com/document/product/598/37140.
secret_key = os.environ['COS_SECRET_KEY'] # User <1>SecretKey</1>. We recommend that you use a sub-account key and follow the principle of least privilege to reduce risks. For more information on how to obtain a sub-account key, visit https://cloud.tencent.com/document/product/598/37140.
region = 'ap-beijing' # Replace it with the actual region, which can be viewed in the console at https://console.cloud.tencent.com/cos5/bucket.
For a list of all regions supported by COS, visit https://cloud.tencent.com/document/product/436/6224
token = None # Token is required for temporary keys but not permanent keys. For more information about how to generate and use a temporary key, see https://cloud.tencent.com/document/product/436/14048.
scheme = 'https' # Specify whether to use HTTP or HTTPS protocol to access COS. This is optional and is https by default.

config = CosConfig(Region=region, SecretId=secret_id, SecretKey=secret_key, Token=token, Scheme=scheme)
client = CosS3Client(config)

response = client.select_object_content(
Bucket='examplebucket-1250000000',
Key='exampleobject',
Expression='Select * from COSObject',
ExpressionType='SQL',
InputSerialization={
'CompressionType': 'NONE',
'JSON': {
'Type': 'LINES'
}
},
OutputSerialization={
'CSV': {
'RecordDelimiter': '\n'
}
}
)

# Obtain the EventStream instance encapsulated in the response result
event_stream = response['Payload']

# Retrieve All Search Results at Once
# Note: Since EventStream retrieves the results in a streaming manner, calling the get_select_result() method again will return an empty set.
result = event_stream.get_select_result()
print(result)

Sample request with all parameters

response = client.select_object_content(
Bucket='examplebucket-1250000000',
Key='exampleobject',
Expression='Select * from COSObject',
ExpressionType='SQL',
InputSerialization={
'CompressionType': 'GZIP',
'JSON': {
'Type': 'LINES'
}
},
OutputSerialization={
'CSV': {
'RecordDelimiter': '\n'
}
},
RequestProgress={
'Enabled': 'FALSE'
}
)

Description

Parameter name
ParameterDescription
Local Disk Types
Required
Bucket
Bucket name in the format of BucketName-APPID
String
Required
Key
ObjectKey is the unique identifier of the object in the bucket. For example, in the object's access domain name examplebucket-1250000000.cos.ap-guangzhou.myqcloud.com/doc/pic.jpg, the ObjectKey is doc/pic.jpg
String
Required
Expression
SQL statement, which represents the extract operation you want to perform
String
Required
ExpressionType
Expression type, which is an extension. Currently, only SQL expressions and parameters are supported.
String
Required
InputSerialization
Format of the object to extract. For more information, see Sample request
Dict
Required
OutputSerialization
Output format of the extraction results. For more information, see Sample request
Dict
Required
RequestProgress
Specifies whether to return the query progress information (QueryProgress). If this parameter is used, COS Select will periodically return the query progress.
Dict
Not required

Response description

The extraction result is in dict format.
{
'Payload': EventStream()
}
In the response, there is only one key-value pair where the key is 'Payload' and the value is the EventStream instance. The extraction result of the object is encapsulated in the EventStream instance. You can call the next_event(), get_select_result(), and get_select_result_to_file() methods to get the extraction result.