Uploading and Downloading Data

Last updated: 2026-03-10 10:16:57

1. Directly Called Functions

1.1. API List

download_from_hdfs(hdfs_url, hdfs_path, local_path)
Download files from HDFS to a local machine.
:param hdfs_url: WebHDFS address, for example, http://10.0.3.16:4008.
:type hdfs_url: str
:param hdfs_path: path on HDFS.
:type hdfs_path: str
:param local_path: local path.
:type local_path: str
:return: local result path.
:rtype: str

upload_to_hdfs(local_path, hdfs_url, hdfs_path)
Upload a local directory to HDFS.
:param local_path: local path.
:type local_path: str
:param hdfs_url: WebHDFS address, for example, http://10.0.3.16:4008.
:type hdfs_url: str
:param hdfs_path: path on HDFS.
:type hdfs_path: str
:return: result path of HDFS.
:rtype: str

upload_to_hive_by_hdfs(local_path, hdfs_url, hive_server, table_name, database='default', auth='CUSTOM', username=None, password=None, overwrite=False, partition='')
Import the data in a local file into a Hive table with HDFS as the intermediate storage.
Process: Upload the local file to HDFS first, and then import the file from HDFS into the Hive table.
:param local_path: local file or folder. The folder cannot contain subfolders.
:type local_path: str
:param hdfs_url: WebHDFS URL, for example, http://10.0.3.16:4008.
:type hdfs_url: str
:param hive_server: HiveServer2 address.
:type hive_server: str
:param table_name: Hive table name.
:type table_name: str
:param database: database name.
:type database: str
:param auth: authentication method.
:type auth: str
:param username: username for database authentication.
:type username: str
:param password: password for database authentication.
:type password: str
:param overwrite: Whether to delete the original data.
:type overwrite: bool
:param partition: partition selection.
:type partition: str
:return:
:rtype:

export_from_hive_by_hdfs(local_path, hdfs_url, hive_server, table_name='', sql='', database='default', auth='CUSTOM', username=None, password=None, row_format="row format delimited fields terminated by ','")
Export a Hive table to a local machine with HDFS as the intermediate storage. For large files, this method is more efficient than that of directly writing from the Hive table to the local machine.
Process: Export the Hive table to HDFS first, and then download it from HDFS to the local machine.
:param local_path: local directory.
:type local_path: str
:param hdfs_url: WebHDFS URL, for example, http://10.0.3.16:4008.
:type hdfs_url: str
:param hive_server: HiveServer2 address.
:type hive_server: str
:param table_name: Hive table name. This parameter can be ignored when SQL is specified.
:type table_name: str
:param sql: SQL statement for querying data, for example, select * from t1.
:type sql: str
:param database: database name.
:type database: str
:param auth: authentication method.
:type auth: str
:param username: username for database authentication.
:type username: str
:param password: password for database authentication.
:type password: str
:param row_format: row output format.
:type row_format: str
:return:
:rtype:

1.2. Usage:

import tikit

tikit.upload_to_hdfs("dir1/file1", "http://10.0.3.16:4008", "/dir1/file1")

2. Methods Called Through the Client

describe_cos_buckets(self)
List all buckets.
:return: bucket list.
:rtype: dict
The returned results are as follows:
{
"Owner": {
"ID": "qcs::cam::uin/100011011262:uin/100011011262",
"DisplayName": "100011011162"
},
"Buckets": {
"Bucket": [
{
"Name": "bucket-58565",
"Location": "ap-beijing-fsi",
"CreationDate": "2021-07-21T11:06:00Z",
"BucketType": "cos"
},
{
"Name": "tai-1300158565",
"Location": "ap-guangzhou",
"CreationDate": "2021-10-22T11:04:40Z",
"BucketType": "cos"
}
]
}
}

describe_cos_path(self, bucket, path, maker='', max_keys=1000, encoding_type='')
Obtain the information on the Cloud Object Storage (COS) directory. A maximum of 1,000 items under the directory can be displayed. To display the files and folders under the directory, add a slash (/) at the end of these files and folders.
:param bucket: COS bucket.
:type bucket: str
:param path: path.
:type path: str
:return: directory information.
:rtype: dict
:param maker: List entries starting from marker.
:type maker: str
:param max_keys: Configure the maximum number of entries returned at a time. The maximum value is 1000.
:type max_keys: int
:param encoding_type: Configure the encoding method for returned results. You can only set it to url.
:type encoding_type: str

upload_to_cos(self, local_path, bucket, cos_path)
Upload files or directories from a local path to COS.
:param local_path: local path.
:type local_path: str
:param bucket: COS bucket.
:type bucket: str
:param cos_path: COS path.
:type cos_path: str
:return: None. Error messages are given through raise.
:rtype:

download_from_cos(self, bucket, cos_path, local_path)
Download files or directories from COS to a local machine.
Note: If local files exist, they will be directly overwritten. When cos_path is a directory and local_path is an existing directory, the folder name of cos_path will be retained as a subdirectory.
:param bucket: COS bucket.
:type bucket: str
:param cos_path: COS path.
:type cos_path: str
:param local_path: local path.
:type local_path: str
:return: None. Error messages are given through raise.
:rtype:


delete_cos_path(self, bucket, delete_path)
Delete the COS directory. To delete files and folders under the directory, add a slash (/) at the end of these files and folders. That is, files and folders without a slash (/) will be deleted as files, and files and folders with a slash (/) will be deleted as folders.
:param bucket: COS bucket.
:type bucket: str
:param delete_path: path to be deleted.
:type delete_path: str

3. Methods Called Through HiveInitial

3.1. Initialization

from tikit.client import Client
from tikit.hive import HiveInitial

client = Client("your_secret_id", "your_secret_key", "<region>")

hive_init = HiveInitial(client)
hive_init.hive_initial("emr-xsjhbhf", "hadoop", "./emr.keytab")

3.2. APIs

def spark_hive_initial_wedata(self, wedata_id, source_account=None):
"""Initialize WeData Hive for Spark. (After calling this method, you can use Spark to perform Hive operations.)

:param wedata_id: WeData data source ID.
:type wedata_id: int
:param source_account: If Hive is a system source, the account UIN needs to be specified.
:type source_account: str
:rtype:
"""
def hive_initial_wedata(self, wedata_id, source_account=None):
"""Obtain the WeData Hive connection handle.

:param wedata_id: WeData data source ID.
:type wedata_id: int
:param source_account: If Hive is a system source, the account UIN needs to be specified.
:type source_account: str
:rtype:
"""

def spark_hive_initial(self, emr_id, username=None, keytab=None):
"""Initialize EMR Hive for Spark. (After calling this method, you can use Spark to perform Hive operations.)

:param emr_id: Tencent Cloud EMR ID.
:type emr_id: str
:param username: If Kerberos authentication is used, the corresponding username needs to be specified.
:type username: str
:param keytab: keytab file path. If the default account (for example, hadoop) of a cluster is used, the keytab path needs to be provided.
:type keytab: str
:rtype:
"""

def hive_initial(self, emr_id, username=None, keytab=None):
"""Obtain the EMR Hive connection handle.

:param emr_id: Tencent Cloud EMR ID.
:type emr_id: str
:param username: If Kerberos authentication is used, the corresponding username needs to be specified.
:type username: str
:param keytab: keytab file path. If the default account (for example, hadoop) of a cluster is used, the keytab path needs to be provided.
:type keytab: str
:rtype:
"""

def hive_initial_custom(
self,
host=None,
port=None,
scheme=None,
username=None,
database='default',
auth=None,
configuration=None,
kerberos_service_name=None,
password=None,
check_hostname=None,
ssl_cert=None,
thrift_transport=None):
"""Connect to HiveServer2

:param host: What host HiveServer2 runs on
:param port: What port HiveServer2 runs on. Defaults to 10000.
:param auth: The value of hive.server2.authentication used by HiveServer2.
Defaults to ``NONE``.
:param configuration: A dictionary of Hive settings (functionally same as the `set` command)
:param kerberos_service_name: Use with auth='KERBEROS' only
:param password: Use with auth='LDAP' or auth='CUSTOM' only
:param thrift_transport: A ``TTransportBase`` for custom advanced usage.
Incompatible with host, port, auth, kerberos_service_name, and password.

The way to support LDAP and GSSAPI is originated from cloudera/Impyla:
https://github.com/cloudera/impyla/blob/255b07ed973d47a3395214ed92d35ec0615ebf62
/impala/_thrift_api.py#L152-L160
"""

def upload_to_wedata_hive(self, wedata_id, local_path, table_name, database='default', overwrite=False,
partition='', source_account=None):
"""Upload files to WeData Hive.

:param wedata_id: WeData data source ID.
:type wedata_id: int
:param local_path: local file path.
:type local_path: str
:param table_name: table name.
:type local_path: str
:param database: database.
:type database: str
:param overwrite: Whether to delete the original data.
:type overwrite: bool
:param partition: partition selection.
:type partition: str
:param source_account: If Hive is a system source, the account UIN needs to be specified.
:type source_account: str
:rtype:
"""

def export_from_wedata_hive(self, wedata_id, local_path, table_name='', database='default', sql='',
row_format="row format delimited fields terminated by ','", source_account=None):
"""Export WeData Hive data to a local machine.

:param wedata_id: WeData data source ID.
:type wedata_id: int
:param local_path: local file path.
:type local_path: str
:param table_name: table name.
:type local_path: str
:param database: database.
:type database: str
:param sql: SQL statement for querying data, for example, select * from t1.
:type sql: str
:param row_format: row output format.
:type row_format: str
:param source_account: If Hive is a system source, the account UIN needs to be specified.
:type source_account: str
:rtype:
"""