我正在尝试使用Google Cloud Translation API来翻译包含多种语言文本的excel (或csv)文档,而我的目标语言是英语。
我想使用"Translate text in batches (仅限高级版)“代码示例(此处链接:https://cloud.google.com/translate/docs/samples/translate-v3-batch-translate-text),但代码示例中有一行定义源语言,因此只能有一种源语言。
但我需要首先检测文档中的语言,然后将文本翻译成英语。有一个简单的文本字符串“检测语言(高级)”(链接:https://cloud.google.com/translate/docs/advanced/detecting-language-v3)中检测语言的代码示例,但我需要将翻译文档(但只定义了一种源语言)的第一个代码示例与检测语言的能力相结合,而不是定义一种源语言。
参考资料中有这种类型的代码示例吗?这个问题怎么解决呢?
下面是有问题的示例代码:
from google.cloud import translate
def batch_translate_text(
input_uri="gs://YOUR_BUCKET_ID/path/to/your/file.txt",
output_uri="gs://YOUR_BUCKET_ID/path/to/save/results/",
project_id="YOUR_PROJECT_ID",
timeout=180,
):
"""Translates a batch of texts on GCS and stores the result in a GCS location."""
client = translate.TranslationServiceClient()
location = "us-central1"
# Supported file types: https://cloud.google.com/translate/docs/supported-formats
gcs_source = {"input_uri": input_uri}
input_configs_element = {
"gcs_source": gcs_source,
"mime_type": "text/plain", # Can be "text/plain" or "text/html".
}
gcs_destination = {"output_uri_prefix": output_uri}
output_config = {"gcs_destination": gcs_destination}
parent = f"projects/{project_id}/locations/{location}"
# Supported language codes: https://cloud.google.com/translate/docs/language
operation = client.batch_translate_text(
request={
"parent": parent,
"source_language_code": "en",
"target_language_codes": ["ja"], # Up to 10 language codes here.
"input_configs": [input_configs_element],
"output_config": output_config,
}
)
print("Waiting for operation to complete...")
response = operation.result(timeout)
print("Total Characters: {}".format(response.total_characters))
print("Translated Characters: {}".format(response.translated_characters))
发布于 2021-07-28 11:28:03
遗憾的是,不能使用batchTranslateText
将值的数组传递给字段source_language_code。我建议对每个文件执行detectLanguage
和translateText
。
下面的代码做的是:
tl
(塔加洛克语),sample2.csv为es
(Spanish).detect_language()
以获取检测到的语言代码。translate_text()
以转换注意:下面的代码仅使用具有一列的csv文件进行测试。编辑main()
中的代码,以根据您希望提取数据的列进行填充。
from google.cloud import translate
import csv
def listToString(s):
""" Transform list to string"""
str1 = " "
return (str1.join(s))
def detect_language(project_id,content):
"""Detecting the language of a text string."""
client = translate.TranslationServiceClient()
location = "global"
parent = f"projects/{project_id}/locations/{location}"
response = client.detect_language(
content=content,
parent=parent,
mime_type="text/plain", # mime types: text/plain, text/html
)
for language in response.languages:
return language.language_code
def translate_text(text, project_id,source_lang):
"""Translating Text."""
client = translate.TranslationServiceClient()
location = "global"
parent = f"projects/{project_id}/locations/{location}"
# Detail on supported types can be found here:
# https://cloud.google.com/translate/docs/supported-formats
response = client.translate_text(
request={
"parent": parent,
"contents": [text],
"mime_type": "text/plain", # mime types: text/plain, text/html
"source_language_code": source_lang,
"target_language_code": "en-US",
}
)
# Display the translation for each input text provided
for translation in response.translations:
print("Translated text: {}".format(translation.translated_text))
def main():
project_id="your-project-id"
csv_files = ["sample1.csv","sample2.csv"]
# Perform your content extraction here if you have a different file format #
for csv_file in csv_files:
csv_file = open(csv_file)
read_csv = csv.reader(csv_file)
content_csv = []
for row in read_csv:
content_csv.extend(row)
content = listToString(content_csv) # convert list to string
detect = detect_language(project_id=project_id,content=content)
translate_text(text=content,project_id=project_id,source_lang=detect)
if __name__ == "__main__":
main()
sample1.csv:
kamusta
ayos
sample2.csv:
cómo estás
okey
使用以上代码的输出:
Translated text: how are you okay
Translated text: how are you ok
https://stackoverflow.com/questions/68548387
复制相似问题