首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >谷歌翻译API -检测语言+翻译文档(xlsx,csv)

谷歌翻译API -检测语言+翻译文档(xlsx,csv)
EN

Stack Overflow用户
提问于 2021-07-28 00:13:28
回答 1查看 409关注 0票数 0

我正在尝试使用Google Cloud Translation API来翻译包含多种语言文本的excel (或csv)文档,而我的目标语言是英语。

我想使用"Translate text in batches (仅限高级版)“代码示例(此处链接:https://cloud.google.com/translate/docs/samples/translate-v3-batch-translate-text),但代码示例中有一行定义源语言,因此只能有一种源语言。

但我需要首先检测文档中的语言,然后将文本翻译成英语。有一个简单的文本字符串“检测语言(高级)”(链接:https://cloud.google.com/translate/docs/advanced/detecting-language-v3)中检测语言的代码示例,但我需要将翻译文档(但只定义了一种源语言)的第一个代码示例与检测语言的能力相结合,而不是定义一种源语言。

参考资料中有这种类型的代码示例吗?这个问题怎么解决呢?

下面是有问题的示例代码:

代码语言:javascript
复制
from google.cloud import translate


def batch_translate_text(
    input_uri="gs://YOUR_BUCKET_ID/path/to/your/file.txt",
    output_uri="gs://YOUR_BUCKET_ID/path/to/save/results/",
    project_id="YOUR_PROJECT_ID",
    timeout=180,
):
    """Translates a batch of texts on GCS and stores the result in a GCS location."""

    client = translate.TranslationServiceClient()

    location = "us-central1"
    # Supported file types: https://cloud.google.com/translate/docs/supported-formats
    gcs_source = {"input_uri": input_uri}

    input_configs_element = {
        "gcs_source": gcs_source,
        "mime_type": "text/plain",  # Can be "text/plain" or "text/html".
    }
    gcs_destination = {"output_uri_prefix": output_uri}
    output_config = {"gcs_destination": gcs_destination}
    parent = f"projects/{project_id}/locations/{location}"

    # Supported language codes: https://cloud.google.com/translate/docs/language
    operation = client.batch_translate_text(
        request={
            "parent": parent,
            "source_language_code": "en",
            "target_language_codes": ["ja"],  # Up to 10 language codes here.
            "input_configs": [input_configs_element],
            "output_config": output_config,
        }
    )

    print("Waiting for operation to complete...")
    response = operation.result(timeout)

    print("Total Characters: {}".format(response.total_characters))
    print("Translated Characters: {}".format(response.translated_characters))
EN

回答 1

Stack Overflow用户

发布于 2021-07-28 11:28:03

遗憾的是,不能使用batchTranslateText将值的数组传递给字段source_language_code。我建议对每个文件执行detectLanguagetranslateText

下面的代码做的是:

  1. 它提取要翻译的内容。出于测试目的,所使用的csv文件只有1列,并且sample1.csv的内容为tl(塔加洛克语),sample2.csv为es(Spanish).
  2. Pass格式。将提取的内容转换为detect_language()以获取检测到的语言代码。
  3. 将所有必需的参数传递给translate_text()以转换

注意:下面的代码仅使用具有一列的csv文件进行测试。编辑main()中的代码,以根据您希望提取数据的列进行填充。

代码语言:javascript
复制
from google.cloud import translate
import csv


def listToString(s):
    """ Transform list to string"""
    str1 = " "
    return (str1.join(s))

def detect_language(project_id,content):
    """Detecting the language of a text string."""

    client = translate.TranslationServiceClient()
    location = "global"
    parent = f"projects/{project_id}/locations/{location}"

    response = client.detect_language(
        content=content,
        parent=parent,
        mime_type="text/plain",  # mime types: text/plain, text/html
    )

    for language in response.languages:
        return language.language_code


def translate_text(text, project_id,source_lang):
    """Translating Text."""

    client = translate.TranslationServiceClient()
    location = "global"
    parent = f"projects/{project_id}/locations/{location}"

    # Detail on supported types can be found here:
    # https://cloud.google.com/translate/docs/supported-formats
    response = client.translate_text(
        request={
            "parent": parent,
            "contents": [text],
            "mime_type": "text/plain",  # mime types: text/plain, text/html
            "source_language_code": source_lang,
            "target_language_code": "en-US",
        }
    )

    # Display the translation for each input text provided
    for translation in response.translations:
        print("Translated text: {}".format(translation.translated_text))
        
def main():

    project_id="your-project-id"
    csv_files = ["sample1.csv","sample2.csv"]
    # Perform your content extraction here if you have a different file format #
    for csv_file in csv_files:
        csv_file = open(csv_file)
        read_csv = csv.reader(csv_file)
        content_csv = []

        for row in read_csv:
            content_csv.extend(row)
        content = listToString(content_csv) # convert list to string
        detect = detect_language(project_id=project_id,content=content)
        translate_text(text=content,project_id=project_id,source_lang=detect)

if __name__ == "__main__":
    main()

sample1.csv:

代码语言:javascript
复制
kamusta
ayos

sample2.csv:

代码语言:javascript
复制
cómo estás
okey

使用以上代码的输出:

代码语言:javascript
复制
Translated text: how are you okay
Translated text: how are you ok
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68548387

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档