首页
学习
活动
专区
工具
TVP
发布
社区首页 >问答首页 >解析协议缓冲区时,字符串字段'google.cloud.language.v1beta2.TextSpan.content‘包含无效的UTF-8数据

解析协议缓冲区时,字符串字段'google.cloud.language.v1beta2.TextSpan.content‘包含无效的UTF-8数据
EN

Stack Overflow用户
提问于 2018-05-27 23:21:45
回答 2查看 760关注 0票数 1

我正在尝试运行Google Cloud Natural Language API Python示例中的Python脚本

https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/language/cloud-client/v1beta2/snippets.py

我没有做任何修改,所以我希望它能正常工作。具体地说,我想在一个文本文件/文档上运行实体分析。代码的相关部分如下所示。

代码语言:javascript
复制
def entities_file(gcs_uri):
"""Detects entities in the file located in Google Cloud Storage."""
client = language_v1beta2.LanguageServiceClient()

# Instantiates a plain text document.
document = types.Document(
    gcs_content_uri=gcs_uri,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects sentiment in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities

# entity types from enums.Entity.Type
entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
               'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')

for entity in entities:
    print('=' * 20)
    print(u'{:<16}: {}'.format('name', entity.name))
    print(u'{:<16}: {}'.format('type', entity_type[entity.type]))
    print(u'{:<16}: {}'.format('metadata', entity.metadata))
    print(u'{:<16}: {}'.format('salience', entity.salience))
    print(u'{:<16}: {}'.format('wikipedia_url',
          entity.metadata.get('wikipedia_url', '-')))

我已经将我的文本文件(utf-8编码)放在云存储上,地址是gs://neotokyo cloud -bucket/TXT/TTS-01.txt

我在Google cloud shell中运行脚本。当我运行该文件时:

代码语言:javascript
复制
python snippets.py entities-file gs://neotokyo-cloud-bucket/TXT/TTS-01.txt

我得到以下错误,这似乎是与协议相关的。

代码语言:javascript
复制
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:629]. 
String field 'google.cloud.language.v1beta2.TextSpan.content' 
contains invalid UTF-8 data when parsing a protocol buffer. 
Use the 'bytes' type if you intend to send raw bytes.

 ERROR:root:Exception deserializing message!
 Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/grpc/_common.py", line 87, in _transform
return transformer(message)
 DecodeError: Error parsing message
 Traceback (most recent call last):
 File "snippets.py", line 336, in <module>
entities_file(args.gcs_uri)
 File "snippets.py", line 114, in entities_file
entities = client.analyze_entities(document).entities
 File "/usr/local/lib/python2.7/dist-     packages/google/cloud/language_v1beta2/gapic/language_service_client.py", line 226, in analyze_entities
return self._analyze_entities(request, retry=retry, timeout=timeout)
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/retry.py", line 177, in retry_target
return target()
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/timeout.py", line 206, in func_with_timeout
return func(*args, **kwargs)
 File "/usr/local/lib/python2.7/dist-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
 File "/usr/local/lib/python2.7/dist-packages/six.py", line 737, in raise_from
raise value
 google.api_core.exceptions.InternalServerError: 500 Exception deserializing response!

我不知道protobuf所以,任何帮助感谢!

EN

回答 2

Stack Overflow用户

发布于 2018-05-31 07:07:15

你的文本文件来自哪里?

Python的ParseFromString/SerializeToString使用字节。在解析前尝试将文本文件转换为字节

票数 0
EN

Stack Overflow用户

发布于 2019-12-14 08:24:06

看起来您的文件以字节顺序标记(utf-8-sig)开头。在调用客户端之前,请尝试将内容转换为标准UTF8。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/50553628

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档