首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >面向视频智能的谷歌云语音转录

面向视频智能的谷歌云语音转录
EN

Stack Overflow用户
提问于 2021-01-24 04:48:58
回答 1查看 65关注 0票数 0

我打算将Google Cloud语音转录用于视频智能。以下代码仅对视频的部分片段进行分析。

代码语言:javascript
运行
复制
video_uri = "gs://cloudmleap/video/next/JaneGoodall.mp4"
language_code = "en-GB"
segment = types.VideoSegment()
segment.start_time_offset.FromSeconds(55)
segment.end_time_offset.FromSeconds(80)
response = transcribe_speech(video_uri, language_code, [segment])

def transcribe_speech(video_uri, language_code, segments=None):
    video_client = videointelligence.VideoIntelligenceServiceClient()
    features = [enums.Feature.SPEECH_TRANSCRIPTION]
    config = types.SpeechTranscriptionConfig(
        language_code=language_code,
        enable_automatic_punctuation=True,
    )
    context = types.VideoContext(
        segments=segments,
        speech_transcription_config=config,
    )

    print(f'Processing video "{video_uri}"...')
    operation = video_client.annotate_video(
        input_uri=video_uri,
        features=features,
        video_context=context,
    )
    return operation.result()

如何自动分析整个视频,而不是定义特定的片段?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-25 10:04:36

您可以在Video Intelligence google doc中学习本教程。本教程展示了如何转录整个视频。您的输入应该存储在GCS存储桶中,我在您的示例代码中看到,您的视频确实存储在GCS存储桶中,因此您应该不会有任何问题。

只需确保您已经安装了latest Video Intelligence library.

代码语言:javascript
运行
复制
pip install --upgrade google-cloud-videointelligence

这是转录音频的code snippet from the Video Intelligence doc

代码语言:javascript
运行
复制
"""Transcribe speech from a video stored on GCS."""
from google.cloud import videointelligence

path="gs://your_gcs_bucket/your_video.mp4"
video_client = videointelligence.VideoIntelligenceServiceClient()
features = [videointelligence.Feature.SPEECH_TRANSCRIPTION]

config = videointelligence.SpeechTranscriptionConfig(
    language_code="en-US", enable_automatic_punctuation=True
)
video_context = videointelligence.VideoContext(speech_transcription_config=config)

operation = video_client.annotate_video(
    request={
        "features": features,
        "input_uri": path,
        "video_context": video_context,
    }
)

print("\nProcessing video for speech transcription.")

result = operation.result(timeout=600)

# There is only one annotation_result since only
# one video is processed.
annotation_results = result.annotation_results[0]
for speech_transcription in annotation_results.speech_transcriptions:

    # The number of alternatives for each transcription is limited by
    # SpeechTranscriptionConfig.max_alternatives.
    # Each alternative is a different possible transcription
    # and has its own confidence score.
    for alternative in speech_transcription.alternatives:
        print("Alternative level information:")

        print("Transcript: {}".format(alternative.transcript))
        print("Confidence: {}\n".format(alternative.confidence))

        print("Word level information:")
        for word_info in alternative.words:
            word = word_info.word
            start_time = word_info.start_time
            end_time = word_info.end_time
            print(
                "\t{}s - {}s: {}".format(
                    start_time.seconds + start_time.microseconds * 1e-6,
                    end_time.seconds + end_time.microseconds * 1e-6,
                    word,
                )
            )
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65864221

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档