前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >自制数字人播报视频

自制数字人播报视频

作者头像
Yunjie Ge
发布2024-01-22 15:16:15
2430
发布2024-01-22 15:16:15
举报
文章被收录于专栏:数据库与编程数据库与编程

本教程带您一下自制属于你自己的数字人播报视频,即通过人脸图像和一段语音音频生成照片说话视频。

先看两段此工具生成的视频:

所使用的 工具为SadTalker,所使用测试环境为google免费提供的colab,具体使用方法:

1、确认GPU 及 CUDA 环境可用

代码语言:javascript
复制
### 确认GPU 及 CUDA 环境可用
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

2、安装环境及下载源代码

代码语言:javascript
复制
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.8 2
!update-alternatives --install /usr/local/bin/python3 python3 /usr/bin/python3.9 1
!sudo apt install python3.8

!sudo apt-get install python3.8-distutils

!python --version

!apt-get update

!apt install software-properties-common

!sudo dpkg --remove --force-remove-reinstreq python3-pip python3-setuptools python3-wheel

!apt-get install python3-pip

print('Git clone project and install requirements...')
!git clone https://github.com/Winfredy/SadTalker &> /dev/null
%cd SadTalker
!export PYTHONPATH=/content/SadTalker:$PYTHONPATH
!python3.8 -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
!apt update
!apt install ffmpeg &> /dev/null
!python3.8 -m pip install -r requirements.txt

3、下载预训练模型

代码语言:javascript
复制
print('下载预训练模型...')
!rm -rf checkpoints
!bash scripts/download_models.sh

4、生成数字人播报视频

准备照片和音频文件,照片必须要有清楚的头像,音频随便找一段讲话的音频即可:

照片:examples/source_image/face.png

音频文件:examples/driven_audio/jack.mp3

代码语言:javascript
复制
# 此处指定数字人图片为face.png 音频文件为 jack.mp3
img = 'examples/source_image/face.png'
print(img)
!python3.8 inference.py --driven_audio ./examples/driven_audio/jack.mp3 \
           --source_image {img} \
           --result_dir ./results --still --preprocess full --enhancer gfpgan

生成完成会输出如下信息,里面包含了视频文件名称:./results/2024_01_18_15.04.41.mp4

代码语言:javascript
复制
examples/source_image/face3.png
using safetensor as default
3DMM Extraction for source image
landmark Det:: 100% 1/1 [00:00<00:00, 15.77it/s]
3DMM Extraction In Video:: 100% 1/1 [00:00<00:00, 22.80it/s]
mel:: 100% 1787/1787 [00:00<00:00, 18679.82it/s]
audio2exp:: 100% 179/179 [00:00<00:00, 295.25it/s]
Face Renderer:: 100% 894/894 [08:45<00:00,  1.70it/s]
IMAGEIO FFMPEG_WRITER WARNING: input image is not divisible by macro_block_size=16, resizing from (256, 254) to (256, 256) to ensure video compatibility with most codecs and players. To prevent resizing, make your input image divisible by the macro_block_size or set the macro_block_size to 1 (risking incompatibility).
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie.mp4
OpenCV: FFMPEG: tag 0x5634504d/'MP4V' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
seamlessClone:: 100% 1787/1787 [01:01<00:00, 28.90it/s]
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie_full.mp4
face enhancer....
Face Enhancer:: 100% 1787/1787 [15:12<00:00,  1.96it/s]
The generated video is named ./results/2024_01_18_15.04.41/face3##2023zongjie_enhanced.mp4
The generated video is named: ./results/2024_01_18_15.04.41.mp4
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2024-01-19,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 山东Oracle用户组 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档