文本转语音工具Bark生成长语音代码，突破14秒长度限制的方法

文章来源：企鹅号 - AI画师大阳

文本转语音工具bark生成长音频方法和生成短音频的方法是不一样的，前几天制作了用bark制作短音频的教程，文章链接：《最强文本转语音工具：Bark，本地安装+云端部署+在线体验详细教程，AI一键生成带语气情感的语音及歌唱》，查看本长语音生成教程前，建议先看一下上一篇教程，熟悉一下基础安装操作。github中bark长语音生成说明：https://github.com/suno-ai/bark/blob/main/notebooks/long_form_generation.ipynb，今天我们来用bark生成长度超过14秒的长音频，下面演示一下具体操作。

1、Google colab 云端部署教程

首先打开谷歌Colaboratory，网站地址：https://colab.research.google.com，点击【文件】-【新建笔记本】。

先链接谷歌云盘，然后新建代码输入框输入下面代码安装bark

pip install git+https://github.com/suno-ai/bark.git

生成长语音有三种模式，1、简单模式，2、高级模式，3、对话模式

先说第一种简单模式，是使用 nltk 将较长的文本拆分成句子，并一个一个地生成句子。

先运行如下代码，安装完整nltk库

import nltk

nltk.download('punkt')

punkt下载完成后，运行如下代码：

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"from IPython.display import Audioimport nltk # we'll use this to split into sentencesimport numpy as npfrom bark.generation import (

generate_text_semantic,

preload_models,)from bark.api import semantic_to_waveformfrom bark import generate_audio, SAMPLE_RATE

preload_models()script = """

Hey, have you heard about this new text-to-audio model called "Bark"?

Apparently, it's the most realistic and natural-sounding text-to-audio model

out there right now. People are saying it sounds just like a real person speaking.

I think it uses advanced machine learning algorithms to analyze and understand the

nuances of human speech, and then replicates those nuances in its own speech output.

It's pretty impressive, and I bet it could be used for things like audiobooks or podcasts.

In fact, I heard that some publishers are already starting to use Bark to create audiobooks.

It would be like having your own personal voiceover artist. I really think Bark is going to

be a game-changer in the world of text-to-audio technology.

""".replace("\n", " ").strip()sentences = nltk.sent_tokenize(script)SPEAKER = "v2/en_speaker_6"silence = np.zeros(int(0.25 * SAMPLE_RATE)) # quarter second of silencepieces = []for sentence in sentences:

audio_array = generate_audio(sentence, history_prompt=SPEAKER)

pieces += [audio_array, silence.copy()]Audio(np.concatenate(pieces), rate=SAMPLE_RATE)

这个段代码的意思就是，将script的文本内容生成语音，speaker可以设置发音人，打开下面链接可以查看所有发音人列表。https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c

将这段文本转为语音用了非常长的时间，已经用时1小时32分钟了，还没有完成，等不了了，这个模型对电脑配置要求确实有点高。

第二种高级模式

有时 Bark 会在提示结束时产生一些额外的音频。我们可以通过降低 bark 停止生成文本的阈值来解决这个问题。我们在 generate_text_semantic 中使用 min_eos_p参数调整。

生成音频的完整代码：

import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"from IPython.display import Audiofrom scipy.io.wavfile import write as write_wavimport nltk # we'll use this to split into sentencesimport numpy as npfrom bark.generation import (