使用 Whisper 生成视频字幕：从提取音频到批量处理

生成视频字幕是许多视频处理任务的核心需求。本文将指导你使用 OpenAI 的 Whisper 模型为视频文件（如电视剧《Normal People》或电影《花样年华》）生成字幕（SRT 格式）。我们将从提取音频开始，逐步实现字幕生成，并提供一个 Python 脚本实现批量处理。此外，我们还将探讨如何处理非英语音频（如中文）并优化字幕质量。

前提条件

在开始之前，请确保安装以下工具：

1. FFmpeg：用于从视频提取音频。

安装：
Windows：下载 FFmpeg 并添加到系统路径。
macOS：brew install ffmpeg
Linux：sudo apt-get install ffmpeg（Ubuntu/Debian）或 sudo dnf install ffmpeg（Fedora）

2. Python 3.8+：用于运行脚本和 Whisper。

安装 Python：python.org。

3. Whisper：OpenAI 的语音转文字模型。

通过 pip 安装：pip install openai-whisper

4. uv（可选）：用于管理 Python 项目环境。

安装：pip install uv

5. 视频文件：准备 MP4 或 MKV 格式的视频文件（如《Normal People》或《花样年华》）。

步骤 1：提取音频

第一步是从视频文件中提取音频。我们使用 FFmpeg 将视频的音频流保存为 AAC 格式。

示例命令

为《Normal People》第1季第1集提取音频：

ffmpeg -i /path/to/Normal.People.S01E01.mp4 -vn -acodec copy /path/to/audio/Normal.People.S01E01.aac

-i：输入视频文件路径。
-vn：禁用视频流（仅提取音频）。
-acodec copy：直接复制音频流，不重新编码，保持原始质量。
输出：保存为 /path/to/audio/Normal.People.S01E01.aac。

注意事项

确保输出目录（如 /path/to/audio/）存在。
替换 /path/to/ 为实际文件路径。

步骤 2：生成字幕

使用 Whisper 模型将音频文件转换为 SRT 格式的字幕文件。Whisper 支持多种模型（如 tiny、base、small、medium、large 和 turbo），turbo 速度快，适合快速测试。

示例命令

为提取的音频生成字幕：

whisper /path/to/audio/Normal.People.S01E01.aac –model turbo –output_format srt –output_dir /path/to/generated_subs/

–model turbo：使用 turbo 模型（快速但可能牺牲精度）。
–output_format srt：输出 SRT 格式字幕。
–output_dir：指定字幕输出目录。
输出：生成 /path/to/generated_subs/Normal.People.S01E01.srt。

示例输出

生成的前几条字幕可能如下：

1
00:00:00,000 –> 00:00:24,000
It's a simple game. You have 15 players. Give one of them the ball.
Get it into the net.

2
00:00:24,000 –> 00:00:26,000
Very simple. Isn't it?

步骤 3：批量处理脚本

手动为多个视频生成字幕效率低下。以下 Python 脚本自动处理目录中的所有视频文件，提取音频并生成字幕。

完整脚本

import os
import subprocess
import argparse

defextract_audio(input_dir, output_dir):
"""Extract audio from video files in input_dir and save to output_dir."""
ifnot os.path.exists(output_dir):
os.makedirs(output_dir)
for filename in os.listdir(input_dir):
if filename.endswith(('.mp4', '.mkv')):
input_path = os.path.join(input_dir, filename)
audio_filename = os.path.splitext(filename)[0] + '.aac'
output_path = os.path.join(output_dir, audio_filename)
command = [
'ffmpeg', '-i', input_path, '-vn', '-acodec', 'copy', output_path
]
print(f"Extracting audio: {command}")
try:
subprocess.run(command, check=True)
except subprocess.CalledProcessError as e:
print(f"Error extracting audio from {filename}: {e}")

defgenerate_subtitles(input_dir, output_dir):
"""Generate subtitles for audio files using Whisper."""
ifnot os.path.exists(output_dir):
os.makedirs(output_dir)
for filename in os.listdir(input_dir):
if filename.endswith('.aac'):
input_path = os.path.join(input_dir, filename)
command = [
'whisper', input_path, '–model', 'turbo',
'–output_format', 'srt', '–output_dir', output_dir
]
print(f"Generating subtitles: {command}")
try:
subprocess.run(command, check=True)
except subprocess.CalledProcessError as e:
print(f"Error generating subtitles for {filename}: {e}")

if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Extract audio and generate subtitles.")
parser.add_argument("input_dir", help="Directory containing video files.")
parser.add_argument("audio_dir", help="Directory to save extracted audio files.")
parser.add_argument("subtitle_dir", help="Directory to save generated subtitles.")
args = parser.parse_args()
extract_audio(args.input_dir, args.audio_dir)
generate_subtitles(args.audio_dir, args.subtitle_dir)

使用方法

保存脚本为 generate_subtitles.py。

运行脚本，指定目录路径：

python generate_subtitles.py /path/to/videos /path/to/audio /path/to/generated_subs

步骤 4：优化字幕质量

生成的字幕可能存在以下问题，我们提供优化方法：

问题 1：时间戳不准确

解决方法：
- 使用 –max_line_width 50 和 –max_line_count 2 限制字幕长度。
- 后处理调整时间戳（示例代码）：

import pysrt
subs = pysrt.open('subtitles.srt')
for sub in subs:
if sub.start.seconds < 18:
sub.shift(seconds=18)
subs.save('adjusted_subtitles.srt')

问题 2：字幕过长

解决方法：
- 使用 NLTK 分句（示例代码）：

import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

def split_long_subtitle(text):
return sent_tokenize(text)

long_text = "It's a simple game. You have 15 players. Give one of them the ball."
sentences = split_long_subtitle(long_text) # 输出：['It's a simple game.', 'You have 15 players.', …]

问题 3：标点不一致

解决方法：
- 使用 –append_punctuations ".,!?" 参数。
- 使用 spaCy 后处理添加标点（示例代码）：

import spacy
nlp = spacy.load("en_core_web_sm")
text = "It's a simple game You have 15 players"
doc = nlp(text)
punctuated_text = " ".join(token.text_with_ws for token in doc) # 输出：It's a simple game. You have 15 players.

步骤 5：处理非英语音频（如中文）

示例命令

生成中文字幕并翻译为英文：

whisper /path/to/In.the.Mood.for.Love.mp4 –model large –output_format srt –output_dir /path/to/generated_subs –language zh –task transcribe

优化建议

使用 large 模型：非英语音频需更高精度。

指定方言：如粤语使用 –language yue。

预处理音频：降噪命令示例：

ffmpeg -i input.mp4 -af "afftdn" -vn -acodec copy output.aac

注意事项

性能考虑：large 模型需更多计算资源。

文件格式：确保兼容 MP4、MKV、AAC 等格式。

调试：使用 –verbose 查看详细日志。

总结

通过 FFmpeg 和 Whisper，可以轻松为视频生成高质量字幕。批量处理脚本自动化了提取音频和生成字幕的过程，优化时间戳、字幕长度和标点的方法进一步提升了字幕质量。对于非英语音频（如中文），使用 large 模型、预处理音频和分离转录翻译是关键。

使用 Whisper 生成视频字幕：从提取音频到批量处理

前提条件

步骤 1：提取音频

示例命令

注意事项

步骤 2：生成字幕

示例命令

示例输出

步骤 3：批量处理脚本

完整脚本

使用方法

步骤 4：优化字幕质量

问题 1：时间戳不准确

问题 2：字幕过长

问题 3：标点不一致

步骤 5：处理非英语音频（如中文）

示例命令

优化建议

注意事项

总结

相关推荐

评论抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

前提条件

步骤 1：提取音频

示例命令

注意事项

步骤 2：生成字幕

示例命令

示例输出

步骤 3：批量处理脚本

完整脚本

使用方法

步骤 4：优化字幕质量

问题 1：时间戳不准确

问题 2：字幕过长

问题 3：标点不一致

步骤 5：处理非英语音频（如中文）

示例命令

优化建议

注意事项

总结

相关推荐

评论 抢沙发

评论前必须登录！

热门标签

置顶推荐

热门文章

最新文章

评论抢沙发