Skip to content

Commit

Permalink
Merge pull request #4 from AuYang261/caption
Browse files Browse the repository at this point in the history
添加自动生成字幕功能
  • Loading branch information
AuYang261 authored Apr 9, 2024
2 parents b5bb312 + 1c4da21 commit 034d49a
Show file tree
Hide file tree
Showing 15 changed files with 224 additions and 35 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,6 @@ old/
*.exe
build/
dist/
*.spec
*.spec
whisper_models/
release_*/
78 changes: 67 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,49 +8,105 @@

## 使用:下载指定课程

[下载并解压](https://github.com/AuYang261/BIT_yanhe_download/releases/latest/download/yanhe.zip)
[下载并解压](https://github.com/AuYang261/BIT_yanhe_download/releases/latest/download/release_downloader.zip)

[延河课堂 (yanhekt.cn)](https://www.yanhekt.cn/recordCourse)中找到想下载的课程,以链接为https://www.yanhekt.cn/course/40524 的课程为例,复制地址栏最后的五位编号40524。注意是课程列表的链接(以`yanhekt.cn/course/五位编号`开头),不是视频界面的链接(以`yanhekt.cn/session/六位编号`开头)。

![image-20231018204208066](md/README/image-20231018204208066.png)

双击运行`main.exe`(Release中的)或`run.bat`文件,并输入你想下载的课程编号(40524)。输出课程视频列表:

![image-20230926124749421](md/README/image-20230926124749421.png)
![image-20240409103306945](md/README/image-20240409103306945.png)

输入想下载的视频编号,用英文逗号(,)分隔,回车。接着选择下载video视频录像(即教室后的摄像头录像)还是下载screen信号(即教室电脑的屏幕),默认为视频录像。回车即开始下载:
输入想下载的视频编号,用英文逗号(,)分隔,回车。接着输入数字选择下载video视频录像(即教室后的摄像头录像)还是下载screen信号(即教室电脑的屏幕),默认为视频录像。回车即开始下载:

![image-20230926124841432](md/README/image-20230926124841432.png)
![image-20240409103338980](md/README/image-20240409103338980.png)

下载完成的文件在`output/`目录下以`课程名-video/screen`格式命名的文件夹中。

![image-20230926124922726](md/README/image-20230926124922726.png)

## 自动生成字幕

本项目提供自动生成字幕功能,使用openai的[whisper](https://github.com/openai/whisper)项目及其模型在本地进行语音转文字生成字幕。

最好使用GPU运行,否则速度较慢,依赖见[下文](#依赖)

下载[字幕生成程序gen_caption](https://github.com/AuYang261/BIT_yanhe_download/releases/latest),由于程序比较大,采用了分卷压缩发布。全部下载并解压,得到一个`gen_caption.exe`可执行文件,保存在上述`release_downloader.zip`解压的目录中,和保存视频的目录`output/`同级,如下所示:

![image-20240409105228362](md/README/image-20240409105228362.png)

下载完视频后,双击运行`gen_caption.exe`(文件较大,需要等一会),输入数字选择视频,回车。再输入数字选择使用多大的模型,越往下效果越好,但所需时间也越长,默认使用base模型。第一次使用会自动下载模型(几百M),请耐心等待。如下所示:

![image-20240409131033038](md/README/image-20240409131033038.png)

等待程序运行完成,生成的字幕文件为`.srt`格式,与视频文件在同级目录下,用支持字幕的播放器(如potplayer)打开视频即可看到带字幕的视频。

*tips: 语音转文字所需的时间较长,可以先观看视频,字幕生成好了再重新打开视频享受字幕。使用GPU大约需要几分钟,不使用GPU则需要更长时间。*

## 依赖

* ffmpeg,已在Release中提供。
* ffmpeg,已在Release中提供。若在Linux环境下运行,需手动安装ffmpeg:

```bash
sudo apt update
sudo apt install ffmpeg
```

*若想用python环境运行,需安装这些依赖*
* **若使用GPU运行自动生成字幕功能,需要先安装cuda,安装方法见[cuda安装](https://blog.csdn.net/chen565884393/article/details/127905428)**

*若想用python环境运行,需安装以下依赖*

* python,[下载](https://www.python.org/ftp/python/3.9.4/python-3.9.4-amd64.exe)并安装

* python第三方库requests。打开命令行(按win+r,在打开的窗口中输入cmd,回车),运行如下命令安装:

```bash
pip install -r requirements.txt
```
```bash
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
```

* 安装语音转文字的依赖:(依赖于pytorch,若未安装pytorch,会自动安装,但是cpu版本。安装cuda版本的pytorch方法见[pytorch官网](https://pytorch.org/get-started/locally/)。)

```bash
pip install -r requirements_whisper.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
```

## 注意

* 需要关闭本机上的代理,否则会提示类似`check_hostname requires server_hostname`的报错信息。
* 可以下载无权限的课程,只要知道课程链接(中的课程编号)就行。

## 打包
## 打包(仅开发者需要)

如果想要运行时不依赖python环境,可将python程序打包成可执行文件。Release中已打包。

使用如下命令打包:

```bash
Pyinstaller -F main.py
# 若未安装pyinstaller,运行以下命令安装
pip install pyinstaller
# 打包
pyinstaller -F main.py
pyinstaller -F gen_caption.py
```
打包`gen_caption.py`时可能会失败,提示递归过深:

<img src="md/README/image-20240409095211597.png" alt="image-20240409095211597" style="zoom:50%;" />

解决方法参考[这里](https://zhuanlan.zhihu.com/p/661325305),需要修改项目根目录下的`gen_caption.spec`配置文件,在文件开始处加上以下代码:

```python
import sys ; sys.setrecursionlimit(sys.getrecursionlimit() * 5)
```

再使用如下命令打包:

```bash
pyinstaller --clean .\gen_caption.spec
```

打包完成后运行若出现Temp目录下的文件未找到:

![image-20240409095831766](md/README/image-20240409095831766.png)

解决方法参考[这个](https://blog.csdn.net/qq_42324086/article/details/118280341),将项目`hooks`目录下的`hook-whisper.py``hook-zhconv.py`文件复制到pyinstaller的hook目录下(通常在`python根目录\Lib\site-packages\PyInstaller\hooks`),。
96 changes: 96 additions & 0 deletions gen_caption.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
import whisper
import time
from zhconv import convert # 简繁体转换
import sys
import os


def seconds_to_hmsm(seconds):
"""
输入一个秒数,输出为H:M:S:M时间格式
@params:
seconds - Required : 秒 (float)
"""
hours = str(int(seconds // 3600))
minutes = str(int((seconds % 3600) // 60))
seconds = seconds % 60
milliseconds = str(int(int((seconds - int(seconds)) * 1000))) # 毫秒留三位
seconds = str(int(seconds))
# 补0
if len(hours) < 2:
hours = "0" + hours
if len(minutes) < 2:
minutes = "0" + minutes
if len(seconds) < 2:
seconds = "0" + seconds
if len(milliseconds) < 3:
milliseconds = "0" * (3 - len(milliseconds)) + milliseconds
return f"{hours}:{minutes}:{seconds},{milliseconds}"


def main():
# 视频文件路径
video_paths = []
if len(sys.argv) >= 2:
video_paths.append(sys.argv[1])
else:
files = []
for dirpath, dirnames, filenames in os.walk("."):
for filename in filenames:
if filename.endswith(".mp4"):
files.append(os.path.join(dirpath, filename).replace("\\", "/"))
for i, f in enumerate(files):
print(f"[{i}]: ", f)
input_list = eval(
"[" + input("select a video file by input a num(split with ','): ") + "]"
)
for i in input_list:
video_paths.append(files[i])
print("selected video files:", video_paths)
models = []
for model in whisper.available_models():
if ".en" in model:
continue
print(f"[{len(models)}]: ", model)
models.append(model)
model_index = input("select a model by input a num(default 'base'): ")
try:
model_name = models[eval(model_index)]
except:
model_name = "base"
print("selected model:", model_name)

for video_path in video_paths:
audio_path = video_path.replace("mp4", "m4a")
cmd = f'ffmpeg -i "{video_path}" -vn -ar {whisper.audio.SAMPLE_RATE} "{audio_path}"'
os.system(cmd)

model = whisper.load_model(model_name, download_root="whisper_models/")

start = time.time()
result = model.transcribe(audio_path, verbose=False, language="zh")
print("Time cost: ", time.time() - start)

# 写入字幕文件
with open(video_path.replace("mp4", "srt"), "w", encoding="utf-8") as f:
i = 1
for r in result["segments"]:
f.write(str(i) + "\n")
f.write(
seconds_to_hmsm(float(r["start"]))
+ " --> "
+ seconds_to_hmsm(float(r["end"]))
+ "\n"
)
i += 1
f.write(
convert(r["text"], "zh-cn") + "\n"
) # 结果可能是繁体,转为简体zh-cn
f.write("\n")

# 删除音频文件
os.remove(audio_path)


if __name__ == "__main__":
main()
3 changes: 3 additions & 0 deletions hooks/hook-whisper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from PyInstaller.utils.hooks import collect_data_files

datas = collect_data_files("whisper")
3 changes: 3 additions & 0 deletions hooks/hook-zhconv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from PyInstaller.utils.hooks import collect_data_files

datas = collect_data_files("zhconv")
73 changes: 50 additions & 23 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,55 +6,82 @@
import json
import os
import cProfile
headers={
'Origin': 'https://www.yanhekt.cn',

headers = {
"Origin": "https://www.yanhekt.cn",
"xdomain-client": "web_user",
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.26'
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 Edg/107.0.1418.26",
}


# courseID = 31425
def main():
if len(sys.argv) == 1:
courseID = eval(input('Please input course ID: '))
courseID = eval(input("Please input course ID: "))
else:
courseID = sys.argv[1]

course = requests.get(f'https://cbiz.yanhekt.cn/v1/course?id={courseID}&with_professor_badges=true', headers=headers)
req = requests.get(f'https://cbiz.yanhekt.cn/v2/course/session/list?course_id={courseID}', headers=headers)
if course.json()['code'] != '0' and course.json()['code'] != 0:
print(course.json()['code'])
print(course.json()['message'])
raise Exception("Please Check your course ID, note that it should be started with yanhekt.cn/course/***, not yanhekt.cn/session/***")
print(course.json()['data']['name_zh'])
videoList = req.json()['data']
course = requests.get(
f"https://cbiz.yanhekt.cn/v1/course?id={courseID}&with_professor_badges=true",
headers=headers,
)
req = requests.get(
f"https://cbiz.yanhekt.cn/v2/course/session/list?course_id={courseID}",
headers=headers,
)
if course.json()["code"] != "0" and course.json()["code"] != 0:
print(course.json()["code"])
print(course.json()["message"])
raise Exception(
"Please Check your course ID, note that it should be started with yanhekt.cn/course/***, not yanhekt.cn/session/***"
)
print(course.json()["data"]["name_zh"])
videoList = req.json()["data"]
# print(json.dumps(videoList, indent=2))
for i, c in enumerate(videoList):
print(i, ":", c['title'])
print(f"[{i}]: ", c["title"])

index = eval('[' + input('select(split by \',\', such as: 0,2,4):') + ']')
vga = input('video(1) or screen(2)?(input 1 or 2, default video):')
if not os.path.exists('output/'):
os.mkdir('output/')
index = eval("[" + input("select(split by ',', such as: 0,2,4): ") + "]")
vga = input("video(1) or screen(2)?(input 1 or 2, default video):")
if not os.path.exists("output/"):
os.mkdir("output/")
for i in index:
c = videoList[i]
name = course.json()['data']['name_zh'].strip() + '-' + course.json()['data']['professors'][0]['name'] + '-' + c['title']
name = (
course.json()["data"]["name_zh"].strip()
+ "-"
+ course.json()["data"]["professors"][0]["name"]
+ "-"
+ c["title"]
)
print(name)
if vga == "2":
print("Downloading screen...")
m3u8dl.M3u8Download(c['videos'][0]['vga'], 'output/' + course.json()['data']['name_zh'].strip() + '-screen', name)
m3u8dl.M3u8Download(
c["videos"][0]["vga"],
"output/" + course.json()["data"]["name_zh"].strip() + "-screen",
name,
)
else:
print("Downloading video...")
m3u8dl.M3u8Download(c['videos'][0]['main'], 'output/'+ course.json()['data']['name_zh'].strip() + '-video', name)
m3u8dl.M3u8Download(
c["videos"][0]["main"],
"output/" + course.json()["data"]["name_zh"].strip() + "-video",
name,
)


if __name__ == '__main__':
if __name__ == "__main__":
try:
main()
# cProfile.run('main()', 'output/profile.txt')
except Exception as e:
print(e)
print("If the problem is still not solved, you can report an issue in https://github.com/AuYang261/BIT_yanhe_download/issues.")
print(
"If the problem is still not solved, you can report an issue in https://github.com/AuYang261/BIT_yanhe_download/issues."
)
print("Or contact with the author [email protected]. Thanks for your report!")
print("如果问题仍未解决,您可以在https://github.com/AuYang261/BIT_yanhe_download/issues 中报告问题。")
print(
"如果问题仍未解决,您可以在https://github.com/AuYang261/BIT_yanhe_download/issues 中报告问题。"
)
print("或者联系作者[email protected]。感谢您的报告!")
Binary file removed md/README/image-20230926124749421.png
Binary file not shown.
Binary file removed md/README/image-20230926124841432.png
Binary file not shown.
Binary file added md/README/image-20240409095211597.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added md/README/image-20240409095831766.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added md/README/image-20240409103306945.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added md/README/image-20240409103338980.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added md/README/image-20240409105228362.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added md/README/image-20240409131033038.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions requirements-whisper.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
openai-whisper
zhconv

0 comments on commit 034d49a

Please sign in to comment.