Skip to content

Commit 8c5cfce

Browse files
committed
Update README.md
1 parent 1211c77 commit 8c5cfce

File tree

2 files changed

+98
-91
lines changed

2 files changed

+98
-91
lines changed

README-EN.md

Lines changed: 96 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,98 +1,103 @@
11
# Whisper-Speech-to-Text-API 🎙️➡️📜
22

3-
欢迎来到 **[Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)** 项目!本项目为开发者们提供了一个快速、可靠的 API,通过调用 [OpenAI Whisper](https://github.com/openai/whisper) 模型,将多种格式的视频或音频文件高效转换为文本,适合语音识别、字幕生成和文本分析需求。
4-
5-
## 项目地址 📂
6-
7-
* **GitHub 地址**[Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)
8-
9-
## 🌟 特性
10-
11-
* **高性能 API 接口**:基于 FastAPI 实现异步操作,支持后台处理任务并将其存储在 SQLite 数据库中,实现任务可控管理。
12-
* **多格式支持**:支持音频文件、视频文件 (如 MP4) 等多种格式,转换基于 `ffmpeg`,确保高兼容性。
13-
* **CUDA 加速**:为有 GPU 的用户提供 CUDA 加速处理,显著提高转录速度。
14-
* **模型优化**:精细调优后的 Whisper 模型,更高的识别精度,适用于多语言音频识别。(敬请期待🔜)
15-
* **文本分析**:支持文本内容的进一步处理,如摘要生成、内容分析等,满足二次开发需求。
16-
17-
## 🚀 快速部署
18-
19-
1. **Python 环境**:确保 Python 版本 >= 3.8,本项目广泛使用 `asyncio` 库进行异步处理。
20-
2. **安装 FFmpeg**:根据你的系统来执行以下命令来安装 FFmpeg。
21-
```
22-
# Ubuntu or Debian System
23-
sudo apt update && sudo apt install ffmpeg
24-
25-
# Arch Linux System
26-
sudo pacman -S ffmpeg
27-
28-
# MacOS System -> Homebrew
29-
brew install ffmpeg
30-
31-
# Windows System -> Chocolatey(Method one)
32-
choco install ffmpeg
33-
34-
# Windows System -> Scoop(Method two)
35-
scoop install ffmpeg
36-
```
37-
3. **安装 CUDA**:如需 GPU 加速,请下载并安装 [CUDA](https://developer.nvidia.com/cuda-12-4-0-download-archive),仅使用 CPU 的用户可跳过。
38-
4. **安装支持CUDA的PyTorch**: `python3 -m pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`
39-
5. **安装项目依赖**: `pip install -r requirements.txt`
40-
41-
## ⚗️ 技术栈
42-
43-
* **[Whisper](https://github.com/openai/whisper)** - 语音识别模型
44-
* **[ffmpeg](https://ffmpeg.org/)** - 音视频格式转换
45-
* **[torch](https://pytorch.org/)** - 深度学习框架
46-
* **[FastAPI](https://github.com/fastapi/fastapi)** - 高性能 API 框架
47-
* **[aiofile](https://github.com/Tinche/aiofiles)** - 异步文件操作
48-
* **[aiosqlite](https://github.com/omnilib/aiosqlite)** - 异步数据库操作
49-
* **[moviepy](https://github.com/Zulko/moviepy)** - 视频编辑
50-
* **[pydub](https://github.com/jiaaro/pydub)** - 音频编辑
51-
52-
## 💡 项目结构
3+
[Chinese](README.md) | [English](README-EN.md)
534

5+
Welcome to the **[Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)** project! This project provides developers with a fast and reliable API, enabling efficient transcription of various video and audio file formats into text using the [OpenAI Whisper](https://github.com/openai/whisper) model. It’s ideal for speech recognition, subtitle generation, and text analysis needs.
6+
7+
## Project Link 📂
8+
9+
* **GitHub** : [Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)
10+
11+
## 🌟 Features
12+
13+
* **High-Performance API** : Built with FastAPI to support asynchronous operations, including background task management and storage in an SQLite database for controlled task management.
14+
* **Multi-Format Support** : Supports audio and video files (e.g., MP4) and utilizes `ffmpeg` for broad compatibility.
15+
* **CUDA Acceleration** : For users with GPUs, offers CUDA-accelerated processing, significantly speeding up transcription.
16+
* **Model Optimization** : Fine-tuned Whisper model for higher recognition accuracy, supporting multilingual audio transcription. (Coming soon🔜)
17+
* **Text Analysis** : Enables further processing, such as summarization and content analysis, suitable for extended development needs.
18+
* **Automatic Language Detection** : Whisper model supports automatic language detection, using the first 30 seconds of the media file to auto-set the target language.
19+
20+
## 🚀 Quick Deployment
21+
22+
1. **Python Environment** : Ensure Python version >= 3.8. This project widely uses the `asyncio` library for asynchronous processing.
23+
2. **Install FFmpeg** : Install FFmpeg with the following commands based on your system.
24+
25+
```bash
26+
# Ubuntu or Debian System
27+
sudo apt update && sudo apt install ffmpeg
28+
29+
# Arch Linux System
30+
sudo pacman -S ffmpeg
31+
32+
# MacOS System -> Homebrew
33+
brew install ffmpeg
34+
35+
# Windows System -> Chocolatey(Method one)
36+
choco install ffmpeg
37+
38+
# Windows System -> Scoop(Method two)
39+
scoop install ffmpeg
5440
```
41+
42+
3. **Install CUDA** : To enable GPU acceleration, download and install [CUDA](https://developer.nvidia.com/cuda-12-4-0-download-archive); CPU-only users can skip this step.
43+
4. **Install CUDA-Supported PyTorch** : `python3 -m pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`
44+
5. **Install Project Dependencies** : `pip install -r requirements.txt`
45+
46+
## ⚗️ Technology Stack
47+
48+
* **[Whisper](https://github.com/openai/whisper)** - Speech recognition model
49+
* **[ffmpeg](https://ffmpeg.org/)** - Audio and video format conversion
50+
* **[torch](https://pytorch.org/)** - Deep learning framework
51+
* **[FastAPI](https://github.com/fastapi/fastapi)** - High-performance API framework
52+
* **[aiofile](https://github.com/Tinche/aiofiles)** - Asynchronous file operations
53+
* **[aiosqlite](https://github.com/omnilib/aiosqlite)** - Asynchronous database operations
54+
* **[moviepy](https://github.com/Zulko/moviepy)** - Video editing
55+
* **[pydub](https://github.com/jiaaro/pydub)** - Audio editing
56+
57+
## 💡 Project Structure
58+
59+
```text
5560
./📂 Whisper-Speech-to-Text-API/
56-
├── 📂 app/                       # 主应用目录
57-
  ├── 📂 api/                   # API 路由
58-
  │   ├── 📄 health_check.py     # 健康检查接口
59-
  │   └── 📄 transcribe.py       # 转录功能接口
60-
  ├── 📂 database/               # 数据库模块
61-
  │   ├── 📄 database.py         # 数据库连接与初始化
62-
  │   └── 📄 models.py           # 数据库模型定义
63-
  ├── 📂 models/                 # 数据模型
64-
  │   └── 📄 APIResponseModel.py # API 响应模型
65-
  ├── 📂 services/               # 服务层逻辑
66-
  │   ├── 📄 whisper_service.py # Whisper 模型处理逻辑
67-
  │   └── 📄 whisper_service_instance.py # Whisper 服务单例
68-
  ├── 📂 utils/                 # 实用工具
69-
  │   ├── 📄 file_utils.py       # 文件处理工具
70-
  │   └── 📄 logging_utils.py   # 日志处理工具
71-
  └── 📄 main.py                 # 应用启动入口
72-
├── 📂 config/                     # 配置文件
73-
  └── 📄 settings.py             # 应用设置
74-
├── 📂 scripts/                   # 脚本文件
75-
  ├── 📄 run_server.sh           # 服务器启动脚本
76-
  └── 📄 setup.sh               # 环境初始化脚本
77-
├── 📁 log_files/                 # 📒 默认日志文件夹
78-
├── 📁 temp_files/                 # 📂 默认临时文件夹
79-
├── 📄 requirements.txt           # 依赖库列表
80-
├── 📄 start.py                   # 启动脚本
81-
└── 📄 tasks.db                   # 📊 默认数据库文件
61+
├── 📂 app/ # Main app directory
62+
├── 📂 api/ # API routes
63+
├── 📄 health_check.py # Health check endpoint
64+
└── 📄 transcribe.py # Transcription endpoint
65+
├── 📂 database/ # Database module
66+
├── 📄 database.py # Database connection and initialization
67+
└── 📄 models.py # Database models
68+
├── 📂 models/ # Data models
69+
└── 📄 APIResponseModel.py # API response model
70+
├── 📂 services/ # Service layer logic
71+
├── 📄 whisper_service.py # Whisper model handling logic
72+
└── 📄 whisper_service_instance.py # Whisper service singleton
73+
├── 📂 utils/ # Utilities
74+
├── 📄 file_utils.py # File handling utilities
75+
└── 📄 logging_utils.py # Logging utilities
76+
└── 📄 main.py # Application entry point
77+
├── 📂 config/ # Configuration files
78+
└── 📄 settings.py # Application settings
79+
├── 📂 scripts/ # Scripts
80+
├── 📄 run_server.sh # Server start script
81+
└── 📄 setup.sh # Environment setup script
82+
├── 📁 log_files/ # 📒 Default log folder
83+
├── 📁 temp_files/ # 📂 Default temp folder
84+
├── 📄 requirements.txt # Dependency list
85+
├── 📄 start.py # Start script
86+
└── 📄 tasks.db # 📊 Default database file
8287
```
8388

84-
## 🛠️ 使用指南
89+
## 🛠️ User Guide
8590

86-
- 切换到项目目录,使用下面的命令启动API服务:
87-
- `python3 start.py`
88-
- 随后你可以访问`http://localhost`来查看接口文档,并且在网页上测试。
91+
* Switch to the project directory, then start the API service with:
92+
* `python3 start.py`
93+
* You can then visit `http://localhost` to view the API documentation and test the endpoints on the web.
8994

90-
### API 使用示例
95+
### API Usage Example
9196

92-
- 添加一个识别任务
97+
* Add a transcription task
9398

9499
```curl
95-
curl -X 'POST' \
100+
curl -X 'POST' \
96101
'http://127.0.0.1/transcribe/task/create' \
97102
-H 'accept: application/json' \
98103
-H 'Content-Type: multipart/form-data' \
@@ -112,7 +117,7 @@
112117
-F 'initial_prompt='
113118
```
114119

115-
- 响应
120+
- Response
116121

117122
```json
118123
{
@@ -157,15 +162,15 @@
157162
}
158163
```
159164

160-
- 查看任务结果
165+
- View task results
161166

162167
```curl
163168
curl -X 'GET' \
164169
'http://127.0.0.1/transcribe/tasks/result?task_id=1' \
165170
-H 'accept: application/json'
166171
```
167172

168-
- 响应
173+
- Response
169174

170175
```json
171176
{
@@ -830,12 +835,12 @@ curl -X 'GET' \
830835
}
831836
```
832837

833-
**在请求体中包含音频或视频文件,API 将返回转录的文本结果。**
838+
**Include an audio or video file in the request, and the API will return the transcribed text result.**
834839

835-
### 文本分析与扩展功能
840+
### Text Analysis and Extended Functionality
836841

837-
**转录完成的文本可以直接用于进一步处理,如内容摘要、语义分析等,适合二次分析或文本挖掘需求。**
842+
**The transcribed text can be used for further processing, such as content summarization and semantic analysis, suitable for secondary analysis or text mining needs.**
838843

839-
## 贡献指南
844+
## Contribution Guide
840845

841-
**非常欢迎大家提出意见和建议!可以通过 GitHub issue 与我们联系,如果希望贡献代码,请 fork 项目并提交 pull request。我们期待你的加入!💪**
846+
**Feedback and suggestions are highly welcome! Reach out through GitHub issues, and if you’d like to contribute, please fork the project and submit a pull request. We look forward to your participation! 💪**

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Whisper-Speech-to-Text-API 🎙️➡️📜
22

3+
[Chinese](README.md) | [English](README-EN.md)
4+
35
欢迎来到 **[Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)** 项目!本项目为开发者们提供了一个快速、可靠的 API,通过调用 [OpenAI Whisper](https://github.com/openai/whisper) 模型,将多种格式的视频或音频文件高效转换为文本,适合语音识别、字幕生成和文本分析需求。
46

57
## 项目地址 📂

0 commit comments

Comments
 (0)