|
1 | 1 | # Whisper-Speech-to-Text-API 🎙️➡️📜 |
2 | 2 |
|
3 | | -欢迎来到 **[Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)** 项目!本项目为开发者们提供了一个快速、可靠的 API,通过调用 [OpenAI Whisper](https://github.com/openai/whisper) 模型,将多种格式的视频或音频文件高效转换为文本,适合语音识别、字幕生成和文本分析需求。 |
4 | | - |
5 | | -## 项目地址 📂 |
6 | | - |
7 | | -* **GitHub 地址**: [Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API) |
8 | | - |
9 | | -## 🌟 特性 |
10 | | - |
11 | | -* **高性能 API 接口**:基于 FastAPI 实现异步操作,支持后台处理任务并将其存储在 SQLite 数据库中,实现任务可控管理。 |
12 | | -* **多格式支持**:支持音频文件、视频文件 (如 MP4) 等多种格式,转换基于 `ffmpeg`,确保高兼容性。 |
13 | | -* **CUDA 加速**:为有 GPU 的用户提供 CUDA 加速处理,显著提高转录速度。 |
14 | | -* **模型优化**:精细调优后的 Whisper 模型,更高的识别精度,适用于多语言音频识别。(敬请期待🔜) |
15 | | -* **文本分析**:支持文本内容的进一步处理,如摘要生成、内容分析等,满足二次开发需求。 |
16 | | - |
17 | | -## 🚀 快速部署 |
18 | | - |
19 | | -1. **Python 环境**:确保 Python 版本 >= 3.8,本项目广泛使用 `asyncio` 库进行异步处理。 |
20 | | -2. **安装 FFmpeg**:根据你的系统来执行以下命令来安装 FFmpeg。 |
21 | | - ``` |
22 | | - # Ubuntu or Debian System |
23 | | - sudo apt update && sudo apt install ffmpeg |
24 | | - |
25 | | - # Arch Linux System |
26 | | - sudo pacman -S ffmpeg |
27 | | - |
28 | | - # MacOS System -> Homebrew |
29 | | - brew install ffmpeg |
30 | | - |
31 | | - # Windows System -> Chocolatey(Method one) |
32 | | - choco install ffmpeg |
33 | | - |
34 | | - # Windows System -> Scoop(Method two) |
35 | | - scoop install ffmpeg |
36 | | - ``` |
37 | | -3. **安装 CUDA**:如需 GPU 加速,请下载并安装 [CUDA](https://developer.nvidia.com/cuda-12-4-0-download-archive),仅使用 CPU 的用户可跳过。 |
38 | | -4. **安装支持CUDA的PyTorch**: `python3 -m pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118` |
39 | | -5. **安装项目依赖**: `pip install -r requirements.txt` |
40 | | - |
41 | | -## ⚗️ 技术栈 |
42 | | - |
43 | | -* **[Whisper](https://github.com/openai/whisper)** - 语音识别模型 |
44 | | -* **[ffmpeg](https://ffmpeg.org/)** - 音视频格式转换 |
45 | | -* **[torch](https://pytorch.org/)** - 深度学习框架 |
46 | | -* **[FastAPI](https://github.com/fastapi/fastapi)** - 高性能 API 框架 |
47 | | -* **[aiofile](https://github.com/Tinche/aiofiles)** - 异步文件操作 |
48 | | -* **[aiosqlite](https://github.com/omnilib/aiosqlite)** - 异步数据库操作 |
49 | | -* **[moviepy](https://github.com/Zulko/moviepy)** - 视频编辑 |
50 | | -* **[pydub](https://github.com/jiaaro/pydub)** - 音频编辑 |
51 | | - |
52 | | -## 💡 项目结构 |
| 3 | +[Chinese](README.md) | [English](README-EN.md) |
53 | 4 |
|
| 5 | +Welcome to the **[Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API)** project! This project provides developers with a fast and reliable API, enabling efficient transcription of various video and audio file formats into text using the [OpenAI Whisper](https://github.com/openai/whisper) model. It’s ideal for speech recognition, subtitle generation, and text analysis needs. |
| 6 | + |
| 7 | +## Project Link 📂 |
| 8 | + |
| 9 | +* **GitHub** : [Whisper-Speech-to-Text-API](https://github.com/Evil0ctal/Whisper-Speech-to-Text-API) |
| 10 | + |
| 11 | +## 🌟 Features |
| 12 | + |
| 13 | +* **High-Performance API** : Built with FastAPI to support asynchronous operations, including background task management and storage in an SQLite database for controlled task management. |
| 14 | +* **Multi-Format Support** : Supports audio and video files (e.g., MP4) and utilizes `ffmpeg` for broad compatibility. |
| 15 | +* **CUDA Acceleration** : For users with GPUs, offers CUDA-accelerated processing, significantly speeding up transcription. |
| 16 | +* **Model Optimization** : Fine-tuned Whisper model for higher recognition accuracy, supporting multilingual audio transcription. (Coming soon🔜) |
| 17 | +* **Text Analysis** : Enables further processing, such as summarization and content analysis, suitable for extended development needs. |
| 18 | +* **Automatic Language Detection** : Whisper model supports automatic language detection, using the first 30 seconds of the media file to auto-set the target language. |
| 19 | + |
| 20 | +## 🚀 Quick Deployment |
| 21 | + |
| 22 | +1. **Python Environment** : Ensure Python version >= 3.8. This project widely uses the `asyncio` library for asynchronous processing. |
| 23 | +2. **Install FFmpeg** : Install FFmpeg with the following commands based on your system. |
| 24 | + |
| 25 | +```bash |
| 26 | +# Ubuntu or Debian System |
| 27 | +sudo apt update && sudo apt install ffmpeg |
| 28 | + |
| 29 | +# Arch Linux System |
| 30 | +sudo pacman -S ffmpeg |
| 31 | + |
| 32 | +# MacOS System -> Homebrew |
| 33 | +brew install ffmpeg |
| 34 | + |
| 35 | +# Windows System -> Chocolatey(Method one) |
| 36 | +choco install ffmpeg |
| 37 | + |
| 38 | +# Windows System -> Scoop(Method two) |
| 39 | +scoop install ffmpeg |
54 | 40 | ``` |
| 41 | + |
| 42 | +3. **Install CUDA** : To enable GPU acceleration, download and install [CUDA](https://developer.nvidia.com/cuda-12-4-0-download-archive); CPU-only users can skip this step. |
| 43 | +4. **Install CUDA-Supported PyTorch** : `python3 -m pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118` |
| 44 | +5. **Install Project Dependencies** : `pip install -r requirements.txt` |
| 45 | + |
| 46 | +## ⚗️ Technology Stack |
| 47 | + |
| 48 | +* **[Whisper](https://github.com/openai/whisper)** - Speech recognition model |
| 49 | +* **[ffmpeg](https://ffmpeg.org/)** - Audio and video format conversion |
| 50 | +* **[torch](https://pytorch.org/)** - Deep learning framework |
| 51 | +* **[FastAPI](https://github.com/fastapi/fastapi)** - High-performance API framework |
| 52 | +* **[aiofile](https://github.com/Tinche/aiofiles)** - Asynchronous file operations |
| 53 | +* **[aiosqlite](https://github.com/omnilib/aiosqlite)** - Asynchronous database operations |
| 54 | +* **[moviepy](https://github.com/Zulko/moviepy)** - Video editing |
| 55 | +* **[pydub](https://github.com/jiaaro/pydub)** - Audio editing |
| 56 | + |
| 57 | +## 💡 Project Structure |
| 58 | + |
| 59 | +```text |
55 | 60 | ./📂 Whisper-Speech-to-Text-API/ |
56 | | -├── 📂 app/ # 主应用目录 |
57 | | -│ ├── 📂 api/ # API 路由 |
58 | | -│ │ ├── 📄 health_check.py # 健康检查接口 |
59 | | -│ │ └── 📄 transcribe.py # 转录功能接口 |
60 | | -│ ├── 📂 database/ # 数据库模块 |
61 | | -│ │ ├── 📄 database.py # 数据库连接与初始化 |
62 | | -│ │ └── 📄 models.py # 数据库模型定义 |
63 | | -│ ├── 📂 models/ # 数据模型 |
64 | | -│ │ └── 📄 APIResponseModel.py # API 响应模型 |
65 | | -│ ├── 📂 services/ # 服务层逻辑 |
66 | | -│ │ ├── 📄 whisper_service.py # Whisper 模型处理逻辑 |
67 | | -│ │ └── 📄 whisper_service_instance.py # Whisper 服务单例 |
68 | | -│ ├── 📂 utils/ # 实用工具 |
69 | | -│ │ ├── 📄 file_utils.py # 文件处理工具 |
70 | | -│ │ └── 📄 logging_utils.py # 日志处理工具 |
71 | | -│ └── 📄 main.py # 应用启动入口 |
72 | | -├── 📂 config/ # 配置文件 |
73 | | -│ └── 📄 settings.py # 应用设置 |
74 | | -├── 📂 scripts/ # 脚本文件 |
75 | | -│ ├── 📄 run_server.sh # 服务器启动脚本 |
76 | | -│ └── 📄 setup.sh # 环境初始化脚本 |
77 | | -├── 📁 log_files/ # 📒 默认日志文件夹 |
78 | | -├── 📁 temp_files/ # 📂 默认临时文件夹 |
79 | | -├── 📄 requirements.txt # 依赖库列表 |
80 | | -├── 📄 start.py # 启动脚本 |
81 | | -└── 📄 tasks.db # 📊 默认数据库文件 |
| 61 | +├── 📂 app/ # Main app directory |
| 62 | +│ ├── 📂 api/ # API routes |
| 63 | +│ │ ├── 📄 health_check.py # Health check endpoint |
| 64 | +│ │ └── 📄 transcribe.py # Transcription endpoint |
| 65 | +│ ├── 📂 database/ # Database module |
| 66 | +│ │ ├── 📄 database.py # Database connection and initialization |
| 67 | +│ │ └── 📄 models.py # Database models |
| 68 | +│ ├── 📂 models/ # Data models |
| 69 | +│ │ └── 📄 APIResponseModel.py # API response model |
| 70 | +│ ├── 📂 services/ # Service layer logic |
| 71 | +│ │ ├── 📄 whisper_service.py # Whisper model handling logic |
| 72 | +│ │ └── 📄 whisper_service_instance.py # Whisper service singleton |
| 73 | +│ ├── 📂 utils/ # Utilities |
| 74 | +│ │ ├── 📄 file_utils.py # File handling utilities |
| 75 | +│ │ └── 📄 logging_utils.py # Logging utilities |
| 76 | +│ └── 📄 main.py # Application entry point |
| 77 | +├── 📂 config/ # Configuration files |
| 78 | +│ └── 📄 settings.py # Application settings |
| 79 | +├── 📂 scripts/ # Scripts |
| 80 | +│ ├── 📄 run_server.sh # Server start script |
| 81 | +│ └── 📄 setup.sh # Environment setup script |
| 82 | +├── 📁 log_files/ # 📒 Default log folder |
| 83 | +├── 📁 temp_files/ # 📂 Default temp folder |
| 84 | +├── 📄 requirements.txt # Dependency list |
| 85 | +├── 📄 start.py # Start script |
| 86 | +└── 📄 tasks.db # 📊 Default database file |
82 | 87 | ``` |
83 | 88 |
|
84 | | -## 🛠️ 使用指南 |
| 89 | +## 🛠️ User Guide |
85 | 90 |
|
86 | | -- 切换到项目目录,使用下面的命令启动API服务: |
87 | | -- `python3 start.py` |
88 | | -- 随后你可以访问`http://localhost`来查看接口文档,并且在网页上测试。 |
| 91 | +* Switch to the project directory, then start the API service with: |
| 92 | +* `python3 start.py` |
| 93 | +* You can then visit `http://localhost` to view the API documentation and test the endpoints on the web. |
89 | 94 |
|
90 | | -### API 使用示例 |
| 95 | +### API Usage Example |
91 | 96 |
|
92 | | -- 添加一个识别任务 |
| 97 | +* Add a transcription task |
93 | 98 |
|
94 | 99 | ```curl |
95 | | -curl -X 'POST' \ |
| 100 | +curl -X 'POST' \ |
96 | 101 | 'http://127.0.0.1/transcribe/task/create' \ |
97 | 102 | -H 'accept: application/json' \ |
98 | 103 | -H 'Content-Type: multipart/form-data' \ |
|
112 | 117 | -F 'initial_prompt=' |
113 | 118 | ``` |
114 | 119 |
|
115 | | -- 响应 |
| 120 | +- Response |
116 | 121 |
|
117 | 122 | ```json |
118 | 123 | { |
|
157 | 162 | } |
158 | 163 | ``` |
159 | 164 |
|
160 | | -- 查看任务结果 |
| 165 | +- View task results |
161 | 166 |
|
162 | 167 | ```curl |
163 | 168 | curl -X 'GET' \ |
164 | 169 | 'http://127.0.0.1/transcribe/tasks/result?task_id=1' \ |
165 | 170 | -H 'accept: application/json' |
166 | 171 | ``` |
167 | 172 |
|
168 | | -- 响应 |
| 173 | +- Response |
169 | 174 |
|
170 | 175 | ```json |
171 | 176 | { |
@@ -830,12 +835,12 @@ curl -X 'GET' \ |
830 | 835 | } |
831 | 836 | ``` |
832 | 837 |
|
833 | | -**在请求体中包含音频或视频文件,API 将返回转录的文本结果。** |
| 838 | +**Include an audio or video file in the request, and the API will return the transcribed text result.** |
834 | 839 |
|
835 | | -### 文本分析与扩展功能 |
| 840 | +### Text Analysis and Extended Functionality |
836 | 841 |
|
837 | | -**转录完成的文本可以直接用于进一步处理,如内容摘要、语义分析等,适合二次分析或文本挖掘需求。** |
| 842 | +**The transcribed text can be used for further processing, such as content summarization and semantic analysis, suitable for secondary analysis or text mining needs.** |
838 | 843 |
|
839 | | -## 贡献指南 |
| 844 | +## Contribution Guide |
840 | 845 |
|
841 | | -**非常欢迎大家提出意见和建议!可以通过 GitHub issue 与我们联系,如果希望贡献代码,请 fork 项目并提交 pull request。我们期待你的加入!💪** |
| 846 | +**Feedback and suggestions are highly welcome! Reach out through GitHub issues, and if you’d like to contribute, please fork the project and submit a pull request. We look forward to your participation! 💪** |
0 commit comments