A high-performance, scalable media deduplication server built with C++ and modern libraries. This server efficiently identifies and manages duplicate media files (images, videos, audio) using advanced hashing algorithms and database management.
- Multi-format Support: Handles images (JPG, PNG, GIF, etc.), videos (MP4, AVI, MOV, etc.), and audio files (MP3, WAV, FLAC, etc.)
- Intelligent Deduplication: Uses SHA-256 hashing for accurate duplicate detection
- Database Management: SQLite-based storage with Poco Data for efficient data handling
- Unified Configuration System: Observable YAML-based configuration with live updates
- RESTful HTTP API: Static OpenAPI 3.0 specification with 13 endpoints
- Thread Pool Management: Configurable thread pools with per-type resource allocation
- Scheduler Service: Advanced job scheduling with jitter, backoff, and drift management
- File Management: Automated file scanning and media location monitoring
- Instance Management: Prevents multiple server instances from running simultaneously
- Console Interface: Interactive command-line interface with graceful shutdown
- Performance Optimized: Asynchronous processing, caching, and connection pooling
- Extensible Architecture: Modular design for easy feature additions
The server is built with a modular, layered architecture:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP API Layer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Business Logic Layer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Media Processing Layer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Database Layer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Configuration Layer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
- C++17 compatible compiler (GCC 7+, Clang 5+, MSVC 2017+)
- CMake 3.16+
- Poco Libraries (Foundation, Data, SQLite, Util, Net)
- SQLite3 development libraries
- yaml-cpp (YAML configuration parsing)
- ImageMagick++ (Image processing and RAW file support)
- LibRaw (RAW image file validation)
- libtiff (TIFF file validation)
- pkg-config (for dependency detection)
- Python 3 (for ONNX model downloading)
- huggingface_hub Python package (for automatic model downloads)
- curl or wget (for model downloads)
Prerequisites:
- Homebrew package manager
- Xcode Command Line Tools:
xcode-select --install
Install all dependencies:
# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install all required dependencies
brew install cmake poco sqlite3 pkg-config imagemagick libraw libtiff yaml-cpp
# Install Python dependencies for ONNX models
pip3 install huggingface_hubNote for Apple Silicon (M1/M2/M3):
- Homebrew installs to
/opt/homebrewby default - The build system will automatically detect dependencies in
/opt/homebrew - If you have an Intel Mac, dependencies will be in
/usr/local(also auto-detected)
sudo apt update
sudo apt install build-essential cmake libpoco-dev libsqlite3-dev \
pkg-config libyaml-cpp-dev libmagick++-dev libraw-dev libtiff-dev \
python3-pip
pip3 install huggingface_hubsudo yum install gcc-c++ cmake3 poco-devel sqlite-devel pkgconfig \
yaml-cpp-devel ImageMagick-c++-devel libraw-devel libtiff-devel \
python3-pip
pip3 install huggingface_hub# Clone the repository
git clone <repository-url>
cd dedup_server
# Make build script executable
chmod +x build.sh
# Build the project (recommended)
./build.sh
# Or build with tests
./build.sh --clean --test
# Or manual CMake build
mkdir build && cd build
cmake ..
make -j$(sysctl -n hw.ncpu) # macOS: use all CPU coresBuild Options:
./build.sh- Standard release build./build.sh --clean- Clean build (removes old artifacts)./build.sh --test- Build and run tests./build.sh --clean --test- Clean build and run all tests./build.sh --debug- Debug build with symbols./build.sh --help- Show all options
Build Output:
- Executable:
build/bin/media_dedup_server - Test executable:
build/bin/all_unit_tests - Configuration:
config/config.yaml
๐ก For development with IDE: See Build Tasks Documentation for VS Code/Cursor build tasks and keyboard shortcuts (Cmd+Shift+B).
sudo make installThe server includes advanced image processing capabilities using ONNX models for deep learning-based deduplication. These models are automatically downloaded during the build process if ONNX Runtime is available.
When you build the project, CMake will automatically:
- Download CLIP ViT-B/32 model (~350MB) for quality image processing
- Download CLIP RN50 model (~600MB) for alternative quality processing
- Place models in the
models/directory - Configure the server to use these models
If you need to manage models manually:
# Download all models
make download_models
# Download specific models
make download_clip_vitb32
make download_clip_rn50
# Use custom model URLs
python3 scripts/fetch_clip_from_hub.py
./scripts/fetch_clip_rn50_onnx.sh <MODEL_URL>- CLIP ViT-B/32:
models/clip-image-vitb32.onnx(~350MB) - CLIP RN50:
models/clip-RN50.onnx(~600MB)
The server will work without these models, but the Quality image processing pipeline will be disabled.
See config/CONFIGURATION_REFERENCE.md for the canonical list of configuration keys, defaults, and live effects. A minimal sample is provided in config/config.yaml and includes TPM defaults (tpm.pool.max, tpm.killTimeoutMs) and examples for per-type shares (tpm.types.<name>.share).
# Start server with default configuration
./build/bin/media_dedup_server
# Or use the convenience script (builds if needed)
./start
# Build and start in one command
./start --buildThe start script provides an easy way to run the server:
# Start with default configuration
./start
# Build first, then start
./start --build
# Start in debug mode
./start --debug
# Start on a specific port
./start --port 9090
# Start on localhost only
./start --host localhost
# Use a custom config file
./start --config /path/to/config.yaml# Start with default configuration
./build/bin/media_dedup_server
# Start with custom config file
./build/bin/media_dedup_server --config=config/config.yaml
# Start with custom database path
./build/bin/media_dedup_server --database=data/dedup_server.db
# Start on specific host and port
./build/bin/media_dedup_server --host=localhost --port=8080
# Show help
./build/bin/media_dedup_server --help--config=<file>: Specify configuration file path (default:config/config.yaml)--database=<path>: Specify database file path (default:data/dedup_server.db)--host=<address>: Server host address (default:0.0.0.0)--port=<number>: Server port number (default:8080)--help: Show help information
Once the server is running, you can access:
- Web API:
http://localhost:8080/api/v1/config - OpenAPI Spec:
http://localhost:8080/api/openapi.json - Swagger UI:
http://localhost:8080/(if available)
- Press
Ctrl+Cin the terminal - Or type
exitin the console interface - Or send
SIGTERMsignal:kill <pid>
The web server provides a comprehensive RESTful API with 13 endpoints for configuration management, user settings, media locations, and system monitoring. Full documentation is available in WEB_SERVER_README.md.
GET /api/v1/config- Get all configuration propertiesGET /api/v1/config/{key}- Get specific configuration propertyPUT /api/v1/config/{key}- Update configuration propertyPOST /api/v1/config/reload- Reload configuration from fileGET /api/v1/config/status- Get system status
GET /api/v1/user-settings- Get all user settingsGET /api/v1/user-settings/{key}- Get specific user settingPUT /api/v1/user-settings/{key}- Create/update user settingDELETE /api/v1/user-settings/{key}- Delete user settingPOST /api/v1/media-locations/register- Register media locationPOST /api/v1/media-locations/deregister- Deregister media location
GET /api/v1/tpm/status- Get Thread Pool Manager statusGET /api/openapi.json- OpenAPI 3.0 specification
The server uses SQLite with the following main tables:
- user_settings: Key-value user settings (
key TEXT PRIMARY KEY, value TEXT NOT NULL) - scanned_files: File metadata and processing status with fields for different processing modes
- Additional tables: Created dynamically as needed for media processing
dedup_server/
โโโ include/ # Header files
โ โโโ config/ # Configuration management
โ โโโ core/ # Core server components
โ โโโ database/ # Database management
โ โโโ filesmanager/ # File scanning and management
โ โโโ orchestration/ # Thread pool and scheduler
โโโ src/ # Source files
โ โโโ config/ # Configuration implementation
โ โโโ core/ # Core server implementation
โ โโโ database/ # Database implementation
โ โโโ filesmanager/ # File scanning implementation
โ โโโ orchestration/ # Thread pool and scheduler implementation
โ โโโ webserver/ # Web API handlers and static files
โ โโโ static/ # Static HTML/CSS/JS and OpenAPI spec
โโโ tests/ # Test files
โโโ config/ # Configuration files
โโโ scripts/ # Utility scripts
โโโ CMakeLists.txt # Build configuration
โโโ README.md # This file
# Debug build
mkdir build-debug && cd build-debug
cmake -DCMAKE_BUILD_TYPE=Debug ..
make -j$(sysctl -n hw.ncpu) # macOS: use all CPU cores
# Linux: use make -j$(nproc)
# Or use the build script
./build.sh --debug# Run all tests (after building)
cd build
./bin/all_unit_tests
# Or use the build script with test flag
./build.sh --test
# Or clean build and test
./build.sh --clean --testTest Output:
- Test results:
build/test_results.xml - Test executable:
build/bin/all_unit_tests
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Issue: "Poco not found" or "yaml-cpp not found"
# Ensure Homebrew is properly installed and in PATH
eval "$(/opt/homebrew/bin/brew shellenv)" # Apple Silicon
# or
eval "$(/usr/local/bin/brew shellenv)" # Intel
# Verify dependencies are installed
brew list | grep -E "(poco|yaml-cpp|imagemagick|libraw|libtiff)"Issue: "ImageMagick not found"
# Install ImageMagick via Homebrew
brew install imagemagick
# Verify installation
pkg-config --modversion Magick++Issue: Build fails with "libtiff not found"
# Install libtiff
brew install libtiff
# Verify installation
pkg-config --modversion libtiff-4Issue: "Permission denied" when running build.sh
# Make script executable
chmod +x build.sh
chmod +x startIssue: Python dependencies not found
# Use pip3 explicitly
pip3 install --user huggingface_hub
# Or install globally (requires sudo)
sudo pip3 install huggingface_hub- Issues: Use GitHub Issues
- Documentation:
docs/CONFIGURATION_REFERENCE.md- Complete configuration referencedocs/WEB_SERVER_README.md- Web API documentationdocs/START_SCRIPT_README.md- Start script usagedocs/BUILD_TASKS.md- IDE build configuration
- Web-based management interface
- Advanced media analysis (content-based deduplication)
- Cloud storage integration
- Machine learning-based duplicate detection
- Real-time file monitoring
- Multi-node clustering support
- Plugin system for custom processors
- File Processing: Configurable scanning with scheduler-based processing
- Thread Management: Auto-detected or configurable thread pools with per-type resource allocation
- Database Operations: Connection pooling with configurable timeouts and backoff
- Memory Usage: Efficient memory management with configurable limits
- Storage Overhead: Minimal - only metadata and processing status stored
- Configuration: Live updates without server restart for most settings
- Scheduling: Advanced job scheduling with jitter, backoff, and drift management
Built with โค๏ธ using C++ and modern libraries