Skip to content

siva404e/IOC_AUTOMATION

Repository files navigation

IOC Threat Intelligence Aggregator

License: MIT Python: 3.10+ Status: Production Ready

Automated IOC threat intelligence aggregator that collects, normalizes, deduplicates, and exports indicators of compromise from 29 open-source threat feeds into Sumo Logic SIEM-compatible STIX 2.1 CSV format for enterprise threat detection.

Aggregates from: AlienVault OTX, PhishTank, ThreatFox, Feodo, Spamhaus, URLhaus, MalwareBazaar, Pulsedive, Hybrid Analysis, CAPE Sandbox, OpenPhish, FireHOL, Botvrij, ThreatView, C2IntelFeeds, C2Tracker, DataPlane, AbuseSSL, IPsum, CINScore, PhishingArmy, VXVault, URLAbuse, TweetFeed, VirusShare, MISP CERT-FR, Blocklist.de, EmergingThreats, and more.


📊 SOC Context

A threat intelligence analyst runs this aggregator weekly to pull fresh IOCs from 29 feeds and upload to Sumo Logic so L1 analysts get automatic alerts when those indicators appear in live logs. This closes the gap between threat feed discovery and SIEM detection, enabling rapid response to known malicious infrastructure.


🎯 Key Features

Feature Details
Parallel Fetching 29 threat intelligence sources with 10 concurrent threads for speed
Confidence Scoring Per-source trust scores (range 65–95) based on feed reputation
IOC Deduplication Automatic deduplication keeping highest-confidence threat type
5-Week Rolling Window Master CSV maintains rolling history; auto-archives oldest week
Timestamp Refresh Re-seen IOCs get validity extended so they don't expire in Sumo Logic
STIX 2.1 Export Sumo Logic-compatible 10-column CSV format, no header
Smart Splitting Split output files at 9,999 rows per file for SIEM compatibility
Auto-Archiving Week 6+ IOCs move to permanent Archive_IOC.csv
Cross-Platform Runs on Windows (Task Scheduler) and Linux (Cron)
Multi-IOC Type IP addresses, domains, URLs, MD5/SHA-1/SHA-256 hashes

🗺️ MITRE ATT&CK Coverage

This aggregator enables detection across multiple attack phases:

Threat Category Sources MITRE ID Tactic Use Case
C2 Infrastructure Feodo, C2IntelFeeds, C2Tracker T1071 Command & Control Block outbound comms to known C2 servers
Phishing URLs PhishTank, OpenPhish, PhishingArmy T1566.002 Initial Access Alert on phishing landing page visits
Malware Hashes MalwareBazaar, Hybrid Analysis, CAPE T1204.002 Execution Detect malware execution by file hash
Botnet IPs Spamhaus, FireHOL, DataPlane T1583.005 Resource Development Block traffic from botnet source IPs
Malware Domains URLhaus, Botvrij, ThreatView T1566.002 Initial Access Block DNS requests to malware domains

📦 Threat Intelligence Sources

# Source IOC Types Threat Category Confidence API Key?
1 Feodo IP C2 95 No
2 PhishTank URL / Domain Phishing 90 No
3 C2IntelFeeds IP C2 90 No
4 OpenPhish URL / Domain Phishing 88 No
5 MalwareBazaar SHA-256 Hash Malware 88 No
6 ThreatFox IP / Domain / URL / Hash Malware 85 No
7 URLhaus URL / Domain Malware 85 No
8 AbuseSSL IP / Hash Malware 83 No
9 MISP CERT-FR Hash Malware 83 No
10 Spamhaus IP Botnet 80 No
11 OTX (AlienVault) IP / Domain / URL / Hash Malware 70 Yes
12 Pulsedive IP / Domain / URL Malware 65 Yes
13 Hybrid Analysis Hash Malware 82 Yes
14 CAPE Sandbox Hash Malware 80 Yes
15 Botvrij IP / Domain / URL / Hash Malware 70 No

+ 14 more sources including FireHOL, Blocklist.de, C2Tracker, DataPlane, ThreatView (5 feeds), Bazaar, IPsum, CINScore, PhishingArmy, VXVault, URLAbuse, TweetFeed, VirusShare, Botvrij Hashes


🚀 Quick Start

Prerequisites

  • Python: 3.10 or higher
  • OS: Windows 10+ or Ubuntu 20.04+
  • Internet: Required (fetches from 29 external sources)
  • Disk: ~500 MB for master + archive + weekly CSVs

Installation

# 1. Clone the repository
git clone https://github.com/siva404e/IOC_AUTOMATION.git
cd IOC_AUTOMATION

# 2. Install Python dependencies
pip install -r requirements.txt

# 3. Create config file from template
cp config.ini.example config.ini

# 4. Add your API keys (optional but recommended)
# Edit config.ini and fill in:
#   - otx_api_key (AlienVault OTX)
#   - pulsedive_api_key (Pulsedive)
#   - hybrid_analysis_api_key (Hybrid Analysis)
#   - cape_api_token (CAPE Sandbox)

# 5. Run the aggregator
python final_ioc_weekly_split.py

First Run Output

19:12:40  INFO    Config loaded from : /IOC_Scripts/config.ini
19:12:40  INFO    Output directory   : /home/analyst/IOC_Output
19:12:40  INFO    Starting IOC aggregator — 29 sources configured
19:12:47  INFO    [+] ThreatFox fetched
19:12:50  INFO    [+] PhishTank fetched (56294 rows)
19:12:52  WARNING [-] CAPE skipped — CAPE_API_TOKEN not set
19:13:28  INFO    Total unique IOCs fetched: 438977
19:13:31  INFO    Master updated — 343349 new | 95628 re-seen | 438977 total
19:13:34  INFO    IP        39379 rows  →  4 part file(s)
19:13:34  INFO    Domain   273135 rows  →  28 part file(s)
19:13:34  INFO    URL       20912 rows  →  3 part file(s)
19:13:34  INFO    Hash       9923 rows  →  1 part file(s)

📂 Project Structure

IOC_AUTOMATION/
├── final_ioc_weekly_split.py              ← Main aggregator script (1200+ lines)
├── config.ini                             ← Your API keys (git ignored)
├── config.ini.example                     ← Configuration template
├── requirements.txt                       ← Python dependencies
├── README.md                              ← This file
├── SOP_IOC_ThreatIntelligence_Aggregator.md ← Complete operational guide
├── .gitignore                             ← Prevents API key leaks
└── LICENSE                                ← MIT License

📊 Output Files

The aggregator produces Sumo Logic-compatible STIX 2.1 CSV files:

File Format Updated Purpose
Master_IOC.csv CSV (8 cols) Every run Rolling 5-week IOC history with WeekTag
Archive_IOC.csv CSV (8 cols) Week 6+ Permanent archive of evicted batches
IOC_Weekly_IP_PartN_.csv CSV (10 cols) Every run IPv4 addresses (max 9,999 rows/file)
IOC_Weekly_Domain_PartN_.csv CSV (10 cols) Every run Domain names (max 9,999 rows/file)
IOC_Weekly_URL_PartN_.csv CSV (10 cols) Every run Malware/phishing URLs (max 9,999 rows/file)
IOC_Weekly_Hash_PartN_.csv CSV (10 cols) Every run MD5/SHA-1/SHA-256 hashes (max 9,999 rows/file)

CSV Format (Sumo Logic STIX 2.1)

id,indicator,type,source,validFrom,validUntil,confidence,threatType,actors,killChain
0001,192.0.2.1,ipv4-addr,Feodo,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,95,malicious-activity,,command-and-control
0002,evil.com,domain-name,PhishTank,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,90,malicious-activity,,delivery
0003,https://malware.xyz/pay.html,url,URLhaus,2026-03-18T13:00:00.000Z,2027-03-18T13:00:00.000Z,85,malicious-activity,,initial-access

🔄 5-Week Rolling Window Architecture

Master file automatically maintains a rolling window:

Week Master Contains Archive Action
1 W01
2 W01–W02
3 W01–W03
4 W01–W04
5 W01–W05
6 W02–W06 W01 → Archive
7 W03–W07 W02 → Archive

Each IOC gets a WeekTag (e.g., 2026-W11) so eviction is deterministic. Re-seen IOCs get timestamp refresh to prevent expiration in Sumo Logic.


⏱️ Scheduled Execution

Windows (Task Scheduler)

Program: C:\Python310\python.exe
Arguments: C:\IOC_Scripts\final_ioc_weekly_split.py
Start in: C:\IOC_Scripts
Trigger: Weekly, Monday 07:00

Linux (Cron)

crontab -e
# Add line:
0 7 * * 1 cd /path/to/IOC_Scripts && python3 final_ioc_weekly_split.py >> ~/IOC_Scripts/cron.log 2>&1

📤 Uploading to Sumo Logic

  1. Run the aggregator → generates CSV files in IOC_Output/
  2. Log in to Sumo LogicSecurityThreat Intelligence
  3. Click Add SourceManual UploadCSV
  4. Upload each IOC_Weekly_<Type>_Part<N>.csv file
  5. Sumo Logic auto-detects indicators and creates detection rules

Verify: Row count in Sumo Logic matches your CSV file row count.


📖 Complete Documentation

For detailed setup, troubleshooting, and operational procedures, see:

👉 SOP_IOC_ThreatIntelligence_Aggregator.md

Covers:

  • System overview & architecture
  • Pre-requisites & dependencies
  • Initial setup procedure
  • Running & scheduling (Windows + Linux)
  • Monitoring & log interpretation
  • 29 threat source details
  • Master file rolling window
  • Sumo Logic upload steps
  • Troubleshooting guide

⚠️ Limitations

  • Internet Dependent: Requires connectivity to all 29 source URLs
  • Rate Limits: Free API tiers have rate limits; may skip sources if throttled
  • Batch Only: Weekly batch aggregation, not real-time feed updates
  • Manual Upload: Sumo Logic upload requires manual CSV import (can automate with API)
  • API Keys: OTX, Pulsedive, Hybrid Analysis, CAPE require free registration for full functionality

🔮 Future Improvements

  • GitHub Actions – Scheduled weekly runs with auto-upload
  • AbuseIPDB Integration – Add IP reputation scoring
  • Sumo Logic API – Automated upload via API (no manual CSV import)
  • Slack Alerts – Notify SOC on completion with summary stats
  • Database Backend – PostgreSQL for historical queries
  • Web Dashboard – Real-time feed status & IOC analytics
  • Elasticsearch Export – Alternative to Sumo Logic

🛠️ Troubleshooting

Issue: FileNotFoundError: config.ini not found

Solution: Copy config.ini.example to config.ini and place in same directory as script.

Issue: ModuleNotFoundError: No module named 'requests'

Solution: Run pip install -r requirements.txt

Issue: [-] <Source> failed: Connection error

Solution: Check internet connection. Source failures are non-fatal; other sources continue.

Issue: No IOCs fetched — aborting

Solution: Check internet connection and verify firewall allows HTTPS outbound.

For more troubleshooting, see SOP_IOC_ThreatIntelligence_Aggregator.md#11-troubleshooting


📊 Typical Weekly Statistics

Total IOCs fetched:     438,977
  - New IOCs:           343,349
  - Re-seen IOCs:        95,628
  - Unique IOCs:        438,977 (after dedup)

Breakdown by type:
  - IP addresses:        39,379 (4 part files)
  - Domains:            273,135 (28 part files)
  - URLs:                20,912 (3 part files)
  - Hashes:               9,923 (1 part file)
  
Confidence distribution:
  - 95 (Critical):       12,450 (Feodo, C2Intel)
  - 85-90 (High):       187,234 (PhishTank, ThreatFox, URLhaus)
  - 70-82 (Medium):     239,293 (OTX, Hybrid Analysis, others)

Processing time: ~2-3 minutes
Archive size: ~2.5 GB (cumulative)

🤝 Contributing

Found an issue? Want to add a source? Open an issue or pull request!

Areas for contribution:

  • Additional threat feeds
  • Performance optimizations
  • Enhanced logging
  • Integration examples (Splunk, ELK, etc.)

📝 License

This project is licensed under the MIT License — see LICENSE for details.


👤 Author

Sivamuthu Selvadurai M


🙏 Acknowledgments

  • Threat Feeds: AlienVault OTX, abuse.ch, MalwareBazaar, Spamhaus, PhishTank, and 23+ open-source feeds
  • Libraries: requests, pandas, beautifulsoup4, python-whois
  • SIEM Integration: Sumo Logic STIX 2.1 format compliance

Last Updated: May 2026
Version: 1.0
Status: ✅ Production Ready

About

Automated IOC Threat Intelligence Aggregator - collects IPs, URLs, domains & hashes from multiple threat feeds

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages