-
Notifications
You must be signed in to change notification settings - Fork 19
feat(cleanup): Smart Cleanup and Disk Space Optimizer (#125) #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(cleanup): Smart Cleanup and Disk Space Optimizer (#125) #298
Conversation
- Add CleanupScanner for identifying cleanup opportunities - Add DiskCleaner for executing cleanup operations - Add CleanupManager for quarantine/undo functionality - Add CleanupScheduler for automatic scheduling (systemd/cron) - Add schedule subcommand to cortex cleanup CLI - Add comprehensive unit tests (88% coverage) - Add CLEANUP_GUIDE.md documentation Closes cortexlinux#125
WalkthroughAdds a new disk-cleanup subsystem: scanner (finds reclaimable items), cleaner (performs removals/compression with dry-run), manager (quarantine/restore), scheduler (systemd/cron), CLI integration, documentation, and unit tests. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as Cortex CLI
participant Scanner as CleanupScanner
participant Cleaner as DiskCleaner
participant Manager as CleanupManager
participant Scheduler as CleanupScheduler
participant System as OS/Filesystem/Services
Note over User,CLI: Scan
User->>CLI: cortex cleanup scan
CLI->>Scanner: scan_all()
Scanner->>System: Inspect apt cache, /var/log, /tmp, run simulate commands
System-->>Scanner: file lists, simulated outputs
Scanner-->>CLI: ScanResult[]
Note over User,CLI: Run
User->>CLI: cortex cleanup run (--safe/--dry-run)
CLI->>Cleaner: run_cleanup(scan_results, safe)
Cleaner->>System: run apt, gzip, rm (or simulate)
System-->>Cleaner: stdout/exit codes, file ops
Cleaner->>Manager: quarantine_file(...) (when needed)
Manager->>System: move files to ~/.cortex/trash, update metadata
Manager-->>Cleaner: item_id / status
Cleaner-->>CLI: {category: freed_bytes}
Note over User,CLI: Undo
User->>CLI: cortex cleanup undo
CLI->>Manager: list_items()
Manager->>System: read metadata.json, list quarantine files
Manager-->>CLI: QuarantineItem[]
CLI->>Manager: restore_item(item_id)
Manager->>System: move file back to original path
System-->>Manager: success/failure
Manager-->>CLI: restore status
Note over User,CLI: Schedule
User->>CLI: cortex cleanup schedule --interval weekly
CLI->>Scheduler: enable_schedule(WEEKLY, safe_mode)
Scheduler->>System: try systemd timer setup
alt systemd available
System-->>Scheduler: timer enabled
else
Scheduler->>System: install cron entry
System-->>Scheduler: cron added
end
Scheduler->>System: write config file
Scheduler-->>CLI: status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Nitpick comments (18)
docs/CLEANUP_GUIDE.md (1)
127-150: Add a short “platform assumptions” note (apt + system directories) to reduce surprise on non-Debian systems.
Right now the guide reads distro-agnostic, but it hardcodes apt paths and/var/logbehaviors.Also applies to: 185-190
cortex/cleanup/scanner.py (2)
28-45: MakeCleanupScannerconfigurable + add missing type hints on public APIs.
This improves testability and portability (non-apt systems, custom temp dirs).-class CleanupScanner: +class CleanupScanner: @@ - def __init__(self): - self.apt_cache_dir = Path("/var/cache/apt/archives") - self.log_dir = Path("/var/log") - self.temp_dirs = [Path("/tmp"), Path.home() / ".cache"] + def __init__( + self, + apt_cache_dir: Path | None = None, + log_dir: Path | None = None, + temp_dirs: list[Path] | None = None, + ) -> None: + self.apt_cache_dir = apt_cache_dir or Path("/var/cache/apt/archives") + self.log_dir = log_dir or Path("/var/log") + self.temp_dirs = temp_dirs or [Path("/tmp"), Path.home() / ".cache"]
161-175: Avoid following symlinks / unbounded traversal inrglobscans.
On real systems this can explode in runtime and unintentionally scan outside intended dirs via symlinks.Also applies to: 200-210
cortex/cleanup/scheduler.py (2)
100-116: Makesave_config()atomic to prevent corrupt JSON on partial writes.
Write to a temp file in the same dir andreplace()it.
284-292: Write unit files with explicit encoding (and consider perms).
At least setencoding="utf-8"; optionallychmod(0o600)since it’s user-controlled executable metadata.cortex/cleanup/__init__.py (1)
11-21: Consider re-exporting scheduler types for a complete cleanup public API.
Ifcortex.cleanup.scheduler.CleanupScheduleris meant to be public, add it (andScheduleInterval/ScheduleConfig) to__all__.tests/test_cleanup_manager.py (3)
57-68: Align metadata round-trip test with the real quarantine schema.
Right now it validates JSON IO, but not that metadata matches whatlist_items()/QuarantineItem(**v)expects.
145-163: Avoidtime.sleep()in tests; patch time for deterministic ordering.
This will reduce flakes under load/CI jitter.
215-216: Drop the__main__pytest runner in test modules.
Repo test execution should be driven bypytest, not ad-hoc module execution.tests/test_cleanup_scanner.py (1)
235-236: Drop the__main__runner in pytest suites (optional).
Not harmful, but it’s uncommon in repo test modules and can confuse execution patterns.cortex/cli.py (2)
625-672: Scheduling enable hardcodessafe_mode=True(no CLI control).
If “unsafe/aggressive scheduled cleanup” is a non-goal, OK; otherwise expose a flag (e.g.,--unsafe) and plumb it through.
676-682: _format_size(): clamp/guard negatives.
If a future computation yields a negative delta, this will print negative sizes; considersize_bytes = max(size_bytes, 0).tests/test_cleanup_scheduler.py (2)
87-96: Avoid bypassing__init__in tests; prefer an injectable config path.
PatchingCleanupScheduler.__init__+ using__new__is brittle (future init logic changes won’t be exercised). Consider updatingCleanupScheduler.__init__(config_dir: Optional[Path]=None)and use that in tests.
174-210: Systemd/cron behavior needs at least one “assert command called with …” test.
Right now it mostly assertsmethodoutput; adding a minimal assertion onsubprocess.runcall args would prevent regressions in the systemd detection / cron install commands.cortex/cleanup/cleaner.py (1)
52-78:remove_orphaned_packages(packages)ignorespackagesinput.
If intentional (apt decides what’s orphaned), drop the parameter to avoid implying the list is honored; otherwise include the packages explicitly (with correct apt semantics).tests/test_cleanup_cleaner.py (2)
129-152: Add a non-dry-run temp-file test for quarantine success + freed bytes.
Onceclean_temp_files()is fixed to count bytes on successful quarantine, add a test that patchesCleanupManager.quarantine_fileto return an id and asserts returned bytes > 0.
190-217: Consider testingrun_cleanup(..., safe=False)once behavior is implemented.
This will prevent thesafeknob from regressing again (currently unused).cortex/cleanup/manager.py (1)
10-27: Consider avoidingidas a field name and tightening the dataclass.
Usingidcan be a little confusing vs Python’s built-inid();item_idis clearer. Also considerslots=True(and possiblyfrozen=True) for a metadata model.-@dataclass +@dataclass(slots=True) class QuarantineItem: @@ - id: str + item_id: str original_path: str quarantine_path: str timestamp: float size_bytes: int
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
cortex/cleanup/__init__.py(1 hunks)cortex/cleanup/cleaner.py(1 hunks)cortex/cleanup/manager.py(1 hunks)cortex/cleanup/scanner.py(1 hunks)cortex/cleanup/scheduler.py(1 hunks)cortex/cli.py(3 hunks)docs/CLEANUP_GUIDE.md(1 hunks)tests/test_cleanup_cleaner.py(1 hunks)tests/test_cleanup_manager.py(1 hunks)tests/test_cleanup_scanner.py(1 hunks)tests/test_cleanup_scheduler.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
cortex/cleanup/__init__.pytests/test_cleanup_cleaner.pytests/test_cleanup_scheduler.pycortex/cleanup/cleaner.pycortex/cleanup/manager.pytests/test_cleanup_manager.pytests/test_cleanup_scanner.pycortex/cli.pycortex/cleanup/scanner.pycortex/cleanup/scheduler.py
{setup.py,setup.cfg,pyproject.toml,**/__init__.py}
📄 CodeRabbit inference engine (AGENTS.md)
Use Python 3.10 or higher as the minimum supported version
Files:
cortex/cleanup/__init__.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_cleanup_cleaner.pytests/test_cleanup_scheduler.pytests/test_cleanup_manager.pytests/test_cleanup_scanner.py
🧬 Code graph analysis (6)
cortex/cleanup/__init__.py (3)
cortex/cleanup/scanner.py (2)
CleanupScanner(24-216)ScanResult(9-22)cortex/cleanup/cleaner.py (1)
DiskCleaner(12-227)cortex/cleanup/manager.py (2)
CleanupManager(28-176)QuarantineItem(11-26)
tests/test_cleanup_cleaner.py (2)
cortex/cleanup/cleaner.py (6)
clean_package_cache(28-50)remove_orphaned_packages(52-77)_parse_freed_space(79-107)clean_temp_files(109-151)compress_logs(153-194)run_cleanup(196-227)cortex/cleanup/scanner.py (1)
ScanResult(9-22)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/scanner.py (2)
ScanResult(9-22)scan_package_cache(47-71)cortex/cleanup/manager.py (1)
quarantine_file(57-101)
tests/test_cleanup_manager.py (1)
cortex/cleanup/manager.py (8)
QuarantineItem(11-26)_ensure_dir(37-40)_load_metadata(42-50)_save_metadata(52-55)quarantine_file(57-101)restore_item(103-135)list_items(137-148)cleanup_old_items(150-176)
cortex/cli.py (4)
cortex/cleanup/scanner.py (2)
CleanupScanner(24-216)scan_all(33-45)cortex/cleanup/cleaner.py (2)
DiskCleaner(12-227)run_cleanup(196-227)cortex/cleanup/manager.py (3)
CleanupManager(28-176)list_items(137-148)restore_item(103-135)cortex/cleanup/scheduler.py (5)
CleanupScheduler(61-448)ScheduleInterval(18-22)get_status(188-204)disable_schedule(166-186)enable_schedule(118-164)
cortex/cleanup/scanner.py (1)
src/sandbox_executor.py (1)
success(52-54)
🪛 markdownlint-cli2 (0.18.1)
docs/CLEANUP_GUIDE.md
34-34: Dollar signs used before commands without showing output
(MD014, commands-show-output)
38-38: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
221-221: Bare URL used
(MD034, no-bare-urls)
🔇 Additional comments (1)
cortex/cleanup/scanner.py (1)
80-90: Remove the critical bug assertion; theresult.successcheck is correct as written.The
CommandResultclass definessuccessas aboolattribute (not a method), andrun_command()returnsCommandResultwithsuccess=result.returncode == 0. All existing callers throughout the codebase useif result.success:as a boolean check. The defensivegetattr/callablepattern is unnecessary.However, update the inline comment: it states "We use strict=False" but the code passes
validate=Truetorun_command(). Either correct the comment or clarify the intent.
| def run_cleanup(self, scan_results: List[ScanResult], safe: bool = True) -> Dict[str, int]: | ||
| """ | ||
| Run cleanup based on scan results. | ||
| Args: | ||
| scan_results (List[ScanResult]): Results from scanner. | ||
| safe (bool): If True, perform safe cleanup (default). | ||
| Returns: | ||
| Dict[str, int]: Summary of bytes freed per category. | ||
| """ | ||
| summary = { | ||
| "Package Cache": 0, | ||
| "Orphaned Packages": 0, | ||
| "Temporary Files": 0, | ||
| "Old Logs": 0 | ||
| } | ||
|
|
||
| for result in scan_results: | ||
| if result.category == "Package Cache": | ||
| summary["Package Cache"] = self.clean_package_cache() | ||
|
|
||
| elif result.category == "Orphaned Packages": | ||
| summary["Orphaned Packages"] = self.remove_orphaned_packages(result.items) | ||
|
|
||
| elif result.category == "Temporary Files": | ||
| summary["Temporary Files"] = self.clean_temp_files(result.items) | ||
|
|
||
| elif result.category == "Old Logs": | ||
| summary["Old Logs"] = self.compress_logs(result.items) | ||
|
|
||
| return summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe parameter is unused in run_cleanup() (API misleading).
Either use it (at least for temp files: quarantine vs delete) or remove it from the public API.
🤖 Prompt for AI Agents
In cortex/cleanup/cleaner.py around lines 196 to 227, the public parameter safe
is declared but unused making the API misleading; update run_cleanup to honor
safe (or remove it). Implement the safe behavior by branching when handling
"Temporary Files": if safe is True call a new or existing
quarantine_temp_files(items) path (move files to quarantine/staging area and
return bytes quarantined) else call clean_temp_files(items) to permanently
delete; propagate the safe flag to any other category that should behave
differently (e.g., use a safe variant for package cache or logs if needed) or
remove the safe parameter from the signature and callers if no safe-mode
behavior is required, and update docstring and unit tests accordingly.
| # --- Cleanup Command --- | ||
| def cleanup(self, args): | ||
| """Handle cleanup commands""" | ||
| from cortex.cleanup.scanner import CleanupScanner | ||
| from cortex.cleanup.cleaner import DiskCleaner | ||
| from cortex.cleanup.manager import CleanupManager | ||
| from rich.table import Table | ||
| from rich.prompt import Confirm | ||
| from datetime import datetime | ||
|
|
||
| if args.cleanup_action == 'scan': | ||
| scanner = CleanupScanner() | ||
| self._print_status("🔍", "Scanning for cleanup opportunities...") | ||
|
|
||
| # Configure scanner based on args if needed (e.g. days_old) | ||
| # For now using defaults | ||
| results = scanner.scan_all() | ||
|
|
||
| console.print("\n[bold cyan]💾 Cleanup Opportunities:[/bold cyan]") | ||
|
|
||
| table = Table(show_header=True, header_style="bold green", box=None) | ||
| table.add_column("Category") | ||
| table.add_column("Items") | ||
| table.add_column("Size") | ||
|
|
||
| total_size = 0 | ||
| for res in results: | ||
| size_str = self._format_size(res.size_bytes) | ||
| table.add_row(res.category, str(res.count), size_str) | ||
| total_size += res.size_bytes | ||
|
|
||
| console.print(table) | ||
| console.print(f"\n[bold]Total reclaimable: {self._format_size(total_size)}[/bold]") | ||
| return 0 | ||
|
|
||
| elif args.cleanup_action == 'run': | ||
| scanner = CleanupScanner() | ||
|
|
||
| # Check for force/safe flags | ||
| is_safe = not args.force | ||
|
|
||
| # Re-scan to get current state | ||
| self._print_status("🔍", "Scanning...") | ||
| results = scanner.scan_all() | ||
|
|
||
| total_reclaimable = sum(r.size_bytes for r in results) | ||
| if total_reclaimable == 0: | ||
| self._print_success("Nothing to clean!") | ||
| return 0 | ||
|
|
||
| console.print(f"Found {self._format_size(total_reclaimable)} to clean.") | ||
|
|
||
| if not args.yes: | ||
| if not Confirm.ask("Are you sure you want to proceed?"): | ||
| return 0 | ||
|
|
||
| self._print_status("🧹", "Cleaning...") | ||
|
|
||
| cleaner = DiskCleaner(dry_run=False) | ||
| summary = cleaner.run_cleanup(results, safe=is_safe) | ||
|
|
||
| console.print("\n[bold green]Cleanup Complete![/bold green]") | ||
| total_freed = 0 | ||
| for category, freed in summary.items(): | ||
| if freed > 0: | ||
| console.print(f"✓ {category}: {self._format_size(freed)}") | ||
| total_freed += freed | ||
|
|
||
| console.print(f"\n[bold]Total freed: {self._format_size(total_freed)}[/bold]") | ||
| return 0 | ||
|
|
||
| elif args.cleanup_action == 'undo': | ||
| manager = CleanupManager() | ||
| if not args.id: | ||
| # List undoable items | ||
| items = manager.list_items() | ||
| if not items: | ||
| console.print("No undoable items found.") | ||
| return 0 | ||
|
|
||
| table = Table(show_header=True, header_style="bold yellow", box=None) | ||
| table.add_column("ID") | ||
| table.add_column("File") | ||
| table.add_column("Size") | ||
| table.add_column("Date") | ||
|
|
||
| for item in items: | ||
| date_str = datetime.fromtimestamp(item.timestamp).strftime('%Y-%m-%d %H:%M') | ||
| table.add_row(item.id, os.path.basename(item.original_path), self._format_size(item.size_bytes), date_str) | ||
|
|
||
| console.print(table) | ||
| console.print("\nRun [bold]cortex cleanup undo <id>[/bold] to restore.") | ||
| return 0 | ||
| else: | ||
| # Restore specific item | ||
| if manager.restore_item(args.id): | ||
| self._print_success(f"Restored item {args.id}") | ||
| return 0 | ||
| else: | ||
| self._print_error(f"Failed to restore item {args.id}") | ||
| return 1 | ||
|
|
||
| elif args.cleanup_action == 'schedule': | ||
| from cortex.cleanup.scheduler import CleanupScheduler, ScheduleInterval | ||
|
|
||
| scheduler = CleanupScheduler() | ||
|
|
||
| if args.show: | ||
| # Show current schedule status | ||
| status = scheduler.get_status() | ||
| console.print("\n[bold cyan]🕐 Cleanup Schedule Status:[/bold cyan]") | ||
| if status['enabled']: | ||
| console.print(f"Status: [green]Enabled[/green]") | ||
| console.print(f"Interval: [yellow]{status['interval']}[/yellow]") | ||
| console.print(f"Safe mode: {'Yes' if status['safe_mode'] else 'No'}") | ||
| if status['systemd_active']: | ||
| console.print("Method: systemd timer") | ||
| elif status['cron_active']: | ||
| console.print("Method: cron") | ||
| else: | ||
| console.print("Status: [dim]Disabled[/dim]") | ||
| return 0 | ||
|
|
||
| if args.disable: | ||
| result = scheduler.disable_schedule() | ||
| if result['success']: | ||
| self._print_success(result['message']) | ||
| else: | ||
| self._print_error(result.get('message', 'Failed to disable schedule')) | ||
| return 0 if result['success'] else 1 | ||
|
|
||
| if args.enable: | ||
| interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY | ||
| result = scheduler.enable_schedule(interval=interval, safe_mode=True) | ||
| if result['success']: | ||
| self._print_success(result['message']) | ||
| else: | ||
| self._print_error(result.get('message', 'Failed to enable schedule')) | ||
| return 0 if result['success'] else 1 | ||
|
|
||
| # Default: show status | ||
| status = scheduler.get_status() | ||
| console.print("\n[bold cyan]🕐 Cleanup Schedule Status:[/bold cyan]") | ||
| if status['enabled']: | ||
| console.print(f"Status: [green]Enabled[/green]") | ||
| console.print(f"Interval: [yellow]{status['interval']}[/yellow]") | ||
| else: | ||
| console.print("Status: [dim]Disabled[/dim]") | ||
| console.print("\nUse [bold]cortex cleanup schedule --enable[/bold] to enable.") | ||
| return 0 | ||
|
|
||
| return 0 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleanup reporting likely undercounts freed space (temp files show 0).
DiskCleaner.clean_temp_files() (non-dry-run) currently returns 0 even when quarantine succeeds, so Total freed is misleading. This conflicts with the PR’s “summary of reclaimed space” objective.
🤖 Prompt for AI Agents
cortex/cli.py lines 523-675: the cleanup "Total freed" output is misleading
because DiskCleaner.clean_temp_files() returns 0 even when quarantine succeeds;
update the cleaner so each cleanup method (especially clean_temp_files) computes
and returns the actual number of bytes freed (or moved to quarantine), modify
DiskCleaner.run_cleanup to collect and sum those returned byte counts per
category and return that summary dict to the CLI, and ensure the CLI uses the
returned summary values (not assumed zeros) when printing per-category and total
freed; also add/adjust tests to assert returned sizes match actual file sizes.
Fix --safe flag behavior (currently ineffective / misleading).
--safe is store_true with default=True (can’t be turned off), and cleanup() ignores it anyway (is_safe = not args.force). This will confuse users and docs.
- run_parser.add_argument('--safe', action='store_true', default=True, help='Safe cleanup (default)')
- run_parser.add_argument('--force', '--all', action='store_true', help='Clean all found items')
+ mx = run_parser.add_mutually_exclusive_group()
+ mx.add_argument('--safe', action='store_true', default=True, help='Safe cleanup (default)')
+ mx.add_argument('--force', '--all', action='store_true', help='Aggressive cleanup (may be destructive)')- # Check for force/safe flags
- is_safe = not args.force
+ is_safe = not args.force # or: is_safe = bool(args.safe) and not args.forceAlso applies to: 780-803
🤖 Prompt for AI Agents
In cortex/cli.py around lines 523-675 (and also update the related handling at
780-803), the --safe flag is defined/used incorrectly (default True and
ignored); change the CLI flag to be a proper store_true with default False (or
use store_false with inverted name) so it can be toggled, and update the cleanup
run logic to use args.safe directly (is_safe = args.safe) instead of deriving
safety from args.force; ensure args.force remains a separate explicit flag (and
adjust any help text/docs), and update the schedule-related code paths at
780-803 to reference the corrected args.safe semantics consistently.
Handle cortex cleanup with no subcommand (currently silent no-op).
Notify does this; cleanup should too, otherwise users get no feedback.
def cleanup(self, args):
"""Handle cleanup commands"""
+ if not args.cleanup_action:
+ self._print_error("Please specify a subcommand (scan/run/undo/schedule)")
+ return 1🤖 Prompt for AI Agents
In cortex/cli.py around lines 523 to 675, the cleanup() handler silently no-ops
when args.cleanup_action is missing; add an early check for a missing/None
cleanup_action and print a concise user-facing message (e.g., console.print("No
cleanup subcommand specified. Run 'cortex cleanup --help' to see available
actions.") or display the cleanup usage/help) then return 0 (or appropriate exit
code). Implement this by inserting the check at the top of the function before
the existing if/elif chain so users receive feedback when running "cortex
cleanup" with no subcommand.
- manager.py: Harden directory permissions, atomic writes, robust metadata - cleaner.py: Add sudo -n flag, fix freed bytes calculation, use safe param - scheduler.py: Pass safe_mode to systemd/cron setup - cli.py: Add error message when no subcommand specified - test_cleanup_scanner.py: Reduce test file size from 150MB to 2MB - test_cleanup_cleaner.py: Fix test for safe mode behavior - CLEANUP_GUIDE.md: Fix markdownlint issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (9)
cortex/cleanup/scheduler.py (1)
118-164: LGTM:enable_schedulenow correctly passessafe_modeto backend methods.The previous review concern about
safe_modebeing ignored has been addressed. Both_setup_systemd_timerand_setup_cronnow receive thesafe_modeparameter.docs/CLEANUP_GUIDE.md (1)
1-50: LGTM: Documentation structure and quick start section are well-organized.The markdown formatting issues from previous reviews have been addressed - bash blocks no longer have
$prompts, and output blocks use thetextlanguage identifier.cortex/cli.py (2)
523-536: LGTM: Missing subcommand handling has been addressed.The early check for
args.cleanup_actionwith a user-friendly error message addresses the previous review concern.
562-596: The--safeflag is still ineffective and misleading.At line 566,
is_safe = not args.forceeffectively ignoresargs.safe. Combined with the CLI definition at line 793 where--safehasaction='store_true', default=True, the--safeflag can never be false and serves no purpose.Consider either removing
--safeentirely (since it's always true unless--forceis used), or use a mutually exclusive group:- run_parser.add_argument('--safe', action='store_true', default=True, help='Safe cleanup (default)') - run_parser.add_argument('--force', '--all', action='store_true', help='Clean all found items') + mode_group = run_parser.add_mutually_exclusive_group() + mode_group.add_argument('--safe', action='store_true', default=True, help='Safe cleanup (default)') + mode_group.add_argument('--force', '--all', action='store_true', help='Clean all found items')cortex/cleanup/cleaner.py (3)
28-50: Previous review concern addressed:sudo -nflag added.The command now uses
sudo -nto prevent hanging on password prompts in non-interactive contexts.
52-78: Previous review concern addressed:sudo -nflag added for autoremove.Consistent with
clean_package_cache, this method now usessudo -nto prevent interactive prompts.
110-144: Previous review concern addressed:clean_temp_filesnow correctly computes freed bytes.The method now stats the file before quarantine and accumulates
freed_byteswhenquarantine_file()succeeds.cortex/cleanup/manager.py (2)
69-113: Critical: Address unresolved safety issues from previous review.Several major issues from the previous review remain unaddressed:
No file type validation (lines 79-81): The method can quarantine directories or symlinks, which may cause issues during restore or cleanup operations.
TOCTOU vulnerability (lines 92-93): The
os.accesscheck creates a time-of-check to time-of-use race condition.No rollback on metadata save failure (lines 105-107): If
_save_metadatafails after the file has been moved (line 95), the file is orphaned in quarantine with no undo record.UUID collision risk (line 83): Using only 8 characters from the UUID significantly increases collision probability compared to the full hex representation.
Apply this diff to address these issues:
def quarantine_file(self, filepath_str: str) -> Optional[str]: """ Move a file to quarantine and return its ID. Args: filepath_str (str): Path to the file to quarantine. Returns: Optional[str]: ID of the quarantined item, or None if failed. """ filepath = Path(filepath_str) if not filepath.exists(): return None + # Only quarantine regular files, not directories or symlinks + if not filepath.is_file(): + return None + if filepath.is_symlink(): + return None - item_id = str(uuid.uuid4())[:8] + item_id = uuid.uuid4().hex # Use full UUID to avoid collisions filename = filepath.name quarantine_path = self.quarantine_dir / f"{item_id}_{filename}" try: # Get file stats before moving size = filepath.stat().st_size - # Check if we have write access to the file - if not os.access(filepath, os.W_OK): - return None - shutil.move(str(filepath), str(quarantine_path)) item = QuarantineItem( id=item_id, original_path=str(filepath), quarantine_path=str(quarantine_path), timestamp=time.time(), size_bytes=size ) metadata = self._load_metadata() metadata[item_id] = asdict(item) - self._save_metadata(metadata) + try: + self._save_metadata(metadata) + except OSError: + # Rollback to avoid orphaned file + try: + shutil.move(str(quarantine_path), str(filepath)) + except Exception: + pass + return None return item_id except Exception: - # Log error? return None
115-147: Critical: Path validation and overwrite behavior remain unaddressed.Two critical issues from the previous review are still present:
No path containment validation (lines 129-131): The method trusts
quarantine_pathfrom metadata without verifying it's actually within the quarantine directory. If metadata is tampered with, this could restore files from arbitrary locations or overwrite arbitrary paths.Undefined overwrite behavior (line 141): If
original_pathalready exists, the behavior is OS-dependent and may silently overwrite existing files, causing data loss.Apply this diff to validate paths and prevent overwrites:
def restore_item(self, item_id: str) -> bool: """ Restore a file from quarantine. Args: item_id (str): ID of the item to restore. Returns: bool: True if restored successfully, False otherwise. """ metadata = self._load_metadata() if item_id not in metadata: return False item_data = metadata[item_id] original_path = Path(item_data['original_path']) quarantine_path = Path(item_data['quarantine_path']) + + # Validate quarantine_path is within quarantine_dir + try: + qdir = self.quarantine_dir.resolve() + qpath = quarantine_path.resolve() + if not qpath.is_relative_to(qdir): + return False + except Exception: + return False + + # Refuse to overwrite existing files + if original_path.exists(): + return False if not quarantine_path.exists(): return False try: # Ensure parent dir exists - if not original_path.parent.exists(): - original_path.parent.mkdir(parents=True) + original_path.parent.mkdir(parents=True, exist_ok=True) shutil.move(str(quarantine_path), str(original_path)) del metadata[item_id] self._save_metadata(metadata) return True except Exception: return False
🧹 Nitpick comments (5)
docs/CLEANUP_GUIDE.md (1)
86-99: Remove shell prompts from the example output block.The example at lines 89 and 96 still contains
$prompts which can trigger MD014 lint warnings. Consider removing them for consistency with the rest of the documentation.**Example:** ```text -$ cortex cleanup undo +cortex cleanup undo ID File Size Date abc123 temp_file.txt 1.2 MB 2024-01-15 10:30 def456 old_log.log 500 KB 2024-01-15 10:30 Run 'cortex cleanup undo <id>' to restore. -$ cortex cleanup undo abc123 +cortex cleanup undo abc123 ✓ Restored item abc123</blockquote></details> <details> <summary>tests/test_cleanup_cleaner.py (1)</summary><blockquote> `190-221`: **Consider adding assertions for all categories in `test_run_cleanup_empty`.** The test only asserts `Package Cache` and `Orphaned Packages`. For completeness, consider asserting all four categories. ```diff def test_run_cleanup_empty(self, cleaner): """Test run_cleanup with empty results.""" summary = cleaner.run_cleanup([]) assert summary["Package Cache"] == 0 assert summary["Orphaned Packages"] == 0 + assert summary["Temporary Files"] == 0 + assert summary["Old Logs"] == 0cortex/cleanup/cleaner.py (1)
189-222:safeparameter now partially used but could be more comprehensive.The
safeparameter now controls whether orphaned packages are removed (good!). However, consider whether other categories should also respect this flag. For instance, in--forcemode, temp files could be permanently deleted rather than quarantined.The current behavior is reasonable - orphaned package removal is the riskiest operation. If the intent is to keep temp files always quarantined for undo capability, this is a valid design choice. Consider adding a comment clarifying the design decision.
cortex/cleanup/manager.py (2)
149-160: Make list_items resilient to malformed metadata entries.The method will fail completely if any metadata entry is missing required keys or has been modified. This was flagged in the previous review but remains unaddressed.
Apply this diff to skip malformed entries gracefully:
def list_items(self) -> List[QuarantineItem]: """ List all items in quarantine. Returns: List[QuarantineItem]: List of quarantined items sorted by date. """ metadata = self._load_metadata() - items = [] + items: List[QuarantineItem] = [] for k, v in metadata.items(): - items.append(QuarantineItem(**v)) + try: + items.append(QuarantineItem(**v)) + except (TypeError, ValueError): + # Skip malformed entries + continue return sorted(items, key=lambda x: x.timestamp, reverse=True)
185-192: Defensively access 'quarantine_path' to avoid KeyError.If a metadata entry is missing the
quarantine_pathkey, line 186 will raise a KeyError and halt the cleanup process partway through.Apply this diff to handle missing keys gracefully:
for item_id in to_remove: - path = Path(metadata[item_id]['quarantine_path']) + qpath = metadata[item_id].get('quarantine_path') + if qpath is None: + del metadata[item_id] + continue + path = Path(qpath) if path.exists(): try: path.unlink() except OSError: pass del metadata[item_id]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cortex/cleanup/cleaner.py(1 hunks)cortex/cleanup/manager.py(1 hunks)cortex/cleanup/scheduler.py(1 hunks)cortex/cli.py(3 hunks)docs/CLEANUP_GUIDE.md(1 hunks)tests/test_cleanup_cleaner.py(1 hunks)tests/test_cleanup_scanner.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_cleanup_scanner.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs
Files:
tests/test_cleanup_cleaner.pycortex/cleanup/manager.pycortex/cleanup/scheduler.pycortex/cleanup/cleaner.pycortex/cli.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Maintain >80% test coverage for pull requests
Files:
tests/test_cleanup_cleaner.py
🧠 Learnings (1)
📚 Learning: 2025-12-11T12:03:24.071Z
Learnt from: CR
Repo: cortexlinux/cortex PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T12:03:24.071Z
Learning: Applies to **/*install*.py : No silent sudo execution - require explicit user confirmation
Applied to files:
cortex/cleanup/cleaner.py
🧬 Code graph analysis (3)
tests/test_cleanup_cleaner.py (2)
cortex/cleanup/cleaner.py (7)
DiskCleaner(12-222)clean_package_cache(28-50)remove_orphaned_packages(52-78)_parse_freed_space(80-108)clean_temp_files(110-144)compress_logs(146-187)run_cleanup(189-222)cortex/cleanup/scanner.py (1)
ScanResult(9-22)
cortex/cleanup/cleaner.py (2)
cortex/cleanup/scanner.py (2)
ScanResult(9-22)scan_package_cache(47-71)cortex/cleanup/manager.py (2)
CleanupManager(28-195)quarantine_file(69-113)
cortex/cli.py (4)
cortex/cleanup/scanner.py (2)
CleanupScanner(24-216)scan_all(33-45)cortex/cleanup/cleaner.py (2)
DiskCleaner(12-222)run_cleanup(189-222)cortex/cleanup/manager.py (3)
CleanupManager(28-195)list_items(149-160)restore_item(115-147)cortex/cleanup/scheduler.py (5)
CleanupScheduler(61-454)ScheduleInterval(18-22)get_status(188-204)disable_schedule(166-186)enable_schedule(118-164)
🔇 Additional comments (19)
cortex/cleanup/scheduler.py (5)
1-16: LGTM: Module setup and imports are well-organized.Type hints, logging setup, and imports follow best practices.
18-59: LGTM: ScheduleInterval enum and ScheduleConfig dataclass are well-designed.The serialization methods (
to_dict/from_dict) handle the enum conversion properly, and defaults are sensible.
61-116: LGTM: CleanupScheduler initialization and config persistence.Good defensive error handling in
load_configwith fallback to defaults on parse errors. The config directory creation withmode=0o700ensures privacy.
240-316: LGTM: systemd timer setup correctly usessafe_mode.The implementation properly generates the mode flag based on
safe_modeand includes good error handling with timeouts.
357-410: LGTM: cron setup correctly usessafe_mode.The cron entry management properly handles existing entries and generates the correct command with the mode flag.
docs/CLEANUP_GUIDE.md (2)
129-181: LGTM: Cleanup categories and safety features are well-documented.Clear explanations of what each category targets, the quarantine system, and safety modes.
183-223: LGTM: Troubleshooting and configuration sections are helpful.Good coverage of common issues and their solutions. The final link is properly formatted as a markdown link.
cortex/cli.py (2)
680-685: LGTM:_format_sizehelper is correctly implemented.The size formatting logic handles the full range of units appropriately.
784-807: LGTM: Cleanup CLI argument parsing is well-structured.The subcommand structure follows the existing pattern established by the notify command.
tests/test_cleanup_cleaner.py (5)
1-16: LGTM: Test file setup and imports are correct.Proper imports for pytest, mocking utilities, and the modules under test.
17-38: LGTM: Test class and fixtures are well-defined.The fixtures provide clean instances for both normal and dry-run modes.
40-78: LGTM: Package cache tests cover success, failure, and dry-run scenarios.Good coverage of the three main code paths.
79-128: LGTM: Orphaned packages and parsing tests are thorough.Tests cover empty list, success path, dry-run mode, and parsing with different units (MB, KB, no match).
129-189: LGTM: Temp file and log compression tests are comprehensive.Good use of
tmp_pathfixture for creating real temporary files. Tests verify both the returned values and filesystem state.cortex/cleanup/cleaner.py (3)
1-11: LGTM: Imports and module setup are correct.Type hints are present and logging is properly configured.
12-27: LGTM: DiskCleaner initialization with proper type hints and docstring.The class docstring and
__init__method follow PEP 8 guidelines.
146-187: LGTM: Log compression with proper error handling.The implementation correctly calculates freed space as the difference between original and compressed sizes, with appropriate dry-run estimation.
cortex/cleanup/manager.py (2)
32-46: LGTM! Permission hardening properly implemented.The initialization and directory setup correctly address the previous critical issue by enforcing secure permissions even when the quarantine directory pre-exists.
48-68: LGTM! Metadata operations are robust and atomic.The implementation properly addresses the previous major issue with atomic writes, comprehensive exception handling, and restricted file permissions.
| def _parse_freed_space(self, stdout: str) -> int: | ||
| """ | ||
| Helper to parse freed space from apt output. | ||
| Args: | ||
| stdout (str): Output from apt command. | ||
| Returns: | ||
| int: Bytes freed. | ||
| """ | ||
| freed_bytes = 0 | ||
| for line in stdout.splitlines(): | ||
| if "disk space will be freed" in line: | ||
| parts = line.split() | ||
| try: | ||
| for i, part in enumerate(parts): | ||
| if part.isdigit() or part.replace('.', '', 1).isdigit(): | ||
| val = float(part) | ||
| unit = parts[i+1] | ||
| if unit.upper().startswith('KB'): | ||
| freed_bytes = int(val * 1024) | ||
| elif unit.upper().startswith('MB'): | ||
| freed_bytes = int(val * 1024 * 1024) | ||
| elif unit.upper().startswith('GB'): | ||
| freed_bytes = int(val * 1024 * 1024 * 1024) | ||
| break | ||
| except Exception: | ||
| pass | ||
| return freed_bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential IndexError when parsing freed space.
At line 98, parts[i+1] could raise an IndexError if the numeric value is the last element in parts. The broad except Exception masks this silently.
try:
for i, part in enumerate(parts):
if part.isdigit() or part.replace('.', '', 1).isdigit():
+ if i + 1 >= len(parts):
+ continue
val = float(part)
unit = parts[i+1]🤖 Prompt for AI Agents
In cortex/cleanup/cleaner.py around lines 80 to 108, the parsing loop may access
parts[i+1] without ensuring it exists which can raise IndexError and the broad
except hides it; update the loop to check that i+1 < len(parts) before reading
the unit, and replace the broad except with a more specific handler (e.g.,
ValueError/IndexError) or remove it so parsing errors surface; additionally, log
or ignore only known parsing failures instead of silently swallowing all
exceptions.
| if args.enable: | ||
| interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY | ||
| result = scheduler.enable_schedule(interval=interval, safe_mode=True) | ||
| if result['success']: | ||
| self._print_success(result['message']) | ||
| else: | ||
| self._print_error(result.get('message', 'Failed to enable schedule')) | ||
| return 0 if result['success'] else 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Schedule enable hardcodes safe_mode=True, ignoring user preference.
The CLI has no option to disable safe mode for scheduled cleanup. If this is intentional for safety, consider documenting it. Otherwise, add a --no-safe or --force option to the schedule subcommand.
if args.enable:
interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY
- result = scheduler.enable_schedule(interval=interval, safe_mode=True)
+ # Use safe mode by default for scheduled cleanup
+ safe = not getattr(args, 'force', False)
+ result = scheduler.enable_schedule(interval=interval, safe_mode=safe)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if args.enable: | |
| interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY | |
| result = scheduler.enable_schedule(interval=interval, safe_mode=True) | |
| if result['success']: | |
| self._print_success(result['message']) | |
| else: | |
| self._print_error(result.get('message', 'Failed to enable schedule')) | |
| return 0 if result['success'] else 1 | |
| if args.enable: | |
| interval = ScheduleInterval(args.interval) if args.interval else ScheduleInterval.WEEKLY | |
| # Use safe mode by default for scheduled cleanup | |
| safe = not getattr(args, 'force', False) | |
| result = scheduler.enable_schedule(interval=interval, safe_mode=safe) | |
| if result['success']: | |
| self._print_success(result['message']) | |
| else: | |
| self._print_error(result.get('message', 'Failed to enable schedule')) | |
| return 0 if result['success'] else 1 |
🤖 Prompt for AI Agents
In cortex/cli.py around lines 658 to 665 the call to scheduler.enable_schedule
currently hardcodes safe_mode=True which ignores user preference; add a CLI flag
(e.g. --no-safe or --force) to the schedule subcommand, parse it into args
(default preserving current safe_mode=True unless user passes the flag), convert
the flag into a boolean safe_mode value and pass that value to
scheduler.enable_schedule, and update the command help text and any docs to
reflect the new option.
Skip sudo options like -n when validating the actual command. This allows sudo -n apt-get clean to be properly validated.
|




Related Issue
Closes #125
Summary
Implements the Smart Cleanup and Disk Space Optimizer feature with automatic scheduling support.
Changes
Features
Example Usage
Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.