Skip to content

Commit 9c2b274

Browse files
CopilotMte90
andcommitted
Migrate web UI and API to use projects exclusively, simplify OPTIMIZATION_NOTES
Co-authored-by: Mte90 <[email protected]>
1 parent 495e885 commit 9c2b274

File tree

4 files changed

+882
-622
lines changed

4 files changed

+882
-622
lines changed

OPTIMIZATION_NOTES.md

Lines changed: 23 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -1,93 +1,30 @@
1-
# Python Code Review and Optimization Suggestions
1+
# Python Code Optimization Suggestions
22

3-
## Files Reviewed
4-
- main.py
5-
- db.py
6-
- projects.py
7-
- analyzer.py
8-
- external_api.py
9-
- config.py
10-
- logger.py
11-
- models.py
3+
## Actionable Optimizations
124

13-
## Findings and Optimizations
5+
### 1. **main.py** - Cache Management
6+
- Replace `_ANALYSES_CACHE` global variable with `functools.lru_cache` decorator
7+
- This provides automatic cache size limits and thread-safety
148

15-
### 1. **main.py**
16-
- **Global Variable Usage**: `_ANALYSES_CACHE` - Consider using a proper caching mechanism
17-
- **Optimization**: Use `functools.lru_cache` or a proper cache library like `cachetools`
18-
- **Database Path Handling**: Currently uses global `DATABASE` variable
19-
- **Status**: Acceptable for backward compatibility with web UI
20-
- **Backward Compatibility**: Both `analysis_id` and `project_id` supported in `/code` endpoint ✅
21-
- Web UI uses `analysis_id` with main database
22-
- Plugin uses `project_id` with per-project databases
9+
### 2. **db.py** - Database Performance
10+
- Add connection pooling for high-load scenarios using SQLite connection pool
11+
- Implement prepared statements for frequently used queries to reduce parsing overhead
2312

24-
### 2. **db.py**
25-
- **Connection Management**: Uses context managers properly ✅
26-
- **WAL Mode**: Enabled for concurrent access ✅
27-
- **Retry Logic**: Exponential backoff implemented ✅
28-
- **Optimization Opportunities**:
29-
- Connection pooling could be added for high-load scenarios
30-
- Consider prepared statements for frequently used queries
13+
### 3. **analyzer.py** - Batch Processing
14+
- Improve embedding batch processing by implementing parallel batch requests
15+
- Add configurable batch size tuning based on API rate limits
3116

32-
### 3. **projects.py**
33-
- **Code Organization**: Successfully refactored to use shared utilities from db.py ✅
34-
- **Path Validation**: Multiple layers of security checks ✅
35-
- **Database Isolation**: Each project gets its own database ✅
17+
### 4. **external_api.py** - API Reliability
18+
- Add rate limiting to prevent API quota exhaustion (consider using `ratelimit` library)
19+
- Implement retry logic with exponential backoff for failed API calls
20+
- Add circuit breaker pattern for cascading failure prevention
3621

37-
### 4. **analyzer.py**
38-
- **Background Processing**: Uses async properly ✅
39-
- **File Size Limits**: Configurable via MAX_FILE_SIZE ✅
40-
- **Optimization**: Batch processing for embeddings could be improved
22+
### 5. **config.py** - Configuration Validation
23+
- Add Pydantic-based validation for critical config values
24+
- Implement type checking for environment variables at startup
25+
- Add sensible defaults for all optional configuration
4126

42-
### 5. **external_api.py**
43-
- **API Rate Limiting**: Not implemented
44-
- **Recommendation**: Add rate limiting for production use
45-
- **Error Handling**: Basic error handling present
46-
- **Recommendation**: Add retry logic with exponential backoff
47-
48-
### 6. **config.py**
49-
- **Environment Variables**: Properly loaded ✅
50-
- **Type Conversion**: Minimal validation
51-
- **Recommendation**: Add validation for critical config values
52-
53-
### 7. **logger.py**
54-
- **Centralized Logging**: All modules now use this ✅
55-
- **Configuration**: Basic setup
56-
- **Recommendation**: Add log rotation for production
57-
58-
### 8. **models.py**
59-
- **Pydantic Models**: Clean separation ✅
60-
- **Validation**: Basic validation present ✅
61-
62-
## Performance Optimizations Summary
63-
64-
### Implemented ✅
65-
1. Database WAL mode for concurrent access
66-
2. Retry logic with exponential backoff
67-
3. Centralized logging
68-
4. Path validation and security checks
69-
5. Backward compatibility (analysis_id + project_id)
70-
6. Per-project database isolation
71-
72-
### Recommended for Future
73-
1. **Connection Pooling**: For high-load scenarios
74-
2. **Cache Layer**: Replace global cache with `functools.lru_cache`
75-
3. **Rate Limiting**: Add to external API calls
76-
4. **Batch Optimization**: Improve embedding batch processing
77-
5. **Log Rotation**: Add for production environments
78-
6. **Config Validation**: Add type checking and validation
79-
7. **Prepared Statements**: For frequently used queries
80-
81-
## Security Review
82-
- ✅ Path traversal prevention
83-
- ✅ Generic error messages (no stack trace exposure)
84-
- ✅ Input validation
85-
- ✅ Secure database operations
86-
87-
## Architecture Notes
88-
- **Web UI**: Uses main `codebase.db` with `analysis_id` parameter
89-
- **Plugin**: Uses per-project databases with `project_id` parameter
90-
- **Backward Compatibility**: Both systems work seamlessly via `/code` endpoint
91-
92-
## No Critical Issues Found
93-
All Python files compile successfully. No FLAGS, TODOs, or FIXMEs in current codebase.
27+
### 6. **logger.py** - Production Logging
28+
- Add log rotation using `logging.handlers.RotatingFileHandler`
29+
- Configure separate log levels for development vs production
30+
- Add structured logging (JSON format) for better log aggregation

main.py

Lines changed: 91 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,11 @@
99
from typing import Optional
1010
from datetime import datetime
1111

12-
from db import init_db, list_analyses, delete_analysis
12+
from db import init_db, list_analyses
1313
from analyzer import analyze_local_path_background, search_semantic, call_coding_model
1414
from config import CFG
1515
from projects import (
16-
create_project, get_project, get_project_by_id, list_projects,
16+
get_project_by_id, list_projects,
1717
update_project_status, delete_project, get_or_create_project
1818
)
1919
from models import (
@@ -24,16 +24,23 @@
2424

2525
logger = get_logger(__name__)
2626

27-
DATABASE = CFG.get("database_path", "codebase.db")
2827
MAX_FILE_SIZE = int(CFG.get("max_file_size", 200000))
2928

3029
# Controls how many characters of each snippet and total context we send to coding model
3130
TOTAL_CONTEXT_LIMIT = 4000
32-
_ANALYSES_CACHE = []
3331

3432
@asynccontextmanager
3533
async def lifespan(app: FastAPI):
36-
init_db(DATABASE)
34+
# Project registry is auto-initialized when needed via create_project
35+
36+
# Auto-create default project from configured local_path if it exists
37+
local_path = CFG.get("local_path")
38+
if local_path and os.path.exists(local_path):
39+
try:
40+
get_or_create_project(local_path, "Default Project")
41+
except Exception as e:
42+
logger.warning(f"Could not create default project: {e}")
43+
3744
yield
3845

3946
app = FastAPI(lifespan=lifespan)
@@ -184,58 +191,79 @@ def api_health():
184191

185192
@app.get("/", response_class=HTMLResponse)
186193
def index(request: Request):
187-
analyses = list_analyses(DATABASE)
188194
projects_list = list_projects()
189195
return templates.TemplateResponse("index.html", {
190196
"request": request,
191-
"analyses": analyses,
192197
"projects": projects_list,
193198
"config": CFG
194199
})
195200

196201

197-
@app.get("/analyses/status")
198-
def analyses_status():
199-
global _ANALYSES_CACHE
202+
@app.get("/projects/status")
203+
def projects_status():
204+
"""Get list of all projects."""
200205
try:
201-
analyses = list_analyses(DATABASE)
202-
# If the DB returned a non-empty list, update cache and return it.
203-
if analyses:
204-
_ANALYSES_CACHE = analyses
205-
return JSONResponse(analyses)
206-
# If DB returned empty but we have a cached non-empty list, return cache
207-
if not analyses and _ANALYSES_CACHE:
208-
return JSONResponse(_ANALYSES_CACHE)
209-
# else return whatever (empty list) — first-run or truly empty
210-
return JSONResponse(analyses)
206+
projects = list_projects()
207+
return JSONResponse(projects)
211208
except Exception as e:
212-
# On DB errors (e.g., locked) return last known cache to avoid empty responses spam.
213-
if _ANALYSES_CACHE:
214-
return JSONResponse(_ANALYSES_CACHE)
215-
return JSONResponse({"error": str(e)}, status_code=500)
209+
logger.exception(f"Error getting projects status: {e}")
210+
return JSONResponse({"error": "Failed to retrieve projects"}, status_code=500)
216211

217212

218-
@app.get("/analyses/{analysis_id}/delete")
219-
def delete_analysis_endpoint(analysis_id: int):
213+
@app.delete("/projects/{project_id}")
214+
def delete_project_endpoint(project_id: str):
215+
"""Delete a project and its database."""
220216
try:
221-
delete_analysis(DATABASE, analysis_id)
217+
delete_project(project_id)
222218
return JSONResponse({"deleted": True})
219+
except ValueError as e:
220+
logger.warning(f"Project not found for deletion: {e}")
221+
return JSONResponse({"deleted": False, "error": "Project not found"}, status_code=404)
223222
except Exception as e:
224-
return JSONResponse({"deleted": False, "error": str(e)}, status_code=500)
223+
logger.exception(f"Error deleting project: {e}")
224+
return JSONResponse({"deleted": False, "error": "Failed to delete project"}, status_code=500)
225225

226226

227-
@app.post("/analyze")
228-
def analyze(background_tasks: BackgroundTasks):
229-
local_path = CFG.get("local_path")
230-
if not local_path or not os.path.exists(local_path):
231-
raise HTTPException(status_code=400, detail="Configured LOCAL_PATH does not exist")
232-
venv_path = CFG.get("venv_path")
233-
background_tasks.add_task(analyze_local_path_background, local_path, DATABASE, venv_path, MAX_FILE_SIZE, CFG)
234-
return RedirectResponse(url="/", status_code=303)
227+
@app.post("/index")
228+
def index_project(background_tasks: BackgroundTasks, project_path: str = None):
229+
"""Index/re-index the default project or specified path."""
230+
try:
231+
# Use configured path or provided path
232+
path_to_index = project_path or CFG.get("local_path")
233+
if not path_to_index or not os.path.exists(path_to_index):
234+
raise HTTPException(status_code=400, detail="Project path does not exist")
235+
236+
# Get or create project
237+
project = get_or_create_project(path_to_index)
238+
project_id = project["id"]
239+
db_path = project["database_path"]
240+
241+
# Update status to indexing
242+
update_project_status(project_id, "indexing")
243+
244+
# Start background indexing
245+
venv_path = CFG.get("venv_path")
246+
247+
def index_callback():
248+
try:
249+
analyze_local_path_background(path_to_index, db_path, venv_path, MAX_FILE_SIZE, CFG)
250+
update_project_status(project_id, "ready", datetime.utcnow().isoformat())
251+
except Exception as e:
252+
logger.exception(f"Indexing failed: {e}")
253+
update_project_status(project_id, "error")
254+
raise
255+
256+
background_tasks.add_task(index_callback)
257+
258+
return RedirectResponse(url="/", status_code=303)
259+
except Exception as e:
260+
logger.exception(f"Error starting indexing: {e}")
261+
raise HTTPException(status_code=500, detail="Failed to start indexing")
235262

236263

237264
@app.post("/code")
238265
def code_endpoint(request: Request):
266+
"""Code completion endpoint - uses project_id to find the right database."""
239267
payload = None
240268
try:
241269
payload = request.json()
@@ -252,29 +280,33 @@ def code_endpoint(request: Request):
252280
explicit_context = payload.get("context", "") or ""
253281
use_rag = bool(payload.get("use_rag", True))
254282

255-
# Support both analysis_id (old) and project_id (new for plugin)
256-
analysis_id = payload.get("analysis_id")
283+
# Get project_id - if not provided, use the first available project
257284
project_id = payload.get("project_id")
258285

259-
# If project_id is provided, get the database path and find the first analysis
260-
database_path = DATABASE # default to main database
261-
if project_id and not analysis_id:
262-
try:
263-
project = get_project_by_id(project_id)
264-
if not project:
265-
return JSONResponse({"error": "Project not found"}, status_code=404)
266-
267-
database_path = project["database_path"]
268-
269-
# Get the first analysis from this project
270-
analyses = list_analyses(database_path)
271-
if not analyses:
272-
return JSONResponse({"error": "Project not indexed yet"}, status_code=400)
273-
274-
analysis_id = analyses[0]["id"]
275-
except Exception as e:
276-
logger.exception(f"Error getting project analysis: {e}")
277-
return JSONResponse({"error": "Failed to get project analysis"}, status_code=500)
286+
if not project_id:
287+
# Try to get default project or first available
288+
projects = list_projects()
289+
if not projects:
290+
return JSONResponse({"error": "No projects available. Please index a project first."}, status_code=400)
291+
project_id = projects[0]["id"]
292+
293+
# Get project and its database
294+
try:
295+
project = get_project_by_id(project_id)
296+
if not project:
297+
return JSONResponse({"error": "Project not found"}, status_code=404)
298+
299+
database_path = project["database_path"]
300+
301+
# Get the first analysis from this project's database
302+
analyses = list_analyses(database_path)
303+
if not analyses:
304+
return JSONResponse({"error": "Project not indexed yet. Please run indexing first."}, status_code=400)
305+
306+
analysis_id = analyses[0]["id"]
307+
except Exception as e:
308+
logger.exception(f"Error getting project: {e}")
309+
return JSONResponse({"error": "Failed to retrieve project"}, status_code=500)
278310

279311
try:
280312
top_k = int(payload.get("top_k", 5))
@@ -284,8 +316,8 @@ def code_endpoint(request: Request):
284316
used_context = []
285317
combined_context = explicit_context or ""
286318

287-
# If RAG requested and an analysis_id provided, perform semantic search and build context
288-
if use_rag and analysis_id:
319+
# If RAG requested, perform semantic search and build context
320+
if use_rag:
289321
try:
290322
retrieved = search_semantic(prompt, database_path, analysis_id=int(analysis_id), top_k=top_k)
291323
# Build context WITHOUT including snippets: only include file references and scores

0 commit comments

Comments
 (0)