Related Code Files:
code-intelligence-toolkit/data_flow_tracker.py- Current implementationcode-intelligence-toolkit/DATA_FLOW_TRACKER_GUIDE.md- User guidecode-intelligence-toolkit/DATA_FLOW_TRACKER_ADVANCED_EXAMPLES.md- Examples
The Code Intelligence Toolkit delivers two critical capabilities:
- Prevents disasters - Through SafeGIT, Safe File Manager, and reversible operations
- Accelerates understanding - Through data flow tracking, code analysis, and algorithm visualization
The data_flow_tracker.py exemplifies this dual approach: it helps you understand complex code flows (intelligence) while ensuring you can refactor safely by showing exactly what will be affected (safety).
Goal: Show the "blast radius" of any code change by identifying where data ultimately escapes its scope.
Key Question: "If I change this variable, what are all the observable outputs and side effects?"
Exit Point Detection:
class ImpactAnalyzer:
def identify_exit_points(self, var_name):
exit_points = {
'returns': [], # Functions returning the value
'side_effects': [], # Writes to files, network, console
'state_changes': [], # Global/class member modifications
'external_calls': [] # API calls, database writes
}Usage:
# Analyze full impact of changing a configuration
./run_any_python_tool.sh data_flow_tracker.py --var db_connection_string --show-impact
# Output:
Impact Analysis for 'db_connection_string':
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ RETURNS (3 functions):
- get_connection() at db.py:45
- initialize_pool() at pool.py:23
- create_backup_connection() at backup.py:67
🌐 EXTERNAL EFFECTS (2):
- Database connection at db.py:89
- Log file write at logger.py:34
📝 STATE CHANGES (1):
- Class member 'self._conn_str' at manager.py:12Benefits:
- Precise refactoring confidence
- Prevent unintended consequences
- Clear visualization of data boundaries
Goal: Extract the minimal "critical path" showing exactly how a value was calculated, filtering out noise.
Key Question: "What is the exact recipe for this final value, ignoring irrelevant branches?"
Path Pruning Algorithm:
class CalculationPathAnalyzer:
def build_critical_path(self, target_var):
# Build full dependency graph
full_graph = self.build_dependency_graph(target_var)
# Prune irrelevant branches
critical_nodes = self.identify_critical_nodes(full_graph)
# Extract minimal calculation sequence
return self.extract_calculation_steps(critical_nodes)Usage:
# Understand complex price calculation
./run_any_python_tool.sh data_flow_tracker.py --var final_price --show-calculation-path
# Output:
Calculation Path for 'final_price':
════════════════════════════════════
1. base_price = 100.00 [pricing.py:10]
↓
2. tax_rate = get_tax_rate(location) [tax.py:45]
↓
3. tax_amount = base_price * tax_rate [pricing.py:25]
↓
4. discount = apply_coupon(coupon_code) [discounts.py:89]
↓
5. final_price = base_price + tax_amount - discount [pricing.py:30]
Critical Functions: 3
Ignored Branches: 7 (logging, validation, unrelated calculations)Benefits:
- Accelerated debugging
- Clear algorithm understanding
- Focus on what matters
Goal: Track how variable types and states evolve through the system.
Key Question: "What type is this variable at each step, and what are its possible values?"
Type Inference Engine:
class TypeStateTracker:
def track_type_evolution(self, var_name):
type_history = []
# Infer from literals
# Read type hints
# Track transformations
# Monitor state changes
return TypeEvolutionReport(type_history)Usage:
# Track type and state changes
./run_any_python_tool.sh data_flow_tracker.py --var user_data --track-state
# Output:
Type & State Evolution for 'user_data':
═══════════════════════════════════════
1. user_data = {} # dict (empty) [main.py:10]
2. user_data['name'] = input() # dict (1 key) [main.py:15]
3. user_data = validate(user_data) # dict|None (⚠️) [validate.py:30]
4. user_data.update(defaults) # dict (5 keys) [main.py:20]
5. return UserModel(**user_data) # UserModel instance [main.py:25]
⚠️ WARNINGS:
- Possible None at step 3 (validate.py:30)
- Type changes from dict → UserModel at step 5Benefits:
- Catch type mismatches early
- Understand data transformations
- Spot None/null pointer bugs
- Extend existing AST visitors
- Add impact point detection
- Implement basic type inference
- Calculation path pruning
- State transition tracking
- Cross-file type propagation
- Interactive HTML reports
- GraphViz enhancements
- VS Code extension integration
- Debugging Speed: 10x faster root cause analysis
- Algorithm Understanding: Clear visualization of complex logic
- Code Comprehension: See data flow at a glance
- Learning Acceleration: Understand unfamiliar codebases quickly
- Refactoring Confidence: See exact impact before changing code
- Bug Prevention: Catch type/state issues before runtime
- Change Validation: Verify nothing unexpected is affected
- Risk Assessment: Know the "blast radius" of any modification
Unlike IDE-based tools that struggle with large codebases, our approach:
- Handles 10k+ line files without breaking
- Works from command line (CI/CD friendly)
- Language agnostic architecture
- No indexing or language server overhead
- Validate concept with user feedback
- Create proof-of-concept for impact analysis
- Design type inference system
- Build minimal viable product
- Iterate based on real-world usage
These enhancements strengthen both pillars of our toolkit:
For Safety: Impact analysis shows exactly what could break. Calculation paths reveal hidden dependencies. Type tracking prevents runtime errors.
For Intelligence: Developers gain deep insights into code behavior. Complex algorithms become transparent. Debugging time drops dramatically.
This is why the Code Intelligence Toolkit is unique - it's not just about preventing mistakes OR understanding code. It's about having the confidence to work quickly because you have both deep understanding AND safety nets.
This roadmap evolves data_flow_tracker.py into a comprehensive code intelligence tool that makes developers both faster AND safer.