Refactor BaseDoclingAgent from Pydantic BaseModel to Abstract Base Class (ABC)

### Background

Currently, `BaseDoclingAgent` inherits from Pydantic's `BaseModel`, which creates a design inconsistency between the intended use of Pydantic (data validation and serialization) and the actual purpose of agent classes (behavioral objects that execute actions).

**Current Implementation:**
```python
# docling_agent/agent/base.py
from pydantic import BaseModel, ConfigDict

class BaseDoclingAgent(BaseModel):
    model_config = ConfigDict(arbitrary_types_allowed=True)
    
    agent_type: DoclingAgentType
    backend: BaseBackend
    tools: list
    max_iteration: int = 16
```

**Agent Subclasses:**
All agent classes (DoclingRAGAgent, DoclingEnrichingAgent, DoclingEditingAgent, etc.) inherit from `BaseDoclingAgent` and define Pydantic fields, but then override `__init__` to manually set these fields, defeating Pydantic's purpose.

**Example from DoclingRAGAgent:**
```python
class DoclingRAGAgent(BaseDoclingAgent):
    max_iterations: Annotated[int, Field(description="...")] = 5
    verbose: Annotated[bool, Field(description="...")] = False
    # ... more fields
    
    def __init__(self, *, tools, backend=None, max_iterations=5, verbose=False, ...):
        super().__init__(
            agent_type=DoclingAgentType.DOCLING_DOCUMENT_RAG,
            backend=backend or self.default_backend(),
            tools=tools,
        )
        # Manually setting fields that are already defined as Pydantic fields
        self.max_iterations = max_iterations
        self.verbose = verbose
        # ...
```

### Problem Statement

1. **Semantic Mismatch**: Pydantic is designed for data models (validation, serialization, deserialization), not behavioral objects
2. **Confusing Initialization**: Mixing Pydantic field definitions with manual `__init__` assignments
3. **Unnecessary Overhead**: Pydantic validation/serialization features are never used for agents
4. **Misleading API**: Developers might expect Pydantic features (model_dump, model_validate) to work meaningfully
5. **Code Duplication**: Field definitions + manual assignments in `__init__`

### Motivation

**Agents are behavioral objects, not data models:**
- They execute actions and maintain state
- They have methods that "do things" with natural language instructions
- They don't need serialization, validation, or schema generation
- They are never persisted or transmitted as data

**Pydantic is appropriate for:**
- ✅ Configuration models (`AgentTask`, `BackendConfig`, `ModelConfig`)
- ✅ LLM output models (`SectionSelection`, `AnswerAttempt`, `RAGResult`)
- ✅ Operation validation models (`UpdateContentOperation`, `RewriteContentOperation`)
- ✅ Data persistence models (`DocLibraryEntry`, `DocLibraryIndex`)

**Pydantic is NOT appropriate for:**
- ❌ Agent classes (behavioral objects)

### Proposed Solution

Refactor `BaseDoclingAgent` to use Python's Abstract Base Class (ABC) pattern.

### Benefits

1. **Clearer Design Intent**: ABC clearly signals "interface/behavior" vs "data model"
2. **Simpler Initialization**: Standard Python `__init__`, no Pydantic confusion
3. **Better Performance**: No Pydantic validation overhead
4. **Easier Maintenance**: No mixing of Pydantic fields with manual assignments
5. **More Pythonic**: Standard OOP patterns for behavioral objects
6. **Better Documentation**: Field documentation can use docstrings instead of Pydantic Field
7. **Type Safety**: Still maintains type hints without Pydantic

### Required Changes

#### Files to Modify:
1. **`docling_agent/agent/base.py`**
   - Change `BaseDoclingAgent` from `BaseModel` to `ABC`
   - Remove `model_config`
   - Convert fields to `__init__` parameters
   - Keep all methods unchanged

2. **All Agent Classes:**
   - `docling_agent/agent/rag.py` - DoclingRAGAgent
   - `docling_agent/agent/enricher.py` - DoclingEnrichingAgent
   - `docling_agent/agent/editor.py` - DoclingEditingAgent
   - `docling_agent/agent/writer.py` - DoclingWritingAgent
   - `docling_agent/agent/extractor.py` - DoclingExtractingAgent
   - `docling_agent/agent/orchestrator.py` - DoclingOrchestratorAgent
   
   For each:
   - Remove Pydantic field definitions
   - Keep `__init__` methods (already correct)
   - Update `super().__init__()` calls if needed
   - Remove any Pydantic-specific code

3. **Tests:**
   - Update any tests that rely on Pydantic features
   - Verify all agent instantiation still works
   - Check that no code uses `model_dump`, `model_validate`, etc. on agents

### Backward Compatibility

**Breaking Changes:**
- Any code that treats agents as Pydantic models will break
- Code using `model_dump()`, `model_validate()`, `model_fields`, etc. on agents

**Mitigation:**
- Search codebase for Pydantic method calls on agents (likely none exist)
- Add deprecation warnings if needed
- Document changes in CHANGELOG

### Additional Notes

- This refactoring does NOT affect data models (AgentTask, BackendConfig, etc.) which correctly use Pydantic
- LLM output models (SectionSelection, AnswerAttempt, etc.) should remain Pydantic models
- The change is primarily architectural - functionality remains the same

### References

- Python ABC documentation: https://docs.python.org/3/library/abc.html
- Pydantic documentation: https://docs.pydantic.dev/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor BaseDoclingAgent from Pydantic BaseModel to Abstract Base Class (ABC) #40

Background

Problem Statement

Motivation

Proposed Solution

Benefits

Required Changes

Files to Modify:

Backward Compatibility

Additional Notes

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Refactor BaseDoclingAgent from Pydantic BaseModel to Abstract Base Class (ABC) #40

Description

Background

Problem Statement

Motivation

Proposed Solution

Benefits

Required Changes

Files to Modify:

Backward Compatibility

Additional Notes

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions