Background
Currently, BaseDoclingAgent inherits from Pydantic's BaseModel, which creates a design inconsistency between the intended use of Pydantic (data validation and serialization) and the actual purpose of agent classes (behavioral objects that execute actions).
Current Implementation:
# docling_agent/agent/base.py
from pydantic import BaseModel, ConfigDict
class BaseDoclingAgent(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
agent_type: DoclingAgentType
backend: BaseBackend
tools: list
max_iteration: int = 16
Agent Subclasses:
All agent classes (DoclingRAGAgent, DoclingEnrichingAgent, DoclingEditingAgent, etc.) inherit from BaseDoclingAgent and define Pydantic fields, but then override __init__ to manually set these fields, defeating Pydantic's purpose.
Example from DoclingRAGAgent:
class DoclingRAGAgent(BaseDoclingAgent):
max_iterations: Annotated[int, Field(description="...")] = 5
verbose: Annotated[bool, Field(description="...")] = False
# ... more fields
def __init__(self, *, tools, backend=None, max_iterations=5, verbose=False, ...):
super().__init__(
agent_type=DoclingAgentType.DOCLING_DOCUMENT_RAG,
backend=backend or self.default_backend(),
tools=tools,
)
# Manually setting fields that are already defined as Pydantic fields
self.max_iterations = max_iterations
self.verbose = verbose
# ...
Problem Statement
- Semantic Mismatch: Pydantic is designed for data models (validation, serialization, deserialization), not behavioral objects
- Confusing Initialization: Mixing Pydantic field definitions with manual
__init__ assignments
- Unnecessary Overhead: Pydantic validation/serialization features are never used for agents
- Misleading API: Developers might expect Pydantic features (model_dump, model_validate) to work meaningfully
- Code Duplication: Field definitions + manual assignments in
__init__
Motivation
Agents are behavioral objects, not data models:
- They execute actions and maintain state
- They have methods that "do things" with natural language instructions
- They don't need serialization, validation, or schema generation
- They are never persisted or transmitted as data
Pydantic is appropriate for:
- ✅ Configuration models (
AgentTask, BackendConfig, ModelConfig)
- ✅ LLM output models (
SectionSelection, AnswerAttempt, RAGResult)
- ✅ Operation validation models (
UpdateContentOperation, RewriteContentOperation)
- ✅ Data persistence models (
DocLibraryEntry, DocLibraryIndex)
Pydantic is NOT appropriate for:
- ❌ Agent classes (behavioral objects)
Proposed Solution
Refactor BaseDoclingAgent to use Python's Abstract Base Class (ABC) pattern.
Benefits
- Clearer Design Intent: ABC clearly signals "interface/behavior" vs "data model"
- Simpler Initialization: Standard Python
__init__, no Pydantic confusion
- Better Performance: No Pydantic validation overhead
- Easier Maintenance: No mixing of Pydantic fields with manual assignments
- More Pythonic: Standard OOP patterns for behavioral objects
- Better Documentation: Field documentation can use docstrings instead of Pydantic Field
- Type Safety: Still maintains type hints without Pydantic
Required Changes
Files to Modify:
-
docling_agent/agent/base.py
- Change
BaseDoclingAgent from BaseModel to ABC
- Remove
model_config
- Convert fields to
__init__ parameters
- Keep all methods unchanged
-
All Agent Classes:
docling_agent/agent/rag.py - DoclingRAGAgent
docling_agent/agent/enricher.py - DoclingEnrichingAgent
docling_agent/agent/editor.py - DoclingEditingAgent
docling_agent/agent/writer.py - DoclingWritingAgent
docling_agent/agent/extractor.py - DoclingExtractingAgent
docling_agent/agent/orchestrator.py - DoclingOrchestratorAgent
For each:
- Remove Pydantic field definitions
- Keep
__init__ methods (already correct)
- Update
super().__init__() calls if needed
- Remove any Pydantic-specific code
-
Tests:
- Update any tests that rely on Pydantic features
- Verify all agent instantiation still works
- Check that no code uses
model_dump, model_validate, etc. on agents
Backward Compatibility
Breaking Changes:
- Any code that treats agents as Pydantic models will break
- Code using
model_dump(), model_validate(), model_fields, etc. on agents
Mitigation:
- Search codebase for Pydantic method calls on agents (likely none exist)
- Add deprecation warnings if needed
- Document changes in CHANGELOG
Additional Notes
- This refactoring does NOT affect data models (AgentTask, BackendConfig, etc.) which correctly use Pydantic
- LLM output models (SectionSelection, AnswerAttempt, etc.) should remain Pydantic models
- The change is primarily architectural - functionality remains the same
References
Background
Currently,
BaseDoclingAgentinherits from Pydantic'sBaseModel, which creates a design inconsistency between the intended use of Pydantic (data validation and serialization) and the actual purpose of agent classes (behavioral objects that execute actions).Current Implementation:
Agent Subclasses:
All agent classes (DoclingRAGAgent, DoclingEnrichingAgent, DoclingEditingAgent, etc.) inherit from
BaseDoclingAgentand define Pydantic fields, but then override__init__to manually set these fields, defeating Pydantic's purpose.Example from DoclingRAGAgent:
Problem Statement
__init__assignments__init__Motivation
Agents are behavioral objects, not data models:
Pydantic is appropriate for:
AgentTask,BackendConfig,ModelConfig)SectionSelection,AnswerAttempt,RAGResult)UpdateContentOperation,RewriteContentOperation)DocLibraryEntry,DocLibraryIndex)Pydantic is NOT appropriate for:
Proposed Solution
Refactor
BaseDoclingAgentto use Python's Abstract Base Class (ABC) pattern.Benefits
__init__, no Pydantic confusionRequired Changes
Files to Modify:
docling_agent/agent/base.pyBaseDoclingAgentfromBaseModeltoABCmodel_config__init__parametersAll Agent Classes:
docling_agent/agent/rag.py- DoclingRAGAgentdocling_agent/agent/enricher.py- DoclingEnrichingAgentdocling_agent/agent/editor.py- DoclingEditingAgentdocling_agent/agent/writer.py- DoclingWritingAgentdocling_agent/agent/extractor.py- DoclingExtractingAgentdocling_agent/agent/orchestrator.py- DoclingOrchestratorAgentFor each:
__init__methods (already correct)super().__init__()calls if neededTests:
model_dump,model_validate, etc. on agentsBackward Compatibility
Breaking Changes:
model_dump(),model_validate(),model_fields, etc. on agentsMitigation:
Additional Notes
References