Architecture Refactoring Plan Documentation#535
Conversation
Document six priority fixes to address architectural complexity: 1. Unify HTTP handling - Replace dual HTTP wrappers with single HttpSession 2. Simplify variable system - Structured context instead of mixed dicts 3. Refactor Scraper - Split into Parser/Extractor/Renderer components 4. Fix notification flow - Event-based or session validation instead of backwards calls 5. Replace index-based discovery - Type-safe registry with unique IDs 6. Extract strategy pattern - Support HTML and JSON parsing natively Includes detailed implementation plans, code examples, migration paths, and timeline for each fix. Total estimated effort: 6-8 weeks. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Warning Rate limit exceeded@danieldotnl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 4 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughCentralizes HTTP into a new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Coordinator
participant Registry
participant Scraper as ScraperInstance
participant Http as HttpSession
participant Parser as ParserFactory
Note over Coordinator,Registry: Load platform by unique_id (registry-driven)
Coordinator->>Registry: get_scraper(unique_id)
Registry-->>Coordinator: ScraperInstance
Coordinator->>Scraper: start_scrape(request, scrape_ctx)
Scraper->>Http: request(url, auth_ctx)
Http-->>Scraper: Response (cookies/state updated)
Scraper->>Parser: parse(response)
Parser-->>Scraper: ParsedContent
Scraper->>Scraper: extract values (VariableRenderer / ValueExtractor)
Scraper-->>Coordinator: scrape_result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (1)
docs/ARCHITECTURE_REFACTORING.md (1)
344-344: Refine wording for conciseness.Consider replacing "mixed together without clear boundaries" with "commingled without clear boundaries" or simply "mixed without clear separation" for conciseness.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/ARCHITECTURE_REFACTORING.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/ARCHITECTURE_REFACTORING.md
[style] ~344-~344: ‘mixed together’ might be wordy. Consider a shorter alternative.
Context: ...bles come from multiple sources and are mixed together without clear boundaries: 1. **Form Va...
(EN_WORDINESS_PREMIUM_MIXED_TOGETHER)
🪛 markdownlint-cli2 (0.18.1)
docs/ARCHITECTURE_REFACTORING.md
51-51: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
988-988: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (2)
docs/ARCHITECTURE_REFACTORING.md (2)
1728-1763: Validate JSONPath implementation covers intended use cases.The basic JSONPath implementation (lines 1728–1763) handles dot notation and integer indices, but complex expressions (e.g., filters, wildcards) are not supported. Before implementation, verify this covers the expected JSON API queries from users, or plan to integrate
jsonpath-ngas the optional migration mentioned at line 1938.
1-2072: Excellent comprehensive refactoring plan with sound architectural decisions.This Architecture Refactoring Plan is well-structured and thoughtfully designed. Each of the six priority fixes addresses specific architectural problems with clear problem analysis, proposed solutions with working code examples, realistic benefits, and practical migration paths. The phased 8-week timeline is realistic, success metrics are measurable, and the decision to defer breaking changes to configuration shows good judgment.
The proposed patterns (Strategy for parsers, Registry for discovery, Context objects for variables, Session validation for authentication) are architecturally sound and follow established best practices.
However, address the markdown linting issues (missing language specifications on code blocks) before merging, and consider validating the JSONPath implementation scope.
| ```python | ||
| # In __init__.py | ||
| if form_submit_config: | ||
| form_http = create_http_wrapper(config_name, form_submit_config, hass, file_manager) | ||
| form_submitter = create_form_submitter( | ||
| config_name, | ||
| form_submit_config, | ||
| hass, | ||
| form_http, # First HTTP wrapper | ||
| file_manager, | ||
| parser, | ||
| ) | ||
|
|
||
| http = create_http_wrapper(config_name, conf, hass, file_manager) # Second HTTP wrapper | ||
| ``` |
There was a problem hiding this comment.
Add language specifications to fenced code blocks.
Code blocks at lines 26–40 and 51–76 lack language identifiers in the markdown fence. Line 51–76 should specify text for the ASCII diagram; lines 26–40 should specify python.
Update the fenced code block opening from ``` to ```python (or ```text for the diagram).
Also applies to: 51-76
🤖 Prompt for AI Agents
In docs/ARCHITECTURE_REFACTORING.md around lines 26–40 and 51–76, the fenced
code blocks are missing language identifiers; update the opening fence on lines
26–40 to use ```python for the Python snippet and update the opening fence on
lines 51–76 to use ```text for the ASCII diagram so that syntax
highlighting/rendering is applied correctly.
| ```python | ||
| # In entity.py:145 - Mixing form variables into scraping | ||
| attr_value = self.scraper.scrape( | ||
| attr_selector, | ||
| self._name, | ||
| name, | ||
| variables=self.coordinator.form_variables # ← Form variables passed everywhere | ||
| ) | ||
|
|
||
| # In coordinator.py:233 - Property chain | ||
| @property | ||
| def form_variables(self): | ||
| return self._request_manager.form_variables # ← Law of Demeter violation | ||
|
|
||
| # In selector.py - Templates receive mixed variable context | ||
| value = selector.value_template.async_render( | ||
| variables=variables, # ← What's in here? Form vars? Template vars? Both? | ||
| parse_result=True | ||
| ) |
There was a problem hiding this comment.
Add language specifications to additional code blocks.
Multiple code blocks lack language identifiers in markdown fences throughout Priority Fix #2 (lines 353–371, 385–453, 457–521, 525–566, 570–595). Add python language specifiers to each.
Also applies to: 385-453, 457-521, 525-566, 570-595
🤖 Prompt for AI Agents
In docs/ARCHITECTURE_REFACTORING.md around lines 353-371 (and additionally
385-453, 457-521, 525-566, 570-595), several fenced code blocks are missing
language specifiers; update each Markdown code fence in those ranges to include
the python language tag (```python) so they render with correct syntax
highlighting — locate every triple-backtick fence in the listed line ranges and
change the opening fence to ```python, leaving the block contents unchanged.
| ```python | ||
| def scrape(self, selector, sensor, attribute=None, variables: dict = {}): | ||
| """60+ line method doing everything.""" | ||
|
|
||
| # Template rendering (should be in VariableRenderer) | ||
| if selector.just_value: | ||
| result = selector.value_template.async_render_with_possible_json_value(...) | ||
| return selector.value_template._parse_result(result) | ||
|
|
||
| # Content type detection (should be in Parser) | ||
| content_stripped = self._data.lstrip() if self._data else "" | ||
| if content_stripped and content_stripped[0] in ["{", "["]: | ||
| raise ValueError("JSON cannot be scraped...") | ||
|
|
||
| # CSS selection (should be in Selector) | ||
| if selector.is_list: | ||
| tags = self._soup.select(selector.list) | ||
| else: | ||
| tag = self._soup.select_one(selector.element) | ||
|
|
||
| # Value extraction (should be in Extractor) | ||
| if selector.attribute is not None: | ||
| value = tag[selector.attribute] | ||
| else: | ||
| value = self.extract_tag_value(tag, selector) | ||
|
|
||
| # Template rendering again (should be in VariableRenderer) | ||
| if value is not None and selector.value_template is not None: | ||
| value = selector.value_template.async_render(...) | ||
| ``` |
There was a problem hiding this comment.
Add language specifications to code blocks in Priority Fix #3.
All fenced code blocks in this section (lines 638–667, 673–765, 768–816, 819–958) lack python language specifiers. Add them to each fence opening.
Also applies to: 673-765, 768-816, 819-958
🤖 Prompt for AI Agents
docs/ARCHITECTURE_REFACTORING.md lines 638-667 (and also 673-765, 768-816,
819-958): several fenced code blocks are missing the language specifier; update
each opening fence (``` ) in these ranges to include the python specifier
(```python) so syntax highlighting and tooling recognize the blocks; scan the
listed line ranges and replace every ``` with ```python for those code blocks,
leaving the fence closings unchanged.
| ``` | ||
| Entity (scraping fails) | ||
| ↓ | ||
| Coordinator.notify_scrape_exception() | ||
| ↓ | ||
| ContentRequestManager.notify_scrape_exception() | ||
| ↓ | ||
| FormSubmitter.notify_scrape_exception() | ||
| ↓ | ||
| Sets should_submit = True | ||
| ``` |
There was a problem hiding this comment.
Add language specifications to all code blocks in Priority Fix #4.
Multiple code blocks lack language identifiers: line 988–998 should be text, and lines 1002–1013, 1029–1113, 1074–1113, 1115–1180, 1184–1259 should all be python. Update each fence opening.
Also applies to: 1002-1013, 1029-1113, 1074-1113, 1115-1180, 1184-1259
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
988-988: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In docs/ARCHITECTURE_REFACTORING.md around lines 988-998 (and also apply to
1002-1013, 1029-1113, 1074-1113, 1115-1180, 1184-1259), several fenced code
blocks are missing language specifiers; change the fence opening on lines
988-998 to ```text and update the other listed fence openings to ```python so
each code block has the correct language tag; ensure you only modify the fence
lines (the opening triple-backticks) and leave the block contents unchanged.
| ```python | ||
| # In __init__.py:121-142 | ||
| hass.data[DOMAIN][SCRAPER_DATA].append( | ||
| {SCRAPER: scraper, COORDINATOR: coordinator} | ||
| ) | ||
|
|
||
| for platform_domain in PLATFORMS: | ||
| if platform_domain not in conf: | ||
| continue | ||
|
|
||
| for platform_conf in conf[platform_domain]: | ||
| hass.data[DOMAIN][platform_domain].append(platform_conf) | ||
| platform_idx = len(hass.data[DOMAIN][platform_domain]) - 1 | ||
|
|
||
| load = discovery.async_load_platform( | ||
| hass, | ||
| platform_domain, | ||
| DOMAIN, | ||
| {SCRAPER_IDX: scraper_idx, PLATFORM_IDX: platform_idx}, # ← Indices! | ||
| config, | ||
| ) | ||
|
|
||
| # In platforms (sensor.py, binary_sensor.py, etc): | ||
| async def async_get_config_and_coordinator(hass, platform_domain, discovery_info): | ||
| shared_data = hass.data[DOMAIN][SCRAPER_DATA][discovery_info[SCRAPER_IDX]] # ← Array lookup | ||
| conf = hass.data[DOMAIN][platform_domain][discovery_info[PLATFORM_IDX]] # ← Array lookup | ||
| coordinator = shared_data[COORDINATOR] | ||
| scraper = shared_data[SCRAPER] | ||
| return conf, coordinator, scraper | ||
| ``` |
There was a problem hiding this comment.
Add language specifications to all code blocks in Priority Fix #5.
All fenced code blocks in this section (lines 1310–1339, 1354–1453, 1457–1525, 1529–1578) should specify python language.
Also applies to: 1354-1453, 1457-1525, 1529-1578
🤖 Prompt for AI Agents
In docs/ARCHITECTURE_REFACTORING.md around lines 1310 to 1339 (and also apply
the same change to ranges 1354–1453, 1457–1525, 1529–1578), several fenced code
blocks lack a language specifier; update each triple-backtick fence to include
"python" (i.e., change ``` to ```python) so all code blocks in those ranges are
marked as Python.
| ```python | ||
| async def set_content(self, content): | ||
| """Set the content to be scraped.""" | ||
| self._data = content | ||
|
|
||
| # Try to detect JSON more robustly | ||
| content_stripped = content.lstrip() if content else "" | ||
| if content_stripped and content_stripped[0] in ["{", "["]: | ||
| _LOGGER.debug( | ||
| "%s # Response seems to be json. Skip parsing with BeautifulSoup.", | ||
| self._config_name, | ||
| ) | ||
| else: | ||
| try: | ||
| _LOGGER.debug( | ||
| "%s # Loading the content in BeautifulSoup.", | ||
| self._config_name, | ||
| ) | ||
| self._soup = await self._hass.async_add_executor_job( | ||
| BeautifulSoup, self._data, self._parser | ||
| ) | ||
| ``` | ||
|
|
||
| **Then during scraping**: | ||
|
|
||
| ```python | ||
| def scrape(self, selector, sensor, attribute=None, variables: dict = {}): | ||
| # Check AGAIN if content is JSON | ||
| content_stripped = self._data.lstrip() if self._data else "" | ||
| if content_stripped and content_stripped[0] in ["{", "["]: | ||
| raise ValueError( | ||
| "JSON cannot be scraped. Please provide a value template to parse JSON response." | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
Add language specifications to all code blocks in Priority Fix #6.
All fenced code blocks lack language specifiers: lines 1608–1641, 1656–1861, 1865–1904 should be python; line 1908–1919 should be yaml.
Also applies to: 1656-1861, 1865-1904, 1908-1919
🤖 Prompt for AI Agents
In docs/ARCHITECTURE_REFACTORING.md around lines 1608–1641, 1656–1861, 1865–1904
and 1908–1919, the fenced code blocks are missing language specifiers; update
the opening fence for the code blocks in lines 1608–1641, 1656–1861 and
1865–1904 to ```python and update the block in lines 1908–1919 to ```yaml so
each fenced block includes the correct language label for syntax highlighting.
Corrections based on code review: - Both HTTP wrappers already use get_async_client(hass) for connection pooling - The real issue is cookie management, not separate clients - Per-request cookies are necessary because shared HA client can't maintain per-integration session state - Updated HttpSession design to use get_async_client() and manage cookies in _session_cookies dict - Changed close() to not close shared client, only cleanup session state The problem was cookie juggling complexity, not connection pooling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…e jar Major improvement based on discussion: Instead of using HA's shared client and manually managing cookies, use a dedicated httpx.AsyncClient per instance. Why this is better: - httpx DOES handle cookies automatically via built-in cookie jar - Problem: can't use shared HA client's cookie jar (shared by all integrations) - Solution: Each multiscrape instance gets its own httpx.AsyncClient - Result: Zero manual cookie extraction/passing - httpx does it all! This is how httpx is meant to be used for stateful scraping. Trade-off: - Pro: Natural session management, simpler code, standards-compliant - Con: More connection pools (but still pooled within each instance) - Con: Deviates from HA's get_async_client() pattern (but justified) The current approach of manual cookie juggling defeats the purpose of using an HTTP library that has session management built-in. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
docs/ARCHITECTURE_REFACTORING.md (1)
26-40: Add language specifiers to all fenced code blocks throughout the document.All fenced code blocks are missing language identifiers. Prior reviewers flagged this across multiple sections; these remain unaddressed. This affects syntax highlighting, rendering, and tooling support.
Code block language mappings required:
- Priority Fix #1 (HTTP): Lines 26–40, 91–262, 266–304, 308–360 →
python; lines 51–76 →text- Priority Fix #2 (Variables): Lines 406–425, 438–506, 510–574, 578–619, 623–648 →
python- Priority Fix #3 (Scraper): Lines 691–720, 727–818, 820–869, 872–1011 →
python- Priority Fix #4 (Notification): Lines 1041–1051 →
text; lines 1055–1066, 1082–1123, 1125–1166, 1168–1233, 1237–1312 →python- Priority Fix #5 (Registry): Lines 1363–1392, 1406–1506, 1510–1578, 1582–1631 →
python- Priority Fix #6 (Parser): Lines 1661–1694 →
text; lines 1708–1914, 1918–1957 →python; lines 1961–1972 →yamlFix approach: Use a shell script or find-and-replace to systematically update all fence openings:
#!/bin/bash # Find all triple-backtick fences and add language specifiers # This is a template; adapt to your specific needs # For Python code blocks (majority): sed -i '/^```$/s/^```$/```python/' docs/ARCHITECTURE_REFACTORING.md # Then manually verify and correct specific exceptions (text, yaml blocks)Alternatively, manually edit each fence opening across the listed line ranges to include the language tag.
Also applies to: 51-76, 353-371, 385-453, 457-521, 525-566, 570-595, 638-667, 673-765, 768-816, 819-958, 988-998, 1002-1013, 1029-1113, 1074-1113, 1115-1180, 1184-1259, 1310-1339, 1354-1453, 1457-1525, 1529-1578, 1608-1641, 1656-1861, 1865-1904, 1908-1919
🧹 Nitpick comments (1)
docs/ARCHITECTURE_REFACTORING.md (1)
397-397: Minor: Consider rewording for conciseness.Line 397: "are mixed together without clear boundaries" is slightly wordy. Consider: "are intermingled without clear boundaries" or "lack clear separation."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/ARCHITECTURE_REFACTORING.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/ARCHITECTURE_REFACTORING.md
[style] ~397-~397: ‘mixed together’ might be wordy. Consider a shorter alternative.
Context: ...bles come from multiple sources and are mixed together without clear boundaries: 1. **Form Va...
(EN_WORDINESS_PREMIUM_MIXED_TOGETHER)
🪛 markdownlint-cli2 (0.18.1)
docs/ARCHITECTURE_REFACTORING.md
60-60: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1041-1041: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (1)
docs/ARCHITECTURE_REFACTORING.md (1)
1-2120: Comprehensive, well-structured refactoring plan that clearly articulates architectural improvements.The documentation excels at:
- Problem identification: Each section opens with concrete code examples and file references showing current issues (e.g., dual HTTP wrappers, cookie juggling, backwards notification chains)
- Solution design: Proposed solutions follow established design patterns (Strategy, Factory, Registry) with clear rationales
- Migration path: Each fix includes step-by-step migration guidance
- Holistic scope: Coverage spans HTTP management, variable context, parser abstraction, event-based error handling, and type-safe discovery—addressing root causes rather than symptoms
- Tradeoff analysis: Honest discussion of pros/cons (e.g., dedicated httpx client vs. HA's shared client)
- Realistic timeline: 8-week phased approach with clear milestones and testing strategy
- Breaking-change clarity: Explicit assurance that no config changes break existing setups
Substantive strengths:
- HTTP Session (#1): The reasoning for a dedicated
httpx.AsyncClientis sound—leveraging httpx's built-in cookie jar instead of manual extraction/passing is idiomatic and simpler.- Variable Context (#2): Structured
AuthContextandScrapeContextclasses eliminate variable-source ambiguity and enable type safety.- Scraper Refactoring (#3): Split of parsing, extraction, and rendering into separate classes reduces cognitive load and enables independent testing.
- Parser Strategy (#6): Abstract
ContentParserhierarchy withHtmlParser/JsonParsercleanly supports multi-format scraping without modifying core logic.- Registry Pattern (#5): Replaces fragile index-based discovery with typed dataclasses and unique IDs—safer reloads and better debuggability.
- Notification Flow (#4): Session validation approach is pragmatic for MVP, with clear path to event-based architecture if needed later.
Minor observations:
- Success metrics section is measurable and actionable (Scraper 197 lines → ~60 lines, cyclomatic complexity <10, >90% unit test coverage).
- Questions & Decisions section shows thoughtful trade-off analysis (e.g., basic JSONPath vs. advanced, session validation vs. events).
- References are helpful (DataUpdateCoordinator, Strategy Pattern, JSONPath spec).
Critical findings from HA and httpx source code analysis: 1. httpx DOES automatically persist cookies by default - Any Set-Cookie headers go into client.cookies - Stored cookies are sent with subsequent requests - This is built-in behavior, not opt-in 2. Home Assistant's get_async_client() does NOT disable this - The shared client has an active cookie jar - All integrations using it share the same jar! - This is fine for stateless requests, problematic for sessions 3. Current multiscrape workaround uses deprecated pattern - Passes cookies per-request via cookies= parameter - httpx deprecated this in favor of client-level cookies - This prevents accidental pollution of shared jar 4. Why dedicated client is correct: - HA's shared client is for STATELESS requests - Stateful scraping needs isolated cookie jar - HA docs mention create_async_httpx_client kwargs "i.e. for cookies" - Multiple multiscrape instances would conflict on shared jar The solution is justified: dedicated httpx client per instance is the right pattern for session-based scraping. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/ARCHITECTURE_REFACTORING.md (2)
26-40: Add language specifications to all fenced code blocks throughout the document.All code fences in this document lack language identifiers. These should be added for proper syntax highlighting and tool recognition.
Affected ranges and required language tags:
- Lines 26–40, 97–268, 272–310, 314–366, 353–371, 385–453, 457–521, 525–566, 570–595, 638–667, 673–765, 768–816, 819–958, 1002–1013, 1029–1113, 1115–1180, 1184–1259, 1310–1339, 1354–1453, 1457–1525, 1529–1578, 1608–1641, 1656–1861, 1865–1904: Use
```python- Lines 51–76, 988–998: Use
```text- Line 1908–1919: Use
```yamlFor example, change
```to```pythonat line 26, and```to```textat line 51.Also applies to: 51-76, 97-268, 272-310, 314-366, 353-371, 385-453, 457-521, 525-566, 570-595, 638-667, 673-765, 768-816, 819-958, 988-998, 1002-1013, 1029-1113, 1074-1113, 1115-1180, 1184-1259, 1310-1339, 1354-1453, 1457-1525, 1529-1578, 1608-1641, 1656-1861, 1865-1904, 1908-1919
407-407: Simplify wording: "mixed together" → "intermixed" or "combined".Line 407 uses "mixed together," which is slightly wordy. Consider a more concise alternative.
Current: "Variables come from multiple sources and are mixed together without clear boundaries:"
Suggested: "Variables come from multiple sources and are intermixed without clear boundaries:"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/ARCHITECTURE_REFACTORING.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/ARCHITECTURE_REFACTORING.md
[style] ~407-~407: ‘mixed together’ might be wordy. Consider a shorter alternative.
Context: ...bles come from multiple sources and are mixed together without clear boundaries: 1. **Form Va...
(EN_WORDINESS_PREMIUM_MIXED_TOGETHER)
🪛 markdownlint-cli2 (0.18.1)
docs/ARCHITECTURE_REFACTORING.md
66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
1051-1051: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Research confirms Home Assistant's aiohttp helper has identical behavior: 1. aiohttp.ClientSession creates CookieJar() by default - Automatically stores cookies from Set-Cookie headers - Automatically sends cookies with subsequent requests - Can be disabled with DummyCookieJar() 2. HA's async_get_clientsession() does NOT disable this - No cookie_jar parameter passed to ClientSession - Uses default behavior = active cookie jar - Same docstring: "with kwargs, i.e. for cookies" 3. Identical issue to httpx - Both shared clients have active cookie jars - All integrations share the same jar - Fine for stateless requests, bad for sessions Conclusion: Whether using httpx or aiohttp, dedicated client per instance is the correct pattern for session-based scraping. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
This PR adds comprehensive documentation for refactoring the ha-multiscrape architecture to address complexity issues identified in the codebase.
What's Included
A detailed refactoring plan covering 6 priority architectural fixes:
Unify HTTP Handling - Replace dual HTTP wrapper instances with a single
HttpSessionclass that properly manages cookies, authentication, and session stateSimplify Variable System - Replace the confusing mix of form variables, template variables, and selector variables with a structured
ScrapeContextsystemRefactor Scraper Class - Split the monolithic 197-line
Scraperclass into focused components using the Strategy pattern (Parser,Extractor,Renderer)Fix Notification Flow - Eliminate backwards dependency flow by using session validation or event-based architecture instead of
notify_scrape_exception()chainsReplace Index-Based Discovery - Replace fragile array indices with a type-safe
ScraperRegistryusing unique IDsExtract Strategy Pattern for Parsers - Enable native JSON API support alongside HTML scraping using pluggable parser strategies
Each Section Includes
Additional Content
Why This Matters
The current architecture works but has accumulated complexity that makes it:
This refactoring plan provides a roadmap to make the codebase more maintainable, testable, and extensible while maintaining backwards compatibility.
Next Steps
This is documentation only - no code changes yet. The plan serves as:
Feedback welcome!
🤖 Generated with Claude Code
Summary by CodeRabbit