Conversation
|
Examining your PR now. Feedback coming soon! |
| const simpleScrape = async () => { | ||
| const inputVal = document.getElementById("urlInput").value; | ||
| const data = await fetch("/api/simple-crawl", { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| body: JSON.stringify({ url: inputVal }), | ||
| }); | ||
| const resp = await data.json(); | ||
| console.log("ok our html: ", resp); | ||
|
|
||
| const parser = new DOMParser(); | ||
| const doc = parser.parseFromString(resp.data, "text/html"); | ||
|
|
||
| const simplified = simplifyDOMTree(doc.body); | ||
| console.dir(simplified, { depth: null }); | ||
| }; |
There was a problem hiding this comment.
The simpleScrape() function lacks error handling. If the fetch fails or JSON parsing errors occur, users will see unhandled promise rejections in the console with no UI feedback. Wrap the function body in a try-catch block and display error messages using the existing showStatus() function.
| /> | ||
| <button id="crawlBtn" onclick="crawlUrl()">Crawl</button> | ||
| <!-- <button id="crawlBtn" onclick="crawlUrl()">Crawl</button> --> | ||
| <button id="crawlBtn" onclick="simpleScrape()">Crawl</button> |
There was a problem hiding this comment.
The button now calls simpleScrape() but the Enter key handler in app.js line 246 still calls crawlUrl(). This creates inconsistent behavior where clicking the button and pressing Enter trigger different functions. Update the Enter key handler to call simpleScrape() instead, or ensure both paths use the same function.
There was a problem hiding this comment.
Got it, and am I able to ask you additional questions right through these comments?
| function findByText(simplifiedNode, keyword, path = "html > body") { | ||
| const matches = []; | ||
| const lowerKeyword = keyword.toLowerCase(); | ||
|
|
||
| function recurse(node, currentPath) { | ||
| if (!node) return; | ||
|
|
||
| const thisPath = `${currentPath} > ${node.tag || "text"}`; | ||
|
|
||
| // Check for a match in text content | ||
| if (node.text && node.text.toLowerCase().includes(lowerKeyword)) { | ||
| matches.push({ | ||
| tag: node.tag || "text", | ||
| path: thisPath, | ||
| text: node.text, | ||
| }); | ||
| } | ||
|
|
||
| // Recurse on children | ||
| if (node.children && node.children.length > 0) { | ||
| for (const child of node.children) { | ||
| recurse(child, thisPath); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| recurse(simplifiedNode, path); | ||
| return matches; | ||
| } |
There was a problem hiding this comment.
The findByText() function is defined but never called anywhere in the codebase. This is dead code that should either be integrated into the tree functionality or removed to keep the codebase clean.
There was a problem hiding this comment.
And @roomote what do you think about line 98 above, where we use recurse(), is there any chance this could lead to a bug?
| app.post("/api/simple-crawl", async (req, res) => { | ||
| try { | ||
| const { url } = req.body; | ||
|
|
||
| if (!url) { | ||
| return res.status(400).json({ error: "URL is required" }); | ||
| } | ||
|
|
||
| const data = await fetch(url); | ||
| const html = await data.text(); | ||
|
|
||
| res.json({ | ||
| success: true, | ||
| message: "Crawl completed", | ||
| data: html, | ||
| }); | ||
| } catch (error) { | ||
| console.error("Crawl error:", error); | ||
| res.status(500).json({ | ||
| error: "Failed to crawl URL", | ||
| message: error.message, | ||
| }); | ||
| } | ||
| }); |
There was a problem hiding this comment.
The /api/simple-crawl endpoint doesn't validate the URL format before fetching. Invalid URLs (like 'not-a-url' or malformed URLs) will cause the fetch to fail with unclear error messages. Add URL validation using the URL constructor in a try-catch block before attempting to fetch, and return a clear 400 error for invalid URLs.
No description provided.