Skip to content

added placeholder button to see full html tree#1

Open
Ollynov wants to merge 11 commits intomainfrom
featureBranch
Open

added placeholder button to see full html tree#1
Ollynov wants to merge 11 commits intomainfrom
featureBranch

Conversation

@Ollynov
Copy link
Copy Markdown
Owner

@Ollynov Ollynov commented Oct 29, 2025

No description provided.

@roomote-v0
Copy link
Copy Markdown

roomote-v0 Bot commented Oct 29, 2025

Examining your PR now. Feedback coming soon!

Follow Along on Roo Code Cloud

Comment thread public/index.html
Comment on lines +102 to +117
const simpleScrape = async () => {
const inputVal = document.getElementById("urlInput").value;
const data = await fetch("/api/simple-crawl", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ url: inputVal }),
});
const resp = await data.json();
console.log("ok our html: ", resp);

const parser = new DOMParser();
const doc = parser.parseFromString(resp.data, "text/html");

const simplified = simplifyDOMTree(doc.body);
console.dir(simplified, { depth: null });
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simpleScrape() function lacks error handling. If the fetch fails or JSON parsing errors occur, users will see unhandled promise rejections in the console with no UI feedback. Wrap the function body in a try-catch block and display error messages using the existing showStatus() function.

Comment thread public/index.html
/>
<button id="crawlBtn" onclick="crawlUrl()">Crawl</button>
<!-- <button id="crawlBtn" onclick="crawlUrl()">Crawl</button> -->
<button id="crawlBtn" onclick="simpleScrape()">Crawl</button>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The button now calls simpleScrape() but the Enter key handler in app.js line 246 still calls crawlUrl(). This creates inconsistent behavior where clicking the button and pressing Enter trigger different functions. Update the Enter key handler to call simpleScrape() instead, or ensure both paths use the same function.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, and am I able to ask you additional questions right through these comments?

Comment thread public/index.html
Comment on lines +72 to +100
function findByText(simplifiedNode, keyword, path = "html > body") {
const matches = [];
const lowerKeyword = keyword.toLowerCase();

function recurse(node, currentPath) {
if (!node) return;

const thisPath = `${currentPath} > ${node.tag || "text"}`;

// Check for a match in text content
if (node.text && node.text.toLowerCase().includes(lowerKeyword)) {
matches.push({
tag: node.tag || "text",
path: thisPath,
text: node.text,
});
}

// Recurse on children
if (node.children && node.children.length > 0) {
for (const child of node.children) {
recurse(child, thisPath);
}
}
}

recurse(simplifiedNode, path);
return matches;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The findByText() function is defined but never called anywhere in the codebase. This is dead code that should either be integrated into the tree functionality or removed to keep the codebase clean.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @roomote what do you think about line 98 above, where we use recurse(), is there any chance this could lead to a bug?

Comment thread src/index.js Outdated
Comment on lines +19 to +42
app.post("/api/simple-crawl", async (req, res) => {
try {
const { url } = req.body;

if (!url) {
return res.status(400).json({ error: "URL is required" });
}

const data = await fetch(url);
const html = await data.text();

res.json({
success: true,
message: "Crawl completed",
data: html,
});
} catch (error) {
console.error("Crawl error:", error);
res.status(500).json({
error: "Failed to crawl URL",
message: error.message,
});
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /api/simple-crawl endpoint doesn't validate the URL format before fetching. Invalid URLs (like 'not-a-url' or malformed URLs) will cause the fetch to fail with unclear error messages. Add URL validation using the URL constructor in a try-catch block before attempting to fetch, and return a clear 400 error for invalid URLs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant