added placeholder button to see full html tree by Ollynov · Pull Request #1 · Ollynov/web-scraper

Ollynov · 2025-10-29T06:12:36Z

No description provided.

roomote-v0 · 2025-10-29T06:13:22Z

Examining your PR now. Feedback coming soon!

roomote-v0 · 2025-10-29T06:14:28Z

+      const simpleScrape = async () => {
+        const inputVal = document.getElementById("urlInput").value;
+        const data = await fetch("/api/simple-crawl", {
+          method: "POST",
+          headers: { "Content-Type": "application/json" },
+          body: JSON.stringify({ url: inputVal }),
+        });
+        const resp = await data.json();
+        console.log("ok our html: ", resp);
+
+        const parser = new DOMParser();
+        const doc = parser.parseFromString(resp.data, "text/html");
+
+        const simplified = simplifyDOMTree(doc.body);
+        console.dir(simplified, { depth: null });
+      };


The simpleScrape() function lacks error handling. If the fetch fails or JSON parsing errors occur, users will see unhandled promise rejections in the console with no UI feedback. Wrap the function body in a try-catch block and display error messages using the existing showStatus() function.

roomote-v0 · 2025-10-29T06:14:40Z

        />
-        <button id="crawlBtn" onclick="crawlUrl()">Crawl</button>
+        <!-- <button id="crawlBtn" onclick="crawlUrl()">Crawl</button> -->
+        <button id="crawlBtn" onclick="simpleScrape()">Crawl</button>


The button now calls simpleScrape() but the Enter key handler in app.js line 246 still calls crawlUrl(). This creates inconsistent behavior where clicking the button and pressing Enter trigger different functions. Update the Enter key handler to call simpleScrape() instead, or ensure both paths use the same function.

Got it, and am I able to ask you additional questions right through these comments?

roomote-v0 · 2025-10-29T06:14:52Z

+      function findByText(simplifiedNode, keyword, path = "html > body") {
+        const matches = [];
+        const lowerKeyword = keyword.toLowerCase();
+
+        function recurse(node, currentPath) {
+          if (!node) return;
+
+          const thisPath = `${currentPath} > ${node.tag || "text"}`;
+
+          // Check for a match in text content
+          if (node.text && node.text.toLowerCase().includes(lowerKeyword)) {
+            matches.push({
+              tag: node.tag || "text",
+              path: thisPath,
+              text: node.text,
+            });
+          }
+
+          // Recurse on children
+          if (node.children && node.children.length > 0) {
+            for (const child of node.children) {
+              recurse(child, thisPath);
+            }
+          }
+        }
+
+        recurse(simplifiedNode, path);
+        return matches;
+      }


The findByText() function is defined but never called anywhere in the codebase. This is dead code that should either be integrated into the tree functionality or removed to keep the codebase clean.

And @roomote what do you think about line 98 above, where we use recurse(), is there any chance this could lead to a bug?

roomote-v0 · 2025-10-29T06:15:17Z

+app.post("/api/simple-crawl", async (req, res) => {
+  try {
+    const { url } = req.body;
+
+    if (!url) {
+      return res.status(400).json({ error: "URL is required" });
+    }
+
+    const data = await fetch(url);
+    const html = await data.text();
+
+    res.json({
+      success: true,
+      message: "Crawl completed",
+      data: html,
+    });
+  } catch (error) {
+    console.error("Crawl error:", error);
+    res.status(500).json({
+      error: "Failed to crawl URL",
+      message: error.message,
+    });
+  }
+});


The /api/simple-crawl endpoint doesn't validate the URL format before fetching. Invalid URLs (like 'not-a-url' or malformed URLs) will cause the fetch to fail with unclear error messages. Add URL validation using the URL constructor in a try-catch block before attempting to fetch, and return a clear 400 error for invalid URLs.

added placeholder button to see full html tree

d9bb52c

roomote-v0 Bot reviewed Oct 29, 2025

View reviewed changes

Ollynov added 10 commits January 4, 2026 12:23

setup initial architecture

036ed8a

priority 3 tackled

073a34c

priority 4 done

e50d7f4

swagger and open api specs

0231559

added prisma

93999bd

db migration and updates

c9104fd

further setup

8154df5

full tailwind setup

7c2c92b

ready to remove all auth logic

98150db

create more generic functionality

38f4b0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added placeholder button to see full html tree#1

added placeholder button to see full html tree#1
Ollynov wants to merge 11 commits intomainfrom
featureBranch

Ollynov commented Oct 29, 2025

Uh oh!

roomote-v0 Bot commented Oct 29, 2025

Uh oh!

roomote-v0 Bot Oct 29, 2025

Uh oh!

roomote-v0 Bot Oct 29, 2025

Uh oh!

Ollynov Oct 29, 2025

Uh oh!

roomote-v0 Bot Oct 29, 2025

Uh oh!

Ollynov Oct 29, 2025

Uh oh!

roomote-v0 Bot Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ollynov commented Oct 29, 2025

Uh oh!

roomote-v0 Bot commented Oct 29, 2025

Uh oh!

roomote-v0 Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

roomote-v0 Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Ollynov Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

roomote-v0 Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Ollynov Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

roomote-v0 Bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant