Skip to content

fix(linkedin): three independent adapter bugs (profile-experience timeout, connect false-positive, thread-snapshot leak)#1913

Open
archits01 wants to merge 6 commits into
jackwener:mainfrom
archits01:openslide-vendor-patches
Open

fix(linkedin): three independent adapter bugs (profile-experience timeout, connect false-positive, thread-snapshot leak)#1913
archits01 wants to merge 6 commits into
jackwener:mainfrom
archits01:openslide-vendor-patches

Conversation

@archits01

Copy link
Copy Markdown

Summary

Three independent bugs in clis/linkedin/ adapters, found while building a downstream LinkedIn skill on top of OpenCLI v1.8.3. Each commit is self-contained and can be cherry-picked individually.

1. profile-experience / profile-projectspage.wait timeout unit mismatch

Both adapters pass timeout: 10000 to page.wait({text/selector, timeout}) assuming milliseconds. The implementation in base-page.js treats options.timeout as seconds and multiplies by 1000 — so the wait was actually 10,000 s × 1000 ms = ~2.8 hours. The 60 s command-level timeout fires first, so profile-experience always errors out with `linkedin/profile-experience timed out after 60s`. `profile-projects` happens to work because "Projects" text loads fast enough — but the bug is the same.

  • Repro: `opencli linkedin profile-experience --profile-url https://www.linkedin.com/in// --site-session persistent` against any valid profile → consistently times out at 60 s.
  • Trace shows pre_navigate completes, then the adapter hangs at the `page.wait({ text: 'Experience', timeout: 10000 })` call with no further actions.
  • Fix: `10000 → 10` at 4 call sites (profile-experience.js:499/503/643, profile-projects.js:288). Matches how `search.js` already calls the same API (`timeout: 8`, `timeout: 10`).

2. connect — false-positive already_connected on unconnected profiles

`buildProfileProbeScript` scans every visible button on the page for label "Message". The safety logic then checks `alreadyConnected` before `connectAvailable`. Sidebar widgets ("People also viewed"), premium-tier prompts, and overlay UI render "Message" elements on profiles you have no connection with — so `connect` reports `status: not_connectable, reason: already_connected` for strangers whose profiles clearly display a Connect button.

  • Repro: pick any profile of someone you are not connected to. `opencli linkedin connect '' --expected-name ''` returns `already_connected` even though the page has a visible Connect button.
  • Empirically reproduced on multiple unconnected profiles.
  • Fix: gate `alreadyConnected` on `!connectAvailable` at the safety check (connect.js:119) — the explicit Connect button is the authoritative tell. Smallest possible diff (one line). Alternative cleaner fix would scope the button-label scan to the top profile card.

3. thread-snapshot — sidebar conversation leak in `snapshot_json`

The adapter exposes two leaks of unrelated content via the `snapshot_json` output field:

  • `bodyText: refreshedText` in the return object dumps the entire `document.body.innerText` — which on the messaging page includes the sidebar conversation list (other people's thread previews), "Sponsored" InMails, and navigation chrome.
  • `headerSelectors` includes nine selectors, six of which are unscoped (`main h1`, `main h2`, `[data-anonymize="person-name"]`, `a[href*="/in/"]`, `.msg-conversation-card__participant-names`). On the messaging page these match the sidebar's conversation cards — so `headerNames` returns every contact in the user's recent inbox.

Empirical repro on a real thread: `headerNames` returned 10 names — the active thread participant plus 8 unrelated sidebar contacts. `bodyText` was ~5 KB of unrelated conversation previews.

Fix:

  • Remove `bodyText: refreshedText,` from the return. The internal `refreshedText` variable is still used to compute the `latestMessageText` fallback inside the IIFE; only the externally-exposed output drops it.
  • Reduce `headerSelectors` to three thread-scoped selectors: `.msg-thread__link-to-profile`, `.msg-thread__link-to-profile span[aria-hidden="true"]`, and `.msg-thread .msg-entity-lockup__entity-title`. Recipient extraction still works because the consumer reads `headerNames[0]`, which the first selector still provides.
  • No test changes — existing `thread-snapshot.test.js` does not assert on `bodyText` or unrelated `headerNames`.

Test plan

  • `profile-experience --profile-url ` returns 7 entries with title/company/dates/location/description in ~25 s (was timing out at 60 s).
  • `profile-projects --profile-url ` still returns the projects list (regression check).
  • `connect --profile-url --expected-name ` now returns `connectable: true` (was `already_connected`). With `--send true`, real invite delivered and `delivery_verified: true`.
  • `connect` on a profile you ARE connected to still correctly returns `already_connected` (no regression).
  • `thread-snapshot --thread-url ` returns `headerNames: [""]` only (was returning 10+ unrelated names). `messages` array still complete (41 messages on the test thread). `snapshot_json` no longer contains any unrelated conversation text.

All three fixes are independent. Happy to split into separate PRs if preferred, or to land any subset.

🤖 Generated with Claude Code

archits01 and others added 6 commits June 10, 2026 18:01
profile-experience.js and profile-projects.js passed `timeout: 10000`
to page.wait({text/selector, timeout}) assuming milliseconds. The
page.wait() implementation in base-page.js treats `options.timeout` as
seconds and multiplies by 1000, so the adapters were actually waiting
10,000,000ms (~2.8 hours) instead of 10s.

In practice the command-level 60s timeout fires first, so
profile-experience always failed with "linkedin/profile-experience
timed out after 60s". profile-projects happened to work because
"Projects" text rendered fast enough that the wait resolved before the
60s cap — but the bug was still present.

Fix: change `timeout: 10000` → `timeout: 10` (4 occurrences across the
two files), bringing them in line with how search.js uses the same API
(`timeout: 8`, `timeout: 10`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nect button is present

buildProfileProbeScript scans ALL visible buttons on the profile page
for "Message" text (line 143) and ALL connect anchors (line 145), then
the safety logic checks alreadyConnected BEFORE connectAvailable. On
LinkedIn profiles, a stray "Message" element in the sidebar, "People
also viewed" cards, or premium-tier overlays trips the
alreadyConnected flag even when the main Connect button is visible.

Result: `connect` reports `status: not_connectable, reason:
already_connected` for profiles you've never connected with.

Repro: any profile of someone you're not connected to who has any
sidebar widget rendering a "Message" button. Empirically reproduced on
multiple unconnected profiles (Marie Curie, Sundar Pichai namesakes).

Fix: gate the alreadyConnected check on !connectAvailable. If the
profile clearly has a Connect button, prefer that signal. The Connect
button is the authoritative tell — if it's there, you can connect; if
it's not AND there's a Message button, then you're connected.

Minimal-diff approach: single line change at the safety check.
Alternative cleaner fix would restructure the probe to scope the
button scan to the top profile card, but this matches the existing
code shape with smallest risk surface.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…debar leak

The thread-snapshot adapter exposed two leaks of unrelated content via
the `snapshot_json` field:

1. `bodyText: refreshedText` (return object) dumped the entire
   `document.body.innerText`, including the messaging sidebar's
   conversation list with previews of other people's threads,
   "Sponsored" InMails, and the navigation chrome. Anyone consuming
   snapshot_json got every conversation preview the user could see.

2. `headerSelectors` collected names from 9 selectors including
   unscoped `main h1`, `main h2`, `[data-anonymize="person-name"]`,
   `a[href*="/in/"]` and `.msg-conversation-card__participant-names`
   — all of which match sidebar conversation cards. Result:
   `headerNames` returned every contact in the user's recent inbox.

Empirical repro on a real thread: `headerNames` contained 10 names
including the active thread participant AND 8 unrelated sidebar
contacts. The internal `bodyText` was ~5KB of unrelated conversation
previews + nav chrome.

Fix:
- Remove `bodyText: refreshedText,` from the return object. The
  internal `refreshedText` variable is still used to compute the
  `latestMessageText` fallback inside the IIFE — that logic is
  unchanged. Only the externally-exposed output drops it.
- Reduce `headerSelectors` to 3 thread-scoped selectors:
  `.msg-thread__link-to-profile`,
  `.msg-thread__link-to-profile span[aria-hidden="true"]`, and
  `.msg-thread .msg-entity-lockup__entity-title` (the .msg-thread
  prefix gates it to the active conversation only). Recipient
  extraction (consumer reads headerNames[0]) continues to work
  because `.msg-thread__link-to-profile` is the canonical thread
  header link.

After the fix: `headerNames` contains only the active thread
participant. `messages` array is unchanged (its selectors were already
scoped to `.msg-s-message-list__event` etc.). No test changes — the
existing thread-snapshot.test.js does not assert on bodyText or on
unrelated headerNames.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cards selector at line 26 scans for `li, div, section, article`
elements whose text contains "withdraw". This matches not only the
real invitation cards but also their wrapping section element — the
wrapper's innerText includes the page header counter (e.g.
"People (81)") at the top of the list.

The fallback name-extraction at line 39 (lines.find skipping a few
common keywords) then picks "People (81)" as the row's name.

Result: rank 1 of sent-invitations output is a phantom row with
`name: "People (81)"` and the profile_url of whatever real invite
happens to be the first descendant link. Empirical repro: rank 1
returned `name: "People (81)", profile_url: <real-marie-curie-url>`,
and rank 2 returned the real `name: "Marie Curie",
profile_url: <same-url>` — the actual invitation card.

Fix: drop rows whose parsed name matches header-counter shapes:
`/^people\s*\(\d+\)$/i` and `/^\d+\s+invitations?$/i`. Minimal diff
(3 lines), keeps the rest of the parsing path intact.

A cleaner fix would scope the cards selector to a known invitation-row
class instead of `li, div, section, article` — but the header rows
already provide a stable signature, and this matches the existing
defensive style elsewhere in the script.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…olution needed)

LinkedIn's standard people-search page (/search/results/people/) accepts
several filter knobs as URL query parameters whose values are
JSON-stringified. Adapter previously only built `?keywords=<encoded>`,
matching opencli's existing CLI surface of "keywords + limit". The
underlying LinkedIn UI exposes far more.

This commit adds 7 new flags for the filters that don't require any
ID/URN resolution against LinkedIn's typeahead — they accept free text
or a small enum and get passed straight through:

  --first-name <text>         -> firstName="<text>"
  --last-name <text>          -> lastName="<text>"
  --title <text>              -> title="<text>"
  --school-keyword <text>     -> schoolFreetext="<text>"
  --network 1,2,3             -> network=["F","S","O"]
                                 (1=1st, 2=2nd, 3=3rd+; CSV maps to F/S/O)
  --profile-language en,fr    -> profileLanguage=["en","fr"]
                                 (ISO 639-1 2-letter codes, validated)
  --open-to proBono,boardMember -> serviceCategory=["proBono","boardMember"]
                                 (validated against LinkedIn's enum)

`buildSearchUrl` now takes an options object instead of a bare keywords
string. The function preserves byte-for-byte compatibility for the
zero-filter case (still emits `?keywords=<encoded>`), and accepts a
bare string as input for back-compat with the existing test (which
calls `buildSearchUrl('site reliability engineer')` directly).

Filter values get JSON-stringified then percent-encoded — matching how
LinkedIn's UI re-canonicalises the URL when you click filter chips.
Empirically verified (2026-06-10): the page accepts these param names
on its server-rendered search results route, applies the filters, and
shows the same `1st 2nd 3rd+` chips it would for a UI-driven filter.

A follow-up commit will add ID-resolution filters (current-company,
past-company, industry, location, school by name) by calling LinkedIn's
typeahead Voyager endpoint.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r yet)

Adds direct ID-acceptance for LinkedIn's structured filters that
normally require a typeahead lookup to resolve a name to a URN/ID:

  --current-company-id <id1,id2,...>   -> currentCompany=["<id>",...]
  --past-company-id <id1,id2,...>      -> pastCompany=["<id>",...]
  --industry-id <id1,id2,...>          -> industry=["<id>",...]
  --location-id <id1,id2,...>          -> geoUrn=["<id>",...]
  --school-id <id1,id2,...>            -> schoolFilter=["<id>",...]

Empirically verified (2026-06-10) that LinkedIn's
/search/results/people/ route accepts these param values as
JSON-stringified arrays of bare numeric IDs — the full
`urn:li:fs_<type>:<id>` prefix is not required. Confirmed by:

  ?currentCompany=["1441"]    -> filtered to Google employees
  ?geoUrn=["105214831"]       -> filtered to Bengaluru-located profiles

Each flag accepts either a bare numeric ID or a full URN; URNs are
stripped to their numeric tail before sending.

This is a "v2-light" implementation. A future commit could add a
typeahead-based name resolver (`--current-company "Google"` ->
resolve to "1441"), but LinkedIn's classic Voyager typeahead endpoint
(`/voyager/api/typeahead/hits`) was removed/moved in 2024 to a GraphQL
endpoint with rotating queryId hashes. Implementing that cleanly is a
separate, larger lift; the ID-direct flags ship now so the filter
surface is usable today.

To find an ID: visit the relevant page in LinkedIn UI (e.g. a
company's `/company/<slug>/` page), open the network panel, look for
the `fsd_company` URN in the response; or apply the filter once via
the search results UI and read the resulting URL's query string. The
flag --help text documents this lookup pattern.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant