This project provides a game engine and analysis tooling to investigate word choices and other patterns in the game. It contains components to run static or interactive analysis in the browser and via CLI.
Wordle is a trademark of The New York Times Company. This project is not affiliated with or endorsed by The New York Times. The word lists included in this project originate from Wardle's publicly visible code and are common online. The author is not aware of the NYT attempting to restrict distribution of these word lists.
A dev.md file exists and contains some minimal info.
Analysiscan run a game with a known word and a knownStrategy- TODO add more details on what it can do?
Gamecan run a game where it knows the word and a UI can let the player make movesGamecan run a game where it doesn't know the word and is supporting aStrategytrying to guess a player's word.Corecan partition a set of words into groups that would look the same after a specific word is played.Suggesterflexibly provides playable words from the strategies it is given. It can pick words only from possible answers or from the full word list.- The number of words chosen from each strategy, how they are picked, and how strictly that should be followed when the options are limited are all configurable.
- TODO This may need to support differentiating deterministic and non-deterministic strategies and a way to pre-compute them so that they're fast. Although: deterministic strategies will have to handle different states if they suddenly have unexpected word choices and thus an unexpected state.
- TODO If this is all in one worker, it can cache partitions for each word being considered, given a current game state, making strategies slightly faster.
See strategies and filters for info about those options.
- Command line
- Run
./wordleto play in the console. Use-hto see options. TerminalIOabstraction supports both native terminal and xterm.js rendering- User plays against a computer-chosen word
- Basic (the default mode): player must generate their own guesses without help
-e/--explain: Shows how many words are left, how the guess ranks compared according to a fixedStrategy, and what the best word would have been.
- Quick-play
-q/--quickplay: User is given a short list of plausible guesses on each turn- Pressing a number key fills in the corresponding word
- Basic (the default mode): player must generate their own guesses without help
-g/--grade: Computer plays against human word, with the human grading the guesses- Use
-wto supply the word in advance and watch the computer play - Use
-eto see words remaining on the committed line - The computer picks its first guess randomly from a small pool of good openers
- Use ↑↓←→, WASD, or YGSpaceDelete to grade the guess, then press Enter. Use Ctrl+Z to undo a committed grading.
- Use
- Run
- Analysis HTML page
- Explore strategies, see consequences
- See ui-spec
- Rough design:
- Card: mostly dumb, any info is pulled from the one thing inside it.
- Can highlight itself when any other card with the same channel is active (LATER)
- Component: inputs or outputs a bundle of things (e.g. a game, strategy, etc). Kinda vague... and doesn't deal with cases where the game is shared but the strategy isn't.
- Either analysis or selector
- Component layout: abstracts multiple components into a visible shape, able to be slotted into a card instead of a single component. But this way the card only has one thing inside it.
- Channels correlate to game or strategy stores
- Word list channel is LATER
- E.g. One card for either playing a game or entering an existing game state, then one card that just shows the current constraints in a different format, then one that allows exploring the full space of remaining 5 letter "words", then one that shows a decision tree for a particular algorithm. So one source of data can influence stats shown elsewhere.
- Card: mostly dumb, any info is pulled from the one thing inside it.
- Selector components for strategy
- maybe one selector shows a minimal-ish set of options in an always-displayed config area, while another shows a gear icon with a popover? Or maybe some analyses work well for comparing two things and it's a mix of the two styles, with the selected feature for comparison always shown, and other settings a little more hidden
- Component to visualize the implicit decision tree (for deterministic strategies)
- Component to see the constraints. Useful for fact checking any typed inputs that an AI transforms.
- Quick-play component: choose a word from a prepared menu on each turn, using a combination of random words and best or near-best words from multiple strategies.
- Maybe allow filtering to options that include one player-chosen letter.
- Separate pages for pre-built static and/or dynamic analysis report, and for build-your-own.
- Any text-based analyses? E.g. that put the analysis into words instead of graphics.
- Hypotheses and notes
- The information-theoretic approaches likely provide similar quality of choices
- Always picking a worse word by a metric, such as 90% best, might do worse regardless of which two information-theoretic strategies are used for top-1 and beats-90%.
- Potential hypothesis «luck dominates between strategies that pick any reasonable top-quartile guess, because follow-up play recovers from a slightly suboptimal choice.»
- Always picking a worse word by a metric, such as 90% best, might do worse regardless of which two information-theoretic strategies are used for top-1 and beats-90%.
- There are different objectives: average moves, avoid high numbers of moves (e.g. within 6 moves). But the odds of needing more than 6 moves might already be low for any decent algorithm, other than dealing with traps in hard mode. So they might look pretty similar, again, other than the impact of traps in hard mode.
- Full-depth trees for either objective might be meaningfully different. But we aren't calculating those yet.
- Could show quality of a move in different strategies (score and order)
- Could show this for a random sample of words. Inversions of order might be particularly useful to show, but that's harder to calculate. Well, maybe not that hard: calculate scores for options, then look for examples that differed by a lot between any two measures, or look at max and min values that differ a lot, maybe also ensuring each algorithm gets some repesentative highs and lows. Not sure if a graph would help at all. There's sort of too many words to do the snake graph thing (bump/slope chart) where options get re-ordered. Maybe one order is used as the baseline, it gets colors assigned, and then the re-ordered words are shown with that color system?
- The information-theoretic approaches likely provide similar quality of choices
- React in Claude
- TBD, but likely things based on the CLI and Analysis UIs, plus support for anything the skill needs.
- Some kind of coach
- Suggest some possible, decent moves
- Trap detection: "You have 6 remaining words and they differ only in the first letter: _IGHT. You can't distinguish them one at a time. Do you have a guess that tests multiple first letters at once?"
- What else would a coach do?
- Give some gamestate statistics
- How many words remain
- Expected move count to win
- Give move statistics (what-if analysis and post-move analysis)
- How many groups are there, what would the distribution of their sizes look like?
- Could offer features inspired by the official analysis
- For instance rating the "luck" of a move vs the optimalness.
- Maybe estimate popular human moves, based on biases or other properties?
- Only if directly asked, give the move that a specified algorithm would select
- Initialize a game with a random word and let the player play it?
- Ability to import a game that's in the middle of play. E.g. words played and their outputs grey/green/yellow or simply a summary of what's known.
- Awareness of hard mode or other constraints.
- Structural note: interacting with an active game artifact should be different from a self-contained question and answer.
- Tool to identify likely first guess words of friends. Eg given the letterless info for the first line and the actual word of the day, find compatible words, especially given data for multiple days.
- Perhaps enable transforming pictures of games into a set format? E.g. json, markdown, or a flexible html layout. Or enabling custom strategies to be exported in some way that lets them be verifiably consistent across sessions.
- Check if a word is in the wordle list. Or possibly just find words that have a property via a filter.
- Smallest average group size
- This is the same as maximizing the number of groups
- Entropy minimization
- This is the same as minimizing sum of n*lg(n) across groups.
- Minimize the number of words in the same group as a word
- This is the same as minimizing the average square of group size
- Minimax group size
- Something using entropy better? Essentially optimizing over multiple moves instead of just one
- Filtered Strategies: limit the words another strategy considers by applying zero or more
Filters. - Actively add randomness to the strategy so that the guesses needed for any particular word is minimized. The purpose is to make a strategy that can't be forced into taking the maximum turns, even when the opponent knows the strategy, like playing rock, paper, and scissors randomly with equal frequency. Keywords: Nash Equilibrium, repeated games, mixed strategy.
- Minimize the maximum expected guesses to find a word. No matter what word the adversary picks, the expected number of guesses needed should be minimized.
Filters can be used post-strategy to find a subset of ranked results, or they can be used to filter the inputs to a strategy using a FilteredStrategy. Some filters are designed more for pre-filtering and some more for post-filtering, but they all be used either way. Post-filtering is useful for a human who wants words from a strategy that have some particular property.
- Letter Exploration: only use unexplored letters, if possible
- Vowel Exploration: try new vowels where possible. This prefers checking for the existence of other distinct vowels instead of locating yellow vowels.
- Anti Vowel Exploration: only use vowels that have already been tried, if possible
- Must Contain: accepted words must contain the full specified letter multiset
- Scrabble: accepted words must only use letters in the specified letter multiset
- Keyboard: accepted words must only use letters in the specified letter set
- Analysis HTML UI
- Decision tree
- Show max moves?
- Show tiny bar chart?
- Add arg/toggle for words vs bits?
- Add max height to columns
- Allow choosing a different word for comparison?
- Alternate tree views? Collapsible tree or treemap? But will struggle on turn 1 with 120-140 groups.
- Once game board is added, make it re-anchor when a game exists and a move is made
- Also: optionally track the partial input for the game and use that instead of the best move for the current game state
- Game board
- Separately: visualize constraints
- A word has up to 5 letters. Could just show one row per letter found with green/yellow per square. Then a list of gray letters.
- If built, make it possible to switch between col major and row major views.
- Not really liking this.
- Or could show per slot: The letter known or else the letters it excludes. Still a separate list of gray letters.
- Per-position possibility sets: may need to avoid moving the letters. Maybe two or three columns per position, with a dividing line to keep positions separate?
- At three columns, could use a rotated keyboard layout.
- Or use a single through-line for each letter, with it effectively dropping out if it doesn't go there. Could combine with the letter frequency idea, varying width and/or color intensity based on remaining word compatibility. Color itself could reflect yellow vs green vs untested. To make the lines denser, can use two columns for the letter labels, with the bars reaching in between the letters of the second col to reach the first (think witch's stairs). Show a double letter label when a letter has two yellows. (Recall: 3 yellows is impossible for five letter words)
- Hover or tap to highlight a row. Tap ought to also support dragging across them to highlight whichever is currently under the tap? Sounds a bit complicated. See if it looks good first.
- Letter frequency over remaining words. Bar or sparkline? Better choice may just be normalizing the counts and using color intensity within a set range for the letters that have non-zero instances. Then another color or visual style for technically possible but incompatible with remaining words. And technically impossible letters (known gray tile or yellow tile) are just missing.
- Something venn-like or a bit amorphous/fluid?
- Ideally does know when a letter appears multiple times or has both a green and a yellow (or two!) remaining.
- A word has up to 5 letters. Could just show one row per letter found with green/yellow per square. Then a list of gray letters.
- Separately: word entry (virtual keyboard)
- Maybe use dots in the corner of the key? E.g. 1 green per green, 1 yellow per yellow, and grey for "no more than that". Main question is the primary bg color and any outline.
- Should it lock to the bottom of the screen? By default or optionally? Useful on mobile, not otherwise. So probably optionally on narrow screens. It would be too tall to lock on short screens. If it doesn't know the page size well enough, then don't do this. Maybe that means it needs to be set up programatically by passing a reference to the screen?
- Separately: visualize constraints
- Hard mode toggle
- Other analysis components? E.g. compare across strategies
- Pre-built composite components (LATER)
- Brute force options list
- When connected to a game, while the player is typing a guess: treats entered letters, other than ones that disagree with the constraints, as green. Or maybe treat all typed letters as green, even if they disagree with constraints? LATER
- Should consider how the scroll position is preserved while the player types or deletes, and after committing a guess.
- When connected to a game, while the player is typing a guess: treats entered letters, other than ones that disagree with the constraints, as green. Or maybe treat all typed letters as green, even if they disagree with constraints? LATER
- Distribution Chart
- Determine how the average is calculated, document. (does it include words past 6 guesses? Coerce them to 6 or 7?)
- Strategies: adjust scores for display to normalize values. E.g. avg group size, expected shannon entropy, expected group size, max group size. Instead of just using unnormalized values when the denominator is always the same.
- Add descriptions for strategies?
- Card
- Highlight when channel active elsewhere? LATER
- Consider additional themes? E.g. move the distinct header/content colors of the current chart demo card into its own color scheme
- On bar hover, show representative words? Or all words?
- Strategy selector
- Additional variants? E.g. a popover from a settings icon.
- Idea: could have a popover settings for whole card, then one for each component. The card can specify whether something is a default or locked. Then the individual components don't offer that setting. Combine with a +component design so a user can add/remove components and compare how they want.
- Given every first word, how do the information theoretic approaches do on subsequent turns? Might go in a report page (or component) where finalized calculations and the code to make them are both stored?
- This is effectively looking at each strategy's resilience to first-turn perterbation. Perhaps there's a comparable property for looking at global expected entropy or similar on before turn 3 or 4?
- Decision tree
- Filters: tweak vowel anti-exploration? Maybe others? e.g. don't retry known bad letters
- The letter and vowel exploration filters do nothing in hard mode because we are already limited to only words that match what we know, so after we filter out the remaining set, we fall back to the full one (or no letters matched yet and the filter is a no-op anyways). The word list being used (hard mode on/off, all or only possible answers) needs to be clearer in the UI.
- Test suite: HTML ?
- Revisit the three worker classes and how they relate?
- CLI: analysis? Delay until after HTML UI
- Cache precomputed rankings for first-turn guesses?
- Probably create a script that knows the best first move on a few dimensions at once: all-guesses vs valid answers, hard mode on/off (won't matter for information-theoretic first turns, only exhaustive search), strategy, maybe certain filters. Assumes unchanged answer-word list. Probably include a hash of the list to validate the list and computed values match. Maybe warn somewhere in a build script if the computations are stale.
- Might also cache related statistics for that first move? But if only needing stats for one guess, it shouldn't be too bad to recompute.
- For some strategies, like the exhaustive ones or allowing all valid guesses, caching the second turn might also be useful.
- May also want some kind of pool of first words of varying quality? A cache of some kind will help with the suggestions.
- Claude Skill
- Test suite: Claude Skill ?
- Strategy: full-depth calculation (min avg or minimax depth)
- Strategy: mixed strategy nash equilibrium ?
This was initially generated by LLM summary of a discussion.
- Do single-step heuristics (group count, x², entropy, minimax) produce meaningfully different average guess counts?
- How much does the distribution overlap vs diverge? Is variance within a strategy (across words) larger than variance between strategies (for the same word)?
- Variance within a strategy: entropy (for example) solves some words in 2 guesses and others in 5. The spread across different answer words is large — maybe a standard deviation of 0.8 guesses. (AI Hypothesis)
- Variance between strategies: for the same answer word, entropy takes 4 guesses and x² also takes 4, or maybe 3. The difference between strategies on any given word is small — usually 0 or 1 guess. (AI Hypothesis)
- But note that those aren't very comparable. So we might care more about cases where strategies produce 2 or more turns of difference for a word. Not quite sure how this should be calculated. "bits remaining" when the word was chosen is more like luck and just how it got split when the word was chosen. Naming and showing the variances is useful, but good framing for them or how to explore them in an interesting way is more difficult.
- Which specific words cause the most disagreement between strategies? What properties do those words share?
- How does the "penalty curve" differ — where does entropy tolerate a large group that x² wouldn't, and vice versa?
- For a given game state, compare the top-k word recommendations across strategies: which words appear in multiple strategies' top lists, which are unique to one?
- Given a guess and remaining words, what does the partition look like? Group count, size distribution, largest group.
- Compare two guesses' partitions side by side.
- "What if" exploration: pick a guess, pick an outcome group, see the next level of partitions. Essentially walking the decision tree interactively.
- For deterministic strategies, show the full pre-computed decision tree (or a subtree rooted at the current state).
- Does a filter (avoid explored letters, vowel exploration) change which guess a strategy picks? How often?
- How does filter auto-disable behavior interact with game progression — at what turn or remaining-word-count do filters typically deactivate?
- Compare filtered vs unfiltered strategy performance distributions.
Identify trap states in the word list: groups of words differing in only one position. How many exist? How large? How do different strategies handle a known trap state? Does minimax outperform entropy here as predicted? Hard mode vs normal mode performance comparison per strategy. Where does the constraint hurt most? For hard mode specifically: how often does the optimal guess (by any single-step metric) differ from the optimal guess that also satisfies hard mode constraints?
- Shannon entropy gives a lower bound (bits needed). What's the actual average bits obtained per guess? How far is the gap?
- Does the gap between theoretical and achievable information vary by game state? Is it largest in trap-like states?
- Across the full word list, what's the distribution of "first guess information yield" — how many bits does each possible opening guess provide?
- For a completed game: was the answer in a large or small partition group at each step? A player who lands in small groups repeatedly was lucky regardless of strategy.
- Separate "was this a good guess given what was known" from "did the outcome happen to be favorable."
- Could show: expected remaining words after a guess vs actual remaining words, across turns.
- What are the best opening words by each metric? How much do the top-10 lists overlap?
- After a fixed opening word, what second-guess does each strategy prefer? Does this vary more than the opening choice?
- Once heuristic strategy results exist: on the words where strategies disagree, what does the optimal full-tree solution actually do? Does it match any single heuristic, or something none of them found?
- How often does the true optimal guess rank outside the top 10% of any single-step heuristic? This determines whether beam search is safe.
- Human bias catalog items are testable against the word list: which words have unusual positional letter frequencies, atypical phonology, rare bigrams, duplicate letters? These properties could be displayed alongside strategy recommendations as "human difficulty predictors."
- CLI
- If the UI will become more complicated, consider using Ink and ink-web or ink-canvas
- Quickplay
- Flag to not require enter after pressing a number?
- Flag to not even allow a custom word in quickplay?
- Explanation flag to show how the words actually ranked?
- Use a Suggester instead of the weaker reimplementation currently being used
- Grading mode
- Show additional stats on each move, when -e is used? (In addition to words remaining.)
- Support a choice of algorithm? This might belong in the analysis section instead of here.
- Flashcard mode: The player is rewarded for choosing the actual best move, and it's always available. Filler words might not be the next-best words overall, but instead be words that are the best on the next turn or two according to the deterministic algorithm.
- Analyses. Delay at least until after HTML UI for analyses
- «terminal background detection is unreliable, design with mid-range colors that work on both, offer a flag as override, and optionally attempt OSC 11 as a nicety.»
- Support config menus in addition to CLI args? Possibly not
- Mode picker might be most useful GUI
- Config format for things like filtered strategies would need to be determined. This is related to how any HTML component configs could be saved, stored, or specified by a Skill. But config "files" are out of scope for now.
- Maybe start with one game per command run, but then allow it to go repeatedly, possibly with a flag to control that behavior. Repeated runs allow any GUI-specified settings to persist.
- Strategies
- What if we didn't restrict the word at all and any sequence of letters were allowed?
- Mainly postponed because we don't want to check 26^5 (10^7) options. If we can reduce the set of letters we might want to check on the first turn to 16 options, then that's only 10^6, which is still a lot of brute force... But with some thought, this might have a better approach. But letter frequency and place frequency could make this complicated. And we don't necessarily want the most frequent letters, but letters and positions that split as many groups as possible. Which might point towards just trying all the vowels on the first move, since most words will have 2 or 3 of them. Which might imply that we'd want to place each vowel where it is most common in words with only one distinct vowel, but effects on other paths might overwhelm that... So I'm now leaning towards an incremental search algorithm that explores variants of a strategy. E.g. swap two letters, replace a letter, while tracking options discarded. Might need to prove convex properties to trust that, but it might just do ok. And some varied starting choices converging to the same result could provide a sanity check in lieu of a proof.
- What if we didn't restrict the word at all and any sequence of letters were allowed?
- Back out guesses from constraints? Filter the list of words to ones that comply with all constraints without implying knowledge beyond the constraints should be known, then do a best effort to pick words that enforce the remaining constraints, until all constraints are matched and no extra ones are introduced.
- Make text sizes more accessible: avoid px size for fonts and anything calculated based on them. This would be a lot of work, most likely, for an unknown number of users. Currently 0 of 0 users who aren't the author want this feature.
- I'm curious about what might bias a human against quickly finding a word.
- Weight words by usage or common knowledge
- E.g. most people may know a word that is relatively rare in some text corpora. A word might be used more often verbally than in formal or informal writing. Or name something that doesn't need to be discussed often but is still commonly known.
- Players might also believe a "dirty" or "sensitive" word is unlikely to be the answer. Note that
penisis excluded from the default answer list, butlynchandkinkyare included. - Note: it appears that many plurals and "derived" words like
timesandtimedare excluded. Eventiredis excluded.
- Words with letters that aren't common in a particular position. Rareness may be defined relative to other letters known to be in the answer (fixed location or unknown location).
- Words with letters, especially vowels, pronounced differently in that position than other words with the letter in that position. Rareness may be defined relative to other letters known to be in the answer (fixed location or unknown location).
- Words that avoid certain common pairs of letters. E.g. -er, -ed, -th-, -ch-, etc
- Words with 2 or 3 copies of a letter
- Maybe use observations from these properties to simulate how a player might guess when blind to the impact on the remaining word list. Then maybe see how well any one or two properties alone do well at picking words that are "hard" for players with the full set of expected biases.
- The simulated biases may need some level of variation, in additon to the randomness? Or maybe a single level of randomness is strong enough.
- Weight words by usage or common knowledge
Things considered but skipped for now
- Word lengths other than 5
- i18n
- Allow selecting words from the larger valid word list
- Filters ("strategy subvariants") currently are designed to be hard, prepass filters with self-disable rules. That makes the logic clear and legible, with the input surface solely using integers. Filters can also be applied post-strategy.
- Slightly more complex logical rules are possible but generally out of scope.
- An alternative further out of scope would be to create and combine quality weights. But the process for doing that isn't obvious. How do we penalize letters that we've already used: multiply by 1 for each unused letter, .1 for each grey letter, .2 for each yellow or green letter in a position we haven't tried yet, and 0.05 otherwise? That might be too extreme. What's the right level of penalty for our goal? And how do we convert the preferences of any given strategy into a weight? What do the ihteractions between those weights do? How do we combine the filter and strategy weights? Multiply them? Take the minumum of the two, plus some portion of the larger one, but capped at twice the minumum and renormalized since that allowed the weights to go over 1? To avoid renormalizing, we could do
lo + min(lo, hi/x)*(1-lo)or justlo + (1-lo)*lo*hi, but do we expect either of those to genuinely produce "good" or "meaningful" results for "good" choices of filter and strategy? The choice for combining likely depends on how much signal is intended to come from both halves, and possibly how many orders of magnitude the weights span.
- See bias estimation doc. Extending player biases and player move analysis into guessing how a specific player would play in response to a word.
- Completely censor a friend's first turn until I've finished the daily game, so I don't accidentally gain knowledge from it.