If you're not a "Never tell me the odds!" type of person, read on to see the most-efficient way of solving Wordle puzzles...
TL;DR - Guess “trade” and “lions” for the first two words. If you want to guess a third, use “chump”. That will get you at least 3 of the 5 letters 80% of the time.
The gory details…
In the last week or so, I started seeing posts like this on Twitter:
Wordle 204 5/6
⬜⬜?⬜⬜
⬜?⬜⬜⬜
⬜⬜⬜⬜⬜
⬜?⬜??
?????
It’s for the word game called “Wordle” where you need to guess a 5-letter word. After each guess, it shows you which letters are in the word and in the right place in your guess (in green), which ones are in the word but not in the right place (yellow), and which ones aren’t in the word at all (gray). You’re allowed a total of 6 guesses. There’s a single new puzzle every day.
After several days of seeing those posts, I finally decided to Google it and play it myself. In my first attempt I got down to my last guess. I knew that the word looked like “_ _ _ al” and included an “n” that wasn’t in the second position.
Now, as a programmer, I thought “how can I use ‘grep’ to solve this?” The keyboard in Wordle shows the letters that you’ve guessed, so you can see what the possible letters are. I knew the word was made up of “qahjlzxvbn”. Linux has a file of words, so I grepped that file to find 5-letter words made up of only those letters:
$ egrep '^[qahjlzxvbn]{5}$' /usr/share/dict/words
banal
naval
That left me two choices: banal and naval. I guessed “naval” and was right! Yay, grep!
This led me to thinking about what might be the most-efficient way to solve Wordle puzzles. What would be the best words to start out guessing?
First, I grepped the words file to create a list of 5-letter words only– there were 4608 total. Then, I grepped that file for each letter in the alphabet and piped the results to wc (word count) to see what letters were most-commonly used. That came up with this list:
# | Letter | Count |
---|---|---|
1 | s | 2254 |
2 | e | 2153 |
3 | a | 1744 |
4 | r | 1407 |
5 | o | 1303 |
6 | i | 1232 |
7 | l | 1180 |
8 | t | 1176 |
9 | n | 976 |
10 | d | 926 |
11 | u | 814 |
12 | c | 759 |
13 | p | 710 |
14 | h | 633 |
15 | y | 623 |
16 | m | 614 |
17 | g | 528 |
18 | b | 527 |
19 | k | 466 |
20 | w | 416 |
21 | f | 405 |
22 | v | 254 |
23 | x | 89 |
24 | z | 88 |
25 | j | 70 |
26 | q | 42 |
The 8 most-common letters in 5-letter words were “searoilt” with a drop-off after that. Then, the next most-common letters were “nducphymgb”. In Wordle, you guess 5-letter words, so I wanted to know what the best guesses would be for the first 3 words. That would require the 15 letters that were in the most words.
I created a script to grep the various combinations of the most-common letters and count the number of words that these letters would occur 3, 4, and 5 times like this:
$ egrep "[searoiltnducphy]{5}" 5-letter-words | wc -l
$ egrep "[searoiltnducphy]{4,5}" 5-letter-words | wc -l
$ egrep "[searoiltnducphy]{3,5}" 5-letter-words | wc -l
I decided to rank the combinations by a score calculated by:
3 x (words with all 5 letters made up from these letters)
+ 2 x (words with 4 letters)
+ 1 x (words with 3 letters)
That gave me these results:
# | Characters | All 5 | 4 letters | 3 letters | Score |
---|---|---|---|---|---|
1 | searoiltcdhmnpu | 1791 | 2940 | 3682 | 14935 |
2 | searoiltcdgmnpu | 1761 | 2890 | 3762 | 14825 |
3 | searoiltbcdmnpu | 1762 | 2849 | 3739 | 14723 |
4 | searoiltcdghnpu | 1722 | 2896 | 3672 | 14630 |
5 | searoiltbdgmnpu | 1739 | 2829 | 3706 | 14581 |
6 | searoiltcdmnpuy | 1735 | 2833 | 3639 | 14510 |
7 | searoiltdghmnpu | 1682 | 2894 | 3628 | 14462 |
8 | searoiltdgmnpuy | 1716 | 2816 | 3602 | 14382 |
9 | searoiltbcdhnpu | 1706 | 2808 | 3636 | 14370 |
10 | searoiltbcdgnpu | 1701 | 2771 | 3692 | 14337 |
So, the most-popular 15 letters in 5-letter words are “searoiltcdhmnpu”. Then, I grepped the words list to find words made only of these letters to see if I could find 3 words that together used up all those letters. I came up with:
- Match
- Lions
- Prude
So, If you guess those 3 words first, you’ll have the best chance of finding all the letters in the actual word. With those three words, you’ll find:
- All 5 letters 39% of the time
- 4 of 5 letters 64% of the time
- 3 of 5 letters 80% of the time
After that, you’ll need to use your last 3 guesses to figure out the word yourself.
Then, I wanted to see what to use if you only wanted to guess 2 words first. Using the same method as above, I got these results:
# | Characters | All 5 | 4 letters | 3 letters | Score |
---|---|---|---|---|---|
1 | searoiltdn | 465 | 1406 | 2374 | 6581 |
2 | searoiltnp | 484 | 1310 | 2349 | 6421 |
3 | searoiltdp | 505 | 1283 | 2241 | 6322 |
4 | searoiltmn | 444 | 1270 | 2282 | 6154 |
5 | searoiltdm | 479 | 1250 | 2154 | 6091 |
6 | searoiltnu | 383 | 1332 | 2348 | 6161 |
7 | searoiltmp | 482 | 1206 | 2196 | 6054 |
8 | searoiltcn | 422 | 1249 | 2338 | 6102 |
9 | searoiltgn | 426 | 1249 | 2292 | 6068 |
10 | searoiltcd | 447 | 1214 | 2209 | 5978 |
The 10 most-common letters in 5-letter words are “searoiltdn”. You could use these letters to come up with the words:
- Trade
- Lions
With those two words, you’ll find:
- All 5 letters 10% of the time
- 4 of 5 letters 31% of the time
- 3 of 5 letters 52% of the time
Then, I looked at the difference between the 10 most-common letters and the 15 most-common. There were actually 5 different letters in that second list: chmpu
Hey, I can make a word from those! “Chump”!
So…
Start with guessing “trade” and “lions”. Then, if you want to use probabilities to guess one more word, guess “chump”.
Also...
If you want to solve more than one puzzle a day and test out my process, you can try the Wordle clone "hello wordl" (not mine).
Caveats / Notes
1. My wife says this is cheating. I prefer to call it "probability-based heuristics". Seriously, though... It's just math.
2. My calculations are based on a random selection of the target word. Looking at the actual words used recently (e.g. "naval", "chomp", "query") look like they're more manually-curated to be "interesting", so that will change the odds a bit.
2 Comments
Fascinating.... And yes, they are - I've read too much about this game - "curated," and the list of 1000-2000 words is available and might up those pcts of yours. Simple things like the presence or absence or "partial exclusion" of simple plurals could/would have a big impact. You may not want to spend any more time on this, but "back testing" to see what the algorithm's average solve # - I think I saw that someone else pegged his at 3.7 - would be. And optimizing on "never needing seven" might be very different from shooting for a low average. I 100% agree with you vis a vis your wife. The very first game I played came down to #6, as happened with you.... I used a "crossword solver" tool - maybe it's grep-based - and I guessed wrong with the 2 provided.
close!..ive been opening with cater lions and pudgy which is pretty similar