ray
Made It and Played It
Posts: 11
Airdate: 11/30/16
Winnings: $14,350 (half of total winnings; played with sister Debra)
|
Post by ray on Nov 4, 2015 13:35:50 GMT -5
As I’ve been preparing to go on the show, a few questions have popped up as I’ve played such as: I know what the most common letters are in general, but does letter frequency change for certain categories like Proper Name, On the Map, or Food & Drink? If a letter such as t, g, c, or s is revealed, that is commonly paired with h, does h become a much better guess?
So I did what any self-respecting nerd would do and made a computer program to analyze puzzles! I was inspired by this post on reddit’s /r/dataisbeautiful subreddit: https://www.reddit.com/r/dataisbeautiful/comments/2r6bfp/which_letters_should_you_pick_in_wheel_of_fortune/and scraped the same website he did to get a bunch of puzzles to study. I have a list of over 20,000 puzzles now, with category information for maybe 2/3rds of them. I have a program that takes that list of puzzles and sorts them by category, and then analyzes for 2 different things:
1.raw letter frequency (what percentage of all letters does this letter make up?) 2.frequency of letter presence at least once in a puzzle (in what percentage of puzzles does this letter occur?)
I find number 2 to be more useful, as this lets me know what are good guesses if I’m trying to avoid duds. I have the program sort the letters by decreasing percentage, so the best guesses are first.
I can also filter puzzles using regular expressions. What this means is I can ask a question such as: Given that a puzzle has a word with t as the 4th from last letter, what is the percent chance that an n is in that puzzle? (looking at the tion suffix)
Some of my questions have been duds (tgcs do not strongly change the percent chance of an h appearing) and some have been quite useful (t is much less common in Proper Name and On the Map categories). I’ll post some of the more interesting results here, and would love for suggestions for other things to look for. It’s fairly easy to run an analysis, so feel free to post in this thread with an idea and I’ll try and get you some results. For example: in a category that isn’t normally plural, what happens to the frequency of s?
|
|
ray
Made It and Played It
Posts: 11
Airdate: 11/30/16
Winnings: $14,350 (half of total winnings; played with sister Debra)
|
Post by ray on Nov 4, 2015 13:36:15 GMT -5
Baseline percentages (Type 2 analysis) All puzzles: [('e', 86.50118410665732), ('a', 79.81317428295763), ('i', 74.41452504166301), ('t', 74.00228050171037), ('r', 72.78308920270152), ('o', 71.80949039557933), ('n', 71.77879133409351), ('s', 67.32304183843523), ('l', 56.65292518200158), ('h', 50.32014735549514), ('c', 47.3861941934918), ('d', 45.36005613542672), ('g', 41.72441013946145), ('u', 39.860538549250066), ('m', 36.3520743794404), ('p', 34.73379528111569), ('f', 29.080782387509867), ('b', 28.905359179019385), ('y', 28.6027541443733), ('w', 24.967108148408034), ('k', 21.835803876852907), ('v', 14.349618454521535), ('j', 4.3373388299272), ('x', 2.9251820015788086), ('z', 2.6445048679940357), ('q', 1.6884483817209017)]
In general, t r n and to a slightly lesser degree s are roughly equally good guesses.
Category: Proper Name [('e', 83.95904436860067), ('a', 80.54607508532423), ('n', 79.86348122866895), ('r', 76.45051194539249), ('o', 70.64846416382252), ('s', 66.89419795221842), ('t', 65.52901023890784), ('i', 64.84641638225256), ('l', 52.55972696245734), ('c', 44.027303754266214), ('h', 37.54266211604095), ('d', 36.51877133105802), ('b', 30.034129692832767), ('m', 30.034129692832767), ('p', 30.034129692832767), ('y', 29.351535836177472), ('u', 23.208191126279864), ('j', 22.18430034129693), ('g', 21.843003412969285), ('k', 20.477815699658702), ('w', 19.453924914675767), ('f', 16.38225255972696), ('v', 8.19112627986348), ('x', 5.1194539249146755), ('z', 3.4129692832764507), ('q', 2.04778156996587)]
As I had suspected, in this category things are different. n and r are in a league of their own, and t drops quite a bit (although still a top 5 guess)
Category: On the Map [('a', 86.83651804670913), ('n', 79.40552016985139), ('e', 74.09766454352442), ('i', 69.85138004246284), ('o', 66.66666666666666), ('r', 63.26963906581741), ('t', 61.78343949044586), ('s', 59.23566878980891), ('l', 54.56475583864119), ('c', 45.64755838641189), ('h', 42.25053078556263), ('d', 37.57961783439491), ('u', 31.422505307855626), ('m', 31.210191082802545), ('b', 26.53927813163482), ('g', 25.902335456475583), ('y', 23.991507430997878), ('p', 21.656050955414013), ('k', 20.59447983014862), ('w', 19.32059447983015), ('v', 18.046709129511676), ('f', 17.40976645435244), ('x', 8.280254777070063), ('z', 4.45859872611465), ('j', 4.033970276008493), ('q', 1.2738853503184715)]
Similarly, n is rather prominent, and t has dropped down the list.
Category: Food & Drink
[('e', 88.78378378378379), ('a', 85.54054054054055), ('s', 78.51351351351352), ('r', 73.37837837837839), ('i', 71.08108108108108), ('t', 69.1891891891892), ('o', 69.05405405405406), ('n', 67.56756756756756), ('c', 66.75675675675676), ('l', 59.189189189189186), ('d', 57.027027027027025), ('h', 55.945945945945944), ('u', 52.43243243243243), ('p', 43.37837837837838), ('m', 41.75675675675676), ('g', 35.810810810810814), ('b', 35.0), ('f', 32.567567567567565), ('w', 28.91891891891892), ('k', 25.945945945945947), ('y', 21.62162162162162), ('v', 9.18918918918919), ('z', 5.27027027027027), ('j', 4.864864864864865), ('q', 2.027027027027027), ('x', 1.6216216216216217)]
s jumps up dramatically in this category.
|
|
ray
Made It and Played It
Posts: 11
Airdate: 11/30/16
Winnings: $14,350 (half of total winnings; played with sister Debra)
|
Post by ray on Nov 4, 2015 13:46:11 GMT -5
An example of a dud: Given that a puzzle has a t,c,g, or s that are not the last letter of a word:
Category: All puzzles [('e', 87.4518643749413), ('a', 80.51563820794591), ('t', 78.0501549732319), ('i', 75.19958673804827), ('r', 73.67333521179675), ('n', 72.35841081994928), ('o', 72.14238752700291), ('s', 70.28740490278952), ('l', 56.48539494693341), ('h', 52.06161360007514), ('c', 50.63398140321217), ('d', 44.810744810744815), ('g', 43.35963182117028), ('u', 40.42453273222504), ('m', 36.23086315394008), ('p', 35.03803888419273), ('f', 29.036348267117496), ('b', 28.346012961397577), ('y', 28.261482107635956), ('w', 24.61726307880154), ('k', 22.189349112426036), ('v', 14.163614163614163), ('j', 4.165492627031089), ('x', 2.977364515826054), ('z', 2.5077486615948152), ('q', 1.7375786606555836)]
h only moved up 2 percentage points, not really a significant amount more.
An example of good analysis, but doesn't really change how you would play: Given that there is a word with a t as the 4th to last letter:
[('t', 100.0), ('e', 90.92984640929846), ('a', 82.6068908260689), ('i', 80.71814030718141), ('n', 77.60481527604816), ('o', 76.81610626816106), ('r', 75.05188875051888), ('s', 70.44416770444167), ('l', 55.064342050643425), ('h', 51.26608551266085), ('c', 47.55085097550851), ('g', 42.465753424657535), ('u', 42.258198422581984), ('d', 40.59775840597759), ('m', 39.91282689912827), ('p', 34.62017434620174), ('f', 29.86716479867165), ('y', 29.036944790369446), ('w', 24.63677874636779), ('b', 24.4914902449149), ('k', 21.21212121212121), ('v', 16.27231216272312), ('x', 2.926525529265255), ('j', 2.7189705271897053), ('z', 1.847239518472395), ('q', 1.432129514321295)]
n does move up 6% more likely, but it is still roughly equal with r, and n was already a perfectly fine guess after t has been guessed anyways, so not the most game-changing information.
|
|
ray
Made It and Played It
Posts: 11
Airdate: 11/30/16
Winnings: $14,350 (half of total winnings; played with sister Debra)
|
Post by ray on Nov 5, 2015 16:05:58 GMT -5
For puzzles with no words less than five letters long: [('e', 84.86030707274101), ('a', 77.28416813491064), ('r', 75.9249937075258), ('i', 72.25018877422602), ('s', 69.30531084822552), ('n', 68.71381827334508), ('t', 67.757362194815), ('o', 65.31588220488295), ('l', 56.40573873647118), ('c', 50.56632267807702), ('d', 40.83815756355399), ('u', 38.358922728416815), ('g', 37.81776994714321), ('h', 36.32016108733954), ('p', 36.13138686131387), ('m', 33.3375283161339), ('b', 25.409010823055628), ('y', 23.445758872388623), ('f', 21.356657437704506), ('w', 18.26076013088346), ('k', 17.115529826327712), ('v', 14.384596023156304), ('j', 3.3098414296501386), ('x', 2.617669267556003), ('z', 2.5044047319405993), ('q', 2.0639315378806944)]
r is now significantly better than t, and t is relegated to the next tier of letters. I'm going to clean up my code a bit and make it output things in a bit more readable format, but I'll keep any interesting finds coming.
|
|