User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » Scrabble Probability Question Page [1]  
FeebleMinded
Finally Preemie!
4472 Posts
user info
edit post

So one of my favorite hobbies is playing Scrabble. Believe it or not there is a fairly large subculture of people who play this game very seriously. Yes, many of them are really dorky and have virtually no social skills, but there are some "normal" people too. Anyway, the game goes far beyond what most people have played casually. There are well over 100,000 words that serious players memorize, many of the words (I'd say roughly 75% of them) are words you would never know if you did not see them listed and study them. [/lame intro]



Seen above is a screen capture of a program called ZYZZYVA (which incidentally is a word) that unscrambles words, creates quizzes, etc. One of the functions it has is determining probability, which is a great way to study. In other words, you are much better off studying words like AILERON or DARIOLE made with common letters with higher probability than words like FILIBEG or FUMULUS, at least initially. So what I was wondering is how does this program go about determining the probability of playing a word? Here is the letter distribution:



At first I thought I could assign a probability to each letter (for instance E=12/100, G=3/100, Z=1/100) and then just multiply all these numbers together, take the inverse of the product, and whichever letter combination had the highest value would be the most probable. Well, just by using this method on a few different cases I found it to be an epic FAIL. I would personally think a word like BEEBEES would be near the top considering it has 4 E's, which appear at the highest probability, along with BBS, which are not totally uncommon. However BEEBEES is about 23,000 of 24,000 7-letter words.

Being able to calculate this during a game would be an immense help. So if anyone out there could give me some help, I would be greatly appreciative.

3/18/2009 6:46:52 AM

Jrb599
All American
8845 Posts
user info
edit post

I just woke up, but let me make one comment.

Take a word like ZEE (if it is a word, I only use it cause it's the two letters you give us).

One thing I noticed is the probability by your method would not be

(1/100)*(12/100)*(12/100)

it would be

(1/100)*(12/99)*(11/98)

Because once you get to your e (12/99), you've already pulled out one letter and when you get to your second e you've already pulled out 2 letters, one which is an e.

[Edited on March 18, 2009 at 8:11 AM. Reason : ]

3/18/2009 8:08:39 AM

FeebleMinded
Finally Preemie!
4472 Posts
user info
edit post

I agree that I didn't take into account the fewer number of remaining tiles, but I don't believe that would effect the results because it's a common error across the board.

3/19/2009 1:04:21 AM

ncsu919
All American
1067 Posts
user info
edit post

you'd be surprised...

3/19/2009 10:06:09 PM

1985
All American
2174 Posts
user info
edit post

it especially matters when letters already have low probabilities. For instance, if b was 2/100, well, once you use the first, the probability of getting the second is nearly cut in half.

3/20/2009 11:36:06 PM

BigHitSunday
Dick Danger
51059 Posts
user info
edit post

this is why scrabble is bullshit and i can never win

haha

god i hate this game, but i love it


and no, i dont have a clue what u talkin bout...but im feelin it man

3/22/2009 1:53:41 AM

A Tanzarian
drip drip boom
10992 Posts
user info
edit post

If you alone were to draw letters from an initially full set, the probability of drawing BEEBEES, in that order, is:

(2/100)*(12/99)*(11/98)*(1/97)*(10/96)*(9/95)*(4/94) = 1.2E-9

However, there are multiple orders in which the letters can be drawn. Also, you can draw 'different' letters; e.g., a different set of 4 E's than you drew the previous time. Each individual scenario that gives the neccessary letters must be taken into account to get a 'true' probability of a particular word.

I'm sure ZYZZYVA is making assumptions about how and when the letters are drawn. If you want to duplicate ZYZZYVA probabilities, you're going to need to know those assumptions.

[Edited on March 22, 2009 at 1:04 PM. Reason : ]

3/22/2009 12:45:44 PM

aaronian
All American
3299 Posts
user info
edit post

I'll stick to playing dumbass slutbags on lexulous on facebook..

3/24/2009 12:06:35 AM

wolfpackgrrr
All American
39759 Posts
user info
edit post

The official Scrabble dictionary is filled with such bs words.

3/24/2009 4:41:59 AM

Jrb599
All American
8845 Posts
user info
edit post

Quote :
"Also, you can draw 'different' letters; e.g., a different set of 4 E's than you drew the previous time."


You can only draw a set of 4 Es one way.



EEEE is the same as EEEE

3/24/2009 11:07:28 AM

A Tanzarian
drip drip boom
10992 Posts
user info
edit post

...but you're choosing 4 out of 12 E's. There are 495 ways to do that.

Ignoring the blank tiles and assuming you're simply pulling tiles from a bag:

[ C(2, 2) * C(12,4) * C(4,1) ] / C(100,7)

[ 1 * 495 * 4 ] / 1.60E10

1980 / 1.60E10

1.24E-7

which is about 1 in 8.1 million. 1 in 7 million if you drop the two blank tiles.

We need FeebleMinded to tell us what the probability is according to ZYZZYVA.

3/24/2009 8:41:12 PM

FeebleMinded
Finally Preemie!
4472 Posts
user info
edit post

It doesn't say what the probability is, it simply ranks the words in order of most to least probable. If anyone is a computer programmer type person, the source code is on the website. I couldn't even begin to comprehend it though.

http://www.zyzzyva.net/

3/26/2009 12:31:48 AM

aaronian
All American
3299 Posts
user info
edit post

whats the probability of me starting with 8 vowels in back to back games? because it happened.

3/26/2009 9:40:29 AM

ncsu919
All American
1067 Posts
user info
edit post

(1/whatever chances of starting with 8 vowels)*(1/whatever chances of starting with 8 vowels)

..overall it's probably a pretty high chance respectively.

3/26/2009 3:40:35 PM

Jrb599
All American
8845 Posts
user info
edit post

^^^^
So take your BEEBEES example.

The probability drawing that in that order is

(2/100)*(12/99)*(11/98)*(1/97)*(10/96)*(9/95)*(4/94) = 1.2E-9

but if you take a different set of E's it's still the same scenario. Because you're (12/99) captures all the ways you can get E, not an individual E. What you need to factor in is the order you can draw different letters.

[Edited on March 26, 2009 at 3:58 PM. Reason : ]

3/26/2009 3:54:26 PM

FeebleMinded
Finally Preemie!
4472 Posts
user info
edit post

Quote :
"whats the probability of me starting with 8 vowels in back to back games? because it happened."


If you are playing by the rules, the probability is zero because you only ever have 7 tiles on your rack at once.

3/26/2009 4:49:30 PM

A Tanzarian
drip drip boom
10992 Posts
user info
edit post

^^ Take a look at http://svn.pietdepsi.com/repos/projects/zyzzyva/trunk/src/libzyzzyva/LetterBag.cpp.

It looks like he's using combinations to calculate probabilities.

^ For each word he's calculating the probability based on drawing the number of tiles in the word from an initially full bag; i.e. he's determining the probability of spelling a three letter word after drawing 3 tiles, not the probability of being able to spell a particular 3 letter word after drawing 7 tiles. He includes the blank tiles (which I didn't do above).

3/26/2009 6:30:43 PM

Jrb599
All American
8845 Posts
user info
edit post

I'll explain it better when I get outta class, but a combination would say that EEEE=EEEE

Essentially, what you are saying is that each E is unique. If that is the case, then the probability of pulling an E is 1/100, not 12/100.

Another thing you're saying is that the order of E's matter, but then you use C(100,7) as your denonminator, which doesn't care about order. The dominator of all the ways you can draw 7 letter combinations will be

100!/92!

100 choices for the first letter, 99 for the second, and so on.

[Edited on March 26, 2009 at 6:53 PM. Reason : ]

3/26/2009 6:44:37 PM

A Tanzarian
drip drip boom
10992 Posts
user info
edit post

That's why I started using combinations, because order doesn't matter.

Quote :
"[ C(2, 2) * C(12,4) * C(4,1) ] / C(100,7)"


The order you select E's doesn't matter, but how many ways you can select 4 E's from 12 E's does matter.

[Edited on March 26, 2009 at 7:00 PM. Reason : I'll be back in awhile]

3/26/2009 6:59:56 PM

Jrb599
All American
8845 Posts
user info
edit post

you can only select 4Es from 12 one way. It's because they aren't unique.

You get EEEE, you're trying to make the E's unique.


I'll come back with a much longer explantation tomorrow.

3/26/2009 7:07:33 PM

aaronian
All American
3299 Posts
user info
edit post

Quote :
"If you are playing by the rules, the probability is zero because you only ever have 7 tiles on your rack at once."


true. but I forgot to mention I play lexulous on facebook which gives you 8 tiles.

3/26/2009 8:45:34 PM

Cabbage
All American
2046 Posts
user info
edit post

It's a pretty straightforward probability problem to determine the probability of picking any particular 7 letters:

Let C(n,r) be the binomial coefficient: n! / [r!(n-r)!], or 0 if r > n.

The method is easiest to illustrate by example:

There are C(100,7) ways of selecting 7 tiles at the beginning of the game.

If you want to calculate the probability of, say, "aaeeejt": Count the number of ways you can get this combination:

C(9,2) * C(12,3) * C(1,1) * C(6,1)

There's exactly one binomial coefficient for each distinct letter:

Count how many ways to select 2 of the 9 a's

times

Count how many ways to select 3 of the 12 e's

times

Count how many ways to select 1 of the 1 j's

times

Count how many ways to select 1 of the 6 t's

That's how to get the numerator.

Then divide by C(100,7) to get the actual probability.

With this formula, it's more or less straightforward to program a computer to calculate the probability of any 7 letter combination, then order them from most to least likely.

3/31/2009 9:55:32 PM

Jrb599
All American
8845 Posts
user info
edit post

^Sorry, error there.

First, you use C(100,7); you need a permutation. You're numerator is off too.

I forgot about this thread, I'll write something up in a bit.

[Edited on March 31, 2009 at 10:29 PM. Reason : ]

3/31/2009 10:26:12 PM

Cabbage
All American
2046 Posts
user info
edit post

You're going to need to tell me why I'm wrong. Just telling me I'm wrong doesn't make it so.

Edited to add: For that matter, I think it's clear that this is a combination problem, not a permutation problem. If you draw a,e,e,e,e,e,e, how's that any different from e,e,a,e,e,e,e? You still have one a and 6 e's--that's all that matters.

[Edited on March 31, 2009 at 10:35 PM. Reason : adding stuff]

3/31/2009 10:33:33 PM

Jrb599
All American
8845 Posts
user info
edit post

For aaeeejt

You have (9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)

that is the probability you draw aaeeejt. However, what if you draw jtaaaee. I mean you can still play aaeeejt. So we need to factor that in too. So there is c(7,3) ways to place the E, c(4,2) ways to place the a, and 2 ways to place the j and t.

(9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)*c(7,3)*c(4,2)*2;

that's your probability.


Also notice, the denominator is 100!/93!, a permutation not a combination.

Quote :
"If you draw a,e,e,e,e,e,e, how's that any different from e,e,a,e,e,e,e?"

They're different. That's simple probability. Suppose you flip a coin twice and you want to know how many times you will get heads once and tails once. You can get HT and TH, they are different. your first term has an a in the first stop and the second has the a in the third spot.


[Edited on March 31, 2009 at 10:53 PM. Reason : ]

3/31/2009 10:37:27 PM

Cabbage
All American
2046 Posts
user info
edit post

Quote :
"For aaeeejt

You have (9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)

that is the probability you draw aaeeejt. However, what if you draw jtaaaee. I mean you can still play aaeeejt. So we need to factor that in too. So there is c(7,3) ways to place the E, c(4,2) ways to place the a, and 2 ways to place the j and t.

(9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)*c(7,3)*c(4,2)*2;

that's your probability."

How's that any different from mine? I mean, obviously the method is different, but try calculating mine before you say my method is wrong; you may be surprised. If I'm wrong, then you're wrong, too. My method just seems more natural to me; I respect that the same may be true for you with your method.

Quote :
"They're different. That's simple probability. Suppose you flip a coin twice and you want to know how many times you will get heads once and tails once. You can get HT and TH, they are different. your first term has an a in the first stop and the second has the a in the third spot.
"


They're not different if you're interested in combinations instead of permutations. Playing Scrabble, you're drawing a combination of letters, not a permutation. I can make exactly the same plays on the board with a,e,e,e,e,e,e as I can with e,e,a,e,e,e,e. The order I pull the tiles out of the bag is irrelevant.

3/31/2009 11:33:01 PM

ncsu919
All American
1067 Posts
user info
edit post

the odds you draw any 1 combination that you are looking for is so low, it isnt worth trying to study the "most" likely 7 letter word combos.

4/1/2009 12:08:19 PM

aaronian
All American
3299 Posts
user info
edit post

this takes me back to st311

4/1/2009 1:00:33 PM

Jrb599
All American
8845 Posts
user info
edit post

Quote :
"How's that any different from mine? I mean, obviously the method is different, but try calculating mine before you say my method is wrong; you may be surprised. If I'm wrong, then you're wrong, too. My method just seems more natural to me; I respect that the same may be true for you with your method."


We get different numbers, I'm sorry but the solution I presented is right.

Quote :
"They're not different if you're interested in combinations instead of permutations. Playing Scrabble, you're drawing a combination of letters, not a permutation. I can make exactly the same plays on the board with a,e,e,e,e,e,e as I can with e,e,a,e,e,e,e. The order I pull the tiles out of the bag is irrelevant."


You're interested in permutation.


Can I ask what probability classes you've taken.

[Edited on April 1, 2009 at 1:44 PM. Reason : ]

4/1/2009 1:43:50 PM

Cabbage
All American
2046 Posts
user info
edit post

Quote :
"We get different numbers, I'm sorry but the solution I presented is right."

Did you actually try calculating both? If you're not getting the same numbers then you've made a mistake somewhere in your calculations. You should get 2.968597189*10^(-6) for both.
Quote :
"You're interested in permutation."

No I'm not. A permutation is when order matters. In Scrabble, order doesn't matter. If I pull out seven letters and can make the seven letter word "feature", it doesn't matter if I pulled them out in the order f-e-a-t-u-r-e or in the order a-e-e-f-r-t-u or in any other order--I still have the same combination of seven letters, and can still make exactly the same plays on the board. That's exactly what it means to be a combination as opposed to a permutation.
Quote :
"Can I ask what probability classes you've taken."

Of course. At State I've taken MA 546. I've taken two or three other probability classes at VA Tech, but that was years ago and I forget the course numbers.

4/1/2009 3:15:41 PM

Jrb599
All American
8845 Posts
user info
edit post

If you got the same number as me, rock on. I must of messed up calculating one of the numbers. All I know is that mine is right.

Quote :
"No I'm not. A permutation is when order matters. In Scrabble, order doesn't matter. If I pull out seven letters and can make the seven letter word "feature", it doesn't matter if I pulled them out in the order f-e-a-t-u-r-e or in the order a-e-e-f-r-t-u or in any other order--I still have the same combination of seven letters, and can still make exactly the same plays on the board. That's exactly what it means to be a combination as opposed to a permutation."


I know the difference, it can be tackled both ways. I was thinking your combination way was wrong, but I guess not. I was wrong when I thought you got a different number then me, which led me to believe you did it wrong with combinations. So I thought I would explain it with permutations Whoops.

So I guess we've posted two different ways to do it.

[Edited on April 1, 2009 at 5:21 PM. Reason : ]

4/1/2009 5:03:21 PM

Cabbage
All American
2046 Posts
user info
edit post

By the way, I was curious how many different seven letter combinations you can get in Scrabble, so I got a CAS to expand the generating function for me:

1 + 27*x + 373*x**2 + 3509*x**3 + 25254*x**4 + 148150*x**5 + 737311*x**6 + 3199724*x**7 + 12353822*x**8 + 43088473*x**9 + 137412392*x**10 + 404600079*x**11 + 1108793943*x**12 + 2847262062*x**13 + 6890404765*x**14 + 15792242064*x**15 + 34425824044*x**16 + 71646518736*x**17 + 142827698985*x**18 + 273533670283*x**19 + 504576050285*x**20 + 898623709228*x**21 + 1548387401915*x**22 + 2586170833356*x**23 + 4194275182613*x**24 + 6615385384601*x**25 + 10161692700549*x**26 + 15221174189579*x**27 + 22259221214607*x**28 + 31813753798288*x**29 + 44482134367066*x**30 + 60898641337468*x**31 + 81701986711369*x**32 + 107493329723951*x**33 + 138786376090493*x**34 + 175952346689553*x**35 + 219163709706077*x**36 + 268341443489446*x**37 + 323111088944227*x**38 + 382772844896252*x**39 + 446290391042394*x**40 + 512301987174498*x**41 + 579155760119564*x**42 + 644969083769945*x**43 + 707709770134396*x**44 + 765294643135632*x**45 + 815699194394498*x**46 + 857070636209692*x**47 + 887835941961195*x**48 + 906796502925404*x**49 + 913201857455724*x**50 + 906796502925404*x**51 + 887835941961195*x**52 + 857070636209692*x**53 + 815699194394498*x**54 + 765294643135632*x**55 + 707709770134396*x**56 + 644969083769945*x**57 + 579155760119564*x**58 + 512301987174498*x**59 + 446290391042394*x**60 + 382772844896252*x**61 + 323111088944227*x**62 + 268341443489446*x**63 + 219163709706077*x**64 + 175952346689553*x**65 + 138786376090493*x**66 + 107493329723951*x**67 + 81701986711369*x**68 + 60898641337468*x**69 + 44482134367066*x**70 + 31813753798288*x**71 + 22259221214607*x**72 + 15221174189579*x**73 + 10161692700549*x**74 + 6615385384601*x**75 + 4194275182613*x**76 + 2586170833356*x**77 + 1548387401915*x**78 + 898623709228*x**79 + 504576050285*x**80 + 273533670283*x**81 + 142827698985*x**82 + 71646518736*x**83 + 34425824044*x**84 + 15792242064*x**85 + 6890404765*x**86 + 2847262062*x**87 + 1108793943*x**88 + 404600079*x**89 + 137412392*x**90 + 43088473*x**91 + 12353822*x**92 + 3199724*x**93 + 737311*x**94 + 148150*x**95 + 25254*x**96 + 3509*x**97 + 373*x**98 + 27*x**99 + x**100

The exponent corresponds to how many tiles you draw, and the corresponding coefficient counts the number of different combinations. So if you draw seven tiles (like in the regular rules) there are 3,199,724 different letter combinations you could possibly get.

4/1/2009 11:50:00 PM

Jrb599
All American
8845 Posts
user info
edit post

^Generating functions are really helpful, you almost always need a computer to do it.

[Edited on April 2, 2009 at 11:08 AM. Reason : ]

4/2/2009 11:07:41 AM

Jrb599
All American
8845 Posts
user info
edit post

Quote :
"27*x"


How can you have 27 1-letter combinations with only 26 letters in the alphabet? I guess it's including the empty tile?

4/2/2009 7:06:23 PM

Jrb599
All American
8845 Posts
user info
edit post

Quote :
"27*x"


How can you have 27 1-letter combinations with only 26 letters in the alphabet? I guess it's including the empty tile?

4/2/2009 7:06:23 PM

Cabbage
All American
2046 Posts
user info
edit post

Yes, I included the blank tile.

4/2/2009 10:42:40 PM

FeebleMinded
Finally Preemie!
4472 Posts
user info
edit post

Quote :
"the odds you draw any 1 combination that you are looking for is so low, it isnt worth trying to study the "most" likely 7 letter word combos."


This is false on so many different levels.

Yes, the odds of simply drawing the 7 tiles on your first turn are not that great, however, the strategy is to "play off" bad bingo-ing tiles (like Z, Q, etc) knowing that you will more than likely draw a high probability tile. So the whole idea is you are learning words that contain either 6 very high probability tiles and one outlier or 7 very high probability tiles. Trust me it works, as I have played/seen played lots and lots of high probability words and very few low probability words.

4/5/2009 12:48:59 AM

David0603
All American
12762 Posts
user info
edit post

What language did they use to code it?

4/5/2009 3:29:35 PM

A Tanzarian
drip drip boom
10992 Posts
user info
edit post

C++

and

Yay, combinations!

4/6/2009 4:52:15 PM

 Message Boards » Study Hall » Scrabble Probability Question Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.38 - our disclaimer.