加州大学伯克利分校Stat2.2x Probability 概率初步学习笔记: Section 5 The accuracy of simple random samples

时间:2023-03-08 20:50:42

Stat2.2x Probability(概率)课程由加州大学伯克利分校(University of California, Berkeley)于2014年在edX平台讲授。

PDF笔记下载(Academia.edu)

Summary

  • Zeros and Ones: Sum of a sample with replacement
    $S$ is the number of successes: $n$ independent trials, chance of success on a single trial is $p$ $$E(S)=n\cdot p,\ SE(S)=\sqrt{n\cdot p\cdot(1-p)}$$ Binomial formula: $$P(S=k)=C_{n}^{k}\cdot p^{k}\cdot(1-p)^{n-k}$$ where $k=0, 1, 2, \ldots, n$. R code:
    dbinom(x = k, size = n, prob = p)
  • Zeros and Ones: Sum of a sample without replacement
    $S$ is the number of good elements in a simple random sample: $n$ elements drawn from $N=G+B$ elements of which $G$ are good. $$E(S)=n\cdot\frac{G}{N},\ SE(S)=\sqrt{n\cdot\frac{G}{N}\cdot\frac{B}{N}}\cdot\sqrt{\frac{N-n}{N-1}}$$ Hypergeometric formula: $$P(S=g)=\frac{C_{G}^{g}\cdot C_{B}^{n-g}}{C_{N}^{n}}$$ where $g$ is the number of good elements in the sample. R code:
    dhyper(k = n, m = G, n = B, x = g)
  • Zeros and Ones: Sample proportion of ones
    $n$ is the sample size, $X$ is the sample proportion of ones. Binomial setting: $$E(X)=p,\ SE(X)=\sqrt{\frac{p\cdot(1-p)}{n}}$$ Hypergeometric setting: $$E(X)=\frac{G}{N},\ SE(X)=\sqrt{\frac{\frac{G}{N}\cdot\frac{B}{N}}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample sum
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample sum is $S$, and population size is $N$. With replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma$$ Without replacement: $$E(S)=n\cdot\mu,\ SE(S)=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Sample mean
    Population mean is $\mu$, $SD$ is $\sigma$, sample size is $n$, sample mean is $M$, and population size is $N$. With replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}$$ Without replacement: $$E(M)=\mu,\ SE(M)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$
  • Square Root Law
    If you multiple the sample size by a factor, the accuracy goes up by the square root of the factor.

PRACTICE

PROBLEM 1

Find the expected value and standard error of

a) your average net gain per bet, if you bet \$1 independently 200 times on “red” at roulette (the bet pays 1 to 1 and the chance of winning is 18/38)

b) the proportion of times you win, if you bet 200 times independently on red as above

c) the total income of a simple random sample of 100 people taken from a population of 5000 people whose average income is \$50,000 with an SD of \$30,000

d) the average income of the sampled people in (c)

e) the number of black cards in a bridge hand (13 cards dealt at random without replacement from a deck consisting of 26 black cards and 26 red cards)

f) the percent of black cards in a bridge hand, described in (e)

Solution

a) Sample mean with replacement. $$E(\text{average net gain})=\mu=1\times\frac{18}{38}+(-1)\times\frac{20}{38}=-\frac{1}{19}\doteq0.05263158$$ $$SE(\text{average net gain})=\frac{SD}{\sqrt{n}}=\frac{\sqrt{E((x-\mu)^2)}}{\sqrt{n}}$$ $$=\frac{\sqrt{(1+\frac{1}{19})\times\frac{18}{38}+(-1+\frac{1}{19})\times\frac{20}{38}}}{\sqrt{200}}\doteq0.07061267$$

b) Sample proportion of ones binomial setting. $$E(\text{proportion of winning times})=p=\frac{18}{38}\doteq0.4736842$$ $$SE(\text{proportion of winning times})=\sqrt{\frac{p\cdot(1-p)}{n}}$$ $$=\sqrt{\frac{\frac{18}{38}\times(1-\frac{18}{38})}{200}}\doteq0.03530634$$

c) Sample sum without replacement. $$E(\text{total income})=n\cdot\mu=100\times50000=5000000$$ $$SE(\text{total income})=\sqrt{n}\cdot\sigma\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{100}\times30000\times\sqrt{\frac{5000-100}{5000-1}}\doteq 297014.6$$

d) Sample mean without replacement. $$E(\text{average income})=\mu=500000$$ $$SE(\text{average income})=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{30000}{\sqrt{100}}\times\sqrt{\frac{5000-100}{5000-1}}\doteq2970.146$$

e) Sum of a sample without replacement. $$E(\text{black cards in a bridge hand})=n\cdot p=13\times\frac{26}{52}=6.5$$ $$SE(\text{black cards in a bridge hand})=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{13\times\frac{1}{2}\times\frac{1}{2}}\times\sqrt{\frac{52-13}{52-1}}\doteq1.576482$$

f) Sample proportion of ones hypergeometric setting. $$E(\text{proportion of black cards in a bridge hand})=p=\frac{1}{2}$$ $$SE(\text{proportion of black cards in a bridge hand})=\sqrt{\frac{p\cdot(1-p)}{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{\frac{\frac{1}{2}\times(1-\frac{1}{2})}{13}}\times\sqrt{\frac{52-13}{52-1}}\doteq0.1212678$$

PROBLEM 2

I play a gambling game repeatedly; the games are independent of each other. In 100 games, my expected average net gain per game is -10 cents, with an SE of 5 cents. In 1000 games, my expected average net gain per game is ________ cents, with an SE of ________ cents.

Solution

The expected value of the net gain will not be changed by increasing the number of playing times. Thus $$E(\text{1000 games})=\mu=-10$$ For $SE$, it will go down when the number of playing games goes up ("square root law"). Thus $$SE(\text{1000 games})=\frac{\sigma}{\sqrt{1000}}=\frac{SE(\text{100 games})\cdot\sqrt{100}}{\sqrt{1000}}\doteq1.581139$$

PROBLEM 3

In a population of tens of thousands of voters, 48% are Democrats. A simple random sample of 125 voters is taken. Approximately what is the chance that a majority of the sampled voters are Democrats?

Solution

Using binomial distribution $n=125, k=63:125, p=0.48$: $$P(\text{majority of 125 sampled voters are Democrats})$$ $$=\sum_{k=63}^{125}C_{125}^{k}\cdot 0.48^k\cdot0.52^{125-k}\doteq0.3269725$$ R code:

sum(dbinom(63:125, 125, 0.48))
[1] 0.3269725

Alternatively, using nomal approximation (sample proportion of ones): $$p=0.48, \sigma=\sqrt{p\cdot(1-p)}$$ $$SE=\frac{\sigma}{\sqrt{125}}, Z=\frac{0.5-p}{SE}$$ Calculating by R:

p = 0.48; sigma = sqrt(p * (1 - p)); se = sigma / sqrt(125)
z = (0.5 - p) / se
1 - pnorm(z)
[1] 0.3272311

The two results are very closer, which is roughly $32.7\%$.

PROBLEM 4

Suppose you are trying to estimate the percent of Democrat voters. Other things being equal, is a simple random sample of 200 voters taken from 100,000 voters about as accurate as a simple random sample of 200 voters taken from 200,000 voters?

Solution

Sample proportion of ones. $$SE(\text{100000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{100000-200}{100000-1}}=0.9990045\cdot\frac{\sigma}{\sqrt{200}}$$ $$SE(\text{200000 voters})=\frac{\sigma}{\sqrt{200}}\cdot\sqrt{\frac{200000-200}{200000-1}}=0.9995024\cdot\frac{\sigma}{\sqrt{200}}$$ Both of the correction factors are very close to 1, thus the accuracy are the same.

UNGRADED EXERCISE SET C

PROBLEM 1

A coin is tossed 2500 times. There is about a 68% chance that the percent of heads is in the range 50% plus or minus? (a percentage)

Solution

$68\%$ is the area between -1 and 1 standard units. So it is $1SE$: $$p=0.5, n=2500$$ $$SE=\sqrt{\frac{p\cdot(1-p)}{n}}=\sqrt{\frac{0.5\times0.5}{2500}}=0.01$$ Thus, there is about $68\%$ chance that the percentage of heads is in the range $50\%$ plus or minus $1\%$.

PROBLEM 2

A simple random sample of 50 students is taken from a class of 300 students. In the class, * the average midterm score is 67 and the $SD$ is 12 * there are 72 women Let $W$ be the number of women in the sample, and let $S$ be the average midterm score of the sampled students.

2A Find $E(W)$.

2B Find $SE(W)$.

2C Find $E(S)$.

2D Find $SE(S)$.

Solution

2A) $$E(W)=50\times\frac{72}{300}=12$$

2B) Sample without replacement. $$N=300, n=50, p=\frac{72}{300}$$ $$SE(W)=\sqrt{n\cdot p\cdot(1-p)}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\sqrt{50\times0.24\times0.76}\times\sqrt{\frac{300-50}{300-1}}\doteq2.761416$$

2C) $$E(S)=\mu=67$$

2D) Sample mean without replacement. $$\sigma=12, n=50, N=300$$ $$SE(S)=\frac{\sigma}{\sqrt{n}}\cdot\sqrt{\frac{N-n}{N-1}}$$ $$=\frac{12}{\sqrt{50}}\times\sqrt{\frac{300-50}{300-1}}\doteq1.551782$$

PROBLEM 3

In a city of over 1,000,000 residents, 14% of the residents are senior citizens. In a simple random sample of 1200 residents, there is about a 95% chance that the percent of senior citizens is in the interval [pick the best option; even if you can provide a sharper answer than you see among the choices, please just pick the best among the options] $9\%-19\%$; $10\%-18\%$; $11\%-17\%$; $12\%-16\%$; $13\%-15\%$.

Solution

Firstly, $95\%$ is $2SE$. This is to find sample proportion (using binomial setting since its correction factor is very close to 1): $$E=p=0.14, n=1200$$ $$SE=\frac{p\cdot(1-p)}{\sqrt{n}}=\frac{0.14\times0.86}{\sqrt{1200}}\doteq0.01001665$$ Thus, the interval should be $E\pm2SE=0.14\pm0.02\in[12\%, 16\%]$.

PROBLEM 4

City A has 1,000,000 people; City B has 4,000,000 people. Suppose the goal is to try to predict the percent of Purple Party voters in a sample. Other things being equal, a simple random sample of 1% of the people in City A has about the same accuracy as a simple random sample of ________% of the people in City B. Pick the best option below to fill in the blank.

Solution

For the same accuracy, we need to make the same sample size (not the same proportion!). Thus the percentage of City B should be $$\frac{10^6\times1\%}{4\times10^6}=0.25\%$$