Data Analysis and Statistical Inference with R - Spring 2019
Homework 3
DUE IN: Friday, 01.03.2019 at 23:59,
HOW: electronically in pdf-format via submission to www.turnitin.com
Class id: depends on lab group (see announcement on piazza.com)
enrollment password: 20TiTaNic19
Please register for the class on turnitin ahead of time.
GROUP WORK: is allowed with a maximum of 2 persons per group. PLEASE stay within the
same group throughout the semester. Only one solution is accepted and graded per group.
Please include the names of all group members on each assignment.
HOW MANY: There will be a total of six homework assignments in this semester. We will do
a random selection of questions to be graded. Each week a total of ten points can be gained.
Only the five best homeworks will be counted.
DUE DATES: 15.02., 22.02., 01.03., 08.03., 15.03., 22.03., 29.03. (tentatively, subject to change)
FORMAT: Please do the required analyses and provide answers in complete sentences. Provide
the R syntax for the commands. Extract and report those statistics that are
relevant; do not copy complete R output without providing proper answers to the assignment
questions. Integrate requested figures or tables into your document and give a brief verbal
comment/caption on them.
Gender discrimination in College Admission?
You are engaged in equal opportunity rights promotion. Hence, you look at one of the classic cases
of alleged gender discrimination. The file UCBAdmission.Rdata (an R data set, uploaded on piazza)
refers to individuals who applied for admission into one of the six largest graduate departments at
the University of California in Berkeley, for the Fall 1973 session. The variables for a total of 4526
applicants are denoted by
Data Analysis作业代写
Admit (A): Whether applicant was admitted or rejected
Gender (G): Gender of applicant (male, female)
Dept (D): Department to which application was sent (A, B, C, D, E, or F)
1. Construct the two-way table for gender and whether admitted. Besides the frequency table,
also generate the tables with row and column percents.
(a) (half a point) How many applicants where rejected?
1
(b) (half a point) Of those, who were rejected, which percentage was female?
(c) (half a point) Which share of males was rejected? Which share of females?
(d) (1 point) Which of the above percentages would you report if you want to make a claim
of gender discrimination? Give reason for your answer!
2. You continue with analysing the relationship between admission and gender.
(a) (1 point) Find the odds ratio for admission of females vs. males and interpret. For which
gender is the probability of admission higher?
(b) (half a point) Draw a mosaic plot for the cross-classification of admission and gender. Is
there a relationship between the two variables visible? If yes, which one?
(c) (half a point) Compute the risk of being rejected for females and males separately. What
is the relative risk of being rejected comparing females with males.
(d) (half a point) Compute the Φ?coefficient for this table. Does it indicate a strong relationship
between admission and gender?
3. Now, you assess the relationship between Admission and Gender using the χ
2
-statistic.
(a) (1 point) Calculate the χ
2
-statistic to assess the relationship between Admission and
Gender.
(b) (1.5 points) Calculate the expected frequencies under the assumption that gender has no
effect on admission. For which cells are expected frequencies higher than the observed
ones?
4. One of your relatives tells you that she is fairly sure that the found relationship is an artefact
and that you should look at rejection rates by department. So, you have a first look at that.
(a) (half a point) Generate the cross-classification of admission and department. Sort the
departments by decreasing rejection rates.
(b) (half a point) Calculate the χ
2
-statistic to assess the relationship between Admit and
Dept.
(c) (1 point) Calculate the expected frequencies under the assumption that department has
no effect on admission. For which department are expected admission frequencies higher
than the observed ones?
(d) (half a point) Visualise the relationships using mosaicplots. Get the differences between
departments in relation to admission visible in the plot? Try out both orders of the two
variables Admit and Dept when creating the contingency table underlying the mosaic
plot. Which variant conveys the message better?
5. Now you investigate the relationship between department and admission for the two sexes
separately. Create the corresponding contingency tables.
(a) (half a point) For each department, report the percentage of female applicants who have
been rejected.
(b) (1 point) Comment on the differences in rejection rates for each department between
males and females? Do they indicate gender discrimination?
(c) (half a point) Visualise the relationships using mosaicplots. Get the differences between
females and males in relation to admission visible in the plots?
2
(d) (half a point) Visualise the relationships using mosaicplots. Get the differences between
females and males in relation to admission visible in the plots?
6. (2.5 points) Now you investigate the relationship between admission and gender for each
department separately. Create the corresponding contingency tables. For each department,
report the odds ratio to assess the relationship between admission and gender.
7. (2.5 points) For each department, compute Cramer’s V to assess the relationship between
admission and gender. What do you conclude from these numbers?
8. (2.5 points) Visualise the three-way table between Admit, Dept and Gender using a single
mosaicplot. Try different sortings of the variables to achieve the display that conveys best
the information of “no gender discrimination when controlling for department”.
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
微信:codinghelp