Name________________________________
This assignment is worth 100 points. Each problem/part is
worth 5 points each.
Due Date: ThursdayMay 21, 2020 by 11:59p.m. EST.
You do not have to type your responses. If you wish to
provide hand-written work and scan your document, that is fine. Upload your
final document with complete responses to the ‘Test of Association Assignment’
in Canvas.
1. Some students in the masters of analytics
program at Harrisburg University claim they are really good at soccer. To see
whether there is an association among students and faculty in terms of the
goals they can score in a soccer game, the analytics faculty invited students and
teaching assistants (Ph.D. students) to play in soccer matches. Every player was allowed to play in only one match.
Over many matches, we counted the number of players who scored goals.
a. The soccer.txt file is attached to the
‘Test of Association Assignment’ in Canvas. Import this file into R. Copy and
paste your code below that shows where you imported the data. Name the data
frame that you create ‘soccer.’
library(readr)
b. When you examine the data frame you
imported into R, you will notice that it is not in the appropriate format for
performing a chi-square test. Using some of the R categorical functions that
you have learned in the course, convert the soccer data frame that you imported
in part a in to a table such that the Job
variable is on the rows and the Score variable
is on the columns. Name the table you create ‘soccer_t’. Copy and paste your code below that created the
table.
soccer_t <- read.delim("soccer.txt")
Table create
below
c. Develop the appropriate null and
alternative hypothesis for testing an association among these variables. State
the null and alternative hypothesis in the context of this problem.
H0: There is no
relationship between occupation with the frequency of scoring in a soccer
match.
H1:
There is a relationship between occupation with the frequency of scoring in a
soccer match.
d. Run the chi-square test in R using the
assocstats() function. Copy and paste your code and the output below. What is
the value of the chi-square test statistic? What is the p-value related to the
chi-square test statistic?
tab<- table(job,freq)
summary(assocstats(tab))
Chi-square test
statistic is 12
The p-value is
0.2851
e. Compute the χ2
test statistic that you obtained from the R output in part d. You should show your computations. Show the
computations of the expected counts. Provide a final table that includes the
observed counts with the expected counts in parenthesis for each cell. The
final table should have row and column margin totals and a grand total.
summary(chisq.test(tab))
f.
Does
the test statistic you found by hand in part e match the χ2 test statistic from the
assocstats() output in part d? State Yes or No.
No
g. Use the pchisq() function in R to find the
p-value associated with the χ2
test statistic. Copy and paste your code below.
Does the p-value you found with the pchisq() function match the p-value
from the assocstats() output in part d? State Yes or No.
YES
1-pchisq(12, 10, ncp = 0, lower.tail = TRUE, log.p = FALSE)
P value is 0.2850565 to four decimal places is 0.2851
hence it is similar to the p-value using assocstats ()
h. Using the p-value for this test, do you
reject or not reject the null hypothesis?
Accept the null
hypothesis
i.
State
a conclusion back in terms of the context of the problem.
The
null hypothesis is accepted because the p-value is greater than 0.05.