Need help?

Data Statistics Assignment 2 Sample

Data Statistics Assignment 2

QUESTION 1. Single- factor experiments

(a) By constructing an appropriate QQ plot, determine if the sample of appears to be normally distributed [3 marks].

(b) Construct a 95% two-sided confidence interval for mean fish length (i.e. is expected value of )

(c) Using significance level , perform a test to determine if the median length of fish with very high mercury concentration is less than 54cm. Write down the null and alternative hypotheses [1 mark], the test statistic and associated p-value [1 mark], the test decision (providing a reason for this) [1 mark] and a conclusion using a minimum of mathematical language [1 mark].

(d) Using significance level , perform a test to determine if mean length of fish from the

Waccamaw river is higher than mean length of fish from the Lumber River. Write down the null and alternative hypotheses [1 mark], the test statistic and associated p-value [1 mark], the test decision (providing a reason for this) [1 mark] and a conclusion using a minimum of mathematical language [1 mark].
mathematical language [1 mark].

QUESTION 2. Two- factor experiment [14 marks]

(a) Construct a single chart that displays eight boxplots (one for each combination of factor levels) [2 marks].

(b) Write down the statistical model for a completely randomised block design consistent with the sample data, excluding interaction between the factors [2 marks].
Identify the treatments and measurement units [2 marks].

(c) Using significance level , perform two-way ANOVA (without interaction) and document the -test for the experimental factor. Write down the null and alternative hypotheses [1 mark], the test statistic and associated p-value [1 mark], the test decision (providing a reason for this) [1 mark] and a conclusion using a minimum of mathematical language [1 mark].

(d) Using significance level , perform Tukey post-hoc analysis on the experimental factor for each level of the blocking factor and determine which pairs of concentrations of mercury are associated with statistically different average fish lengths [2 marks].

(e) Using diagnostic plots of the residuals, assess whether the assumptions of normality and constant variance have been met [2 marks].

QUESTION 3. Simple linear regression [14 marks]

In this question for MBA assignment expert, we build a simple linear regression to model the relationship between engine power ( and engine displacement . We consider the population model where .

(a) Fit the model described above, write down the regression equation [1 mark] and use the model to calculate the difference in predicted average engine power for vehicles with a 25 cubic inch difference in engine displacement [2 marks].

(b) Construct a scatter plot of on the vertical axis and on the horizontal axis and superimpose the fitted regression line over the top [3 marks].

(c) Using 0.05 significance level, test whether average engine power increases by more than 0.33 units for each additional cubic inch in engine displacement. Write down the null and alternative hypotheses [1 mark], the test statistic and p- value [1 mark], the test decision with reason [1 mark] and a conclusion using a minimum of mathematical language [1 mark].

(d) Is there any statistical evidence against the assumption of independent errors [2 marks]?

(e) Provide an estimate of [2 marks].

QUESTION 4. Multiple linear regression [18 marks]

In this question we extend the model from Q3 into a multiple linear regression. We now consider the population model where Note that R will create the dummy variable automatically.

(a) Fit the model described above, write down the regression equation that applies for vehicles with a manual gearbox [1 mark] and provide interpretations of the estimated coefficients and [2 marks].

(b) Using 0.05 significance level, determine if the regression is significant. Write down the null and alternative hypotheses [1 mark], the test statistic and p- value [1 mark], the test decision with reason [1 mark] and a conclusion using a minimum of mathematical language [1 mark].

(c) Compute the 95% mean confidence interval for engine power of a vehicle with a manual gearbox, with engine displacement 295in3 and weight 2875lbs [2 marks].

(d) Using 0.05 significance level perform a normality test on the residuals of the fitted model. Write down the null and alternative hypotheses [1 mark], the test statistic and p-value [1 mark], the test decision with reason [1 mark] and a conclusion using a minimum of mathematical language [1 mark].

(e) Is there any indication of multicollinearity in the fitted model [2 marks]?

(f) Provide the Cook’s D of the most influential point [1 mark], refit the regression model on sample data excluding this point and write down the fitted equation [2 marks].

Solution

Question 1

a)

R Code

In QQ plot, when the data follows normal distribution, then the points tend to fall on the 450 trend line. On the other hand, when the data violates the normality assumption, then it is expected that the point deviate away from the 450 trend line. From the above QQ plot it is clearly seen that the points move away from the 450 trend line indicating that the distribution of fish length do not follow normal distribution.

b)

R Code

The 95% confidence interval for the true mean fish length is calculated and is given below



Thus, the 95% confidence interval for the true mean fish length is (38.7 cm, 41.3 cm).
This indicates that when repeated samples are taken from the same population, then 95 out of 100 times the true mean fish length will fall within this interval.

c)

R Code

Null Hypothesis: H0: Md ≥ 54
That is, median length of fish with very high mercury concentration is not less than 54 cm
Alternative Hypothesis: Ha: Md < 54
That is, median length of fish with very high mercury concentration is less than 54 cm

The workings of Wilcoxon signed rank test is given below



P – Value = 0.02

From the above output, we see that the value of test statistic is 25 and its corresponding p – value falls below 0.05, indicating that there is sufficient statistical evidence to reject the null hypothesis at 5% level of significance. Therefore, we conclude that median length of fish with very high mercury concentration is less than 54 cm.

d)

R Code

Null Hypothesis: H0: µLumber = µWaccamaw
That is, mean length of fish from the Waccamaw River is not higher than mean length of fish from the Lumber River
Alternative Hypothesis: Ha: µLumber < µWaccamaw
That is, mean length of fish from the Waccamaw River is higher than mean length of fish from the Lumber River.

The workings of t test is given below:



The value of t test statistic is – 0.7

The p – value is 0.2

Conclusion

Here, the p – value of t test statistic falls above 0.05, indicating that there is insufficient evidence to reject the null hypothesis at 5% level. Therefore, we there is no statistical evidence to conclude that the mean length of fish from the Waccamaw River is higher than mean length of fish from the Lumber River.

Question 2

a)

R Code

b)

The complete randomized block design is given below
Length = µ + Riverj + Mercuryi + Riverj * Mercuryi
The treatment effects are River (two levels Lumber and Waccamaw) and Mercury (four levels, low, medium, high and very high).

c)

R code

Main Effect River

Null Hypothesis: H0 µ1 = µ2

That is, there is no mean difference in the fish length between the two rivers

Alternative Hypothesis: Ha µ1 ≠ µ2

That is, there is a mean difference in the fish length between the two rivers

Main Effect Mercury

Null Hypothesis: H0 µ1 = µ2 = µ3 = µ4

That is, there is no mean difference in the mercury concentration among the four groups

Alternative Hypothesis: Ha µi ≠ µj

That is, at least one pair mercury concentration mean fish length differ significantly

The two – way ANOVA output is given below



From the above output, we see that the value of f test statistic for main effect River is 0.91 and its corresponding p – value is 0.34 > 0.05, indicating that there is no difference in the mean fish length between the two rivers.

From the above output, we see that the value of f test statistic for main effect Mercury is 37.33 and its corresponding p – value is 0.000 < 0.05, indicating that at least one pair mercury concentration means fish length differ significantly.

That is, the fish length was significantly influenced by mercury concentration.

d)

R Code

From the above Tukey Post Hoc test, we see that

Mean Fish length is high for medium mercury concentration than that of low mercury concentration.

Mean Fish length is high for high mercury concentration than that of low mercury concentration.

Mean Fish length is high for very high mercury concentration than that of low mercury concentration.

Mean Fish length is high for high mercury concentration than that of medium mercury concentration.

Mean Fish length is high for very high mercury concentration than that of medium mercury concentration.

Mean Fish length is high for very high mercury concentration than that of high mercury concentration.

d) Residual Plots

The normal probability plot of residuals validates the normality assumption.

Question 3

a)

R Codes

q3q4.data<-read.csv("F:/ q3q4data.csv",header=TRUE,colClasses=c(rep("numeric",times=3),"factor"))
fit<-lm(pow~disp,data=q3q4.data)
summary(fit)

The regression equation is
Engine Power = 45.7345 + 0.4376 * Engine Displacement
When engine displacement = 25 cubic inch, we have
Engine Power = 45.7345 + 0.4376 * 25 = 56.6745

b)

R Code

c)

Null Hypothesis: H0: β1 = 0.33

That is, average engine power do not increases by more than 0.33 units for each additional cubic inch in engine displacement

Alternative Hypothesis: Ha: β1 > 0.33

That is, average engine power do increases by more than 0.33 units for each additional cubic inch in engine displacement.

The value of t test statistic workings is given below



Thus, the value of t test statistic is 1.7411 and its corresponding p – value is 0.046

Since the p – value falls below 0.05, we conclude that the average engine power do increases by more than 0.33 units for each additional cubic inch in engine displacement.

d)

From the above residual plot, it is clearly seen that the assumption of independent errors is satisfied.

e)

Engine Power = 45.7345 + 0.4376 * Engine Displacement

Question 4

a)

R Code

Engine Power = - 11.86 + 0.516 * Engine Displacement + 50.39 * Manual + 5.907 * Weight

The coefficient of manual gearbox is 50.39, indicating that when the car is with manual gearbox, then the engine power increases by 50.39 hp, provided other independent variables held constant.

b)

Null Hypothesis: H0: βi = 0

That is, the regression coefficients do not differ significantly from zero

Alternative Hypothesis: Ha: β1 ≠ 0

That is, the regression coefficients differ significantly from zero

The value of f test statistic is 34.3 and its corresponding p – value at (3, 28) degrees of freedom is 0.000000145

Since the p – value falls well below 0.05, there is sufficient statistical evidence to conclude that the estimated regression model is good fit in predicting engine power

c)

Thus, the 95% mean confidence interval for engine power of a vehicle with a manual gearbox, with engine displacement 295in3 and weight 2875lbs is (173, 242).

d)

R Code

Null Hypothesis: H0:

That is, the distribution of residuals follows normal

Alternative Hypothesis: Ha:

That is, the distribution of residuals do not follow normal

The value of W test statistic is 0.9 and its corresponding p – value is 0.004

Decision

Reject the null hypothesis since the p – value falls below 0.05

Conclusion

Since the p – value falls below 0.05, we conclude that residuals of the fitted model violates the normality assumption.

e)

R code

Here, the VIF for weight is greater than 5, indicating that there is severe correlation between a weight and displacement or weight and manual gearbox.
Here, the VIF for displacement falls between 1 and 5, indicating that there is moderate correlation between a weight and displacement or displacement and manual gearbox.

Thus, there exists multicollinearity.

f)

From the Cooks Distance, there is a clear evidence of most influential points are 15, 29 and 31



Engine Power = - 11.86 + 0.516 * Engine Displacement + 50.39 * Manual + 5.907 * Weight

Still in Dilemma? See what our users have to say about our services.

student rating
Management

Essay: 10 Pages, Deadline: 2 days

They delivered my assignment early. They also respond promptly. This is excellent. Tutors answer my questions professionally and courteously. Good job. Thanks!

flag User ID: 9***95 United States

student rating
Accounting

Report: 10 Pages, Deadline: 4 days

After sleeping for only a few hours a day for the entire week, I was very weary and lacked the motivation to write anything or think about any suggestions for the writer to include in the paper. I am glad I chose your service and was pleasantly pleased by the quality. The paper is complete and ready for submission to the professor. Thanks!

flag User ID: 9***85 United States

student rating
Finance

Assignment: 8 Pages, Deadline: 3 days

I resorted to the MBA assignment Expert in the hopes that they would provide different outcomes after receiving unsatisfactory results from other assignment writing organizations, and they genuinely are fantastic! I received exactly what I was looking for from this writing service. I'm grateful.

flag User ID: 9***55

student rating
HR Rrecruiter

Assignment: 13 Pages, Deadline: 3 days

Incredible response! I could not believe I had received the completed assignment so far ahead of the deadline. Their expert team of writers effortlessly provided me with high-quality content. I only received an A because of their assistance. Thank you very much!

flag User ID: 6***15 United States

student rating
Management

Essay: 8 Pages, Deadline: 3 days

This expert work was very nice and clean.expert did the included more words which was very kind of them.Thank you for the service.

flag User ID: 9***95 United States

student rating
Thesis

Report: 15 Pages, Deadline: 5 days

Cheers on the excellent work, which involved asking questions to clarify anything they were unclear about and ensuring that any necessary adjustments were made promptly.

flag User ID: 9***95 United States

student rating
Economics

Essay: 9 Pages, Deadline: 5 days

To be really honest, I can't bear writing essays or coursework. I'm fortunate to work with a writer who has always produced flawless work. What a wonderful and accessible service. Satisfied!

flag User ID: 9***95

student rating
Taxation

Essay: 12 Pages, Deadline: 4 days

My essay submission to the university has never been so simple. As soon as I discovered this assignment helpline, however, everything improved. They offer assistance with all forms of academic assignments. The finest aspect is that there is also an option for escalation. We will get a solution on time.

flag User ID: 9***95 United States

student rating
Management

Essay: 15 Pages, Deadline: 3 days

This is my first experience with expert MBA assignment expert. They provide me with excellent service and complete my project within 48 hours before the deadline; I will attempt them again in the future.

flag User ID: 9***95 United States

GET A FREE ASSISTANCE

Still Finding MBA Assignment Help? You’ve Come To The Right Place!