STAT 501 Mid-Term Exam 2 Spring 2015

A mid-term exam assessing statistical analysis and hypothesis testing.

Charlotte Garcia
Contributor
4.2
34
10 months ago
Preview (4 of 11 Pages)
100%
Log in to unlock

Page 1

STAT 501 Mid-Term Exam 2 Spring 2015 - Page 1 preview image

Loading page ...

STAT 501Mid-Term Exam 2Spring 2015Due April 12Instructions: Use Word to type your answers within this document. Then, submit youranswers in the appropriate dropbox in ANGEL by the due date andwithin 3 hours ofdownloading the exam. The point distribution is located next to each question.1.(4x2=8points)State which of the following statements is TRUE and which isFALSE. For the statements that are false,explain why they are false.a.Removinganoutlier ina regressionanalysiswillresultin narrowerconfidence intervals.b.In a simple linear regression(SLR) model, ifalogtransformationisperformed on Xto remedysomenon-linearity, the mean value ofYis boundto change.c.In model selection, thehighestadjustedR2-value andthesmallest S-valuecriteria always yield the same "best" models.d.Regression models with different responses, but the samepredictorXmatrix,willhave the same leverage values.2.(3+3+4+4+3+3 = 20points)Open the “SalaryData.The datasetconsistsof currentsalaries(Salaryin thousands of dollars)for63individuals withinformationabouttheiryears of work experience(YrsExp)andhighest degree attained(Degree).Yourgoalis tofit a regression model to express the dependenceofY (Salary)on X(YrsExp) and Degree.a.Clearly definea set ofindicator variablesthat could be used in a regressionmodelto represent the qualitative variableDegree.[Hint: Think carefullyabout the number of indicator variables needed given the number of levels ofDegreeand use “Bachelor” as the reference level.]b.Write apopulation multiple linear regression equationfor predicting thecurrent salary in terms ofYrsExpand Degree.Since education levelcouldimpact thedependence of Y on X,the model should containaninteractioneffect betweenYrsExpand Degree, together with their main effects.[Hint:Your equation should include Y, X,theindicator variablesyou defined in part(a),interaction terms,andpopulationregression coefficients (β’s).]c.Conduct a hypothesistestforwhethertheaverage annual salary increaseper year of experience differs bylevel of education (i.e., test iftheslopesfortwo or moreDegree categoriesdiffer).Write out the null and alternativehypotheses, the test statistic, the p-value, and the conclusion.[Minitab v17:SelectSalary as the Response, YrsExp as the Continuous predictor, Degreeas the categorical predictor,click “Model,” select both YrsExp and Degreetogether in the Predictors box and click the Add button next to “Interactionsthrough order 2.Minitab v16: Create interaction terms using Calc >Calculator before fitting the regression model.]d.Writeanew population regression equationbased on your conclusion to part(c). Fit this model and conduct twoseparatehypothesis testsfor whetherthemean salaryfor a fixed number of years’ experience differsbyeducationlevel. For each test, write out the null and alternative hypotheses, the teststatistic, the p-value, and the conclusion.

Page 2

STAT 501 Mid-Term Exam 2 Spring 2015 - Page 2 preview image

Loading page ...

Page 3

STAT 501 Mid-Term Exam 2 Spring 2015 - Page 3 preview image

Loading page ...

e.Based on your conclusiontopart(d), write threefittedsampleregressionequationsthat can be used to predict the current salaryfor each educationlevel.[Hint: Your equations should include number values, notβ’s.]f.Based on one of the equationsfrom part (e), predict thecurrentsalary of aPhDdegree holder with10 yearsof work experience.[Hint: A point estimateis sufficientso there is no need for an interval.]3.(4x2 = 8 points) Consider the following four graphs where the vertical axisrepresents Yandthe horizontal axis representsX.Choose the most appropriate plot for each of the following models(whereD1andD2represent a set of indicator variables):a.𝐸(𝑌)=𝛽0+𝛽1𝑋b.𝐸(𝑌)=𝛽0+𝛽1𝑋+𝛽4𝐷1𝑋+𝛽5𝐷2𝑋c.𝐸(𝑌)=𝛽0+𝛽1𝑋+𝛽2𝐷1+𝛽3𝐷2d.𝐸(𝑌)=𝛽0+𝛽1𝑋+𝛽2𝐷1+𝛽3𝐷2+𝛽4𝐷1𝑋

Page 4

STAT 501 Mid-Term Exam 2 Spring 2015 - Page 4 preview image

Loading page ...

4.(5+2+5+3+3=18points)The fileSavingsDatacontains savings of 33 individualsalong with their age.It is apparent that Y=Savings(in $)has apositive associationwith X=Age(in years). An appropriate regression model relatingSavings toAgecouldbe useful for predicting savings based on age. The most straightforwardapproachwould be to fit a simple linear regression (SLR) model for Y vs X, providedthattheLINE assumptions are satisfied.[Consult “Worked Examples Using Minitab”in the Online Notes for help with any Minitab procedures.]a.Fit an SLR modelfor Y vs Xand perform a residual plot analysis todetermine iftheLINE assumptions are satisfied.Includea numerical testwhen checking for normality(use the Ryan Joiner test in Minitab). Discussyour findings and include any relevant graphs.b.Based on your conclusion in part(a), determine if any transformations aresuggestedfor Xand/orY.[Hint: You should find that both X and Y need tobe transformed.]c.Fit an SLR model for the transformed variable(s)and comment on thismodel’s validity with supporting statements, numerical tests and/or plots.d.UseMinitabto compute a 95% confidence interval for the mean amount ofsavings(in $)expectedfor40 year-oldsbased onthe fitted model in part(c).[Hint: Remember to take into account the transformations to X and Y.]e.UseMinitabto compute a 95%predictioninterval for the amount of savings(in $) predictedfora randomly selected 40 year-oldbased onthe fittedmodel in part(c).[Hint: Remember to take into account the transformationsto X and Y.]5.(2+1+3+2 = 8 points) The following Minitab output resulted from a multiple linearregression model fit to response variable,Y, and predictor terms,X1, X2, and X1X2:CoefficientsTermCoefSECoefT-ValueP-ValueConstant4.491.892.370.022X10.7590.3742.030.048X20.9650.4262.260.028X1*X20.17420.08212.120.039a.Conduct a hypothesis test for whetherthe interactionterm,X1X2,can bedropped from the model.Write out the population model, null and alternativehypotheses, the test statistic, the p-value, and the conclusion.In this case the population model is,𝑌=𝛽0+𝛽1𝑋1+𝛽2𝑋2+𝛽3𝑋1𝑋2+𝑒Here we want to test whether the interaction term is significant or not thus thenull and alternative hypotheses are,𝐻0:𝛽3=0𝑎𝑛𝑑𝐻𝑎:𝛽30From the above output we can see that the test statistic for this abovehypothesis test is 2.12 with the corresponding p-value 0.039. Now assuming
Preview Mode

This document has 11 pages. Sign in to access the full document!