Solution Manual for Using R for Introductory Statistics, 2nd Edition

Page 1 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 1 preview image

solutions MAnuAlFoRusing RFoRintRoductoRystAtisticssecond editionJohn Verzaniby

Page 2 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 2 preview image

Page 3 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 3 preview image

ContentsContentsi1Getting Started12Univariate data53Bivariate data274Multivariate data475Multivariate graphics566Populations617Statistical inference718Confidence intervals729Significance tests9110 Goodness of fit10911 Linear regression12112 Analysis of variance14413 Extensions of linear model16914 Thanks179i

Page 4 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 4 preview image

1Getting Started1.1The only thing to remember is the placement of parentheses, and theneed to use*for multiplication:1 + 2*(3+4)## [1] 154^3 + 3^(2+1)## [1] 91sqrt((4+3)*(2+1))## [1] 4.582576( (1+2)/(3+4) )^2## [1] 0.18367351.2These would be(2+3)−4, 2+ (3∗4),(2/3)/4, and 2(34); the last work-ing right to left.1.3Translating this to R requires attention to the use of parentheses andusing an asterisk for multiplication:(1 + 2*3^4) / (5/6 - 7)1

Page 5 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 5 preview image

CHAPTER 1.GETTING STARTED2## [1] -26.432431.4We use the 1/2 power as an alternative to thesqrtfunction:(0.25 - 0.2) / (0.2 * (1 - 0.2)/100)^(1/2)## [1] 1.251.5We don’t usecbelow, as it is a very commonly used function in R:a <- 2; b <- 3; d <- 4; e <- 5a * b * d * e## [1] 1201.6It is 1770.1.7It is2510. Instead of scanning, this can be automated:require(UsingR)## Loading required package: UsingR## Loading required package: MASS#### Attaching package: ’UsingR’#### The following object is masked from ’package:ggplot2’:####movies#### The following object is masked from ’package:survival’:####cancermax(exec.pay)## [1] 2510

Page 6 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 6 preview image

CHAPTER 1.GETTING STARTED31.8These values come from:require(UsingR)mean(exec.pay)## [1] 59.88945min(exec.pay)## [1] 0max(exec.pay)## [1] 25101.9This is done with:require(UsingR)mean(exec.pay)## [1] 59.88945mean(exec.pay, trim=0.10)## [1] 29.96894The big difference is due to the fact that the very large salaries that aretrimmed have big influence on the average of the data set computed bymean.1.10The variable names are printed when the data set is displayed. They areTree,age, andcircumference.1.11The only trick is to reference the variable appropriately:mean(Orange$age)

Page 7 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 7 preview image

CHAPTER 1.GETTING STARTED4## [1] 922.14291.12The largest value in a collection is returned bymax:max(Orange$circumference)## [1] 214

Page 8 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 8 preview image

2Univariate data2.1For example:p <- c(2, 3, 5, 7, 11, 13, 17, 19)2.2Thedifffunction returns the distance between fill-ups, somean(diff(gas))is your average mileage per fill-up, andmean(gas)is the uninterestingaverage of the recorded mileage.2.3The data may be entered in usingcthen manipulated in a natural way.x <- c(2, 5, 4, 10, 8)x^2## [1]42516 10064x - 6## [1] -4 -1 -242(x - 9)^2## [1] 49 16 25112.4These can be done with5

Page 9 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 9 preview image

CHAPTER 2.UNIVARIATE DATA6rep("a", 10)##[1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"seq(1, 99, by=2)##[1]13579 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39## [21] 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79## [41] 81 83 85 87 89 91 93 95 97 99rep(1:3, rep(3,3))## [1] 1 1 1 2 2 2 3 3 3rep(1:3, 3:1)## [1] 1 1 1 2 2 3c(1:5, 4:1)## [1] 1 2 3 4 5 4 3 2 12.5These can be done with the following commands:primes_under_20 <- c(1, 2, 3, 5, 8, 13, 21, 34)ns <- 1:10recips <- 1/nscubes <- (1:6)^3years <- 1964:2014subway <- c(14, 18, 23, 28, 34, 42, 50, 59, 66, 72, 79, 86, 96, 103, 110)by25 <- seq(0,1000, by=25)2.6We have:sum(abs(rivers - mean(rivers))) / length(rivers)

Page 10 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 10 preview image

CHAPTER 2.UNIVARIATE DATA7## [1] 313.5508To elaborate,rivers - mean(rivers)centers the values and is a datavector. Callingabsmakes all the values non-negative, andsumreducesthe result to a single number, which is then divided by the length.2.7The unary minus is evaluated before the colon:-1:3# like (-1):3## [1] -10123However, the colon is evaluated before multiplication in the latter:1:2*3# not like 1:(2*3)## [1] 3 62.8If we know the cities starting with a “J” then this is just an exercise inindexing by the names attribute, as with:precip["Juneau"]## Juneau##54.7Getting the cities with the names beginning with “J” can be done bysorting and inspecting, say withsort(names(precip)). This gives:j_cities <- c("Jackson", "Jacksonville", "Juneau")precip[j_cities]##Jackson JacksonvilleJuneau##49.254.554.7The inspection of the names by scanning can be tedious for large datasets. Thegreplfunction can be useful here, but requires the specifica-

Page 11 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 11 preview image

CHAPTER 2.UNIVARIATE DATA8tion of a regular expression to indicate words that start with “J”. As ateaser, here is how this could be done:precip[grepl("^J", names(precip))]##Juneau JacksonvilleJackson##54.754.549.2Regular expressions are described in the help page ?regexp.2.9There are many ways to do this, the following usespaste:paste("Trial", 1:10)##[1] "Trial 1""Trial 2""Trial 3""Trial 4""Trial 5"##[6] "Trial 6""Trial 7""Trial 8""Trial 9""Trial 10"2.10This answer will very depending on the underlying system. One answeris:paste(dname, fname, sep=.Platform$file.sep)## [1] "/Library/Frameworks/R.framework/Versions/3.2/Resources/library/UsingR/DESCRIPTION"2.11The number of levels and number of cases are returned by:require(MASS)man <- Cars93$Manufacturerlength(man)# number of cases## [1] 93length(levels(man))# number of levels## [1] 32

Page 12 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 12 preview image

CHAPTER 2.UNIVARIATE DATA92.12Looking at the levels, we see that one isrotary, which is clearly notnumeric. As for the 5-cylinder cars, we can get them as follows:cyl <- Cars93$Cylinderslevels(cyl)# "rotary"## [1] "3""4""5""6""8""rotary"which(cyl == "5")# just 5 is also okay## [1] 89 93Cars93$Manufacturer[ which(cyl == 5) ]# which companies## [1] Volkswagen Volvo## 32 Levels: Acura Audi BMW Buick Cadillac Chevrolet ... Volvo2.13Thefactorfunction allows this to be done by specifying thelabelsargument:mtcars$am <- factor(mtcars$am, labels=c("automatic", "manual"))This produces a modified, local copy ofmtcars. The ordering of the la-bels should match the following:sort(unique(as.character(mtcars$am))).2.14The answer is no:require(HistData)any(Arbuthnot$Female > Arbuthnot$Male)## [1] FALSERead the help page to see how this could be construed to show the“guiding hand of a devine being.”2.15We have:

Page 13 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 13 preview image

CHAPTER 2.UNIVARIATE DATA10A <- c(TRUE, FALSE, TRUE, TRUE)B <- c(TRUE, FALSE, TRUE, TRUE)!(A & B)## [1] FALSETRUE FALSE FALSE!A | !B## [1] FALSETRUE FALSE FALSEIt is not necessary to express the latter as(!A) | (!B), as the unary!operator has higher precedence than the binary|operator.2.16We use logical extraction for this task:names(precip[precip > 50])## [1] "Mobile""Juneau""Jacksonville" "Miami"## [5] "New Orleans""San Juan"2.17After parsing the question, it can be seen that this expression answersit:m <- mean(precip)trimmed_m <- mean(precip, trim=0.25)any(precip > m + 1.5 * trimmed_m)## [1] FALSEA similar question is used for the algorithmic determination of “out-liers” in a data set.2.18The comparison of strings is done lexicographically. That is, compar-isons are done character by character until a tie is broken. The com-parison of characters varies due to the locale. This may be decided byASCII codes—which yields alphabetically ordering—but need not be.See?localefor more detail.2.19First we store the data, then we analyze it.

Page 14 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 14 preview image

CHAPTER 2.UNIVARIATE DATA11commutes <- c(17, 16, 20, 24, 22, 15, 21, 15, 17, 22)commutes[commutes == 24] <- 18max(commutes)## [1] 22min(commutes)## [1] 15mean(commutes)## [1] 18.3sum(commutes >= 20)## [1] 4sum(commutes < 18)/length(commutes)## [1] 0.52.20We need to know that the months with 31 days are 1, 3, 5, 7, 8, 10, and12.cds <- c(79, 74, 161, 127, 133, 210, 99, 143, 249, 249, 368, 302)longmos <- c(1, 3, 5, 7, 8, 10, 12)long <- cds[longmos]short <- cds[-longmos]mean(long)## [1] 166.5714mean(short)## [1] 205.6

Page 15 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 15 preview image

CHAPTER 2.UNIVARIATE DATA122.21Enter in the data as follows:x <- c(0.57, 0.89, 1.08, 1.12, 1.18, 1.07, 1.17, 1.38, 1.441, 1.72)names(x) <- 1990:1999Usingdiffgivesdiff(x)##199119921993199419951996199719981999##0.3200.1900.0400.060 -0.1100.1000.2100.0610.279We can see that one year was negative:which(diff(x) < 0)## 1995##5The jump between 1994 and 1995 was negative (there was a work stop-page that year). The percentage difference is found by dividing byx[-10]and multiplying by 100. (Recall thatx[-10]is all but the tenth(10th) number ofx). The first year’s jump was the largest.diff(x)/x[-10] * 100##199119921993199419951996## 56.140351 21.3483153.7037045.357143 -9.3220349.345794##199719981999## 17.9487184.420290 19.3615542.22We have:mean_distance <- function(x) {distances <- abs(x - mean(x))mean(distances)}

Page 16 of 16

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 16 preview image

CHAPTER 2.UNIVARIATE DATA132.23This can be done through:f <- function(x) {mean(x^2) - mean(x)^2}f(1:10)## [1] 8.252.24A simple answer is just given by:iseven <- function(x) x %%2 == 0Thenisoddwould be:isodd <- function(x) x%%2 == 1The following implementation ensures integers are used, and addsnames:iseven <- function(x) {x <- as.integer(x)ans <- x %% 2 == 0setNames(ans, x)# add names}iseven(1:10)##12345678910## FALSETRUE FALSETRUE FALSETRUE FALSETRUE FALSETRUERestricting a function to handle only integer inputs can be achieved byusing generic functions, such as described in Appendix??.2.25A simple implementation looks like this. One could improve it by onlylooking at integer factors less or equal the square-root ofx.isprime <- function(x){!any(x %% 2:(x-1) == 0)}

Preview Mode

This document has 180 pages. Sign in to access the full document!

Report

Learning Tools

Writing Tools

Browse Resources

Solution Manual for Using R for Introductory Statistics, 2nd Edition - Page 1

Study Now!

Document Details

Related Documents

Statistics - Univariate Inferential Tests

Statistics - Sampling

Statistics - Probability

Statistics - Principles of Testing

Statistics - Numerical Measures

Statistics - Introduction to Statistics

Statistics - Graphic Displays

Statistics - Cummulative Reviews

Statistics - Common Mistakes and Tables

Statistics - Bivariate Relationships

Company

Explore

Study Tools