left-icon

R Succinctly®
by Barton Poulson

Previous
Chapter

of
A
A
A

CHAPTER 9

Statistics for Three or More Variables

Statistics for Three or More Variables


The final analytic chapter of this book addresses a few common methods for exploring and describing the relationships between multiple variables. These methods include the single most useful procedure in any analyst’s toolbox, multiple regression. Other methods we will cover include the two-factor analysis of variance, the cluster analysis, and the principle components, or factor, analysis.

Multiple regression

The goal of regression is simple: take a collection of predictor variables and use them to predict scores on a single, quantitative outcome variable. Multiple regression is the most flexible approach we will cover in this book. All of the other parametric procedures that we have covered—t-tests, ANOVA, correlation, and bivariate regression—can all be seen as special cases of multiple regression.

In this section, we will start by looking at the simplest version of multiple regression: simultaneous entry. This is when all of the predictors are entered as a group and all of them are retained in the equation.

In this particular example, we're going to look at the most basic form of multiple regression, where all of the variables are entered at the same time in the equation (it's the variable selection and entry that causes most of the fuss in statistics). We will begin by loading the USJudgeRatings data from R’s datasets package. See USJudgeRatings for more information.

Sample: sample_9_1.R

# LOAD DATA

require("datasets")  # Load the datasets package.

data(USJudgeRatings)  # Load data into the workspace.

USJudgeRatings[1:3, 1:8]  # Display 8 variables for 3 cases.

               CONT INTG DMNR DILG CFMG DECI PREP FAMI

AARONSON,L.H.   5.7  7.9  7.7  7.3  7.1  7.4  7.1  7.1

ALEXANDER,J.M.  6.8  8.9  8.8  8.5  7.8  8.1  8.0  8.0

ARMENTANO,A.J.  7.2  8.1  7.8  7.8  7.5  7.6  7.5  7.5

The default function in R for regression is lm(), which stands for “linear model” (see ?lm for more information). The basic structure is lm(outcome ~ predictor1 + predictor2). We can run this function on the outcome variable, RTEN (i.e., “worthy of retention”), and the eleven predictors using the code that follows and save the model to an object that we’ll call reg1, for regression 1. Then, by calling only the name reg1, we can get the regression coefficients, and by calling summary(reg1), we can get several statistics on the model.

# MULTIPLE REGRESSION: DEFAULTS

# Simultaneous entry

# Save regression model to the object.

reg1 <- lm(RTEN ~ CONT + INTG + DMNR + DILG + CFMG +

           DECI + PREP + FAMI + ORAL + WRIT + PHYS,

           data = USJudgeRatings)

Once we have saved the regression model, we can just call the object’s name, reg1, and get a list of regression coefficients:

Coefficients:

(Intercept)         CONT         INTG         DMNR         DILG 

   -2.11943      0.01280      0.36484      0.12540      0.06669 

       CFMG         DECI         PREP         FAMI         ORAL 

   -0.19453      0.27829     -0.00196     -0.13579      0.54782 

       WRIT         PHYS 

   -0.06806      0.26881

For more detailed information about the model, including descriptions of the residuals, confidence intervals for the coefficients, and inferential tests, we can just type summary(reg1):

Residuals:

     Min       1Q   Median       3Q      Max

-0.22123 -0.06155 -0.01055  0.05045  0.26079

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) -2.11943    0.51904  -4.083 0.000290 ***

CONT         0.01280    0.02586   0.495 0.624272   

INTG         0.36484    0.12936   2.820 0.008291 **

DMNR         0.12540    0.08971   1.398 0.172102   

DILG         0.06669    0.14303   0.466 0.644293   

CFMG        -0.19453    0.14779  -1.316 0.197735   

DECI         0.27829    0.13826   2.013 0.052883 . 

PREP        -0.00196    0.24001  -0.008 0.993536   

FAMI        -0.13579    0.26725  -0.508 0.614972   

ORAL         0.54782    0.27725   1.976 0.057121 . 

WRIT        -0.06806    0.31485  -0.216 0.830269   

PHYS         0.26881    0.06213   4.326 0.000146 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1174 on 31 degrees of freedom

Multiple R-squared:  0.9916,  Adjusted R-squared:  0.9886

F-statistic: 332.9 on 11 and 31 DF,  p-value: < 2.2e-16

All of the predictor variables are included in this model, which means that their coefficients and probability values are only valid when taken together. Two things are curious about this model. First, it has an extraordinarily high predictive value, with an R2 of 99%. Second, the two most important predictors in this simultaneous entry model are (a) INTG, or judicial integrity, which makes obvious sense, and (b) PHYS, or physical ability, which has a t-value that’s nearly twice as large as the integrity. This second one doesn’t make sense but is supported by the data.

Additional information on the regression model is available with these functions, when the model’s name is entered in the parentheses:

  • anova(), which gives an ANOVA table for the regression model
  • coef() or coefficients() which gives the same coefficients that we got by calling the model’s name, reg1.
  • confint(), which gives confidence intervals for the coefficients.
  • resid() or residuals(), which gives case-by-case residual values.
  • hist(residuals()), which gives a histogram of the residuals.

Multiple regression is potentially a very complicated procedure, with an enormous number of variations and much room for analytical judgment calls. The version that we conducted previously is the simplest version: all of the variables were entered at once in their original state (i.e. without any transformations), no interactions were specified, and no adjustments were made once the model was calculated.

R’s base installation provides many other options and the available packages give hundreds, and possibly thousands, of other options for multiple regression.[21] I will just mention two of R’s built-in options, both of which are based on stepwise procedures. Stepwise regression models work by using a simple criterion to include or exclude variables from a model, and they can greatly simplify analysis. Such models, however, are very susceptible to capitalizing on the quirks of data, leading one author, in exasperation, to call them “positively satanic in their temptations toward Type I errors.”[22]

With those stern warnings in mind, we will nonetheless take a brief look at two versions of stepwise regression because they are very common—and commonly requested—procedures. The first variation we will examine is backwards removal, in which all possible variables are initially entered, and then variables that do make statistically significant contributions to the overall model are removed one at a time.

The first step is to create a full regression model, just like we did for simultaneous regression. Then, the R function step() is called with that regression model as its first argument and direction = "backward" as the second. An optional argument, trace = 0, prevents R from printing out all of the summary statistics at each step. Finally, we can use summary() to get summary statistics on the new model, which was saved as regb, as in “regression backwards.”

# MULTIPLE REGRESSION: STEPWISE: BACKWARDS REMOVAL

reg1 <- lm(RTEN ~ CONT + INTG + DMNR + DILG + CFMG +

           DECI + PREP + FAMI + ORAL + WRIT + PHYS,

           data = USJudgeRatings)

regb <- step(reg1,  # Stepwise regression, starts with the full model.

             direction = "backward",  # Backwards removal

             trace = 0)  # Don't print the steps.

summary(regb)  # Give the hypothesis testing info.

Residuals:

      Min        1Q    Median        3Q       Max

-0.240656 -0.069026 -0.009474  0.068961  0.246402

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) -2.20433    0.43611  -5.055 1.19e-05 ***

INTG         0.37785    0.10559   3.579 0.000986 ***

DMNR         0.15199    0.06354   2.392 0.021957 * 

DECI         0.16672    0.07702   2.165 0.036928 * 

ORAL         0.29169    0.10191   2.862 0.006887 **

PHYS         0.28292    0.04678   6.048 5.40e-07 ***

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1119 on 37 degrees of freedom

Multiple R-squared:  0.9909,  Adjusted R-squared:  0.9897

F-statistic: 806.1 on 5 and 37 DF,  p-value: < 2.2e-16

Using a stepwise regression model with backwards removal, the predictive ability or R2 was still 99%. Only five variables remained in the model and, as with the simultaneous entry model, physical ability was still the single biggest contributor.

A more common approach to stepwise regression is forward selection, which starts with no variables and then adds them one at a time if they make statistically significant contributions to predictive ability. This approach is slightly more complicated in R because it requires the creation of a “minimal” model with nothing more than the intercept, which is the mean score on the outcome variable. This model is created by using the number 1 as the only predictor variable in the equation. Then the step() function is called again, with the minimal model as the starting point and direction = "forward" as one of the attributes. The possible variables to include are listed in scope. Finally, trace = 0 prevents the intermediate steps from being printed.

# MULTIPLE REGRESSION: STEPWISE: FORWARDS SELECTION

# Start with a model that has nothing but a constant.

reg0 <- lm(RTEN ~ 1, data = USJudgeRatings)  # Intercept only

regf <- step(reg0,  # Start with intercept only.

             direction = "forward",  # Forward addition

             # scope is a list of possible variables to include.

             scope = (~ CONT + INTG + DMNR + DILG + CFMG + DECI +

                        PREP + FAMI + ORAL + WRIT + PHYS),

             data = USJudgeRatings,

             trace = 0)  # Don't print the steps.

summary(regf)  # Statistics on model.

Residuals:

      Min        1Q    Median        3Q       Max

-0.240656 -0.069026 -0.009474  0.068961  0.246402

Coefficients:

            Estimate Std. Error t value Pr(>|t|)   

(Intercept) -2.20433    0.43611  -5.055 1.19e-05 ***

ORAL         0.29169    0.10191   2.862 0.006887 **

DMNR         0.15199    0.06354   2.392 0.021957 * 

PHYS         0.28292    0.04678   6.048 5.40e-07 ***

INTG         0.37785    0.10559   3.579 0.000986 ***

DECI         0.16672    0.07702   2.165 0.036928 * 

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1119 on 37 degrees of freedom

Multiple R-squared:  0.9909,  Adjusted R-squared:  0.9897

F-statistic: 806.1 on 5 and 37 DF,  p-value: < 2.2e-16

Given the possible fluctuations of stepwise regression, it is reassuring to know that both approaches finished with the same model, although they are listed in a different order.

Again, it is important to remember that multiple regression can be a very complicated and subtle procedure and that many analysts have criticized stepwise methods vigorously. Fortunately, R and its available packages offer many alternatives—and more are added on a regular basis—so I would encourage you to explore you options before committing to a single approach.

Once you have saved your work, you should clean the workspace by removing any variables or objects you created.

# CLEAN UP

detach("package:datasets", unload = TRUE)  # Unloads the datasets package.

rm(list = ls())  # Remove all objects from the workspace.

Two-factor ANOVA

The multiple regression procedure that we discussed in the previous section is enormously flexible, and the procedure that we will discuss in this section, the two-factor analysis of variance (ANOVA), can accurately be described as a special case of multiple regression. There are, however, advantages to using the specialized procedures of ANOVA. The most important advantage is that it was developed specifically to work in situations where two categorical variables—called factors in ANOVA—are used simultaneously to predict a single quantitative outcome. ANOVA gives easily interpreted results for the main effect of each factor and a third result for their interaction. We will examine these effects by using the warpbreaks data from R’s datasets package.

Sample: sample_9_2.R

# LOAD DATA

require("datasets")  # Load the datasets package.

data(warpbreaks)

There are two different ways to specify a two-factor ANOVA in R, but both use the aov() function. In the first method, the main effects and interaction are explicitly specified, as shown in the following code. The results of that analysis can be viewed with the summary() function that we have used elsewhere.

# ANOVA: METHOD 1

aov1 <- aov(breaks ~ wool + tension + wool:tension,

            data = warpbreaks)

summary(aov1)  # ANOVA table

             Df Sum Sq Mean Sq F value   Pr(>F)   

wool          1    451   450.7   3.765 0.058213 . 

tension       2   2034  1017.1   8.498 0.000693 ***

wool:tension  2   1003   501.4   4.189 0.021044 * 

Residuals    48   5745   119.7                    

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

These results show a strong main effect of level of tension on the breakage of wool, with a smaller interaction with the kind of wool used. These results make sense, given the pattern of means we saw in the grouped bar chart back in Chapter 8. Figure 32 is reproduced below as Figure 37 for your convenience:

Grouped Bar Chart of Mean

  1. Grouped Bar Chart of Mean

A second method for specifying the ANOVA spells out only the interaction and leaves the main effects as implicit, with the same results as the first method.

# ANOVA: METHOD 2

aov2 <- aov(breaks ~ wool*tension,

            data = warpbreaks)

R is also able to provide a substantial amount of additional information via the model.tables() function. For example, the command model.tables(aov1, type = "means") gives tables of all the marginal and cell means, while the command model.tables(aov1, type = "effects") reinterprets those means as coefficients.

Finally, if one or both of the factors has more than two levels, it may be necessary to do a post-hoc test. As with the one-factor ANOVA discussed in Chapter 7, a good choice is Tukey’s HSD (Honestly Significant Difference) test, with the R command TukeyHSD().

We can finish this section by unloading and packages and clearing the workspace.

# CLEAN UP

detach("package:datasets", unload = TRUE)  # Unloads the datasets package.

rm(list = ls())  # Remove all objects from workspace.

Cluster analysis

Cluster analysis performs a fundamental task: determining which cases are similar. This task makes it possible to place cases—be they people, companies, regions of the country, etc.—into relatively homogeneous groups while distinguishing them from other groups. R has built-in functions that approach the formation of clusters in two ways. The first approach is k-means clustering with the kmeans() function. This approach requires that the researcher specify how many clusters they would like to form, although it is possible to try several variations. The second approach is hierarchical clustering with the hclust() function, in which each case starts by itself and then the cases are gradually joined together according to their similarity. We will discuss these two procedures in turn.

For these examples we will use a slightly reduced version of the mtcars data from R’s datasets package, where we remove two undefined variables from the data set.

Sample: sample_9_3.R

# LOAD DATA

require("datasets")  # Load the datasets package.

mtcars1 <- mtcars[, c(1:4, 6:7, 9:11)]  # New object, select variables.

mtcars1[1:3, ]  # Show the first three lines of the new object.

               mpg cyl disp  hp    wt  qsec am gear carb

Mazda RX4     21.0   6  160 110 2.620 16.46  1    4    4

Mazda RX4 Wag 21.0   6  160 110 2.875 17.02  1    4    4

Datsun 710    22.8   4  108  93 2.320 18.61  1    4    1

In order to use the kmeans() function, we must specify the number of clusters we want. For this example, we’ll try three clusters, although further inspection might suggest fewer or more clusters. This function produces a substantial amount of output that can be displayed by calling the name of the object with the results, which would be km in this case.

# CLUSTER ANALYSIS: K-MEANS

km <- kmeans(mtcars1, 3)  # Specify 3 clusters

Instead of the statistical output for the kmeans() function, it is more useful at this point to create a graph of the clusters. Unfortunately, the kmeans() function does not do this by default. We will instead use the clusplot() function from the cluster package.

# USE "CLUSTER" PACKAGE FOR K-MEANS GRAPH

require("cluster")

clusplot(mtcars1,        # Data frame

         km$cluster,     # Cluster data

         color = TRUE,   # Use color

         shade = FALSE# Colored lines in clusters (FALSE is default).

         lines = 3,      # Turns off lines connecting centroids.

         labels = 2)     # Labels clusters and cases.

This command produces the chart shown in Figure 38.

Cluster Plot for K-Means Clustering

  1. Cluster Plot for K-Means Clustering

Figure 38 shows the three clusters bound by colored circles and arranged on a grid defined by the two largest cluster components. There is good separation between the clusters, but the large separation in cluster 2 on the far left suggests that more than three clusters might be appropriate. Hierarchical clustering would be a good method for checking on the number and size of clusters.

In R, hierarchical clustering is done with the hclust() function. However, this function does not run on the raw data frame. Instead, it needs a distance or dissimilarity matrix, which can be created with the dist() function. Once the dist() and hclust() functions are run, it is then possible to display a dendrogram of the clusters using R’s generic plot() command on the model generated by hclust().

# HIERARCHICAL CLUSTERING

d <- dist(mtcars1)  # Calculate the distance matrix.

c <- hclust(d)  # Use distance matrix for clustering.

plot(c)  # Plot a dendrogram of clusters.

Figure 39 shows the default dendrogram produced by plot(). In this plot, each case is listed individually at the bottom. The lines above join each case to other similar cases, while cases that are more similar are joined lower down—such as the Mercedes-Benz 280 and 280C on the far right—and cases that are more different are joined higher up. For example, it is clear from this diagram that the Maserati Bora on the far left is substantially different from every other car in the data set.

Hierarchical Clustering Dendrogram with Defaults

  1. Hierarchical Clustering Dendrogram with Defaults

Once the hierarchical model has been calculated, it is also possible to place the observations into groups using cutree(), which represents a “cut tree” diagram, another name for a dendrogram. You must, however, tell the function how or where to cut the tree into groups. You can specify either the number of groups using k = 3, or you can specify the vertical height on the dendrogram, h = 230, which would produce the same result. For example, the following command will categorize the cases into three groups and then show the group IDs for the last three cases:

# PLACE OBSERVATIONS IN GROUPS

g3 <- cutree(c, k = 3)  # "g3" = "groups: 3"

g3[30:32]  # Show groups for the last three cases.

Ferrari Dino Maserati Bora    Volvo 142E

            1             3             1

As a note, it is also possible to do several groupings at once by specifying a range of groups (k = 2:5 will do groups of 2, 3, 4, and 5) or specific values (k = c(2, 4) will do groups of 2 and 4).

A final convenient feature of R’s hierarchical clustering function is the ability to draw boxes around groups in the dendrogram using rect.hclust(). The following code superimposes four sets of different colored boxes on the dendrogram:

# DRAW BORDERS AROUND CLUSTERS

rect.hclust(c, k = 2, border = "gray")

rect.hclust(c, k = 3, border = "blue")

rect.hclust(c, k = 4, border = "green4")

rect.hclust(c, k = 5, border = "red")

The result is shown in Figure 40.

Hierarchical Clustering Dendrogram with Boxes around Groups

  1. Hierarchical Clustering Dendrogram with Boxes around Groups

From Figure 40, it is clear that large, American cars form groups that are distinct from smaller, imported cars. It is also clear, again, that the Maserati Bora is distinct from the group, as it is placed in its own category once we request at least four groups.

Once you have saved your work, you should clean the workspace by removing any variables or objects you created.

# CLEAN UP

detach("package:datasets", unload = TRUE)  # Unloads datasets package.

detach("package:cluster", unload = TRUE)  # Unloads datasets package.

rm(list = ls())  # Remove all objects from the workspace.

Principal components and factor analysis

The final pair of statistical procedures that we will discuss in this book is principal components analysis (PCA) and factor analysis (FA). These procedures are very closely related and are commonly used to explore relationships between variables with the intent of combining variables into groups. In that sense, these procedures are the complement of cluster analysis, which we covered in the last section. However, where cluster analysis groups cases, PCA and FA group variables. PCA and FA are terms that are often used interchangeably, even if that is not technically correct. One explanation of the differences between the two is given in the documentation for the psych package: “The primary empirical difference between a components model versus a factor model is the treatment of the variances for each item. Philosophically, components are weighted composites of observed variables while in the factor model, variables are weighted composites of the factors.”[23] In my experience, that can be a distinction without a difference. I personally have a very pragmatic approach to PCA and FA: the ability to interpret and apply the results is the most important outcome. Therefore, it sometimes helps to see the results of these analyses more as recommendations on how the variables could be grouped rather than as statistical dogma that must be followed.

With that caveat in mind, we can look at a simple example of how to run PCA and then FA in R. For this example, we will use the same mtcars data from R’s datasets package that we used in the last section to illustrate cluster analysis. We will exclude two variables from the data set because R does not provide explanations of their meaning. That leaves us with nine variables to work with.

Sample: sample_9_4.R

# LOAD DATA

require("datasets")  # Load the datasets package.

mtcars1 <- mtcars[, c(1:4, 6:7, 9:11)]  # Select the variables.

mtcars1[1:3, ]  # Show the first three cases.

               mpg cyl disp  hp    wt  qsec am gear carb

Mazda RX4     21.0   6  160 110 2.620 16.46  1    4    4

Mazda RX4 Wag 21.0   6  160 110 2.875 17.02  1    4    4

Datsun 710    22.8   4  108  93 2.320 18.61  1    4    1

The default method for principal components analysis in R is prcomp(). This function is easiest to use if the entire data frame can be used. Also, there are two additional arguments that can standardize the variables and make the results more interpretable: center = TRUE, which centers the variables’ means to zero, and scale = TRUE, which sets their variance to one (i.e., unit variance). These two arguments essentially turn all of the observations into z-scores and ensure that the data have a form of homogeneity of variance, which helps stabilize the results of principal components analysis See ?prcomp for more information on this function and the center and scale arguments.

# PRINCIPAL COMPONENTS

pc <- prcomp(mtcars1,

             center = TRUE,  # Centers means to 0 (optional).

             scale = TRUE)  # Sets unit variance (helpful).

By saving the analysis in an object—pc in this case—we can call additional functions for several functions. The first is summary(), which gives the proportion of total variance accounted for by each component. The first line, “standard deviation,” contains the square roots of the eigenvalues of the covariance/correlation matrix.

# OUTPUT

summary(pc)  # Summary statistics

Importance of components:

                          PC1    PC2     PC3     PC4     PC5     PC6     PC7

Standard deviation     2.3391 1.5299 0.71836 0.46491 0.38903 0.35099 0.31714

Proportion of Variance 0.6079 0.2601 0.05734 0.02402 0.01682 0.01369 0.01118

Cumulative Proportion  0.6079 0.8680 0.92537 0.94939 0.96620 0.97989 0.99107

                           PC8    PC9

Standard deviation     0.24070 0.1499

Proportion of Variance 0.00644 0.0025

Cumulative Proportion  0.99750 1.0000

Some plots are also available for PCA. The generic plot() function, when applied to the output of prcomp(), will give an unlabeled bar chart of the eigenvalues for each component, although that can be used to give an intuitive test of how many components should be retained.

The function biplot() gives a two dimensional plot with:

  1. The two largest components on the X and Y axes, respectively.
  2. Vectors to indicate the relationship of each variable in the data frame to those components.
  3. The labels for the individual cases to show where they fall on the two components.

With our data, biplot(pc) will give Figure 41.

Biplot of the Principal Components Analysis

  1. Biplot of the Principal Components Analysis

The simplest use of factor analysis (FA) within R is to determine how many factors are needed to adequately represent the variability within the data. For example, in our data, we can run several iterations of the function factanal(), where we specify different numbers of possible factors and check the probability values on the resulting chi-squared test. In this case, we are looking for a model that is not statistically significant (i.e., p > .05 as opposed to p < .05) because we want a model that corresponds well with the data and does not deviate substantially from it. In each of the following four analyses, a different number of factors is specified and the p-value from the last line of the printout is mentioned. The complete printout for the final command is also included.

# FACTOR ANALYSIS

factanal(mtcars1, 1)  # 1 factor, p < .05 (poor fit)

factanal(mtcars1, 2)  # 2 factors, p < .05 (poor fit)

factanal(mtcars1, 3)  # 3 factors, p < .05 (poor fit)

factanal(mtcars1, 4)  # 4 factors, First w/p > .05 (good fit)

Call:

factanal(x = mtcars1, factors = 4)

Uniquenesses:

  mpg   cyl  disp    hp    wt  qsec    am  gear  carb

0.137 0.045 0.005 0.108 0.038 0.101 0.189 0.126 0.031

Loadings:

     Factor1 Factor2 Factor3 Factor4

mpg   0.636  -0.445  -0.453  -0.234

cyl  -0.601   0.701   0.277   0.163

disp -0.637   0.555   0.176   0.500

hp   -0.249   0.721   0.472   0.296

wt   -0.730   0.219   0.417   0.456

qsec -0.182  -0.897  -0.246        

am    0.891                  -0.100

gear  0.907           0.226        

carb          0.478   0.851        

               Factor1 Factor2 Factor3 Factor4

SS loadings      3.424   2.603   1.549   0.644

Proportion Var   0.380   0.289   0.172   0.072

Cumulative Var   0.380   0.670   0.842   0.913

Test of the hypothesis that 4 factors are sufficient.

The chi square statistic is 6.06 on 6 degrees of freedom.

The p-value is 0.416

These results suggest from the pattern of factor loadings that the first factor has to do with physical size of a car (with smaller cars getting higher factor scores), the second factor has to do with power and speed (with higher scores for more powerful and quicker cars), the third factor has to do with carburetor barrels (which reduces to the “Maserati Bora” factor, as it was the only car with eight carburetor barrels), and the fourth factor gives some additional variance to heavier cars with larger engines. These results can be compared with the biplot that came from the PCA and the cluster analysis in the previous section to provide a more complete understanding of the relationships between cases and variables in this data set.

Once you have saved your work, you should clean the workspace by removing any variables or objects you created.

# CLEAN UP

detach("package:datasets", unload = TRUE)  # Unloads the datasets package.

rm(list = ls())  # Remove all objects from the workspace.

Scroll To Top
Disclaimer
DISCLAIMER: Web reader is currently in beta. Please report any issues through our support system. PDF and Kindle format files are also available for download.

Previous

Next



You are one step away from downloading ebooks from the Succinctly® series premier collection!
A confirmation has been sent to your email address. Please check and confirm your email subscription to complete the download.