10. Model Formulae

Formulae are Splus expressions that state the structural form of a model in terms of the variables involved. For example, the formula

cholesterol ~ systol + age

tells us that the response variable, cholesterol, is to be modeled by an additive model in two predictors, systol and age. This model is the same as

cholesterol ~ 1 + systol + age

where the 1 indicates that an intercept is to be included in the model and is included by default. To exclude the intercept, the model is written as

cholesterol ~ -1 + systol + age

The terms in a formula can be any Splus expression which, when evaluated, can be interpreted as a variable. For example:

log(cholesterol) ~ systol + age

Expressions appearing in a model formula are interpreted as ordinary Splus expressions except for the following operators:

+
used to separate items in a list of terms to be included in the model
:
denotes interaction
*
expansion operator for interaction
eg.: systol*age is equivalent to systol + age + systol:age
-
used to delete terms in a model
eg.: systol*diastol*age - systol:diastol:age deletes the third-order interaction term
%in%
denotes nesting
eg.: smoke%in%sex where smoke corresponds to the number of cigarettes smoked, and smoke is nested within sex
The model morbidity ~ sex + smoke%in%sex would be of the form:
morbidity = intercept + (beta1)*sex + (alpha1)*sex1*smoke + (alpha2)*sex2*smoke
where beta1 corresponds to the contrast for sex.
/
expansion operator for nesting
eg.: sex/smoke (sex and then smoke within sex) is equivalent to sex + smoke%in%sex
^
crosses all the terms to the specified order
eg.: (sex + smoke + diabetes)^2 is equivalent to
sex + smoke + diabetes + sex:smoke + sex:diabetes + smoke:diabetes
poly()
generates a basis for polynomial regression
eg.: poly(x,2), poly(x,y,3) where the last argument is the degree of the polynomial
Model formulas can be saved in the same way as any other Splus object. The formula can then be reused and/or modified using the update() function. The first argument to the update() function is any object with a component named call. This can either be a saved model formula or a model fitted from a model formula. The next argument is a modelling formula, such as y ~ a + b. A single . on either side of the ~ gets replaced by the left or right side of the formula in the first argument.

> chol1_chol~systol*bmi*age

> update(chol1, .~. -systol:bmi:age)
chol ~ systol + bmi + age + systol:bmi + systol:age + bmi:age

Further Reading

John M. Chambers, Trevor J.Hastie, Statistical Models in S, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California, 1992, pp. 18-44.

Where to now?

Table of Contents

Linear Models