1. Entering Data

Data can be typed directly into Splus or read in from a file. Data objects can also be the result of an expression (combination of data objects and constants with operators and functions). Splus objects are stored in the .Data directory and are saved from one session to the next. In order to save a data object, a name must be assigned to it. This is done using the underscore character "_" or the less-than character and a hyphen "<-", with the name of the object on the left, and the values on the right. Alternatively, the symbol "->" can be used with the values on the left and the name of the object on the right. The name must start with a letter and may contain letters, digits, and periods. Splus is case sensitive, x and X refer to two different things. The following are examples of data assignments:

Scalar

> height_175                        
                            * read as "height gets 175"
                            * assigns the value 175 to the scalar
                              name height

> person_"Jo"                       
                            * character values are inserted in quotes
                            * if the quotes are omitted, Splus will
                              look for a data object Jo to assign
                              to person

Vector

> heights_c(160,140,155)              
                            * the function c() "collects" the values
                              160, 140, and 155 and stores them into
                              the vector heights

> people_c("Ned","Jill","Pat")        
                            * creates a vector of names

> names(heights)_people             
                            * the names() function assigns names to
                              the elements of a vector
                            * the word people is not inserted in
                              quotes, it refers to the vector people,
                              and not the word itself

> heights
 Ned Jill Pat               * typing the name of an object by itself 
 160  140 155                 causes its value to be printed on the 
                              terminal

> heights["Ned"]
 Ned                        * when an object has a names attribute,
 160                          its elements can be referred to by name

> names(heights)_NULL              
                            * deletes the names attribute of the
                              vector heights

Extracting data from an object using subscripts

> object[subscript]                  
                            * syntax for subscripting, where object
                              is the name of the data object and
                              subscript defines which elements to
                              extract
                            * the expression heights["Ned"] above
                              is an example of extracting data using a
                              subscript
- Notice that square brackets are used instead of parentheses. Round brackets are used by functions (ie.: the c() and names() functions). Functions use the arguments provided in parentheses to perform a task. Subscripts require square brackets. The information provided in square brackets tells Splus what subset of a data object is being referred to.

> heights[2]
[1] 140                     * extracts the second element from heights
                            * [1] refers to the position of the first
                              element on the given line - this is very
                              useful when vectors are several lines
                              long

>  heights[c(2,1,2)]
[1] 140 160 140             * extracts the second, first, and second
                              elements from heights
                            * the c() function is used in the
                              subscript when more than one element
                              is listed

> heigths[heights < 160]
[1] 140 155                 * returns all the values of heights
                              which are less than 160
                            * this is NOT equivalent to 
                              > heights[ < 160]
                              which would return the first 159 
                              elements in the vector (eg.:  the 
                              subscript numbers < 160)
                              

> heights[-2] 
[1] 160 155                 * returns all except the second value in
                              heights

> heights[1]_162
> heights                   * assigns the value 162 to the first
[1] 162 140 155               element of heights
                            * the old object heights has now been
                              replaced by the new object heights

> heights[4]_135
> heights                   * appends the value 135 to the vector
[1] 162 140 155 135           heights
                                 
> heights_append(heights,height)
> heights                   * the function append() creates a new
[1] 162 140 155 135 175       vector with the first values the same
                              as heights and the last value as
                              height (recall that height
                              was previously assigned the value 175)
                            * the function append() binds two objects
                              into a vector
                            * the arguments may be vectors, scalars,
                              or both
                            * this is equivalend to
                              > heights_c(heights,height)

> heights.1_append(heights,180,after=2)
[1] 162 140 180 155 135 175     * the argument after specifies
                                  the index of heights after which the
                                  new values are to be inserted
                                    
> heights_replace(heights,2,142)
> heights                   * replaces the second value in heights with
[1] 162 142 155 135 175       the value 142 and stores the new vector
                              in heights
                            * the first argument specifies the name of
                              the data obejct,  the second specifies
                              the indices of the elements to be
                              replaced, and the third argument
                              specifies the values the elements are to
                              be replaced with
                            * this expression is equivalent to
                              > heights[2]_142
 
> heights.2_replace(heights,c(2,4),c(140,142))
> heights.2                 * replaces the second and fourth values of
[1] 162 140 155 142 175       heights by the values 140 and 142 
                              and stores the result into heights.2

> numbers_1:5 
> numbers                   * the operator ":" creates a sequence from
[1] 1 2 3 4 5                 1 to 5 
                            * the syntax for the sequence operator is
                              from:to

> heights_heights.2[2:5]
> heights                   * assigns the last four elements of the
[1] 140 155 142 175           vector heights.2 to the vector heights

> length(heights)
[1] 4                       * returns the length (number of elements)
                              of the object heights
The data objects you have just created are stored in your .Data directory. To see a list of the data objects (and later on, functions) you have created, type

> ls()

To remove an object or function from your .Data directory, use the rm() function. For example, to remove the scalar height, type

> rm(height)

You can also remove more than one data object at a time. To remove the scalar person and the vector numbers, type

> rm(person,numbers)

Assigning a name already used by an Splus function may cause warning messages to appear on the screen:

> c_c(1,2,3)

> d_c(1,2,3)

Warning messages:
  Looking for object "c" of mode "function", ignored one of mode "numeric"
Here, the name c (the "concatenate" function) was assigned to a data object. The problem is solved by reassigning the object to another name and removing the numeric object from the directory:

> b_c

> rm(c)

This will cause the warning message to be printed one last time.

Matrices

> size.1_matrix(c(130,26,110,24,118,25,112,25),ncol=2)

> size.1            * the function matrix() reads data into a matrix
     [,1] [,2]      * the number of columns is specified using the
[1,]  130  118        argument ncol= #
[2,]   26   25      * alternatively, the number of rows can be
[3,]  110  112        specified using the argument nrow= # or both
[4,]   24   25        nrow and ncol can be specified
                    * when neither nrow nor ncol are specified, the
                      data is read in as a one column matrix

> size.2_matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T)

> size.2            * specifying byrow=T forces Splus to read the
     [,1] [,2]        data in row by row
[1,]  130   26      * when the argument is not specified, or specified
[2,]  110   24        as byrow=F, Splus assumes the data is
[3,]  118   25        written in column by column
[4,]  112   25

Lists

Names can be assigned to the rows and to the columns of matrices using the dimnames() and list() functions. The list() function may be used to combine data objects of different modes (eg.: numeric, character,...)or different types (vector, matrix) into one object of mode list. Here, the list() function is used to combine two vectors of differents lengths. The list is therefore made up of two components: the first component corresponds to the row names, and the second component corresponds to the column names.

> size.names_list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist"))
> size.names
[[1]]:
[1] "Abe"   "Bob"   "Carol" "Deb"

[[2]]:
[1] "Weight" "Waist"
Notice the double square brackets: whereas single square brackets are used to extract data from a vector, double square brackets are used to extract components from a list:

> size.names[[2]]
[1] "Weight" "Waist"
The individual components in the list retain their properties as vectors and as such, individual elements can be extracted from each component in the same way as in any other vector:

> size.names[[2]][2]
[1] "Waist"
Names can also be assigned to the components of a list:

> names(size.names)_c("Rows","Columns")
> size.names
$Rows:
[1] "Abe"   "Bob"   "Carol" "Deb"

$Columns:
[1] "Weight" "Waist"
The components of the list can then be extracted using their names attribute:

> size.names$Rows
[1] "Abe"   "Bob"   "Carol" "Deb"
> size.names$Rows[2]
[1] "Bob"


> dimnames(size.2)_size.names

> size.2                          
      Weight Waist  * the dimnames() function assigns names to the
  Abe    130    26    dimensions of a data object (in this case,
  Bob    110    24    the rows and columns of size.2)
Carol    118    25  
  Deb    112    25  
                    

> size.2_matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T, 
+ dimnames=list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist")))

                    * it is possible to assign dimnames directly from
                      within the matrix function
                    * expressions can be spread over several lines,
                      simply hit return at the end of the line and
                      Splus prompts for a continuation line by means
                      of the "+" character (this may also happen if
                      you omit to close all open brackets or strings)

> dimnames(size.2)_list(NULL,c("Weight","Waist"))

                    * the NULL object is used when no dimnames are
                      to be assigned to a dimension

> abc_size.2
> dimnames(abc)_list(c("Abe","Bob","Carol","Deb"),dimnames(size.2)[[2]])

                    * this command assigns dimnames to the rows of abc
                      and assigns the column dimnames of size.2 to the
                      columns of abc

> size_cbind(size.2,heights)
> size                     * cbind() (column bind) "binds" together
    Weight Waist heights     vectors and matrices columnwise into a
[1,]   130    26     140     new matrix
[2,]   110    24     155   * cbind() "binds" the vector heights
[3,]   118    25     142     columnwise to the matrix size.2 and
[4,]   112    25     175     stores the resulting matrix in size
                           * the name heights is automatically
                             assigned to the third column of the
                             matrix size

> size_rbind(size,c(128,26,170))
> size                     * rbind() (row bind) "binds" together
     Weight Waist heights    vectors and/or matrices rowwise into
[1,]    130    26     140    a new matrix
[2,]    110    24     155
[3,]    118    25     142
[4,]    112    25     175
[5,]    128    26     170

> x_c(1,2,3)
> y_diag(x)
> y                        * the function diag() creates a matrix with
     [,1] [,2] [,3]          the vector y on the main diagonal
[1,]    1    0    0        * the main diagonal of a matrix are those
[2,]    0    2    0          elements whose row number and column
[3,]    0    0    3          number are the same
                           * the number of rows or columns can be
                             specified using the arguments nrow or ncol

> diag(y)
[1] 1 2 3                  * alternatively, when the argument is a
                             matrix, diag() returns the diagonal of
                             the matrix

> col(y)
     [,1] [,2] [,3]        * the function col() returns a matrix of
[1,]    1    2    3          column numbers
[2,]    1    2    3        * similarly, the function row() returns a
[3,]    1    2    3          matrix of row numbers

Extracting data from a matrix

> size[2,3]
 heights                 * to extract one value from a matrix, it is
     155                   necessary to use two elements in the
                           subscript:  the first element applies to
                           the rows, the  second element applies to
                           the columns
                         * the full subscript expression applies to
                           the elements of the matrix that satisfy
                           both the row and the column condition
                         * in this case, the element in the second row,
                           third column of the matrix size is printed

> size[2,] 
 Weight Waist heights    * if one dimension is not specified in the
    110    24     155      subscript, all elements in that dimension
                           are extracted
                         * in this case, the columns are not specified
                           so all the columns are included

> size[,3]
[1] 140 155 142 175 170  * prints the third column of the matrix size
                         * in both examples, the comma must be kept in
                           as a marker to indicate which dimension is
                           specified
                         * in both of these examples, Splus drops the
                           extra dimension so that the result is a
                           vector

> size[2, ,drop=F]
    Weight Waist heights * to retain the matrix properties for the
[1,]   110    24     155   result (which might be necessary in some
                           computations), add drop=F to the
                           subscripts
                         * notice that two commas were used in the
                           subscript, one to separate the row from the
                           column (not specified) dimensions, the other
                           to separate the indices from the argument 
                           drop

> is.matrix(size[2,])
[1] F                    * is.matrix is a logical expression which
                           tests whether an object is a matrix

> is.matrix(size[2, ,drop=F)
[1] T                    * as seen above, when a single row or column
                           is extracted from a matrix, the matrix
                           properties are dropped unless otherwise
                           specified in the argument drop

> size[,c(1,3)]
     Weight heights        * the c() function is used in matrix
[1,]    130     140          subscripts in the same way as it is used
[2,]    110     155          in vector subscripts
[3,]    118     142        * here, the first and third columns of the
[4,]    112     175          matrix size are printed out
[5,]    128     170

> size[,c("Weight","Waist")]
     Weight Waist        * character subscripts are used in the same
[1,]    130    26          way as numeric subscripts: the first
[2,]    110    24          element in the subscript specifies the
[3,]    118    25          rows, and the second element in the
[4,]    112    25          subscript specifies the columns
[5,]    128    26

> size[-2,-3]
     Weight Waist       * negative subscripts have the same meaning
[1,]    130    26         for the rows and columns of matrices that
[2,]    118    25         they have for elements of a vector
[3,]    112    25
[4,]    128    26
Suppose you wished to print the weights of those people taller than 160cm: the expression size[,1] will print all the weights in the matrix size. It is necessary to limit the rows to be printed to those rows where the value for heights (column 3) is greater than 160cm, ie.: those rows which satisfy the condition 'size[,3] > 160'. Combining these two expressions gives

> size[size[,3] > 160,1]
[1] 112 128             * this command pulls out the weights (column 1)
                          of those people (rows) with height (size[,3])
                          greater than 160

Matrix Attributes

> dim(size)
[1] 5 3                 * the dim() function returns the dimensions of
                          an object
                        * in the case of matrices, the first element
                          is the number of rows in the matrix and the
                          second element is the number of columns


> nrow(size) 
[1] 5                 
> ncol(size)
[1] 3                   * the functions nrow() and ncol() are based
                          on the function dim() and return the number
                          of rows or the number of columns in the
                          matrix

Further Reading:

Richard A. Becker, John M. Chambers, Allan R. Wilks, The New S Language. A Programming Environmnent for Data Analysis and Graphics, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, California, 1988, pp. 11-13, 17, 18, 36, 37, 95-106, 111-113, 125-129

Exercises

Paste your solutions in a UNIX file.

a) Create the following matrix called marks, and put in the approriate label names.

     Test1 Test2 Test3 Final
[1,]    20    23    18    48
[2,]    16    15    18    40
[3,]    25    20    22    40
[4,]    14    19    18    42
b) Add the following row to the bottom of the matrix:

        10    15    14    30
c) Change the fifth mark for test #2 from a 15 to a 17.

d) Print all the marks for test #3.

e) Print the final marks for those people with marks greater than 16 on test #1.

f) Print the marks matrix without the column for test #3.

g) Print the number of rows in the matrix.

Solutions(Middle mouse button for separate window)

Where to now?

Table of Contents

Computations and Data Manipulations