Data can be typed directly into Splus or read in from a file. Data objects can also be the result of an expression (combination of data objects and constants with operators and functions). Splus objects are stored in the .Data directory and are saved from one session to the next. In order to save a data object, a name must be assigned to it. This is done using the underscore character "_" or the less-than character and a hyphen "<-", with the name of the object on the left, and the values on the right. Alternatively, the symbol "->" can be used with the values on the left and the name of the object on the right. The name must start with a letter and may contain letters, digits, and periods. Splus is case sensitive, x and X refer to two different things. The following are examples of data assignments:
> height_175 * read as "height gets 175" * assigns the value 175 to the scalar name height > person_"Jo" * character values are inserted in quotes * if the quotes are omitted, Splus will look for a data object Jo to assign to person
> heights_c(160,140,155) * the function c() "collects" the values 160, 140, and 155 and stores them into the vector heights > people_c("Ned","Jill","Pat") * creates a vector of names > names(heights)_people * the names() function assigns names to the elements of a vector * the word people is not inserted in quotes, it refers to the vector people, and not the word itself > heights Ned Jill Pat * typing the name of an object by itself 160 140 155 causes its value to be printed on the terminal > heights["Ned"] Ned * when an object has a names attribute, 160 its elements can be referred to by name > names(heights)_NULL * deletes the names attribute of the vector heights
> object[subscript] * syntax for subscripting, where object is the name of the data object and subscript defines which elements to extract * the expression heights["Ned"] above is an example of extracting data using a subscript- Notice that square brackets are used instead of parentheses. Round brackets are used by functions (ie.: the c() and names() functions). Functions use the arguments provided in parentheses to perform a task. Subscripts require square brackets. The information provided in square brackets tells Splus what subset of a data object is being referred to.
> heights[2] [1] 140 * extracts the second element from heights * [1] refers to the position of the first element on the given line - this is very useful when vectors are several lines long > heights[c(2,1,2)] [1] 140 160 140 * extracts the second, first, and second elements from heights * the c() function is used in the subscript when more than one element is listed > heigths[heights < 160] [1] 140 155 * returns all the values of heights which are less than 160 * this is NOT equivalent to > heights[ < 160] which would return the first 159 elements in the vector (eg.: the subscript numbers < 160) > heights[-2] [1] 160 155 * returns all except the second value in heights > heights[1]_162 > heights * assigns the value 162 to the first [1] 162 140 155 element of heights * the old object heights has now been replaced by the new object heights > heights[4]_135 > heights * appends the value 135 to the vector [1] 162 140 155 135 heights > heights_append(heights,height) > heights * the function append() creates a new [1] 162 140 155 135 175 vector with the first values the same as heights and the last value as height (recall that height was previously assigned the value 175) * the function append() binds two objects into a vector * the arguments may be vectors, scalars, or both * this is equivalend to > heights_c(heights,height) > heights.1_append(heights,180,after=2) [1] 162 140 180 155 135 175 * the argument after specifies the index of heights after which the new values are to be inserted > heights_replace(heights,2,142) > heights * replaces the second value in heights with [1] 162 142 155 135 175 the value 142 and stores the new vector in heights * the first argument specifies the name of the data obejct, the second specifies the indices of the elements to be replaced, and the third argument specifies the values the elements are to be replaced with * this expression is equivalent to > heights[2]_142 > heights.2_replace(heights,c(2,4),c(140,142)) > heights.2 * replaces the second and fourth values of [1] 162 140 155 142 175 heights by the values 140 and 142 and stores the result into heights.2 > numbers_1:5 > numbers * the operator ":" creates a sequence from [1] 1 2 3 4 5 1 to 5 * the syntax for the sequence operator is from:to > heights_heights.2[2:5] > heights * assigns the last four elements of the [1] 140 155 142 175 vector heights.2 to the vector heights > length(heights) [1] 4 * returns the length (number of elements) of the object heightsThe data objects you have just created are stored in your .Data directory. To see a list of the data objects (and later on, functions) you have created, type
> ls()
To remove an object or function from your .Data directory, use the rm() function. For example, to remove the scalar height, type
> rm(height)
You can also remove more than one data object at a time. To remove the scalar person and the vector numbers, type
> rm(person,numbers)
Assigning a name already used by an Splus function may cause warning messages to appear on the screen:
> c_c(1,2,3)
> d_c(1,2,3)
Warning messages: Looking for object "c" of mode "function", ignored one of mode "numeric"Here, the name c (the "concatenate" function) was assigned to a data object. The problem is solved by reassigning the object to another name and removing the numeric object from the directory:
> b_c
> rm(c)
This will cause the warning message to be printed one last time.
> size.1_matrix(c(130,26,110,24,118,25,112,25),ncol=2) > size.1 * the function matrix() reads data into a matrix [,1] [,2] * the number of columns is specified using the [1,] 130 118 argument ncol= # [2,] 26 25 * alternatively, the number of rows can be [3,] 110 112 specified using the argument nrow= # or both [4,] 24 25 nrow and ncol can be specified * when neither nrow nor ncol are specified, the data is read in as a one column matrix > size.2_matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T) > size.2 * specifying byrow=T forces Splus to read the [,1] [,2] data in row by row [1,] 130 26 * when the argument is not specified, or specified [2,] 110 24 as byrow=F, Splus assumes the data is [3,] 118 25 written in column by column [4,] 112 25
> size.names_list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist")) > size.names [[1]]: [1] "Abe" "Bob" "Carol" "Deb" [[2]]: [1] "Weight" "Waist"Notice the double square brackets: whereas single square brackets are used to extract data from a vector, double square brackets are used to extract components from a list:
> size.names[[2]] [1] "Weight" "Waist"The individual components in the list retain their properties as vectors and as such, individual elements can be extracted from each component in the same way as in any other vector:
> size.names[[2]][2] [1] "Waist"Names can also be assigned to the components of a list:
> names(size.names)_c("Rows","Columns") > size.names $Rows: [1] "Abe" "Bob" "Carol" "Deb" $Columns: [1] "Weight" "Waist"The components of the list can then be extracted using their names attribute:
> size.names$Rows [1] "Abe" "Bob" "Carol" "Deb" > size.names$Rows[2] [1] "Bob" > dimnames(size.2)_size.names > size.2 Weight Waist * the dimnames() function assigns names to the Abe 130 26 dimensions of a data object (in this case, Bob 110 24 the rows and columns of size.2) Carol 118 25 Deb 112 25 > size.2_matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T, + dimnames=list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist"))) * it is possible to assign dimnames directly from within the matrix function * expressions can be spread over several lines, simply hit return at the end of the line and Splus prompts for a continuation line by means of the "+" character (this may also happen if you omit to close all open brackets or strings) > dimnames(size.2)_list(NULL,c("Weight","Waist")) * the NULL object is used when no dimnames are to be assigned to a dimension > abc_size.2 > dimnames(abc)_list(c("Abe","Bob","Carol","Deb"),dimnames(size.2)[[2]]) * this command assigns dimnames to the rows of abc and assigns the column dimnames of size.2 to the columns of abc > size_cbind(size.2,heights) > size * cbind() (column bind) "binds" together Weight Waist heights vectors and matrices columnwise into a [1,] 130 26 140 new matrix [2,] 110 24 155 * cbind() "binds" the vector heights [3,] 118 25 142 columnwise to the matrix size.2 and [4,] 112 25 175 stores the resulting matrix in size * the name heights is automatically assigned to the third column of the matrix size > size_rbind(size,c(128,26,170)) > size * rbind() (row bind) "binds" together Weight Waist heights vectors and/or matrices rowwise into [1,] 130 26 140 a new matrix [2,] 110 24 155 [3,] 118 25 142 [4,] 112 25 175 [5,] 128 26 170 > x_c(1,2,3) > y_diag(x) > y * the function diag() creates a matrix with [,1] [,2] [,3] the vector y on the main diagonal [1,] 1 0 0 * the main diagonal of a matrix are those [2,] 0 2 0 elements whose row number and column [3,] 0 0 3 number are the same * the number of rows or columns can be specified using the arguments nrow or ncol > diag(y) [1] 1 2 3 * alternatively, when the argument is a matrix, diag() returns the diagonal of the matrix > col(y) [,1] [,2] [,3] * the function col() returns a matrix of [1,] 1 2 3 column numbers [2,] 1 2 3 * similarly, the function row() returns a [3,] 1 2 3 matrix of row numbers
> size[2,3] heights * to extract one value from a matrix, it is 155 necessary to use two elements in the subscript: the first element applies to the rows, the second element applies to the columns * the full subscript expression applies to the elements of the matrix that satisfy both the row and the column condition * in this case, the element in the second row, third column of the matrix size is printed > size[2,] Weight Waist heights * if one dimension is not specified in the 110 24 155 subscript, all elements in that dimension are extracted * in this case, the columns are not specified so all the columns are included > size[,3] [1] 140 155 142 175 170 * prints the third column of the matrix size * in both examples, the comma must be kept in as a marker to indicate which dimension is specified * in both of these examples, Splus drops the extra dimension so that the result is a vector > size[2, ,drop=F] Weight Waist heights * to retain the matrix properties for the [1,] 110 24 155 result (which might be necessary in some computations), add drop=F to the subscripts * notice that two commas were used in the subscript, one to separate the row from the column (not specified) dimensions, the other to separate the indices from the argument drop > is.matrix(size[2,]) [1] F * is.matrix is a logical expression which tests whether an object is a matrix > is.matrix(size[2, ,drop=F) [1] T * as seen above, when a single row or column is extracted from a matrix, the matrix properties are dropped unless otherwise specified in the argument drop > size[,c(1,3)] Weight heights * the c() function is used in matrix [1,] 130 140 subscripts in the same way as it is used [2,] 110 155 in vector subscripts [3,] 118 142 * here, the first and third columns of the [4,] 112 175 matrix size are printed out [5,] 128 170 > size[,c("Weight","Waist")] Weight Waist * character subscripts are used in the same [1,] 130 26 way as numeric subscripts: the first [2,] 110 24 element in the subscript specifies the [3,] 118 25 rows, and the second element in the [4,] 112 25 subscript specifies the columns [5,] 128 26 > size[-2,-3] Weight Waist * negative subscripts have the same meaning [1,] 130 26 for the rows and columns of matrices that [2,] 118 25 they have for elements of a vector [3,] 112 25 [4,] 128 26Suppose you wished to print the weights of those people taller than 160cm: the expression size[,1] will print all the weights in the matrix size. It is necessary to limit the rows to be printed to those rows where the value for heights (column 3) is greater than 160cm, ie.: those rows which satisfy the condition 'size[,3] > 160'. Combining these two expressions gives
> size[size[,3] > 160,1] [1] 112 128 * this command pulls out the weights (column 1) of those people (rows) with height (size[,3]) greater than 160
> dim(size) [1] 5 3 * the dim() function returns the dimensions of an object * in the case of matrices, the first element is the number of rows in the matrix and the second element is the number of columns > nrow(size) [1] 5 > ncol(size) [1] 3 * the functions nrow() and ncol() are based on the function dim() and return the number of rows or the number of columns in the matrix
a) Create the following matrix called marks, and put in the approriate label names.
Test1 Test2 Test3 Final [1,] 20 23 18 48 [2,] 16 15 18 40 [3,] 25 20 22 40 [4,] 14 19 18 42b) Add the following row to the bottom of the matrix:
10 15 14 30c) Change the fifth mark for test #2 from a 15 to a 17.
d) Print all the marks for test #3.
e) Print the final marks for those people with marks greater than 16 on test #1.
f) Print the marks matrix without the column for test #3.
g) Print the number of rows in the matrix.
Solutions(Middle mouse button for separate window)
Computations and Data Manipulations