Statistics Toolbox Release Notes 
New Features
This section summarizes the new features and enhancements introduced in the Statistics Toolbox 4.0.
If you are upgrading from a release earlier than Release 12.0, then you should also see New Features in the Statistics Toolbox 3.0 Release Notes.
Cluster Analysis
The new kmeans
function performs Kmeans clustering and supports five different distance measures. The new function silhouette
plots silhouettes of clusters created using either Kmeans or hierarchical clustering methods. The pdist
function now allows several new distance measures and is more efficient for large datasets.
Factor Analysis
The new factoran
function fits a Common Factor Analysis model using maximum likelihood, including rotation of the estimated factor loadings and estimation of factor scores.
Multidimensional Scaling and Procrustes Analysis
The new cmdscale
function performs classical (metric) Multidimensional Scaling, to create a configuration of points in Euclidean space solely from distance data. The new function procrustes
performs orthogonal Procrustes rotations to match one set of points onto another.
Canonical Correlation Analysis
The new function canoncorr
performs Canonical Correlation Analysis, to find the subsets of variables in two datasets that best correlate with each other.
Discriminant Analysis
The classify
function now supports three types of discrimination (linear, quadratic, and Mahalanobis) and allows specification of prior probabilities.
'linear'
is now the default, and you must specify 'mahalanobis'
to duplicate the behavior of the previous version.
Classification and Regression Trees
A collection of new functions (treefit
, treeprune
, treedisp
, treetest
, treeval
) performs classification and regression using decision trees. These functions fit trees to data, display them, prune them, compute error rates for them using test data or crossvalidation, and apply them to new data.
Probability Distributions
Several new functions support the generation of random samples from multivariate distributions. There are functions for generating random matrices from the Wishart (wishrnd
) or inverse Wishart (iwishrnd
) distributions. Other functions (lhsdesign
, lhsnorm
) use latin hypercube sampling methods to generate samples from the multivariate uniform and normal distributions. In addition there have been improvements in other probability functions, particularly those for the negative binomial distribution. Finally, a new function (mvnpdf
) computes the probability density function for the multivariate Normal distribution.
Density Estimation
The new ksdensity
function produces a nonparametric density estimate using a kernel smoothing technique.
Empirical Cumulative Distribution
The new ecdf
function computes the empirical cumulative distribution function (cdf) and confidence bounds for it. For censored data (common in survival analysis), it computes the KaplanMeier estimate of the cdf.
Response Surface Designs
New functions support two commonly used designs: central composite designs (ccdesign
) and BoxBehnken designs (bbdesign
). Central composite designs fit a full quadratic model and can have three or five levels of each factor. ccdesign
supports the three types, circumscribed, inscribed and faced.
BoxBehnken designs are rotatable designs that also fit a full quadratic model but use just three levels of each factor.
DOptimal Designs
The Doptimal design generation functions are faster than in the past. In addition, the two new functions candgen
and candexch
provide more control over the rowexchange algorithm for design generation.
Function Summary
Version 4.0 of the Statistics Toolbox provides the following:
New Functions
Function 
Purpose 

Generate BoxBehnken design 

Doptimal design from candidate set using row exchanges 

Generate candidate set for Doptimal design 

Canonical correlation analysis 

Generate central composite design 



Empirical (KaplanMeier) cumulative distribution function 

Perform Factor Analysis by maximum likelihood 





Compute a probability density estimate using a kernel smoothing method 



Generate a multivariate normal random matrix using latin hypercube sampling 



Parameter estimates and confidence intervals for negative binomial data 

Procrustes Analysis 



Fit a treebased model for classification or regression. 

Produce a sequence of subtrees by pruning. 

Show classification or regression tree graphically 

Compute error rate for tree 

Compute fitted value for decision tree applied to data 

Statistics Functions with New or Changed Capabilities
Function 
Enhancement or Change  

A new syntax lets you specify the type of discriminant function as 'linear' (default), 'quadratic' , or 'mahalanobis' . Specify 'mahalanobis' to duplicate the behavior of the previous version.Another new syntax enables you to specify prior probabilities for the groups. A new output returns an estimate of the misclassification error rate.  
 Now also allows clustering based on distance measures. A new syntax also enables you to specify values for these parameters:  
'cutoff' 
Cutoff for inconsistent and distance measure 

'maxclust' 
Maximum number of clusters to form 

'criterion' 
Either 'inconsistent' or 'distance' 

'depth' 
Depth for computing inconsistent values 

The old syntax still works but is undocumented.  
 clusterdata(Z,'param1',val1,'param2',val2,...) now enables you to specify parameters that clusterdata uses in calling pdist , linkage , and cluster :  
'distance' 
Any of the distance metric names allowed by pdist 

'linkage' 
Any of the linkage methods allowed by linkage 

'cutoff' 
Cutoff for inconsistent and distance measure 

'maxclust' 
Maximum number of clusters to form 

'criterion' 
Either 'inconsistent' or 'distance' 

'depth' 
Depth for computing inconsistent values 

 A new syntax provides more control over design generation through a set of parametervalue pairs.  
'display' 
Controls display of iteration counter. 

'init' 
Specifies an initial design. The default is a randomly selected set of points. 

'maxiter' 
Specifies the maximum number of iterations. The default is 10. 


Provides three new syntaxes:[R,P] = corrcoef(...) returns P , a matrix of pvalues for testing the hypothesis of no correlation.
 

Consistent with a more general interpretation of the negative binomial, these functions now accept any positive value, including nonintegers, for the size parameter R .  

Provides four new metrics for calculating the pairwise distance between observations:  

A new syntax 
Statistics Toolbox Release Notes  Major Bug Fixes 