Statistics Toolbox Release Notes

# Chapter 1Statistics Toolbox 4.0 Release Notes

New Features

This section summarizes the new features and enhancements introduced in the Statistics Toolbox 4.0.

If you are upgrading from a release earlier than Release 12.0, then you should also see New Features in the Statistics Toolbox 3.0 Release Notes.

Multivariate Analysis

Cluster Analysis

The new `kmeans` function performs K-means clustering and supports five different distance measures. The new function `silhouette` plots silhouettes of clusters created using either K-means or hierarchical clustering methods. The `pdist` function now allows several new distance measures and is more efficient for large datasets.

Factor Analysis

The new `factoran` function fits a Common Factor Analysis model using maximum likelihood, including rotation of the estimated factor loadings and estimation of factor scores.

Multidimensional Scaling and Procrustes Analysis

The new `cmdscale` function performs classical (metric) Multidimensional Scaling, to create a configuration of points in Euclidean space solely from distance data. The new function `procrustes` performs orthogonal Procrustes rotations to match one set of points onto another.

Canonical Correlation Analysis

The new function `canoncorr` performs Canonical Correlation Analysis, to find the subsets of variables in two datasets that best correlate with each other.

Discriminant Analysis

The `classify` function now supports three types of discrimination (linear, quadratic, and Mahalanobis) and allows specification of prior probabilities.

'`linear'` is now the default, and you must specify `'mahalanobis'` to duplicate the behavior of the previous version.

Nonlinear Regression Models

Classification and Regression Trees

A collection of new functions (`treefit`, `treeprune`, `treedisp`, `treetest`, `treeval`) performs classification and regression using decision trees. These functions fit trees to data, display them, prune them, compute error rates for them using test data or cross-validation, and apply them to new data.

Probability Distributions

Several new functions support the generation of random samples from multivariate distributions. There are functions for generating random matrices from the Wishart (`wishrnd`) or inverse Wishart (`iwishrnd`) distributions. Other functions (`lhsdesign`, `lhsnorm`) use latin hypercube sampling methods to generate samples from the multivariate uniform and normal distributions. In addition there have been improvements in other probability functions, particularly those for the negative binomial distribution. Finally, a new function (`mvnpdf`) computes the probability density function for the multivariate Normal distribution.

Descriptive Statistics

Density Estimation

The new `ksdensity` function produces a nonparametric density estimate using a kernel smoothing technique.

Empirical Cumulative Distribution

The new `ecdf` function computes the empirical cumulative distribution function (cdf) and confidence bounds for it. For censored data (common in survival analysis), it computes the Kaplan-Meier estimate of the cdf.

Design of Experiments

Response Surface Designs

New functions support two commonly used designs: central composite designs (`ccdesign`) and Box-Behnken designs (`bbdesign`). Central composite designs fit a full quadratic model and can have three or five levels of each factor. `ccdesign` supports the three types, circumscribed, inscribed and faced.

Box-Behnken designs are rotatable designs that also fit a full quadratic model but use just three levels of each factor.

D-Optimal Designs

The D-optimal design generation functions are faster than in the past. In addition, the two new functions `candgen` and `candexch` provide more control over the row-exchange algorithm for design generation.

Function Summary

Version 4.0 of the Statistics Toolbox provides the following:

New Functions

 Function Purpose `bbdesign` Generate Box-Behnken design `candexch` D-optimal design from candidate set using row exchanges `candgen` Generate candidate set for D-optimal design `canoncorr` Canonical correlation analysis `ccdesign` Generate central composite design `cmdscale` Classical multidimensional scaling `ecdf` Empirical (Kaplan-Meier) cumulative distribution function `factoran` Perform Factor Analysis by maximum likelihood `iwishrnd` Generate inverse Wishart random matrix `kmeans` K-means clustering `ksdensity` Compute a probability density estimate using a kernel smoothing method `lhsdesign` Generate a latin hypercube sample `lhsnorm` Generate a multivariate normal random matrix using latin hypercube sampling `mvnpdf` Multivariate normal probability density function (pdf) `nbinfit` Parameter estimates and confidence intervals for negative binomial data `procrustes` Procrustes Analysis `silhouette` Silhouette plot for clustered data `treefit` Fit a tree-based model for classification or regression. `treeprune` Produce a sequence of subtrees by pruning. `treedisp` Show classification or regression tree graphically `treetest` Compute error rate for tree `treeval` Compute fitted value for decision tree applied to data `wishrnd` Generate Wishart random matrix

Statistics Functions with New or Changed Capabilities

 Function Enhancement or Change `classify` A new syntax lets you specify the type of discriminant function as `'linear'` (default), `'quadratic'`, or `'mahalanobis'`. Specify `'mahalanobis'` to duplicate the behavior of the previous version.Another new syntax enables you to specify prior probabilities for the groups. A new output returns an estimate of the misclassification error rate. `cluster` Now also allows clustering based on distance measures. A new syntax also enables you to specify values for these parameters: `'cutoff'` Cutoff for inconsistent and distance measure `'maxclust'` Maximum number of clusters to form `'criterion'` Either `'inconsistent'` or `'distance'` `'depth'` Depth for computing inconsistent values The old syntax still works but is undocumented. `clusterdata` `clusterdata(Z,'param1',val1,'param2',val2,...)` now enables you to specify parameters that `clusterdata` uses in calling `pdist`, `linkage`, and `cluster`: `'distance'` Any of the distance metric names allowed by `pdist` `'linkage'` Any of the linkage methods allowed by `linkage` `'cutoff'` Cutoff for inconsistent and distance measure `'maxclust'` Maximum number of clusters to form `'criterion'` Either `'inconsistent'` or `'distance'` `'depth'` Depth for computing inconsistent values `cordexch daugment dcovary rowexch` A new syntax provides more control over design generation through a set of parameter-value pairs.````function`(...,'param1',value1,'param2',value2,...) ``` Valid parameters are: `'display'` Controls display of iteration counter. `'init'` Specifies an initial design. The default is a randomly selected set of points. `'maxiter'` Specifies the maximum number of iterations. The default is 10. `corrcoef (MATLAB)` Provides three new syntaxes:`[R,P] = corrcoef(...)` returns `P`, a matrix of p-values for testing the hypothesis of no correlation.`[R,P,RLO,RUP] = corrcoef(...)` returns matrices `RLO` and `RUP` which contain lower and upper bounds for a 95% confidence interval for each coefficient. `[...]=corrcoef(...,'param1',val1,'param2',val2,...)` accepts parameter-value pairs that enable you to override the default confidence interval, and specify how to treat rows of `X` that contain `NaN`s. `nbincdf, nbininv, nbinpdf, nbinrnd,``nbinstat` Consistent with a more general interpretation of the negative binomial, these functions now accept any positive value, including nonintegers, for the size parameter `R`. `pdist` Provides four new metrics for calculating the pairwise distance between observations: `'cosine'`, `'correlation'`, `'hamming'`, and `'jaccard'`. It now also accepts a function handle to a user-defined distance function. `regstats` A new syntax `stats = regstats(responses,DATA,model,whichstats)` creates an output structure `stats` containing the statistics listed in `whichstats`. `whichstats` can be a single name or a cell array of names. The list of available statistics remains the same.

 Statistics Toolbox Release Notes Major Bug Fixes