Title: | A Collection of Functions for Graphing Correlation Matrices |
---|---|
Description: | Routines for the graphical representation of correlation matrices by means of correlograms, MDS maps and biplots obtained by PCA, PFA or WALS (weighted alternating least squares); See Graffelman & De Leeuw (2023) <doi: 10.1080/00031305.2023.2186952>. |
Authors: | Jan Graffelman [aut, cre], Jan De Leeuw [aut] |
Maintainer: | Jan Graffelman <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.0 |
Built: | 2025-01-08 05:59:09 UTC |
Source: | https://github.com/cran/Correlplot |
Four variables registered for 21 types of aircraft.
data("aircraft")
data("aircraft")
A data frame with 21 observations on the following 4 variables.
SPR
specific power
RGF
flight range factor
PLF
payload
SLF
sustained load factor
Gower and Hand, Table 2.1
Gower, J.C. and Hand, D.J. (1996) Biplots, Chapman & Hall, London
data(aircraft) str(aircraft)
data(aircraft) str(aircraft)
Correlations between SPR (specific power), RGF (flight range factor), PLF (payload) and SLF (sustained load factor) for 21 types of aircraft.
data(aircraftR)
data(aircraftR)
a matrix containing the correlations
Gower and Hand, Table 2.1
Gower, J.C. and Hand, D.J. (1996) Biplots, Chapman & Hall, London
Function angleToR
converts a vector of angles (in radians) to an
estimate of the correlation matrix, given an interpretation function.
angleToR(x, ifun = "cos")
angleToR(x, ifun = "cos")
x |
a vector of angles (in radians) |
ifun |
the interpretation function ("cos" or "lincos") |
A correlation matrix
Jan Graffelman ([email protected])
Graffelman, J. (2012) Linear-angle correlation plots: new graphs for revealing correlation structure. Journal of Computational and Graphical Statistics. 22(1): 92-106.
angles <- c(0,pi/3) R <- angleToR(angles) print(R)
angles <- c(0,pi/3) R <- angleToR(angles) print(R)
A 10 by 10 artificial correlation matrix
data(artificialR)
data(artificialR)
A matrix of correlations
Trosset (2005), Table 1.
Trosset, M.W. (2005) Visualizing correlation. Journal of Computational and Graphical Statistics, 14(1), pp. 1–19.
Correlation matrix of 12 characteristics of Austration athletes (Sex, Height, Weight, Lean Body Mass, RCC, WCC, Hc, Hg, Ferr, BMI, SSF, Bfat)
data(athletesR)
data(athletesR)
A matrix of correlations
Weisberg (2005), file ais.txt
Weisberg, S. (2005) Applied Linear Regression. Third edition, John Wiley & Sons, New Jersey.
The Swiss banknote data consist of six measures taken on 200 banknotes, of which 100 are counterfeits, and 100 are normal.
data("banknotes")
data("banknotes")
A data frame with 200 observations on the following 7 variables.
Length
Banknote length
Left
Left width
Right
Right width
Bottom
Bottom margin
Top
Top margin
Diagonal
Length of the diagonal of the image
Counterfeit
0 = normal, 1 = counterfeit
Weisberg, S. (2005) Applied Linear Regression. Third edition. John Wiley & Sons, New Jersey.
data(banknotes)
data(banknotes)
Correlation matrix for sex, height and weight at age 2, 9 and 18 and somatotype
data(berkeleyR)
data(berkeleyR)
A matrix of correlations
Weisberg (2005), file BGSBoys.txt
Weisberg, S. (2005) Applied Linear Regression. Third edition, John Wiley & Sons, New Jersey.
Correlation between nave height and total length
data(cathedralsR)
data(cathedralsR)
A matrix of correlations
Weisberg (2005), file cathedral.txt
Weisberg, S. (2005) Applied Linear Regression. Third edition, John Wiley & Sons, New Jersey.
correlogram
plots a correlogram for a correlation matrix.
correlogram(R,labs=colnames(R),ifun="cos",cex=1,main="",ntrials=50, xlim=c(-1.2,1.2),ylim=c(-1.2,1.2),pos=NULL,...)
correlogram(R,labs=colnames(R),ifun="cos",cex=1,main="",ntrials=50, xlim=c(-1.2,1.2),ylim=c(-1.2,1.2),pos=NULL,...)
R |
a correlation matrix. |
labs |
a vector of labels for the variables. |
ifun |
the interpretation function ("cos" or "lincos") |
cex |
character expansion factor for the variable labels |
main |
a title for the correlogram |
ntrials |
number of starting points for the optimization routine |
xlim |
limits for the x axis (e.g. c(-1.2,1.2)) |
ylim |
limits for the y axis (e.g. c(-1.2,1.2)) |
pos |
if specified, overrules the calculated label positions for the variables. |
... |
additional arguments for the |
correlogram
makes a correlogram on the basis of a set of
angles. All angles are given w.r.t the positive x-axis. Variables are
represented by unit vectors emanating from the origin.
A vector of angles
Jan Graffelman ([email protected])
Trosset, M.W. (2005) Visualizing correlation. Journal of Computational and Graphical Statistics 14(1), pp. 1–19
X <- matrix(rnorm(90),ncol=3) R <- cor(X) angles <- correlogram(R)
X <- matrix(rnorm(90),ncol=3) R <- cor(X) angles <- correlogram(R)
Correlations between infant mortality, educational and demographic variables (infd, phys, dens, agds, lit, hied, gnp)
data(countriesR)
data(countriesR)
A matrix of correlations
Chatterjee and Hadi (1988)
Chatterjee, S. and Hadi, A.S. (1988), Sensitivity Analysis in Regression. Wiley, New York.
fit_angles
finds a set of optimal angles for representing a
particular correlation matrix by angles between vectors
fit_angles(R, ifun = "cos", ntrials = 10, verbose = FALSE)
fit_angles(R, ifun = "cos", ntrials = 10, verbose = FALSE)
R |
a correlation matrix. |
ifun |
an angle interpretation function (cosine, by default). |
ntrials |
number of trials for optimization routine |
verbose |
be silent (FALSE), or produce more output (TRUE) |
a vector of angles (in radians)
anonymous
Trosset, M.W. (2005) Visualizing correlation. Journal of Computational and Graphical Statistics 14(1), pp. 1–19
X <- matrix(rnorm(90),ncol=3) R <- cor(X) angles <- fit_angles(R) print(angles)
X <- matrix(rnorm(90),ncol=3) R <- cor(X) angles <- fit_angles(R) print(angles)
Program FitRDeltaQSym
calculates a low rank factorization for a correlation matrix. It adjusts for column effects, and the approximation is therefore asymmetric.
FitRDeltaQSym(R, W = NULL, nd = 2, eps = 1e-10, delta = 0, q = colMeans(R), itmax.inner = 1000, itmax.outer = 1000, verbose = FALSE)
FitRDeltaQSym(R, W = NULL, nd = 2, eps = 1e-10, delta = 0, q = colMeans(R), itmax.inner = 1000, itmax.outer = 1000, verbose = FALSE)
R |
A correlation matrix |
W |
A weight matrix (optional) |
nd |
The rank of the low rank approximation |
eps |
The convergence criterion |
delta |
Initial value for the scalar adjustment (zero by default) |
q |
Initial values for the column adjustments (random by default) |
itmax.inner |
Maximum number of iterations for the inner loop of the algorithm |
itmax.outer |
Maximum number of iterations for the outer loop of the algorithm |
verbose |
Print information or not |
Program FitRDeltaQSym
implements an iterative algorithm for the low rank factorization of the correlation matrix. It decomposes the correlation matrix as R = delta J + 1 q' + G G' + E. The approximation of R is ultimately asymmetric, but the low rank factorization used for biplotting (G G') is symmetric.
A list object with fields:
delta |
The final scalar adjustment |
Rhat |
The final approximation to the correlation matrix |
C |
The matrix of biplot vectors |
rmse |
The root mean squared error |
q |
The final column adjustments |
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. Available online as latest article doi:10.1080/00031305.2023.2186952
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) W <- matrix(1, 7, 7) diag(W) <- 0 out.sym <- FitRDeltaQSym(R, W, eps=1e-6) Rhat <- out.sym$Rhat
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) W <- matrix(1, 7, 7) diag(W) <- 0 out.sym <- FitRDeltaQSym(R, W, eps=1e-6) Rhat <- out.sym$Rhat
Function FitRwithPCAandWALS
uses principal component analysis (PCA) and weighted alternating least squares (WALS) to
calculate different low-rank approximations to the correlation matrix.
FitRwithPCAandWALS(R, nd = 2, itmaxout = 10000, itmaxin = 10000, eps = 1e-08)
FitRwithPCAandWALS(R, nd = 2, itmaxout = 10000, itmaxin = 10000, eps = 1e-08)
R |
The correlation matrix |
nd |
The dimensionality of the low-rank solution (2 by default) |
itmaxout |
Maximum number of iterations for the outer loop of the algorithm |
itmaxin |
Maximum number of iterations for the inner loop of the algorithm |
eps |
Numerical criterion for convergence of the outer loop |
Four methods are run succesively: standard PCA; PCA with an additive adjustment; WALS avoiding the fit of the diagonal; WALS avoiding the fit of the diagonal and with an additive adjustment.
A list object with fields:
Rhat.pca |
Low-rank approximation obtained by PCA |
Rhat.pca.adj |
Low-rank approximation obtained by PCA with adjustment |
Rhat.wals |
Low-rank approximation obtained by WALS without fitting the diagonal |
Rhat.wals.adj |
Low-rank approximation obtained by WALS without fitting the diagonal and with adjustment |
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. Available online as latest article doi:10.1080/00031305.2023.2186952
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) ## Not run: out <- FitRwithPCAandWALS(R) ## End(Not run)
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) ## Not run: out <- FitRwithPCAandWALS(R) ## End(Not run)
Correlations of 13 fysiological variables (sys, dia, p.p., pul, cort, u.v., tot/100, adr/100, nor/100, adr/tot, tot/hr, adr/hr, nor/hr) obtained from 48 medical students
data(fysiologyR)
data(fysiologyR)
A matrix of correlations
Hills (1969), Table 1.
Hills, M (1969) On looking at large correlation matices Biometrika 56(2): pp. 249.
Function ggbiplot
creates a biplot of a matrix with ggplot2 graphics.
ggbplot(A, B, main = "", circle = TRUE, xlab = "", ylab = "", main.size = 8, xlim = c(-1, 1), ylim = c(-1, 1), rowcolor = "red", rowch = 1, colcolor = "blue", colch = 1, rowarrow = FALSE, colarrow = TRUE)
ggbplot(A, B, main = "", circle = TRUE, xlab = "", ylab = "", main.size = 8, xlim = c(-1, 1), ylim = c(-1, 1), rowcolor = "red", rowch = 1, colcolor = "blue", colch = 1, rowarrow = FALSE, colarrow = TRUE)
A |
A dataframe with coordinates and names for the biplot row markers |
B |
A dataframe with coordinates and names for the biplot column markers |
main |
A title for the biplot |
circle |
Draw a unit circle ( |
xlab |
The label for the x axis |
ylab |
The label for the y axis |
main.size |
Size of the main title |
xlim |
Limits for the horizontal axis |
ylim |
Limits for the vertical axis |
rowcolor |
Color used for the row markers |
rowch |
Symbol used for the row markers |
colcolor |
Color used for the column markers |
colch |
Symbol used for the column markers |
rowarrow |
Draw arrows from the origin to the row markers ( |
colarrow |
Draw arrows from the origin to the column markers ( |
Dataframes A
and B
must consists of three columns labeled "PA1", "PA2" (coordinates of the first and second principal axis) and a column "strings" with the labels for the coordinates.
Dataframe B
is optional. If it is not specified, a biplot with a single set of markers is constructed, for which the row settings must be specified.
A ggplot2 object
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) On the visualisation of the correlation matrix. Available online. doi:10.48550/arXiv.2211.13150
data("HeartAttack") X <- as.matrix(HeartAttack[,1:7]) n <- nrow(X) Xt <- scale(X)/sqrt(n-1) res.svd <- svd(Xt) Fs <- sqrt(n)*res.svd$u # standardized principal components Gp <- crossprod(t(res.svd$v),diag(res.svd$d)) # biplot coordinates for variables rows.df <- data.frame(Fs[,1:2],as.character(1:n)) colnames(rows.df) <- c("PA1","PA2","strings") cols.df <- data.frame(Gp[,1:2],colnames(X)) colnames(cols.df) <- c("PA1","PA2","strings") ggbplot(rows.df,cols.df,xlab="PA1",ylab="PA2",main="PCA")
data("HeartAttack") X <- as.matrix(HeartAttack[,1:7]) n <- nrow(X) Xt <- scale(X)/sqrt(n-1) res.svd <- svd(Xt) Fs <- sqrt(n)*res.svd$u # standardized principal components Gp <- crossprod(t(res.svd$v),diag(res.svd$d)) # biplot coordinates for variables rows.df <- data.frame(Fs[,1:2],as.character(1:n)) colnames(rows.df) <- c("PA1","PA2","strings") cols.df <- data.frame(Gp[,1:2],colnames(X)) colnames(cols.df) <- c("PA1","PA2","strings") ggbplot(rows.df,cols.df,xlab="PA1",ylab="PA2",main="PCA")
Function ggcorrelogram
creates a correlogram of a correlation matrix using ggplot graphics.
ggcorrelogram(R, labs = colnames(R), ifun = "cos", cex = 1, main = "", ntrials = 50, xlim = c(-1.2, 1.2), ylim = c(-1.2, 1.2), hjust = 1, vjust = 2, size = 2, main.size = 8)
ggcorrelogram(R, labs = colnames(R), ifun = "cos", cex = 1, main = "", ntrials = 50, xlim = c(-1.2, 1.2), ylim = c(-1.2, 1.2), hjust = 1, vjust = 2, size = 2, main.size = 8)
R |
a correlation matrix |
labs |
a vector of labels for the variables |
ifun |
the interpretation function ("cos" or "lincos") |
cex |
character expansion factor for the variable labels |
main |
a title for the correlogram |
ntrials |
number of starting points for the optimization routine |
xlim |
limits for the x axis (e.g. c(-1.2,1.2)) |
ylim |
limits for the y axis (e.g. c(-1.2,1.2)) |
hjust |
horizontal adjustment of variable labels (by default 1 for all variables) |
vjust |
vertical adjustment of variable labels (by default 2 for all variables) |
size |
font size for the labels of the variables |
main.size |
font size of the main title of the correlogram |
ggcorrelogram
makes a correlogram on the basis of a set of
angles. All angles are given w.r.t the positive x-axis. Variables are
represented by unit vectors emanating from the origin.
A ggplot object. Field theta
of the output contains the angles for the variables.
Jan Graffelman ([email protected])
Trosset, M.W. (2005) Visualizing correlation. Journal of Computational and Graphical Statistics 14(1), pp. 1–19
set.seed(123) X <- matrix(rnorm(90),ncol=3) R <- cor(X) angles <- ggcorrelogram(R)
set.seed(123) X <- matrix(rnorm(90),ncol=3) R <- cor(X) angles <- ggcorrelogram(R)
Function ggtally
puts a series of dots along a biplot vector of a correlation matrix,
so marking the change in correlation along the vector with specified values.
ggtally(G, p1, adj = 0, values = seq(-1, 1, by = 0.2), dotsize = 0.1, dotcolour = "black")
ggtally(G, p1, adj = 0, values = seq(-1, 1, by = 0.2), dotsize = 0.1, dotcolour = "black")
G |
A matrix (or vector) of biplot markers |
p1 |
A ggplot2 object with a biplot |
adj |
A scalar adjustment for the correlations |
values |
Values of the correlations to be marked off by dots |
dotsize |
Size of the dot |
dotcolour |
Colour of the dot |
Any set of values for the correlation to be marked off can be used, though a standard scale with 0.2 increments is recommmended.
A ggplot2 object with the updated biplot
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) On the visualisation of the correlation matrix. Available online. doi:10.48550/arXiv.2211.13150
library(calibrate) data(goblets) R <- cor(goblets) out.sd <- eigen(R) V <- out.sd$vectors[,1:2] Dl <- diag(out.sd$values[1:2]) Gp <- crossprod(t(V),sqrt(Dl)) pca.df <- data.frame(Gp) pca.df$strings <- colnames(R) colnames(pca.df) <- c("PA1","PA2","strings") p1 <- ggbplot(pca.df,pca.df,main="PCA correlation biplot",xlab="",ylab="",rowarrow=TRUE, rowcolor="blue",rowch="",colch="") p1 <- ggtally(Gp,p1,values=seq(-0.2,0.6,by=0.2),dotsize=0.1)
library(calibrate) data(goblets) R <- cor(goblets) out.sd <- eigen(R) V <- out.sd$vectors[,1:2] Dl <- diag(out.sd$values[1:2]) Gp <- crossprod(t(V),sqrt(Dl)) pca.df <- data.frame(Gp) pca.df$strings <- colnames(R) colnames(pca.df) <- c("PA1","PA2","strings") p1 <- ggbplot(pca.df,pca.df,main="PCA correlation biplot",xlab="",ylab="",rowarrow=TRUE, rowcolor="blue",rowch="",colch="") p1 <- ggtally(Gp,p1,values=seq(-0.2,0.6,by=0.2),dotsize=0.1)
Correlations between 6 size measurements of archeological goblets
data(gobletsR)
data(gobletsR)
A matrix of correlations
Manly (1989)
Manly, B.F.J. (1989) Multivariate statistical methods: a primer. Chapman and Hall, London.
Data set consisting of 101 observations of patients who suffered a heart attack.
data("HeartAttack")
data("HeartAttack")
A data frame with 101 observations on the following 8 variables.
Pulse
Pulse
CI
Cardiac index
SI
Systolic index
DBP
Diastolic blood pressure
PA
Pulmonary artery pressure
VP
Ventricular pressure
PR
Pulmonary resistance
Status
Deceased or survived
Table 18.1, (Saporta 1990, pp. 452–454)
Saporta, G. (1990) Probabilites analyse des donnees et statistique. Paris, Editions technip
data(HeartAttack) str(HeartAttack)
data(HeartAttack) str(HeartAttack)
Function ipSymLS
implements an alternating least squares algorithm that uses both decomposition and block relaxation
to find the optimal positive semidefinite approxation of given rank p to a known symmetric matrix of order n.
ipSymLS(target, w = matrix(1, dim(target)[1], dim(target)[2]), ndim = 2, init = FALSE, itmax = 100, eps = 1e-06, verbose = FALSE)
ipSymLS(target, w = matrix(1, dim(target)[1], dim(target)[2]), ndim = 2, init = FALSE, itmax = 100, eps = 1e-06, verbose = FALSE)
target |
Symmetric matrix to be approximated |
w |
Matrix of weights |
ndim |
Number of dimensions extracted (2 by default) |
init |
Initial value for the solution (optional; if supplied should be a matrix of dimensions |
itmax |
Maximum number of iterations |
eps |
Tolerance criterion for convergence |
verbose |
Show the iteration history ( |
A matrix with the coordinates for the variables
De Leeuw, J. (2006) A decomposition method for weighted least squares low-rank approximation of symmetric matrices. Department of Statistics, UCLA. Retrieved from https://escholarship.org/uc/item/1wh197mh
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. Available online as latest article doi:10.1080/00031305.2023.2186952
data(banknotes) R <- cor(banknotes) W <- matrix(1,nrow(R),nrow(R)) diag(W) <- 0 Fp.als <- ipSymLS(R,w=W,verbose=TRUE,eps=1e-15) Rhat.als <- Fp.als%*%t(Fp.als)
data(banknotes) R <- cor(banknotes) W <- matrix(1,nrow(R),nrow(R)) diag(W) <- 0 Fp.als <- ipSymLS(R,w=W,verbose=TRUE,eps=1e-15) Rhat.als <- Fp.als%*%t(Fp.als)
jointlim computes a sensible range for x and y axis if two sets of points are to be plotted simultaneously
jointlim(X, Y)
jointlim(X, Y)
X |
Matrix of coordinates |
Y |
Matrix of coordinates |
xlim |
minimum and maximum for x-range |
ylim |
minimum and maximum for y-range |
Jan Graffelman ([email protected])
X <- matrix(runif(20),ncol=2) Y <- matrix(runif(20),ncol=2) print(jointlim(X,Y)$xlim)
X <- matrix(runif(20),ncol=2) Y <- matrix(runif(20),ncol=2) print(jointlim(X,Y)$xlim)
Keller
calculates a rank p approximation to a correlation matrix according to Keller's method.
Keller's method is based on iterated eigenvalue decompositions that are used to adjust the diagonal of the correlation matrix.
Keller(R, eps = 1e-06, nd = 2, itmax = 10)
Keller(R, eps = 1e-06, nd = 2, itmax = 10)
R |
A correlation matrix |
eps |
Numerical criterion for convergence (default |
nd |
Number of dimensions used in the spectral decomposition (default |
itmax |
The maximum number of iterations |
A matrix containing the approximation to the correlation matrix-
Jan Graffelman ([email protected])
Keller, J.B. (1962) Factorization of Matrices by Least-Squares. Biometrika, 49(1 and 2) pp. 239–242.
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. Available online as latest article doi:10.1080/00031305.2023.2186952
data(Kernels) R <- cor(Kernels) Rhat <- Keller(R)
data(Kernels) R <- cor(Kernels) Rhat <- Keller(R)
Wheat kernel data set taken from the UCI Machine Learning Repository
data("Kernels")
data("Kernels")
A data frame with 210 observations on the following 8 variables.
area
Area of the kernel
perimeter
Perimeter of the kernel
compactness
Compactness (C = 4*pi*A/P^2)
length
Length of the kernel
width
Width of the kernel
asymmetry
Asymmetry coefficient
groove
Length of the groove of the kernel
variety
Variety (1=Kama, 2=Rosa, 3=Canadian)
https://archive.ics.uci.edu/ml/datasets/seeds
M. Charytanowicz, J. Niewczas, P. Kulczycki, P.A. Kowalski, S. Lukasik, S. Zak, A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images. in: Information Technologies in Biomedicine, Ewa Pietka, Jacek Kawa (eds.), Springer-Verlag, Berlin-Heidelberg, 2010, pp. 15-24.
data(Kernels)
data(Kernels)
linangplot
produces a plot of two variables, such that the correlation between the two variables is linear in the angle.
linangplot(x, y, tmx = NULL, tmy = NULL, ...)
linangplot(x, y, tmx = NULL, tmy = NULL, ...)
x |
x variable |
y |
y variable |
tmx |
vector of tickmarks for the x variable |
tmy |
vector of tickmarks for the y variable |
... |
additional arguments for the plot routine |
Xt |
coordinates of the points |
B |
axes for the plot |
r |
correlation coefficient |
angledegrees |
angle between axes in degrees |
angleradians |
angle between axes in radians |
r |
correlation coefficient |
Jan Graffelman ([email protected])
x <- runif(10) y <- rnorm(10) linangplot(x,y)
x <- runif(10) y <- rnorm(10) linangplot(x,y)
Function lincos
linearizes the cosine function over the interval
[0,2pi]. The function returns -2/pi*x + 1 over [0,pi] and 2/pi*x - 3
over [pi,2pi]
lincos(x)
lincos(x)
x |
angle in radians |
a real number in [-1,1].
Jan Graffelman ([email protected])
Graffelman, J. (2012) Linear-angle correlation plots: new graphs for revealing correlation structure. Journal of Computational and Graphical Statistics. 22(1): 92-106.
angle <- pi y <- lincos(angle) print(y)
angle <- pi y <- lincos(angle) print(y)
pco
is a program for Principal Coordinate Analysis.
pco(Dis)
pco(Dis)
Dis |
A distance or dissimilarity matrix |
The program pco
does a principal coordinates analysis of a
dissimilarity (or distance) matrix (Dij) where the diagonal elements,
Dii, are zero.
Note that when we dispose of a similarity matrix rather that a distance matrix, a transformation is needed before calling coorprincipal. For instance, if Sij is a similarity matrix, Dij might be obtained as Dij = 1 - Sij/diag(Sij)
Goodness of fit calculations need to be revised such as to deal (in different ways) with negative eigenvalues.
PC |
the principal coordinates |
Dl |
all eigenvalues of the solution |
Dk |
the positive eigenvalues of the solution |
B |
double centred matrix for the eigenvalue decomposition |
decom |
the goodness of fit table |
Jan Graffelman ([email protected])
citynames <- c("Aberystwyth","Brighton","Carlisle","Dover","Exeter","Glasgow","Hull", "Inverness","Leeds","London","Newcastle", "Norwich") A <-matrix(c( 0,244,218,284,197,312,215,469,166,212,253,270, 244,0,350,77,167,444,221,583,242,53,325,168, 218,350,0,369,347,94,150,251,116,298,57,284, 284,77,369,0,242,463,236,598,257,72,340,164, 197,167,347,242,0,441,279,598,269,170,359,277, 312,444,94,463,441,0,245,169,210,392,143,378, 215,221,150,236,279,245,0,380,55,168,117,143, 469,583,251,598,598,169,380,0,349,531,264,514, 166,242,116,257,269,210,55,349,0,190,91,173, 212,53,298,72,170,392,168,531,190,0,273,111, 253,325,57,340,359,143,117,264,91,273,0,256, 270,168,284,164,277,378,143,514,173,111,256,0),ncol=12) rownames(A) <- citynames colnames(A) <- citynames out <- pco(A) plot(out$PC[,2],-out$PC[,1],pch=19,asp=1) textxy(out$PC[,2],-out$PC[,1],rownames(A))
citynames <- c("Aberystwyth","Brighton","Carlisle","Dover","Exeter","Glasgow","Hull", "Inverness","Leeds","London","Newcastle", "Norwich") A <-matrix(c( 0,244,218,284,197,312,215,469,166,212,253,270, 244,0,350,77,167,444,221,583,242,53,325,168, 218,350,0,369,347,94,150,251,116,298,57,284, 284,77,369,0,242,463,236,598,257,72,340,164, 197,167,347,242,0,441,279,598,269,170,359,277, 312,444,94,463,441,0,245,169,210,392,143,378, 215,221,150,236,279,245,0,380,55,168,117,143, 469,583,251,598,598,169,380,0,349,531,264,514, 166,242,116,257,269,210,55,349,0,190,91,173, 212,53,298,72,170,392,168,531,190,0,273,111, 253,325,57,340,359,143,117,264,91,273,0,256, 270,168,284,164,277,378,143,514,173,111,256,0),ncol=12) rownames(A) <- citynames colnames(A) <- citynames out <- pco(A) plot(out$PC[,2],-out$PC[,1],pch=19,asp=1) textxy(out$PC[,2],-out$PC[,1],rownames(A))
Heights of 1375 mothers and daughters (in cm) in the UK in 1893-1898.
data(PearsonLee)
data(PearsonLee)
dataframe with Mheight and Dheight
Weisberg, Chapter 1
Weisberg, S. (2005) Applied Linear Regression, John Wiley & Sons, New Jersey
Program pfa
performs (iterative) principal factor analysis, which
is based on the computation of eigenvalues of the reduced correlation matrix.
pfa(X, option = "data", m = 2, initial.communality = "R2", crit = 0.001, verbose = FALSE)
pfa(X, option = "data", m = 2, initial.communality = "R2", crit = 0.001, verbose = FALSE)
X |
A data matrix or correlation matrix |
option |
Specifies the type of matrix supplied by argument
|
m |
The number of factors to extract (2 by default) |
initial.communality |
Method for computing initial
communalites. Possibilities are |
crit |
The criterion for convergence. The default is
|
verbose |
When set to |
Res |
Matrix of residuals |
Psi |
Diagonal matrix with specific variances |
La |
Matrix of loadings |
Shat |
Estimated correlation matrix |
Fs |
Factor scores |
Jan Graffelman ([email protected])
Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate analysis.
Rencher, A.C. (1995) Methods of multivriate analysis.
Satorra, A. and Neudecker, H. (1998) Least-Squares Approximation of off-Diagonal Elements of a Variance Matrix in the Context of Factor Analysis. Econometric Theory 14(1) pp. 156–157.
X <- matrix(rnorm(100),ncol=2) out.pfa <- pfa(X) # based on a correlation matrix R <- cor(X) out.pfa <- pfa(R,option="cor")
X <- matrix(rnorm(100),ncol=2) out.pfa <- pfa(X) # based on a correlation matrix R <- cor(X) out.pfa <- pfa(R,option="cor")
Correlations between sources of protein for a number of countries (Red meat, White meat, Eggs, Milk, Fish, Cereals, Starchy food, Nuts, Fruits and vegetables.
data(proteinR)
data(proteinR)
A matrix of correlations
Manly (1989)
Manly, B.F.J. (1989) Multivariate statistical methods: a primer. Chapman and Hall, London.
Correlations between sources of protein for a number of countries (Red meat, White meat, Eggs, Milk, Fish, Cereals, Starchy food, Nuts, Fruits and vegetables.
data(proteinR)
data(proteinR)
A matrix of correlations
Manly (1989)
Manly, B.F.J. (1989) Multivariate statistical methods: a primer. Chapman and Hall, London.
Correlations between national track records for men (100m,200m,400m,800m,1500m,5000m,10.000m and Marathon
data(recordsR)
data(recordsR)
A matrix of correlations
Johnson and Wichern, Table 8.6
Johnson, R.A. and Wichern, D.W. (2002) Applied Multivariate Statistical Analysis. Fifth edition. New Jersey: Prentice Hall.
Program rmse
calculates the RMSE for a matrix approximation.
rmse(R, Rhat, W = matrix(1, nrow(R), ncol(R)) - diag(nrow(R)), verbose = FALSE, per.variable = FALSE)
rmse(R, Rhat, W = matrix(1, nrow(R), ncol(R)) - diag(nrow(R)), verbose = FALSE, per.variable = FALSE)
R |
The original matrix |
Rhat |
The approximating matrix |
W |
A symmetric matrix of weights |
verbose |
Print output ( |
per.variable |
Calculate the RMSE for the whole matrix ( |
By default, function rmse
assumes a symmetric correlation matrix as input, together with its approximation. The approximation does not need to be symmetric.
Weight matrix W has to be symmetric. By default, the diagonal is excluded from RMSE calcuations (W = J - I). To include it, specify W = J, that is set W = matrix(1, nrow(R), ncol(R))
the calculated rmse
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. doi:10.1080/00031305.2023.2186952
data(banknotes) X <- as.matrix(banknotes[,1:6]) p <- ncol(X) J <- matrix(1,p,p) R <- cor(X) out.sd <- eigen(R) V <- out.sd$vectors Dl <- diag(out.sd$values) V2 <- V[,1:2] D2 <- Dl[1:2,1:2] Rhat <- V2%*%D2%*%t(V2) rmse(R,Rhat,W=J)
data(banknotes) X <- as.matrix(banknotes[,1:6]) p <- ncol(X) J <- matrix(1,p,p) R <- cor(X) out.sd <- eigen(R) V <- out.sd$vectors Dl <- diag(out.sd$values) V2 <- V[,1:2] D2 <- Dl[1:2,1:2] Rhat <- V2%*%D2%*%t(V2) rmse(R,Rhat,W=J)
Function rmsePCAandWALS
creates table with the RMSE for each variable, for a low-rank
approximation to the correlation matrix obtained by PCA or WALS.
rmsePCAandWALS(R, output, digits = 4, omit.diagonals = c(FALSE,FALSE,TRUE,TRUE))
rmsePCAandWALS(R, output, digits = 4, omit.diagonals = c(FALSE,FALSE,TRUE,TRUE))
R |
The correlation matrix |
output |
A list object with four approximationst to the correlation matrix |
digits |
The number of digits used in the output |
omit.diagonals |
Vector of four logicals for omitting the diagonal of the correlation matrix for RMSE calculations. Defaults to c(FALSE,FALSE,TRUE,TRUE), to include the diagonal for PCA and exclude it for WALS |
A matrix with one row per variable and four columns for RMSE statistics.
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. doi:10.1080/00031305.2023.2186952
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) ## Not run: out <- FitRwithPCAandWALS(R) Results <- rmsePCAandWALS(R,out) ## End(Not run)
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) ## Not run: out <- FitRwithPCAandWALS(R) Results <- rmsePCAandWALS(R,out) ## End(Not run)
Danish data from 1953-1977 giving the correlations between nesting storks, human birth rate and per capita electricity consumption.
data(storksR)
data(storksR)
A matrix of correlations
Gabriel and Odoroff, Table 1.
Gabriel, K. R. and Odoroff, C. L. (1990) Biplots in biomedical research. Statistics in Medicine 9(5): pp. 469-485.
Matrix of marks for five exams, two with closed books and three with open books (Mechanics (C), Vectors (C), Algebra (O), Analysis (O) and Statistics (O)).
data(students)
data(students)
A data matrix
Mardia et al., Table 1.2.1
Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis, Academic Press London.
Correlation matrix of marks for five exams, two with closed books and three with open books (Mechanics (C), Vectors (C), Algebra (O), Analysis (O) and Statistics (O)).
data(studentsR)
data(studentsR)
A matrix of correlations
Mardia et al., Table 1.2.1
Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis, Academic Press London.
Function tally
marks of a set of dots on a biplot vector. It is thought for biplot vectors representing correlations,
such that their correlation scale becomes visible, without doing a full calibration with tick marks and tick mark labels.
tally(G, adj = 0, values = seq(-1, 1, by = 0.2), pch = 19, dotcolor = "black", cex = 0.5, color.negative = "red", color.positive = "blue")
tally(G, adj = 0, values = seq(-1, 1, by = 0.2), pch = 19, dotcolor = "black", cex = 0.5, color.negative = "red", color.positive = "blue")
G |
Matrix with biplot coordinates of the variables |
adj |
A scalar adjustment for the correlations |
values |
The values of the correlations to be marked off by dots |
pch |
The character code used for marking off correlations |
dotcolor |
The colour of the dots that are marked off |
cex |
The character expansion factor for a dot. |
color.negative |
The colour of the segments of the negative part of the correlation scale |
color.positive |
The colour of the segments of the positive part of the correlation scale |
NULL
Jan Graffelman ([email protected])
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. doi:10.1080/00031305.2023.2186952
data(goblets) R <- cor(goblets) results <- eigen(R) V <- results$vectors Dl <- diag(results$values) # # Calculate correlation biplot coordinates # G <- crossprod(t(V[,1:2]),sqrt(Dl[1:2,1:2])) # # Make the biplot # bplot(G,G,rowch=NA,colch=NA,collab=colnames(R), xl=c(-1.1,1.1),yl=c(-1.1,1.1)) # # Create a correlation tally stick for variable X1 # tally(G[1,])
data(goblets) R <- cor(goblets) results <- eigen(R) V <- results$vectors Dl <- diag(results$values) # # Calculate correlation biplot coordinates # G <- crossprod(t(V[,1:2]),sqrt(Dl[1:2,1:2])) # # Make the biplot # bplot(G,G,rowch=NA,colch=NA,collab=colnames(R), xl=c(-1.1,1.1),yl=c(-1.1,1.1)) # # Create a correlation tally stick for variable X1 # tally(G[1,])
tr
computes the trace of a matrix.
tr(X)
tr(X)
X |
a (square) matrix |
the trace (a scalar)
Jan Graffelman ([email protected])
X <- matrix(runif(25),ncol=5) print(X) print(tr(X))
X <- matrix(runif(25),ncol=5) print(X) print(tr(X))
Function wAddPCA
calculates a weighted least squares approximation of low rank to a given matrix.
wAddPCA(x, w = matrix(1, nrow(x), ncol(x)), p = 2, add = "all", bnd = "opt", itmaxout = 1000, itmaxin = 1000, epsout = 1e-06, epsin = 1e-06, verboseout = TRUE, verbosein = FALSE)
wAddPCA(x, w = matrix(1, nrow(x), ncol(x)), p = 2, add = "all", bnd = "opt", itmaxout = 1000, itmaxin = 1000, epsout = 1e-06, epsin = 1e-06, verboseout = TRUE, verbosein = FALSE)
x |
The data matrix to be approximated |
w |
The weight matrix |
p |
The dimensionality of the low-rank solution (2 by default) |
add |
The additive adjustment to be employed. Can be "all" (default), "nul" (no adjustment), "one" (adjustment by a single scalar), "row" (adjustment by a row) or "col" (adjustment by a column). |
bnd |
Can be "opt" (default), "all", "row" or "col". |
itmaxout |
Maximum number of iterations for the outer loop of the algorithm |
itmaxin |
Maximum number of iterations for the inner loop of the algorithm |
epsout |
Numerical criterion for convergence of the outer loop |
epsin |
Numerical criterion for convergence of the inner loop |
verboseout |
Be verbose on the outer loop iterations |
verbosein |
Be verbose on the inner loop iterations |
A list object with fields:
a |
The left matrix (A) of the factorization X = AB' |
b |
The right matrix (B) of the factorization X = AB' |
z |
The product AB' |
f |
The final value of the loss function |
u |
Vector for rows used to construct rank 1 weights |
v |
Vector for columns used to construct rank 1 weights |
p |
The vector with row adjustments |
q |
The vector with column adjustments |
itel |
Iterations needed for convergence |
delta |
The additive adjustment |
y |
The low-rank approximation to |
Graffelman, J. and De Leeuw, J. (2023) Improved approximation and visualization of the correlation matrix. The American Statistician pp. 1–20. Available online as latest article doi:10.1080/00031305.2023.2186952
https://jansweb.netlify
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) W <- matrix(1, 7, 7) diag(W) <- 0 Wals.out <- wAddPCA(R, W, add = "nul", verboseout = FALSE) Rhat <- Wals.out$y
data(HeartAttack) X <- HeartAttack[,1:7] X[,7] <- log(X[,7]) colnames(X)[7] <- "logPR" R <- cor(X) W <- matrix(1, 7, 7) diag(W) <- 0 Wals.out <- wAddPCA(R, W, add = "nul", verboseout = FALSE) Rhat <- Wals.out$y