Title: | Multivariate Tools for Compositional Data Analysis |
---|---|
Description: | Provides functions for multivariate analysis with compositional data. Includes a function for doing compositional canonical correlation analysis. This analysis requires two data matrices of compositions, which can be adequately transformed and used as entries in a specialized program for canonical correlation analysis, that is able to deal with singular covariance matrices. The methodology is described in Graffelman et al. (2017) <doi:10.1101/144584>. Functions for log-ratio principal component analysis with condition number computations and log-ratio discriminant analysis have been added to the package. |
Authors: | Jan Graffelman [aut, cre] |
Maintainer: | Jan Graffelman <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1.0 |
Built: | 2025-01-09 13:00:33 UTC |
Source: | https://github.com/cran/ToolsForCoDa |
The list object Artificial
contains two data frames of 3-part
compositions. The data refer to the example in Section 3.1 of
Graffelman et al. (2017)
data(Artificial)
data(Artificial)
A list containing two data frames containing 100 observations.
Laird, N. M. and Lange, C. Table 7.11, p. 124
Graffelman, J., Pawlowsky-Glahn, V., Egozcue, J.J. and Buccianti, A. (2017) Compositional Canonical Correlation Analysis.
The data consists of 14 geological samples from the US with their major oxide composition (SiO2, Al2O3, Fe2O3, MnO, MgO, CaO, K2O, Na2O and H2O+) and delta Deuterium and delta-18-Oxysgen (dD,d18O).
data("bentonites")
data("bentonites")
A data frame with 14 observations on the following 11 variables.
Si
a numeric vector
Al
a numeric vector
Fe
a numeric vector
Mn
a numeric vector
Mg
a numeric vector
Ca
a numeric vector
K
a numeric vector
Na
a numeric vector
H20
a numeric vector
dD
a numeric vector
d18O
a numeric vector
Cadrin, A.A.J (1995), Tables 1 and 2. Reyment, R. A. and Savazzi, E. (1999), pp. 220-222.
Cadrin, A.A.J., Kyser, T.K., Caldwell, W.G.E. and Longstaffe, F.J. (1995) Isotopic and chemical compositions of bentonites as paleoenvironmental indicators of the Cretaceous Western Interior Seaway Palaeogeography, Palaeoclimatology, Palaeoecology 119 pp. 301–320.
Reyment, R. A. and Savazzi, E. (1999) Aspects of Multivariate Statistical Analysis in Geology, Elsevier Science B.V., Amsterdam.
data(bentonites)
data(bentonites)
Function canocov
performs a canonical correlation analysis. It
operates on raw data matrices, which are only centered in the
program. It uses generalized inverses and can deal with structurally
singular covariance matrices.
canocov(X, Y)
canocov(X, Y)
X |
The n times p X matrix of observations |
Y |
The n times q Y matrix of observations |
canocov
computes the solution by a singular value
decomposition of the transformed between set covariance matrix.
Returns a list with the following results
ccor |
the canonical correlations |
A |
canonical weights of the X variables |
B |
canonical weights of the Y variables |
U |
canonical X variates |
V |
canonical Y variates |
Fs |
biplot markers for X variables (standard coordinates) |
Gs |
biplot markers for Y variables (standard coordinates) |
Fp |
biplot markers for X variables (principal coordinates) |
Gp |
biplot markers for Y variables (principal coordinates) |
Rxu |
canonical loadings, (correlations X variables, canonical X variates) |
Rxv |
canonical loadings, (correlations X variables, canonical Y variates) |
Ryu |
canonical loadings, (correlations Y variables, canonical X variates) |
Ryv |
canonical loadings, (correlations Y variables, canonical Y variates) |
Sxu |
covariance X variables, canonical X variates |
Sxv |
covariance X variables, canonical Y variates |
Syu |
covariance Y variables, canonical X variates |
Syv |
covariance Y variables, canonical Y variates |
fitRxy |
goodness of fit of the between-set correlation matrix |
fitXs |
adequacy coefficients of X variables |
fitXp |
redundancy coefficients of X variables |
fitYs |
adequacy coefficients of Y variables |
fitYp |
redundancy coefficients of Y variables |
Jan Graffelman [email protected]
Hotelling, H. (1935) The most predictable criterion. Journal of Educational Psychology (26) pp. 139-142.
Hotelling, H. (1936) Relations between two sets of variates. Biometrika (28) pp. 321-377.
Johnson, R. A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall.
set.seed(123) X <- matrix(runif(75),ncol=3) Y <- matrix(runif(75),ncol=3) cca.results <- canocov(X,Y)
set.seed(123) X <- matrix(runif(75),ncol=3) Y <- matrix(runif(75),ncol=3) cca.results <- canocov(X,Y)
centres the columns of a matrix to mean zero.
cen(X,w=rep(1,nrow(X))/nrow(X))
cen(X,w=rep(1,nrow(X))/nrow(X))
X |
a raw data matrix. |
w |
a vector of case weights. |
returns a matrix
Jan Graffelman ([email protected])
X<-matrix(runif(10),ncol=2) Y<-cen(X) print(Y)
X<-matrix(runif(10),ncol=2) Y<-cen(X) print(Y)
Program clrmat
calculates the centred log-ratio transformation
for a matrix of compositions.
clrmat(X)
clrmat(X)
X |
A matrix of compositions |
A matrix containing the transformed data
Jan Graffelman [email protected]
data(Artificial) Xsim.com <- Artificial$Xsim.com Xclr <- clrmat(Xsim.com)
data(Artificial) Xsim.com <- Artificial$Xsim.com Xclr <- clrmat(Xsim.com)
Function largest.kappas
calculates the condition numbers for all subcompositions of a given size, for a particular compositional data set.
largest.kappas(Xcom, nparts = 3, sizetoplist = 10)
largest.kappas(Xcom, nparts = 3, sizetoplist = 10)
Xcom |
A data matrix with compositions in rows |
nparts |
The number of parts for the subcompositions to be analysed. |
sizetoplist |
The length of the list of the "best" subcompositions |
Log-ratio PCA is executed for each subcompostion, and the resulting eigenvalues and eigenvectors are stored.
A data frame with an ordered list of subcompositions
Jan Graffelman ([email protected])
X <- matrix(runif(600),ncol=6) Xcom <- X/rowSums(X) Results <- largest.kappas(Xcom)
X <- matrix(runif(600),ncol=6) Xcom <- X/rowSums(X) Results <- largest.kappas(Xcom)
Function lrcco
is a wrapper function around canocov
. It performs logratio canonical correlation analysis (LR-CCO) accepting
two compositional data matrices as input.
lrcco(X, Y)
lrcco(X, Y)
X |
The matrix of X compositions |
Y |
The matrix of Y compositions |
Matrices X
and Y
are assumed to contain positive elements only, and there rows sum to one.
Returns a list with the following results
ccor |
the canonical correlations |
A |
canonical weights of the X variables |
B |
canonical weights of the Y variables |
U |
canonical X variates |
V |
canonical Y variates |
Fs |
biplot markers for X variables (standard coordinates) |
Gs |
biplot markers for Y variables (standard coordinates) |
Fp |
biplot markers for X variables (principal coordinates) |
Gp |
biplot markers for Y variables (principal coordinates) |
Rxu |
canonical loadings, (correlations X variables, canonical X variates) |
Rxv |
canonical loadings, (correlations X variables, canonical Y variates) |
Ryu |
canonical loadings, (correlations Y variables, canonical X variates) |
Ryv |
canonical loadings, (correlations Y variables, canonical Y variates) |
Sxu |
covariance X variables, canonical X variates |
Sxv |
covariance X variables, canonical Y variates |
Syu |
covariance Y variables, canonical X variates |
Syv |
covariance Y variables, canonical Y variates |
fitRxy |
goodness of fit of the between-set correlation matrix |
fitXs |
adequacy coefficients of X variables |
fitXp |
redundancy coefficients of X variables |
fitYs |
adequacy coefficients of Y variables |
fitYp |
redundancy coefficients of Y variables |
Jan Graffelman [email protected]
Hotelling, H. (1935) The most predictable criterion. Journal of Educational Psychology (26) pp. 139-142.
Hotelling, H. (1936) Relations between two sets of variates. Biometrika (28) pp. 321-377.
Johnson, R. A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall.
Graffelman, J. and Pawlowsky-Glahn, V. and Egozcue, J.J. and Buccianti, A. (2018) Exploration of geochemical data with compositional canonical biplots, Journal of Geochemical Exploration 194, pp. 120–133. doi:10.1016/j.gexplo.2018.07.014
set.seed(123) X <- matrix(runif(75),ncol=3) Y <- matrix(runif(75),ncol=3) Xc <- X/rowSums(X) # create compositions by closure Yc <- Y/rowSums(Y) out.lrcco <- lrcco(X,Y)
set.seed(123) X <- matrix(runif(75),ncol=3) Y <- matrix(runif(75),ncol=3) Xc <- X/rowSums(X) # create compositions by closure Yc <- Y/rowSums(Y) out.lrcco <- lrcco(X,Y)
Function lrlda
implements logratio linear discriminant analysis for compositional data, using the centred logratio
transformation (clr)
lrlda(Xtrain, group, Xtest = NULL, divisorn = FALSE, verbose = FALSE)
lrlda(Xtrain, group, Xtest = NULL, divisorn = FALSE, verbose = FALSE)
Xtrain |
A compositional data set, the training data for logratio-LDA. |
group |
A categorical variable defining the groups. |
Xtest |
A compositional data set for which group prediction is sought (the test data). If no test data is supplied, the training data itself is classified. |
divisorn |
Use divisor "n" ( |
verbose |
Print output ( |
Function lrlda
uses the centred logratio transformation, which produces a singular covariance matrix. This singularity is
dealt with by using a generalized inverse. When test data is supplied via argument Xtest
, the scores of the linear classifier,
the poster probabilities and the predicted classes are calculated for the test data. If no test data is supplied, these quantities
are calculated for the training data.
LD |
Scores on the linear classifier for the test observations. These are also the biplot coordinates of the individuals. |
Fp |
Biplot coordinates of the group means. |
Gs |
Biplot coordinates of the variables. |
Sp |
Pooled covariance matrix. |
Mc |
Matrix of centred clr mean vectors, one row for each group. |
S.list |
Covariance matrices of each group. |
la |
Vector of eigenvalues. |
pred |
Predicted class for the test observations. |
CM |
The confusion matrix. |
gsize |
Sample size of each group. |
Mclr |
Matrix of mean vectors for clr coordinates, one row for each group. |
prob.posterior |
Vector of posterior probabilities. |
decom |
Table with decomposition of variability as expressed by the eigenvalues. |
Jan Graffelman ([email protected])
data(Tubb) sampleid <- Tubb$Sample site <- factor(Tubb$site) Oxides <- as.matrix(Tubb[,2:10]) rownames(Oxides) <- sampleid Oxides <- Oxides/rowSums(Oxides) out.lda <- lrlda(Oxides,site,verbose=FALSE)
data(Tubb) sampleid <- Tubb$Sample site <- factor(Tubb$site) Oxides <- as.matrix(Tubb[,2:10]) rownames(Oxides) <- sampleid Oxides <- Oxides/rowSums(Oxides) out.lda <- lrlda(Oxides,site,verbose=FALSE)
Function lrpca
performs logratio principal component analysis. It returns the variance decomposition, principal components, biplot coordinates and a table with condition indices.
lrpca(Xcom)
lrpca(Xcom)
Xcom |
A matrix with compositions in its rows |
Calculations are based on the singular value decompositon of the clr transformed compositions.
Fp |
matrix with principal components |
Fs |
matrix with standardized principal components |
Gp |
biplot markers for parts (principal coordinates) |
Gs |
biplot markers for parts (standard coordinates) |
La |
eigenvalues |
D |
singular values |
decom |
table with variance decomposition |
kappalist |
table with condition indices and eigenvectors |
Jan Graffelman ([email protected])
data(bentonites) Ben <- bentonites[,1:8] Ben.com <- Ben/rowSums(Ben) out.lrpca <- lrpca(Ben.com)
data(bentonites) Ben <- bentonites[,1:8] Ben.com <- Ben/rowSums(Ben) out.lrpca <- lrpca(Ben.com)
Dataframe PinotNoir
contains the composition of 17 chemical components for 37 Pinot Noir wines, as well as
an Aroma evaluation.
data("PinotNoir")
data("PinotNoir")
A data frame with 37 observations on the following 18 variables.
Cd
Cadmium
Mo
Molybdenum
Mn
Manganese
Ni
Nickel
Cu
Copper
Al
Aluminium
Ba
Barium
Cr
Chromium
Sr
Strontium
Pb
Lead
B
Boron
Mg
Magnesium
Si
Silicon
Na
Sodium
Ca
Calcium
P
Phosphorus
K
Potassium
Aroma
Aroma evaluation
doi:10.1016/S0003-2670(00)84245-2
Frank, I.E. and Kowalski, B.R. (1984) Prediction of Wine Quality and Geographic Origin from Chemical Measurements by Partial Least-Squares Regression Modeling. Analytica Chimica Acta 162, pp. 241–251 doi:10.1016/S0003-2670(00)84245-2
data(PinotNoir)
data(PinotNoir)
Function ternaryplot
accepts a matrix of three part compositions or non-negative
counts and presents these in a ternary diagram.
ternaryplot(X, vertexlab = colnames(X), vertex.cex = 1, pch = 19, addpoints = TRUE, grid = FALSE, gridlabels = TRUE, ...)
ternaryplot(X, vertexlab = colnames(X), vertex.cex = 1, pch = 19, addpoints = TRUE, grid = FALSE, gridlabels = TRUE, ...)
X |
A matrix of counts or compositions with three columns |
vertexlab |
Labels for the vertices of the tenary diagram |
vertex.cex |
Character expansion factor for vertex labels |
pch |
Plotting character for the compositions |
addpoints |
Show the compositions |
grid |
Place a grid over the ternary diagram |
gridlabels |
Place grid labels or not |
... |
Additional arguments for the |
NULL
Jan Graffelman ([email protected])
data("Artificial") Xsim.com <- Artificial$Xsim.com colnames(Xsim.com) <- paste("X",1:3,sep="") ternaryplot(Xsim.com)
data("Artificial") Xsim.com <- Artificial$Xsim.com colnames(Xsim.com) <- paste("X",1:3,sep="") ternaryplot(Xsim.com)
tr
computes the trace of a matrix.
tr(X)
tr(X)
X |
a (square) matrix |
the trace (a scalar)
Jan Graffelman ([email protected])
X <- matrix(runif(25),ncol=5) print(X) print(tr(X))
X <- matrix(runif(25),ncol=5) print(X) print(tr(X))
A dataframe with the major oxide composition of pottery found at Romano-British kiln sites in Wales, Gloucester and the New Forest as determined by atomic absorption.
data("Tubb")
data("Tubb")
A data frame with 48 observations on the following 11 variables.
Sample
Sample identifier
Al2O3
Aluminium oxide
Fe2O3
Iron (III) oxide
MgO
Magnesium oxide
CaO
Calcium oxide
Na2O
Sodium oxide
K2O
Potassium oxide
TiO2
Titaniium dioxide
MnO
Manganese oxide
BaO
Barium oxide
site
Geographical region of the sample. G=Gloucester, NF=New Forest, W=Wales.
Tubb, A., Parker, A.J. and Nickless, G. (1980) The analysis of Romano-British pottery by atomic absorption spectrophotometry. Archaeometry 22(2) pp. 153–171.
data(Tubb)
data(Tubb)