Package 'ToolsForCoDa' reference manual

Title:	Multivariate Tools for Compositional Data Analysis
Description:	Provides functions for multivariate analysis with compositional data. Includes a function for doing compositional canonical correlation analysis. This analysis requires two data matrices of compositions, which can be adequately transformed and used as entries in a specialized program for canonical correlation analysis, that is able to deal with singular covariance matrices. The methodology is described in Graffelman et al. (2017) <doi:10.1101/144584>. Functions for log-ratio principal component analysis with condition number computations and log-ratio discriminant analysis have been added to the package.
Authors:	Jan Graffelman [aut, cre]
Maintainer:	Jan Graffelman <[email protected]>
License:	GPL (>= 2)
Version:	1.1.0
Built:	2025-03-10 04:34:24 UTC
Source:	https://github.com/cran/ToolsForCoDa

Two sets of 3-part compositions

Description

The list object Artificial contains two data frames of 3-part compositions. The data refer to the example in Section 3.1 of Graffelman et al. (2017)

Usage

data(Artificial)data(Artificial)

Format

A list containing two data frames containing 100 observations.

Source

Laird, N. M. and Lange, C. Table 7.11, p. 124

References

Graffelman, J., Pawlowsky-Glahn, V., Egozcue, J.J. and Buccianti, A. (2017) Compositional Canonical Correlation Analysis.

Isotopic and chemical compositions of bentonites

Description

The data consists of 14 geological samples from the US with their major oxide composition (SiO2, Al2O3, Fe2O3, MnO, MgO, CaO, K2O, Na2O and H2O+) and delta Deuterium and delta-18-Oxysgen (dD,d18O).

Usage

data("bentonites")data("bentonites")

Format

A data frame with 14 observations on the following 11 variables.

Si: a numeric vector
Al: a numeric vector
Fe: a numeric vector
Mn: a numeric vector
Mg: a numeric vector
Ca: a numeric vector
K: a numeric vector
Na: a numeric vector
H20: a numeric vector
dD: a numeric vector
d18O: a numeric vector

Source

Cadrin, A.A.J (1995), Tables 1 and 2. Reyment, R. A. and Savazzi, E. (1999), pp. 220-222.

References

Cadrin, A.A.J., Kyser, T.K., Caldwell, W.G.E. and Longstaffe, F.J. (1995) Isotopic and chemical compositions of bentonites as paleoenvironmental indicators of the Cretaceous Western Interior Seaway Palaeogeography, Palaeoclimatology, Palaeoecology 119 pp. 301–320.

Reyment, R. A. and Savazzi, E. (1999) Aspects of Multivariate Statistical Analysis in Geology, Elsevier Science B.V., Amsterdam.

Examples

data(bentonites)
data(bentonites)

Canonical correlation analysis.

Description

Function canocov performs a canonical correlation analysis. It operates on raw data matrices, which are only centered in the program. It uses generalized inverses and can deal with structurally singular covariance matrices.

Usage

canocov(X, Y)
canocov(X, Y)

Arguments

`X`	The n times p X matrix of observations
`Y`	The n times q Y matrix of observations

Details

canocov computes the solution by a singular value decomposition of the transformed between set covariance matrix.

Value

Returns a list with the following results

`ccor`	the canonical correlations
`A`	canonical weights of the X variables
`B`	canonical weights of the Y variables
`U`	canonical X variates
`V`	canonical Y variates
`Fs`	biplot markers for X variables (standard coordinates)
`Gs`	biplot markers for Y variables (standard coordinates)
`Fp`	biplot markers for X variables (principal coordinates)
`Gp`	biplot markers for Y variables (principal coordinates)
`Rxu`	canonical loadings, (correlations X variables, canonical X variates)
`Rxv`	canonical loadings, (correlations X variables, canonical Y variates)
`Ryu`	canonical loadings, (correlations Y variables, canonical X variates)
`Ryv`	canonical loadings, (correlations Y variables, canonical Y variates)
`Sxu`	covariance X variables, canonical X variates
`Sxv`	covariance X variables, canonical Y variates
`Syu`	covariance Y variables, canonical X variates
`Syv`	covariance Y variables, canonical Y variates
`fitRxy`	goodness of fit of the between-set correlation matrix
`fitXs`	adequacy coefficients of X variables
`fitXp`	redundancy coefficients of X variables
`fitYs`	adequacy coefficients of Y variables
`fitYp`	redundancy coefficients of Y variables

Author(s)

Jan Graffelman [email protected]

References

Hotelling, H. (1935) The most predictable criterion. Journal of Educational Psychology (26) pp. 139-142.

Hotelling, H. (1936) Relations between two sets of variates. Biometrika (28) pp. 321-377.

Johnson, R. A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall.

Examples

set.seed(123)
X <- matrix(runif(75),ncol=3)
Y <- matrix(runif(75),ncol=3)
cca.results <- canocov(X,Y)
set.seed(123)
X <- matrix(runif(75),ncol=3)
Y <- matrix(runif(75),ncol=3)
cca.results <- canocov(X,Y)

centring of a data matrix

Description

centres the columns of a matrix to mean zero.

Usage

cen(X,w=rep(1,nrow(X))/nrow(X))cen(X,w=rep(1,nrow(X))/nrow(X))

Arguments

`X`	a raw data matrix.
`w`	a vector of case weights.

Value

returns a matrix

Author(s)

Jan Graffelman ([email protected])

Examples

X<-matrix(runif(10),ncol=2)
Y<-cen(X)
print(Y)
X<-matrix(runif(10),ncol=2)
Y<-cen(X)
print(Y)

Centred log-ratio transformation

Description

Program clrmat calculates the centred log-ratio transformation for a matrix of compositions.

Usage

clrmat(X)
clrmat(X)

Arguments

`X`	A matrix of compositions

Value

A matrix containing the transformed data

Author(s)

Jan Graffelman [email protected]

Examples

data(Artificial)
Xsim.com <- Artificial$Xsim.com
Xclr <- clrmat(Xsim.com)
data(Artificial)
Xsim.com <- Artificial$Xsim.com
Xclr <- clrmat(Xsim.com)

Calculate condition indices for subcompositions

Description

Function largest.kappas calculates the condition numbers for all subcompositions of a given size, for a particular compositional data set.

Usage

largest.kappas(Xcom, nparts = 3, sizetoplist = 10)
largest.kappas(Xcom, nparts = 3, sizetoplist = 10)

Arguments

`Xcom`	A data matrix with compositions in rows
`nparts`	The number of parts for the subcompositions to be analysed.
`sizetoplist`	The length of the list of the "best" subcompositions

Details

Log-ratio PCA is executed for each subcompostion, and the resulting eigenvalues and eigenvectors are stored.

Value

A data frame with an ordered list of subcompositions

Author(s)

Jan Graffelman ([email protected])

Examples

X <- matrix(runif(600),ncol=6)
Xcom <- X/rowSums(X)
Results <- largest.kappas(Xcom)
X <- matrix(runif(600),ncol=6)
Xcom <- X/rowSums(X)
Results <- largest.kappas(Xcom)

Logratio Canonical Correlation Analysis

Description

Function lrcco is a wrapper function around canocov. It performs logratio canonical correlation analysis (LR-CCO) accepting two compositional data matrices as input.

Usage

lrcco(X, Y)
lrcco(X, Y)

Arguments

`X`	The matrix of X compositions
`Y`	The matrix of Y compositions

Details

Matrices X and Y are assumed to contain positive elements only, and there rows sum to one.

Value

Returns a list with the following results

`ccor`	the canonical correlations
`A`	canonical weights of the X variables
`B`	canonical weights of the Y variables
`U`	canonical X variates
`V`	canonical Y variates
`Fs`	biplot markers for X variables (standard coordinates)
`Gs`	biplot markers for Y variables (standard coordinates)
`Fp`	biplot markers for X variables (principal coordinates)
`Gp`	biplot markers for Y variables (principal coordinates)
`Rxu`	canonical loadings, (correlations X variables, canonical X variates)
`Rxv`	canonical loadings, (correlations X variables, canonical Y variates)
`Ryu`	canonical loadings, (correlations Y variables, canonical X variates)
`Ryv`	canonical loadings, (correlations Y variables, canonical Y variates)
`Sxu`	covariance X variables, canonical X variates
`Sxv`	covariance X variables, canonical Y variates
`Syu`	covariance Y variables, canonical X variates
`Syv`	covariance Y variables, canonical Y variates
`fitRxy`	goodness of fit of the between-set correlation matrix
`fitXs`	adequacy coefficients of X variables
`fitXp`	redundancy coefficients of X variables
`fitYs`	adequacy coefficients of Y variables
`fitYp`	redundancy coefficients of Y variables

Author(s)

Jan Graffelman [email protected]

References

Hotelling, H. (1935) The most predictable criterion. Journal of Educational Psychology (26) pp. 139-142.

Hotelling, H. (1936) Relations between two sets of variates. Biometrika (28) pp. 321-377.

Johnson, R. A. and Wichern, D. W. (2002) Applied Multivariate Statistical Analysis. New Jersey: Prentice Hall.

Graffelman, J. and Pawlowsky-Glahn, V. and Egozcue, J.J. and Buccianti, A. (2018) Exploration of geochemical data with compositional canonical biplots, Journal of Geochemical Exploration 194, pp. 120–133. doi:10.1016/j.gexplo.2018.07.014

Examples

set.seed(123)
X  <- matrix(runif(75),ncol=3)
Y  <- matrix(runif(75),ncol=3)
Xc <- X/rowSums(X) # create compositions by closure
Yc <- Y/rowSums(Y)
out.lrcco <- lrcco(X,Y)
set.seed(123)
X  <- matrix(runif(75),ncol=3)
Y  <- matrix(runif(75),ncol=3)
Xc <- X/rowSums(X) # create compositions by closure
Yc <- Y/rowSums(Y)
out.lrcco <- lrcco(X,Y)

Logratio Linear Discriminant Analysis

Description

Function lrlda implements logratio linear discriminant analysis for compositional data, using the centred logratio transformation (clr)

Usage

lrlda(Xtrain, group, Xtest = NULL, divisorn = FALSE, verbose = FALSE)
lrlda(Xtrain, group, Xtest = NULL, divisorn = FALSE, verbose = FALSE)

Arguments

`Xtrain`	A compositional data set, the training data for logratio-LDA.
`group`	A categorical variable defining the groups.
`Xtest`	A compositional data set for which group prediction is sought (the test data). If no test data is supplied, the training data itself is classified.
`divisorn`	Use divisor "n" (`divisorn=TRUE`) in the calculation of covariance or use "n-1" (`divisorn=TRUE`)
`verbose`	Print output (`verbose = TRUE`) or not.

Details

Function lrlda uses the centred logratio transformation, which produces a singular covariance matrix. This singularity is dealt with by using a generalized inverse. When test data is supplied via argument Xtest, the scores of the linear classifier, the poster probabilities and the predicted classes are calculated for the test data. If no test data is supplied, these quantities are calculated for the training data.

Value

`LD`	Scores on the linear classifier for the test observations. These are also the biplot coordinates of the individuals.
`Fp`	Biplot coordinates of the group means.
`Gs`	Biplot coordinates of the variables.
`Sp`	Pooled covariance matrix.
`Mc`	Matrix of centred clr mean vectors, one row for each group.
`S.list`	Covariance matrices of each group.
`la`	Vector of eigenvalues.
`pred`	Predicted class for the test observations.
`CM`	The confusion matrix.
`gsize`	Sample size of each group.
`Mclr`	Matrix of mean vectors for clr coordinates, one row for each group.
`prob.posterior`	Vector of posterior probabilities.
`decom`	Table with decomposition of variability as expressed by the eigenvalues.

Author(s)

Jan Graffelman ([email protected])

Examples

  data(Tubb)
  sampleid <- Tubb$Sample
  site     <- factor(Tubb$site)
  Oxides   <- as.matrix(Tubb[,2:10])
  rownames(Oxides) <- sampleid
  Oxides   <- Oxides/rowSums(Oxides)
  out.lda  <- lrlda(Oxides,site,verbose=FALSE)
data(Tubb)
  sampleid <- Tubb$Sample
  site     <- factor(Tubb$site)
  Oxides   <- as.matrix(Tubb[,2:10])
  rownames(Oxides) <- sampleid
  Oxides   <- Oxides/rowSums(Oxides)
  out.lda  <- lrlda(Oxides,site,verbose=FALSE)

Logratio principal component analysis with condition indices

Description

Function lrpca performs logratio principal component analysis. It returns the variance decomposition, principal components, biplot coordinates and a table with condition indices.

Usage

lrpca(Xcom)
lrpca(Xcom)

Arguments

Xcom

A matrix with compositions in its rows

Details

Calculations are based on the singular value decompositon of the clr transformed compositions.

Value

`Fp`	matrix with principal components
`Fs`	matrix with standardized principal components
`Gp`	biplot markers for parts (principal coordinates)
`Gs`	biplot markers for parts (standard coordinates)
`La`	eigenvalues
`D`	singular values
`decom`	table with variance decomposition
`kappalist`	table with condition indices and eigenvectors

Author(s)

Jan Graffelman ([email protected])

Examples

data(bentonites)
Ben <- bentonites[,1:8]
Ben.com <- Ben/rowSums(Ben)
out.lrpca <- lrpca(Ben.com)
data(bentonites)
Ben <- bentonites[,1:8]
Ben.com <- Ben/rowSums(Ben)
out.lrpca <- lrpca(Ben.com)

Chemical composition of Pinot Noir wines

Description

Dataframe PinotNoir contains the composition of 17 chemical components for 37 Pinot Noir wines, as well as an Aroma evaluation.

Usage

data("PinotNoir")data("PinotNoir")

Format

A data frame with 37 observations on the following 18 variables.

Cd: Cadmium
Mo: Molybdenum
Mn: Manganese
Ni: Nickel
Cu: Copper
Al: Aluminium
Ba: Barium
Cr: Chromium
Sr: Strontium
Pb: Lead
B: Boron
Mg: Magnesium
Si: Silicon
Na: Sodium
Ca: Calcium
P: Phosphorus
K: Potassium
Aroma: Aroma evaluation

Source

doi:10.1016/S0003-2670(00)84245-2

References

Frank, I.E. and Kowalski, B.R. (1984) Prediction of Wine Quality and Geographic Origin from Chemical Measurements by Partial Least-Squares Regression Modeling. Analytica Chimica Acta 162, pp. 241–251 doi:10.1016/S0003-2670(00)84245-2

Examples

data(PinotNoir)
data(PinotNoir)

Create a Ternary Plot for three-part Compositions

Description

Function ternaryplot accepts a matrix of three part compositions or non-negative counts and presents these in a ternary diagram.

Usage

ternaryplot(X, vertexlab = colnames(X), vertex.cex = 1, pch = 19, addpoints = TRUE,
            grid = FALSE, gridlabels = TRUE, ...)
ternaryplot(X, vertexlab = colnames(X), vertex.cex = 1, pch = 19, addpoints = TRUE,
            grid = FALSE, gridlabels = TRUE, ...)

Arguments

`X`	A matrix of counts or compositions with three columns
`vertexlab`	Labels for the vertices of the tenary diagram
`vertex.cex`	Character expansion factor for vertex labels
`pch`	Plotting character for the compositions
`addpoints`	Show the compositions `addpoints=TRUE` or not
`grid`	Place a grid over the ternary diagram
`gridlabels`	Place grid labels or not
`...`	Additional arguments for the `points` function

Value

NULL

Author(s)

Jan Graffelman ([email protected])

Examples

data("Artificial")
Xsim.com <- Artificial$Xsim.com
colnames(Xsim.com) <- paste("X",1:3,sep="")
ternaryplot(Xsim.com)
data("Artificial")
Xsim.com <- Artificial$Xsim.com
colnames(Xsim.com) <- paste("X",1:3,sep="")
ternaryplot(Xsim.com)

Compute the trace of a matrix

Description

tr computes the trace of a matrix.

Usage

tr(X)
tr(X)

Arguments

`X`	a (square) matrix

Value

the trace (a scalar)

Author(s)

Jan Graffelman ([email protected])

Examples

X <- matrix(runif(25),ncol=5)
print(X)
print(tr(X))
X <- matrix(runif(25),ncol=5)
print(X)
print(tr(X))

Romano-British pottery oxides

Description

A dataframe with the major oxide composition of pottery found at Romano-British kiln sites in Wales, Gloucester and the New Forest as determined by atomic absorption.

Usage

data("Tubb")data("Tubb")

Format

A data frame with 48 observations on the following 11 variables.

Sample: Sample identifier
Al2O3: Aluminium oxide
Fe2O3: Iron (III) oxide
MgO: Magnesium oxide
CaO: Calcium oxide
Na2O: Sodium oxide
K2O: Potassium oxide
TiO2: Titaniium dioxide
MnO: Manganese oxide
BaO: Barium oxide
site: Geographical region of the sample. G=Gloucester, NF=New Forest, W=Wales.

References

Tubb, A., Parker, A.J. and Nickless, G. (1980) The analysis of Romano-British pottery by atomic absorption spectrophotometry. Archaeometry 22(2) pp. 153–171.

Examples

data(Tubb)
data(Tubb)

Package 'ToolsForCoDa'

Help Index

Two sets of 3-part compositions

Description

Usage

Format

Source

References

Isotopic and chemical compositions of bentonites

Description

Usage

Format

Source

References

Examples

Canonical correlation analysis.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

centring of a data matrix

Description

Usage

Arguments

Value

Author(s)

Examples

Centred log-ratio transformation

Description

Usage

Arguments

Value

Author(s)

Examples

Calculate condition indices for subcompositions

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Logratio Canonical Correlation Analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Logratio Linear Discriminant Analysis

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Logratio principal component analysis with condition indices

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Chemical composition of Pinot Noir wines

Description

Usage

Format

Source