Zoobenthic species measured in an intertidal area   in Argentina

This work is based on:

bulletIeno, E.N. 2000. Las comunidades bentónicas de fondos blandos del norte de la Provincia de Buenos Aires: su rol ecológico en el ecosistema costero. Tesis Doctoral Universidad Nacional de Mar del Plata, 247 pp.
bulletIeno E.N., Martin J.P., Bastida R.O. & Zuur A.F. (2001). Wader population and macrozoobenthos on a South American estuary. Poster presentation at: Joint IAPSO/IABO Assembly 2001: An Ocean Odyssey October 2001, Mar del Plata, Argentina. Session: Ecological processes and pollution in estuarine and coastal waters. IB01-71.

In this case study, the principal component analysis (PCA) biplot, indirect gradient analysis and the redundancy analysis (RDA) triplot are used to detect species-environmental relationships. We also discuss why PCA and RDA should be used for this example and not CA or CCA. Finally, we show how discriminant analysis can be used to detect differences in species behavior at 3 transects. All results were obtained using Brodgar v1.8.5.

The original data consist of counted abundance of 4 zoobenthic species measured at 30 sites. Measurements were made both in Spring and Autumn 1997 in a salt mush area in Argentina (Ieno, 2000). The 4 species were:

Number Species
1 Laeonereis acuta 
2 Heteromastus similis
3 Uca uruguayensis
4 Neanthes succinea

The samples were taken along three transects, each consisting of 10 sites. Additionally, four environmental variables were measured at each site. These were medium sand, fine sand, mud content and organic matter. The 30 sites were labelled as A1,...,A10, B1, ..,B10, C1, ..,C10. The 4 species measured in Spring and Autumn are denoted by their species name followed by an "S" and "A" respectively. The underlying question is: Are there any relationships between the species and the explanatory variables.

 

Transformation & correlations

We first discuss the transformation of the data. Figure 1 shows the boxplot of the 4 species measured in Spring and in Autumn.

Figure 1: Boxplot of original species data. Boxplots of the same species measured in Spring (S) and Autumn (A) are plotted beside each other.

Because some species had values between 0 and 150, and other species only between 0 and 10,  we decided to apply a square root transformation. Next, we look at the distributions of the species (over the samples) and compare these for both periods. This can be done by calculation the cross-correlation between the same species measured in different periods. If there is no seasonal effect, one expects to find a high correlation between a particular species measured in the two periods. Table 1 shows the correlations between the (square root transformed)  species. The correlation between L. acuta measured in Spring and Autumn is 0.29, which is not significant at the 5% level. Only the patterns of the species U. uruguayensis at the 30 sites were similar in both periods; the correlation was 0.93. Based on these results we decided to consider the 4 species measured in Spring  and Autumn as 8 different response variables.

Table 1: Correlations between all species. Numbers in red indicate a significant correlations at the 5% level, and numbers in black are not significant different from 0. 

L. acuta S H. similis S   U. uruguayensis S N. succinea S L. acuta A H. similis A   U. uruguayensis A N. succinea A
L. acuta S 1

-0.25

0.46 -0.52 0.29 -0.26 0.54 -0.49
H. similis S 1 -0.51 -0.14 -0.33 -0.03 -0.52 0.27
U. uruguayensis S 1 -0.26 0.15 0.08 0.93 -0.17
N. succinea S 1 -0.28  0.46 -0.22 0.43
L. acuta A 1 -0.10 0.27 -0.54
H. similis A 1 0.10 0.50
U. uruguayensis A 1 -0.20
N. succinea A 1

 

Principal component analysis biplot

The correlation matrix can also be visualised with the PCA biplot, see Figure 2. Lines pointing in the same direction correspond to species which are correlated. Note that this is the case for (i) L. acuta (measured in Spring and Autumn), (ii) U. uruguayensis (measured in Spring and Autumn), and (iii) N. succinea (measured in Spring and Autumn). Lines pointing in opposite direction correspond to negatively correlated species, e.g. L. acuta and N. succinea.  Based on the position of the samples, one can infer at which samples the species were correlated. For example, U. uruguayensis was abundant at the B sites and N. succinea mainly at the third transect (C).   Note that the lines for H. similis  are nearly perpendicular, indicating that this species did not behave similar in the two seasons.

Figure 2. The principal component analysis biplot.

Brodgar gives the following numerical output for principal component analysis:

Column 1: axis
Column 2: eigenvalue
Column 3: eigenvalue as percentage
Column 4: eigenvalue as cumulative percentage

Col 1 Col 2 	Col 3 	   Col 4
1     0.408     40.829     40.829
2     0.237     23.689     64.519
3     0.133     13.324     77.843

The first two axes explain 64.52% of the total variation. To simplify interpretation of the biplot axes, Brodgar can superimpose the explanatory variables (via the Tools menu from the biplot graph) on the graph, see Figure 3. Each explanatory variable is represented by a line from the origin to a point with coordinates (c1,c2). The coordinates c1 and c2 are the correlations between the explanatory variable and the first and second axis respectively. If these lines are long (say longer than 0.5), it indicates a strong correlation between a biplot axis (and everything related to that) and the corresponding explanatory variable. Results in Figure 3 indicate that only the explanatory variable median sand is related to the first axis. The other variables are not strongly correlated to these axes.

Figure 3. The principal component analysis biplot and explanatory variables are superimposed.

 

Redundancy analysis

The PCA biplot and superimposing explanatory variables are indirect gradient analysis techniques; in the first step ordination axes are calculated and in the second step these axes are correlated with explanatory variables. The underlying aim is to explain variation in terms of the measured explanatory variables. However, it might be possible that the third or the fourth axes are strongly correlated to the explanatory variables. An indirect gradient analysis might not pick this up. Instead, a direct gradient analysis can be applied, e.g. redundancy analysis (RDA). RDA is basically a PCA in which the axes are restricted to be linear combinations of explanatory variables.  RDA is very similar to canonical correspondence analysis (CCA), except that RDA is based on linear species-environmental relationships whereas CCA is based on (approximately) unimodal relations. Results of RDA are presented in a so-called triplot, see Figure 4.

Figure 4. The redundancy analysis triplot.

A triplot consists of various biplots, namely (i) explanatory variables and species, (ii) explanatory variables and samples and (iii) species and samples. Furthermore, the directions of the lines give information on correlations between explanatory variables and between species. The most obvious aspects of the triplot in Figure 4 are:

bulletMud is positively correlated with organic matter and negatively correlated with fine sand.
bulletU. uruguayensis was abundant at sites with high values of median sand and low values of organic matter.
bulletN. succinea (Spring) was abundant at sites characterised by fine sand.

The numerical output of Brodgar is given in Table 2.

Table 2. Eigenvalues, cumulative percentage of variance of species data and cumulative percentage variance of species-environmental relationships.

axis

eigenvalue

Cumulative percentage variance of species data Cumulative percentage variance of species-environmental relationships
1 0.23 22.63 64.02
2 0.06 28.99 82.02
3 0.03 32.35 91.54

The second column shows the eigenvalue for each axis. The total sum of eigenvalues is 1. The third and fourth columns contain the cumulative percentage of variance of the species data and the cumulative percentage of variance of the species-environmental relationships explained by the axes respectively. Results indicate that the first two axes in Figure 4 represent 82.02% of the information that can be explained with all four explanatory (or: environmental) variables. Hence, a two dimensional presentation is a good representation in this case. However, these first two axes only explain 28.99% of the total species variance, which is rather low. This means that the four explanatory variables are not adequate in explaining the species variance and that the really important explanatory variables were not used in the analysis (or they were not measured). 

 

Coenoclines

In the previous two paragraphs, PCA and RDA were applied. These techniques are based on linear relationships. The reason we used PCA and RDA, and not correspondence analysis and canonical correspondence analysis becomes clear by drawing coenoclines. These are smoothing curves of species abundances along a gradient. The gradient can either be (i) an ordination axis or (ii) an explanatory variable. Figure 5 shows the coenoclines along the explanatory variable mud. Most curves indicate a linear relationship between species and mud content. Coenoclines for other explanatory variables are similar. The vertical bars | at the bottom of the graph indicate where (along the gradient) samples were taken. Note that most samples were obtained in the midrange of the gradient and that there are a few samples with more extreme values.

Figure 5. Coenoclines.

 

Conclusions based on PCA and RDA

Coenoclines indicated that most species-environmental variables are linear and as a consequence we applied PCA and RDA instead of CA and CCA. RDA revealed various interesting species-environmental relations but the measured explanatory variables explained only 29.99% of the total species variance. This is a clear indication that an important gradient was not measured in the experiment.

 

Discriminant analysis

In this paragraph we focus on a slightly different question: Are species interactions similar in the 3 transects? Discriminant analysis (DA), as implemented in Brodgar, can be used to answer this question. An explanation of DA and various practical applications are presented elsewhere on this web site. The species are used as 8 response variables and the 30 samples were divided in three groups, namely:

bulletgroup 1: samples A1,...,A10
bulletgroup 2: samples B1, ..,B10
bulletgroup 3: samples C1, ..,C1 

A discriminant analysis on this grouping was applied. Group average and canonical coefficients are presented in Figures 6 and 7 respectively.

Figure 6. Group averages and sample scores. The numbers 1, 2 and 3 refer to the transects A, B and C respectively. Triangles represent group averages. 

Figure 7. Canonical coefficients.

Figure 6 indicates that there is a clear discrimination between samples from each groups. The eigenvalues of the first two axes (and other information) can be obtained by clicking on the button "Numerical information". 

Eigenvalues (=lambda)
axis 	lambda 	lambda as % 	lambda cumulative %
1 	8.961 	65.855 		65.855
2 	4.646 	34.145 		100.000

Because there are three groups, the total number of axes is 2. Hence, the first two axes show 100% of the information. The first eigenvalue is approximately 2 times larger than the second. This means that  the distinctions between transects A and C are more important than the differences between B and the other two. The dimensionality tests in Brodgar show that the discriminations along both axes are significant (the critical values of a Chi2 test with 16 and 7 degrees of freedom are 26.296 and 14.067 respectively at the 5% significance level.

Dimensionality tests for group separation
See p. 211-212 in Huberty (1994)
H0: No separation on any dimension
B0= 94.696
B0 is a chi-squared statistic with degrees of freedom: 16.000

H0: Separation on at most one dimension
B1= 40.677
B1 is a chi-squared statistic with degrees of freedom: 7.000

The canonical correlations indicate that transect C is different because of H. similis (S and A) and N. succinea (S and A). The behavior of U. uruguayensis allows for the discrimination of transect B. Obviously, not every species is equally important for discriminating between the three transects. Brodgar carries out a backward selection procedure, see the Egyptian skull example for details. Results are presented in Table 3. The species H. similis S, L. acuta S and N. succinea S were the most important species for discriminating between the samples. L. acuta A, U. uruguayensis S and H. similis A were the least important. 

The biological interpretation of this analysis is that species interactions at the three transects are not the same. Especially H. similis S, L. acuta S and N. succinea S showed rather different behavior at some of the transects.

 

Table 3. Decrease in total sum of Mahalanobis distances between group means if a particular species is omitted from the analysis. The naive rank indicates the importance of a species. A low rank (1, 2, 3) means that the species was important  for discriminating between the transects, a high rank (8, 7, 6) indicates that its contribution was minor.

species Decrease in total sum of Mahalanobis distances between group means  naive rank
L. acuta S 75.19 2
H. similis S 63.59 1
U. uruguayensis S 108.89 7
N. succinea S 76.81 3
L. acuta A 109.18 8
H. similis A 108.34 6
U. uruguayensis A 94.26 5
N. succinea A 81.77 4

Home