Hunting Spider Data

The aim of this example is to show how canonical correspondence analysis can be used in Brodgar to detect species-environmental relationships. The data set concerns the distribution of wolfspiders in a dune area. The data were originally published in van der Aart & Smeenk-Enserink (1975). Since then these data have been used in various publications to illustrate ordination techniques. 

The original data consist of counted abundances of twelve species captured at 100 sites (pitfall traps) in a dune area in The Netherlands. Five environmental variables were measured at 28 sites. These were water content, bare sand, moss cover, light reflection, fallen twigs and herb cover. We use the observed abundances at the 28 sites where environmental variables were monitored. Species data are transformed by taking square roots. In Table 1, the 12 species are given.

Table 1: Twelve spider species

No species   No  species
1 Arctosa lutetiana   7 Trochosa terricola
2 Pardosa lugubris   8 Alopecosa cuneata
3 Zora spinimana   9 Pardosa monticola
4 Pardosa nigriceps   10 Alopecosa accentuata
5 Pardosa pullata   11 Alopecosa fabrilis
6 Aulonia albimana   12 Arctosa perita


Note that the geographical coordinates of the 28 sites are not available. This means that geostatistical techniques like kriging and variograms among others, cannot be applied. To identify relationships between the 12 spiders and the 5 environmental variables, canonical correspondence analysis was applied. The resulting triplot is presented in Figure 1.

Figure 1: Snapshot of the triplot produced by Brodgar. Colors, fonts and fontsizes of the labels and lines can be changed. High quality graphical output can be obtained by exporting the graph to wmf format or by copying and pasting the graph directly into Word.  

A triplot visualises correlation between species, environmental variables and samples. We will now explain how to read such a graph. The blue lines in Figure 1 correspond to the environmental variables. The black lines represent the 12 species and the samples are denoted by their number. The interpretation of the triplot is as follows:

bulletBlue lines pointing in the same direction indicate that the corresponding explanatory variables are correlated with each other. Examples are moss cover and light reflection. Long lines are more important than the short ones. 
bulletLines pointing in opposite directions are negatively correlated, see for example the lines for water content and bare sand. It doesn't come as a surprise that these two  are negatively correlated. Lines with an angle of 90 degrees indicate that the two variables are uncorrelated. An example is moss and herb cover.
bulletThe same interpretation holds for the species. For example, A. perita (denoted by arctperi) and A. fabrilis (Alopfabr) are highly correlated.
bulletThe head of a species line and the lines for the environmental variables can be analysed as a biplot. The same holds for (i) the species lines and sample points, and (ii) the environmental lines and the sample points. 

Interpretation of the biplots is in general easier by presenting them as a biplot instead as a triplot, see Figures 2 for the species and sample scores.

Figure 2. Biplot of species and sample scores

The sample scores can be projected on the species lines in Figure 2, indicating at which sites a particular species behaved different from the average pattern. If Figure 2 would have been the output of a PCA or RDA biplot, the interpretation would have been: at which sites a particular species was abundant. This indicates also the difference between PCA and RDA on one side, and CA and CCA on the other side. In the latter two techniques, we look at the deviations from the average pattern (or: average profile).

So, we can say that P. lugubris behaved rather different at various sites. Actually, some authors analysed this data set without P. lugubris. The biplot of environmental and sample scores is presented in Figures 3.

Figure 3. Biplot of environmental and sample scores

By projecting the sample scores on the blue lines, environmental conditions at the sites can be inferred. Brodgar also allows the user to produce graphs containing only one set of score, see Figure 4. 

Figure 4. Sample scores

 

The numerical output produced by Brodgar is as follows:

Numerical output for canonical correspondence analysis

Column 1: axis
Column 2: eigenvalue
Column 3: eigenvalue as percentage of total inertia
Column 4: idem, but cumulative
Column 5: eigenvalue as percentage of sum of all canonocal eigenvalues
Column 6: idem, but cumulative

Col 1 	Col 2 	Col 3 	Col 4 	Col 5 	Col 6
1 	0.502 	43.663 	43.663 	49.530 	49.530
2 	0.181 	15.762 	59.425 	17.880 	67.410
3 	0.062	5.433 	64.858 	6.163 	73.573
4 	0.019 	1.627 	66.485 	1.845 	75.418

Total inertia or total variance:
1.149

Sum of all canonical eigenvalues:
1.013

The second column contains the eigenvalues. The fourth column indicates the (cumulative) amount of variation (inertia) explained by the axes, as a percentage of the total variation. Hence, the first two axes in Figure 2 represent 59.43% of the total variation. However, the sixth column contains the (cumulative) amount of variation explained by the axes, as a percentage of the variation that can be explained with the 5 explanatory variables. In this example, the first two axes in Figure 1 represent 67.41% of the variation that can be explained with the explanatory variables. If these two percentages differ much, it is an indication that other (not the used ones) explanatory variables are important. Formulated differently, there is another important explanatory variable which was not used in the analysis (or perhaps it was not measured).

The results of CCA can be summarised as follows:

bulletVarious explanatory variables were correlated with each other.
bulletOne species behaved rather different, namely P. lugubris. The sites at which this happened had many fallen twigs.
bulletThe three species A. perita, A. fabrilis and A. cuneata  behaved very similar, but different from the other species. These species preferred a sandy environment.
bulletMost of the other species occurred at sites with lots of herb cover.
bulletThere were no unmeasured important gradients.

 

To produce Figure 1 in Brodgar, download the spiders.xls excel spreadsheet. Copy and paste the data into Brodgar via the data import process. Click on the "Data exploration" button, click on the tab labeled "Dimension reduction techniques", select canonical correspondence analysis, and click "Go". Figure 1 will now appear on your screen.