MAFA applied on zoobenthic time series data

MAFA stands for min/max autocorrelation factor analysis. The implementation of MAFA in Brodgar is based on:

bulletSolow, A.R. (1994). Detecting Change in the Composition of a Multispecies Community. Biometrics 50, 556-565.
bulletShapiro D.E. and Switzer P. (1989). Extracting time trends from multiple monitoring sites. Technical report No. 132. Department of Statistics, Standford University, Callifornia.

MAFA can be described in various ways, e.g.

bulletA type of principal component analysis especially for (short) time series.
bulletA method for extracting trends from multiple time series.
bulletA method for estimating index functions from time series.
bulletA smoothing method.
bulletA signal extraction procedure.

The aim of this example is to show how MAFA can be used in Brodgar in order to detect changes in a multispecies community. MAFA can be applied by clicking the "Time Series" button from the main menu and select the "General" tab. Click on the "Go" button for MAFA.

Data 

In an intertidal area, abundances of 17 zoobenthic species were measured at various sites*. Sampling took place on an annual basis since 1970. For illustration purposes, total abundances over all sites were taken. However, this is not required for the methodology. The species used in this example are given in Table 1. Figure 1 shows a plot of all 17 time series. 

Table 1. List of zoobenthic species used in the MAFA example.

No species   No  species
1 Corophium spec.   10 Heteromastus filiformis
2 Eteone longa   11 Mytilus edulis
3 Arenicola marina   12 Nereis spec.
4 Hydrobia ulvae   13 Phyllodoce sp.
5 Cerastoderma edule   14 Mya arenaria
6 Ensis directus 15 Tellina tenuis
7 Macoma balthica 16 Scoloplos armiger
8 Marenzelleria wireni 17 Nephtys hombergii
9 Lanice conchilega  

 

Figure 1: Time series plot of all 17 zoobenthic species.

  

MAFA

In principal component analysis, the first axis explains most variance. In MAFA, the first axis has the highest auto-correlation with lag 1. The second axis has the second highest auto-correlation with time lag 1, etc. The underlying idea is that a trend is associated with high auto-correlation with time lag 1. Therefore, the first MAFA axis represents the trend, or the main underlying pattern in the data. This axis can also be seen as an index function or smoothing curve. Summarising, MAFA can be seen as a PCA-type analysis in which the axes represent trends. Figure 2 shows the first two MAFA axes for the zoobenthic data. The first axis shows an increase between 1973-1982 and 1992-2000. Note that this is the main trend underlying the time series. The second axis shows a decrease from 1970 until 1986 and an increase there after (except for the period 1994-1995).  

Figure 2: First (left) and second (right) MAFA axes.

 

Just as in PCA, loadings are estimated. These can be used to infer which species are related to a particular MAFA axis. Another option, used here, is to calculate the cross-correlation between the MAFA axes and each of the original species time series. We called these canonical correlations. Figure 3 and 4 show the canonical correlations for the first two MAFA axes. Results indicate that the first MAFA axis is important for A. marine, H. ulvae, C. edule, E. directus, M. balthica, H. filiformis, M. edulis, P. species, M. arenaria and S. armiger. Hence, all these species a characterised by a general increase (or decrease if the correlation was negative) in abundance. In Figure 5, the original time series of some of these species are highlighted, and one can indeed see a general increase. A similar graph can be made for some of the species related to the second MAFA axis.

Figure 3: Canonical correlations for MAFA axis 1.

 

  

Figure 4: Canonical correlations for MAFA axis 2.

Figure 5: A few of the original time series highlighted.

 

Explanatory variables

If explanatory variables are selected during the data import process,  Brodgar will automatically estimate the cross-correlations between the MAFA axes and all explanatory variables. For this data set, water temperature was measured as well. The relevant information can be obtained by clicking the "Numerical output" button from the MAFA window. For these data, we have:

Correlations between explanatory variables and MAFA
1 	0.08 	0.31 	0.17 	-0.01 	0.04
Significance level for correlations: 0.37

Hence, the cross-correlation between the temperature (the only explanatory variable) and the first MAFA axis is 0.08 and is not significant. The cross-correlation between temperature and the second MAFA axis is 0.31. Summarising, temperature is not significantly related to any of the first 5 MAFA axes.

 

P-values of the MAFA axes

Solow (1994) described a randomization process to obtain  p-values for the MAFA axes, and this has been implemented in Brodgar. These can be used to decide how many axes to present. By default, p-values are not calculated because for large data sets, the randomization process might be time consuming. P-values can be obtained by clicking the "Settings" button from the main "Time Series" menu before MAFA is selected. Results for the zoobenthic data set are as follows:

Axis auto-correlation lag 1 	P-value
1 	0.969 			0.000
2 	0.902 			0.006
3 	0.867 			0.006
4 	0.715 			0.238

These results indicate that MAFA axis 1, 2 and 3 are significant and that the fourth and consequently, higher axes, are not significant.   

*For copyright purposes, we used an artificial data set. To generate these data, we made use of a real zoobenthic time series data set. Therefore, this artificial data set shows identical characteristics as any other zoobenthic time series data set.

 

Missing values

In order to apply MAFA, Brodgar replaces the missing values in the response variables (and explanatory variables) by the mean values of the corresponding variables. Alternatively, the user can fill in the missing values by more appropriate techniques prior to the data import process.

 

What can go wrong

MAFA calculates two correlation matrices. Numerical problems will arise if these matrices are singular. One option to solve singularity problems is to remove one of the response variables which has a very high  cross-correlation. These high correlations can be detected from the "Data exploration" menu and click the "Detect nearly linear relationships" button. You also need more time points then variables.