Detrended correspondence analysis


Detrended correspondence analysis is a multivariate statistical technique widely used by ecologists to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify ecological community data. DCA is frequently used to suppress artifacts inherent in most other multivariate analyses when applied to gradient data.

History

DCA was created in 1979 by Mark Hill of the United Kingdom's Institute for Terrestrial Ecology and implemented in FORTRAN code package called DECORANA, a correspondence analysis method. DCA is sometimes erroneously referred to as DECORANA; however, DCA is the underlying algorithm, while DECORANA is a tool implementing it.

Issues addressed

According to Hill and Gauch, DCA suppresses two artifacts inherent in most other multivariate analyses when applied to gradient data. An example is a time-series of plant species colonising a new habitat; early successional species are replaced by mid-successional species, then by late successional ones. When such data are analysed by a standard ordination such as a correspondence analysis:
Outside ecology, the same artifacts occur when gradient data are analysed because the curved projection is an accurate representation of the shape of the data in multivariate space.
Ter Braak and Prentice cite a simulation study analysing two-dimensional species packing models resulting in a better performance of DCA compared to CA.

Method

DCA is an iterative algorithm that has shown itself to be a highly reliable and useful tool for data exploration and summary in community ecology. It starts by running a standard ordination on the data, to produce the initial horse-shoe curve in which the 1st ordination axis distorts into the 2nd axis. It then divides the first axis into segments, and rescales each segment to have mean value of zero on the 2nd axis - this effectively squashes the curve flat. It also rescales the axis so that the ends are no longer compressed relative to the middle, so that 1 DCA unit approximates to the same rate of turnover all the way through the data: the rule of thumb is that 4 DCA units mean that there has been a total turnover in the community.
Ter Braak and Prentice warn against the non-linear rescaling of the axes due to robustness issues and recommend using detrending-by-polynomials only.

Drawbacks

No significance tests are available with DCA, although there is a constrained version called DCCA in which the axes are forced by Multiple linear regression to correlate optimally with a linear combination of other variables; this allows testing of a null model by Monte-Carlo permutation analysis.

Example

The example shows an ideal data set: The species data is in rows, samples in columns. For each sample along the gradient, a new species is introduced but another species is no longer present. The result is a sparse matrix. Ones indicate the presence of a species in a sample. Except at the edges each sample contains five species.
1234567891011121314151617181920
SP111100000000000000000
SP211110000000000000000
SP311111000000000000000
SP401111100000000000000
SP500111110000000000000
SP600011111000000000000
SP700001111100000000000
SP800000111110000000000
SP900000011111000000000
SP1000000001111100000000
SP1100000000111110000000
SP1200000000011111000000
SP1300000000001111100000
SP1400000000000111110000
SP1500000000000011111000
SP1600000000000001111100
SP1700000000000000111110
SP1800000000000000011111
SP1900000000000000001111
SP2000000000000000000111

The plot of the first two axes of the correspondence analysis result on the right hand side clearly shows the disadvantages of this procedure: the edge effect, i.e. the points are clustered at the edges of the first axis, and the arch effect.