Discussion:
[R-sig-eco] repeated measures NMDS?
Eduard Szöcs
2010-11-08 14:39:31 UTC
Permalink
Hi listers,

I have species and environmental data for 24 sites that were sampled thrice. If I want to analyze the data with NMDS I could run metaMDS on the whole dataset (24 sites x 3 times = 72) and then fit environmental data, but this would be some kind of pseudoreplication given that the samplings are not independent and the gradients may be overestimated, wouldn`t it?

For environmental data a factor could be included for the sampling dates - but this would not be possible for species data.

Is there an elegant way either to aggregate data before ordination or to conduct sth. like a repeated measures NMDS?

Thank you in advance,
Eduard Szöcs
Gavin Simpson
2010-11-08 21:01:40 UTC
Permalink
Post by Eduard Szöcs
Hi listers,
I have species and environmental data for 24 sites that were sampled
thrice. If I want to analyze the data with NMDS I could run metaMDS on
the whole dataset (24 sites x 3 times = 72) and then fit environmental
data, but this would be some kind of pseudoreplication given that the
samplings are not independent and the gradients may be overestimated,
wouldn`t it?
For environmental data a factor could be included for the sampling
dates - but this would not be possible for species data.
Is there an elegant way either to aggregate data before ordination or
to conduct sth. like a repeated measures NMDS?
Thank you in advance,
Eduard Szöcs
Depends on how you want to fit the env data - the pseudo-replication
isn't relevant o the nMDS. If you are doing it via function `envfit()`,
then look at argument `'strata'` which should, in your case, be set to a
factor with 24 levels. This won't be perfect because your data are a
timeseries and, strictly, one should permute them whilst maintaining
their ordering in time, but as yet we don't have these types of
permutations hooked into vegan.

If you are doing the fitting some other way you'll need to include
"site" as a fixed effect factor to account for the within site
correlation.

You don't need to worry about the species data and accounting for
sampling interval. You aren't testing the nMDS "axes" or anything like
that, and all the species info has been reduced to dissimilarities and
thence to a set of nMDS coordinates. You need to account for the pseudo
rep at the environmental modelling level, not the species level.

HTH

G
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Vít Syrovátka
2010-11-08 22:41:01 UTC
Permalink
Hi listers,

I am facing a problem with pseudoreplications, similar to that by
Eduard Szöcs.
However, I want to try to explain the variation in species composition
by some environmental variables.
I decided to use adonis() for this, as there are many zeros in the
data-set, but don't know if it is correct, or how to do it so that it
was correct.

The design is:
Aquatic insect larvae were collected at 17 sites three times (spring,
summer, autumn), giving a total of 51 samples.
During each sampling campaign several chemical and substratum
variables were measured, while others were measured only once, as they
characterize the whole site and are not expected to change in time.

Now I want to relate the species composition to environmental
variables. I see that the samples are not independent, so I thought
about setting strata in the adonis model, but it didn't make me much
sense. I am interested in the differences among sites, therefore
strata should probably be season (?). Moreover, the effect of
environmental variables might differ among seasons, therefore
interaction with season should be probably included.

I ended up with this model:
adonis(log(spe+1) ~ env1 %in% season + env2 %in% season + ...,
distance= 'bray', permut= 999, data= env)
(where season is a factor with 3 levels)

I don't feel good about this model and would greatly appreciate any
suggestions on how to build an appropriate model.
Is there a way to analyze such data without splitting them into 3 data-
sets according to the season and analyze them separately?

Thanks in advance for any suggestions,
Vit Syrovatka
Eduard Szöcs
2010-11-10 12:57:25 UTC
Permalink
Thanks, that helped.

permuted.index2() generates these types of permutations. But envfit()
does not use this yet.
What if I modify vectorfit() (used by envfit() ) in such a way that it
uses permuted.index2() instead of permuted.index()?


Eduard Szöcs
Post by Gavin Simpson
Post by Eduard Szöcs
Hi listers,
I have species and environmental data for 24 sites that were sampled
thrice. If I want to analyze the data with NMDS I could run metaMDS on
the whole dataset (24 sites x 3 times = 72) and then fit environmental
data, but this would be some kind of pseudoreplication given that the
samplings are not independent and the gradients may be overestimated,
wouldn`t it?
For environmental data a factor could be included for the sampling
dates - but this would not be possible for species data.
Is there an elegant way either to aggregate data before ordination or
to conduct sth. like a repeated measures NMDS?
Thank you in advance,
Eduard Szöcs
Depends on how you want to fit the env data - the pseudo-replication
isn't relevant o the nMDS. If you are doing it via function `envfit()`,
then look at argument `'strata'` which should, in your case, be set to a
factor with 24 levels. This won't be perfect because your data are a
timeseries and, strictly, one should permute them whilst maintaining
their ordering in time, but as yet we don't have these types of
permutations hooked into vegan.
If you are doing the fitting some other way you'll need to include
"site" as a fixed effect factor to account for the within site
correlation.
You don't need to worry about the species data and accounting for
sampling interval. You aren't testing the nMDS "axes" or anything like
that, and all the species info has been reduced to dissimilarities and
thence to a set of nMDS coordinates. You need to account for the pseudo
rep at the environmental modelling level, not the species level.
HTH
G
Gavin Simpson
2010-11-10 13:25:30 UTC
Permalink
Post by Eduard Szöcs
Thanks, that helped.
permuted.index2() generates these types of permutations. But envfit()
does not use this yet.
What if I modify vectorfit() (used by envfit() ) in such a way that it
uses permuted.index2() instead of permuted.index()?
permuted.index2() is old code. Don't use it. Or at least if you do,
check it works for your permutation design.

I carved off this code into a new package `permuted` which is part of
the vegan stable on R-Forge. `permute` generates the correct
permutations but needs a bit of work on the surrounding helper code.

If you want the same permutation, temporally constrained, then really
you only have 72 permutations of your data and should evaluate all of
them, not at random. `permute` is designed to generate the set of
permutations for you, but there are bugs in it that mean it doesn't work
for all designs as yet.

Sorry I can't be much more help at the moment - I'm busy with work and
about to head into the field for 10 days.

HTH

G
Post by Eduard Szöcs
Eduard Szöcs
Post by Gavin Simpson
Post by Eduard Szöcs
Hi listers,
I have species and environmental data for 24 sites that were sampled
thrice. If I want to analyze the data with NMDS I could run metaMDS on
the whole dataset (24 sites x 3 times = 72) and then fit environmental
data, but this would be some kind of pseudoreplication given that the
samplings are not independent and the gradients may be overestimated,
wouldn`t it?
For environmental data a factor could be included for the sampling
dates - but this would not be possible for species data.
Is there an elegant way either to aggregate data before ordination or
to conduct sth. like a repeated measures NMDS?
Thank you in advance,
Eduard Szöcs
Depends on how you want to fit the env data - the pseudo-replication
isn't relevant o the nMDS. If you are doing it via function `envfit()`,
then look at argument `'strata'` which should, in your case, be set to a
factor with 24 levels. This won't be perfect because your data are a
timeseries and, strictly, one should permute them whilst maintaining
their ordering in time, but as yet we don't have these types of
permutations hooked into vegan.
If you are doing the fitting some other way you'll need to include
"site" as a fixed effect factor to account for the within site
correlation.
You don't need to worry about the species data and accounting for
sampling interval. You aren't testing the nMDS "axes" or anything like
that, and all the species info has been reduced to dissimilarities and
thence to a set of nMDS coordinates. You need to account for the pseudo
rep at the environmental modelling level, not the species level.
HTH
G
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Kay Cecil Cichini
2010-11-10 22:33:24 UTC
Permalink
hi eduard,

i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.

the code is in no way approven of, but at least it maybe good enough
for a starter.

best,
kay

library(vegan)

### species matrix with 5 sp.
### one env.variable
### a factor denoting blocks of repeated measurments

sp<-matrix(runif(24*3*5,0,100),24*3,5)
env<-rnorm(24*3,10,2)
rep.mes<-gl(24,3)

### NMDS:
sol<-metaMDS(sp,trymax=5)

fit<-envfit(sol~env)
plot(sol)
plot(fit)

### testing code for appropiate randomization,
### permuting blocks of 3 as a whole:
permuted.index2(nrow(sp),permControl(strata = rep.mes,permute.strata=T))

B=4999

### setting up frame for population of r2 values:
pop<-rep(NA,B+1)
pop[1]<-fit$vectors$r

### loop:
for(i in 2:(B+1)){
fit.rand<-envfit(sol~env[permuted.index2(nrow(sp),
permControl(strata = rep.mes,permute.strata=T))])
pop[i]<-fit.rand$vectors$r
}

### p-value:
print(pval<-sum(pop>pop[1])/B+1)

### compare to anti-conservative p-value from envfit(),
### not restricting permutations:

envfit(sol~env,perm=B)
Post by Eduard Szöcs
Thanks, that helped.
permuted.index2() generates these types of permutations. But
envfit() does not use this yet.
What if I modify vectorfit() (used by envfit() ) in such a way that
it uses permuted.index2() instead of permuted.index()?
Eduard Szöcs
Post by Gavin Simpson
Post by Eduard Szöcs
Hi listers,
I have species and environmental data for 24 sites that were sampled
thrice. If I want to analyze the data with NMDS I could run metaMDS on
the whole dataset (24 sites x 3 times = 72) and then fit environmental
data, but this would be some kind of pseudoreplication given that the
samplings are not independent and the gradients may be overestimated,
wouldn`t it?
For environmental data a factor could be included for the sampling
dates - but this would not be possible for species data.
Is there an elegant way either to aggregate data before ordination or
to conduct sth. like a repeated measures NMDS?
Thank you in advance,
Eduard Szöcs
Depends on how you want to fit the env data - the pseudo-replication
isn't relevant o the nMDS. If you are doing it via function `envfit()`,
then look at argument `'strata'` which should, in your case, be set to a
factor with 24 levels. This won't be perfect because your data are a
timeseries and, strictly, one should permute them whilst maintaining
their ordering in time, but as yet we don't have these types of
permutations hooked into vegan.
If you are doing the fitting some other way you'll need to include
"site" as a fixed effect factor to account for the within site
correlation.
You don't need to worry about the species data and accounting for
sampling interval. You aren't testing the nMDS "axes" or anything like
that, and all the species info has been reduced to dissimilarities and
thence to a set of nMDS coordinates. You need to account for the pseudo
rep at the environmental modelling level, not the species level.
HTH
G
_______________________________________________
R-sig-ecology mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
Gavin Simpson
2010-11-10 23:11:42 UTC
Permalink
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,

I don't think you have this right.

If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Kay Cecil Cichini
2010-11-11 08:50:40 UTC
Permalink
gavin,

sorry - of course it should be permute.strata=F, permuting within
individual sites!
but despite of this the code should work, doesn't it?

thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Gavin Simpson
2010-11-11 09:16:09 UTC
Permalink
Post by Kay Cecil Cichini
gavin,
sorry - of course it should be permute.strata=F, permuting within
individual sites!
but despite of this the code should work, doesn't it?
Yes, it should - i.e the permutation will be random within the blocks.
Whether it does or not is another matter entirely. AFAICR, this option
in permuted.index2() did work.

*But*, this is doing exactly what the original permuted.index() does if
you set argument 'strata' to be the grouping factor - in your case:

envfit(sol ~ env, strata = rep.mes, perm = 999)

for example. So there is no need to code up the analysis by hand.

This of course doesn't account for any temporal correlation within sites
nor, if the observations on the 24 blocks were made at the same times,
that you might want to have the same permutation within each block.

In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.

With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that.

The two restricted permutations /should/ work correctly, /but/ if you
are doing this by hand, I'd look at the functions in the 'permute'
package - only on R-Forge, on the Vegan R-Forge area - as I know the
code to generate these permutations in that package *does* work. (It is
the helper cruft around it that needs more work.)

https://r-forge.r-project.org/R/?group_id=68

I've had a busy Summer and not made as much progress as I would have
liked, but I hope to finish this soon and make an initial release to
CRAN so we can start grafting it into vegan.

In the meantime, I can help people try to link the two packages if
needed, but I don't have much time till the end of this month.

G
Post by Kay Cecil Cichini
thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Gavin Simpson
2010-11-11 10:08:21 UTC
Permalink
[Apologies - I replied with this only to Kay. Hopefully she won't mind
receiving it twice!]
thanks a lot for your elaborations.
of course, envfit(..,strata=rep.mes) does it.
then, at least, i consider it exercise for other cases, were you really
might need a handmade permutation
so, as to round this off, i actually can't analyse this very design in
such a way, with the right NULL concerned - but were to go from here?
You could hook up the code in 'permute'. It contains a new
permuted.index() function (and currently no NAMESPACE, so will overwrite
(mask) the vegan version if loaded after vegan) which will break all the
permutation code in vegan).

Here is your example, modified to use the code in permute. I post this
to illustrate how you'd use the new code. There are lots of examples
in ?permuted.index (for the permute package, not the vegan package
version), but *don't* touch the permutation t-test example code as it
uses permCheck() and it might call allPerms() and allPerms() *IS*
*WRONG* for some designs --- this is the last bit I need to fix/get
working before we can make our first release of this code.

HTH

G

Here's the example script:

## Load packages
require(vegan)
require(permute)

## Data
set.seed(123)
sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
env <- rnorm(24*3, 10, 2)
rep.mes <- gl(24, 3)

### NMDS:
sol <- metaMDS(sp, trymax = 5)
fit <- envfit(sol~env, permutations = 0) ## perms now won't work!

B <- 999 ## number of perms

### setting up frame for population of r2 values:
pop <- rep(NA, B + 1)
pop[1] <- fit$vectors$r

## set-up a Control object:
ctrl <- permControl(strata = rep.mes,
within = Within(type = "series", mirror = FALSE))
## we turn off mirroring as time should only flow in one direction

## Number of observations
nobs <- nrow(sp)

## check it works
matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
## Yep - Phew!!!

### loop:
set.seed(1)
for(i in 2:(B+1)){
idx <- permuted.index(nobs, control = ctrl)
fit.rand <- envfit(sol ~ env[idx], permutations = 0)
pop[i] <- fit.rand$vectors$r
}

### p-value:
pval <- sum(pop >= pop[1]) / (B + 1)
pval
pval
[1] 0.286

Now to compare with the actual permutation you'd have gotten from
env.fit, you first need:

detach(package:permute)
set.seed(1)
fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
fit2
***VECTORS

NMDS1 NMDS2 r2 Pr(>r)
env 0.28727 0.95785 0.0315 0.321
P values based on 999 permutations, stratified within strata.
a simplistic approach could be, averaging sites, yielding n=24 for
further testing.
yours,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
gavin,
sorry - of course it should be permute.strata=F, permuting within
individual sites!
but despite of this the code should work, doesn't it?
Yes, it should - i.e the permutation will be random within the blocks.
Whether it does or not is another matter entirely. AFAICR, this option
in permuted.index2() did work.
*But*, this is doing exactly what the original permuted.index() does if
envfit(sol ~ env, strata = rep.mes, perm = 999)
for example. So there is no need to code up the analysis by hand.
This of course doesn't account for any temporal correlation within sites
nor, if the observations on the 24 blocks were made at the same times,
that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that.
The two restricted permutations /should/ work correctly, /but/ if you
are doing this by hand, I'd look at the functions in the 'permute'
package - only on R-Forge, on the Vegan R-Forge area - as I know the
code to generate these permutations in that package *does* work. (It is
the helper cruft around it that needs more work.)
https://r-forge.r-project.org/R/?group_id=68
I've had a busy Summer and not made as much progress as I would have
liked, but I hope to finish this soon and make an initial release to
CRAN so we can start grafting it into vegan.
In the meantime, I can help people try to link the two packages if
needed, but I don't have much time till the end of this month.
G
Post by Kay Cecil Cichini
thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Kay Cecil Cichini
2010-11-11 12:03:54 UTC
Permalink
thanks a lot for the illustrative example.

..referring to your quote:

"...This of course doesn't account for any temporal correlation within
sites nor, if the observations on the 24 blocks were made at the same
times, that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that."

..so for the time beeing we assume the former case -
and for the latter there is no way out.

yours,
kay

ps: in germany/austria there are two alternative spellings for the name
kai/kay - beeing a male name opposed to the english kay.
Post by Gavin Simpson
[Apologies - I replied with this only to Kay. Hopefully she won't mind
receiving it twice!]
thanks a lot for your elaborations.
of course, envfit(..,strata=rep.mes) does it.
then, at least, i consider it exercise for other cases, were you really
might need a handmade permutation
so, as to round this off, i actually can't analyse this very design in
such a way, with the right NULL concerned - but were to go from here?
You could hook up the code in 'permute'. It contains a new
permuted.index() function (and currently no NAMESPACE, so will overwrite
(mask) the vegan version if loaded after vegan) which will break all the
permutation code in vegan).
Here is your example, modified to use the code in permute. I post this
to illustrate how you'd use the new code. There are lots of examples
in ?permuted.index (for the permute package, not the vegan package
version), but *don't* touch the permutation t-test example code as it
uses permCheck() and it might call allPerms() and allPerms() *IS*
*WRONG* for some designs --- this is the last bit I need to fix/get
working before we can make our first release of this code.
HTH
G
## Load packages
require(vegan)
require(permute)
## Data
set.seed(123)
sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
env <- rnorm(24*3, 10, 2)
rep.mes <- gl(24, 3)
sol <- metaMDS(sp, trymax = 5)
fit <- envfit(sol~env, permutations = 0) ## perms now won't work!
B <- 999 ## number of perms
pop <- rep(NA, B + 1)
pop[1] <- fit$vectors$r
ctrl <- permControl(strata = rep.mes,
within = Within(type = "series", mirror = FALSE))
## we turn off mirroring as time should only flow in one direction
## Number of observations
nobs <- nrow(sp)
## check it works
matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
## Yep - Phew!!!
set.seed(1)
for(i in 2:(B+1)){
idx <- permuted.index(nobs, control = ctrl)
fit.rand <- envfit(sol ~ env[idx], permutations = 0)
pop[i] <- fit.rand$vectors$r
}
pval <- sum(pop >= pop[1]) / (B + 1)
pval
pval
[1] 0.286
Now to compare with the actual permutation you'd have gotten from
detach(package:permute)
set.seed(1)
fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
fit2
***VECTORS
NMDS1 NMDS2 r2 Pr(>r)
env 0.28727 0.95785 0.0315 0.321
P values based on 999 permutations, stratified within strata.
a simplistic approach could be, averaging sites, yielding n=24 for
further testing.
yours,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
gavin,
sorry - of course it should be permute.strata=F, permuting within
individual sites!
but despite of this the code should work, doesn't it?
Yes, it should - i.e the permutation will be random within the blocks.
Whether it does or not is another matter entirely. AFAICR, this option
in permuted.index2() did work.
*But*, this is doing exactly what the original permuted.index() does if
envfit(sol ~ env, strata = rep.mes, perm = 999)
for example. So there is no need to code up the analysis by hand.
This of course doesn't account for any temporal correlation within sites
nor, if the observations on the 24 blocks were made at the same times,
that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that.
The two restricted permutations /should/ work correctly, /but/ if you
are doing this by hand, I'd look at the functions in the 'permute'
package - only on R-Forge, on the Vegan R-Forge area - as I know the
code to generate these permutations in that package *does* work. (It is
the helper cruft around it that needs more work.)
https://r-forge.r-project.org/R/?group_id=68
I've had a busy Summer and not made as much progress as I would have
liked, but I hope to finish this soon and make an initial release to
CRAN so we can start grafting it into vegan.
In the meantime, I can help people try to link the two packages if
needed, but I don't have much time till the end of this month.
G
Post by Kay Cecil Cichini
thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Gavin Simpson
2010-11-11 12:32:24 UTC
Permalink
Post by Kay Cecil Cichini
thanks a lot for the illustrative example.
"...This of course doesn't account for any temporal correlation within
sites nor, if the observations on the 24 blocks were made at the same
times, that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that."
..so for the time beeing we assume the former case -
and for the latter there is no way out.
Yes - for the case of wanting the same temporal permutation within each
block there only are 3 permutations (6 if you allow time to go backwards
[mirror = TRUE]), but this includes the observed ordering, so only 2 (5)
other permutations to try. This is where permutation tests fail. If the
observed statistic is bigger than the statistic from the two random
permutations, this is an exact statistic in the sense that you've
evaluated all possible orderings consistent with the null, but all you
can is that the p-value is p < 0.333.

Having said that, we can perhaps try to be reasonable and relax some of
the assumptions (how often do our data fully meet all the assumptions of
the parametric statistical approaches we use?) and be happy with a null
that respects temporal autocorrelation within block, but not across
blocks. One might then choose to accept as a significant result a
permutation p that is say p <= 0.01 or even p <= 0.001, rather than the
usual p <= 0.05, to help guard against using the wrong Null.
Post by Kay Cecil Cichini
yours,
kay
ps: in germany/austria there are two alternative spellings for the name
kai/kay - beeing a male name opposed to the english kay.
I am truly sorry for my mistake - please accept my apologies. Totally
unintentional I assure you.

All the best,

G
Post by Kay Cecil Cichini
Post by Gavin Simpson
[Apologies - I replied with this only to Kay. Hopefully she won't mind
receiving it twice!]
thanks a lot for your elaborations.
of course, envfit(..,strata=rep.mes) does it.
then, at least, i consider it exercise for other cases, were you really
might need a handmade permutation
so, as to round this off, i actually can't analyse this very design in
such a way, with the right NULL concerned - but were to go from here?
You could hook up the code in 'permute'. It contains a new
permuted.index() function (and currently no NAMESPACE, so will overwrite
(mask) the vegan version if loaded after vegan) which will break all the
permutation code in vegan).
Here is your example, modified to use the code in permute. I post this
to illustrate how you'd use the new code. There are lots of examples
in ?permuted.index (for the permute package, not the vegan package
version), but *don't* touch the permutation t-test example code as it
uses permCheck() and it might call allPerms() and allPerms() *IS*
*WRONG* for some designs --- this is the last bit I need to fix/get
working before we can make our first release of this code.
HTH
G
## Load packages
require(vegan)
require(permute)
## Data
set.seed(123)
sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
env <- rnorm(24*3, 10, 2)
rep.mes <- gl(24, 3)
sol <- metaMDS(sp, trymax = 5)
fit <- envfit(sol~env, permutations = 0) ## perms now won't work!
B <- 999 ## number of perms
pop <- rep(NA, B + 1)
pop[1] <- fit$vectors$r
ctrl <- permControl(strata = rep.mes,
within = Within(type = "series", mirror = FALSE))
## we turn off mirroring as time should only flow in one direction
## Number of observations
nobs <- nrow(sp)
## check it works
matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
## Yep - Phew!!!
set.seed(1)
for(i in 2:(B+1)){
idx <- permuted.index(nobs, control = ctrl)
fit.rand <- envfit(sol ~ env[idx], permutations = 0)
pop[i] <- fit.rand$vectors$r
}
pval <- sum(pop >= pop[1]) / (B + 1)
pval
pval
[1] 0.286
Now to compare with the actual permutation you'd have gotten from
detach(package:permute)
set.seed(1)
fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
fit2
***VECTORS
NMDS1 NMDS2 r2 Pr(>r)
env 0.28727 0.95785 0.0315 0.321
P values based on 999 permutations, stratified within strata.
a simplistic approach could be, averaging sites, yielding n=24 for
further testing.
yours,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
gavin,
sorry - of course it should be permute.strata=F, permuting within
individual sites!
but despite of this the code should work, doesn't it?
Yes, it should - i.e the permutation will be random within the blocks.
Whether it does or not is another matter entirely. AFAICR, this option
in permuted.index2() did work.
*But*, this is doing exactly what the original permuted.index() does if
envfit(sol ~ env, strata = rep.mes, perm = 999)
for example. So there is no need to code up the analysis by hand.
This of course doesn't account for any temporal correlation within sites
nor, if the observations on the 24 blocks were made at the same times,
that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that.
The two restricted permutations /should/ work correctly, /but/ if you
are doing this by hand, I'd look at the functions in the 'permute'
package - only on R-Forge, on the Vegan R-Forge area - as I know the
code to generate these permutations in that package *does* work. (It is
the helper cruft around it that needs more work.)
https://r-forge.r-project.org/R/?group_id=68
I've had a busy Summer and not made as much progress as I would have
liked, but I hope to finish this soon and make an initial release to
CRAN so we can start grafting it into vegan.
In the meantime, I can help people try to link the two packages if
needed, but I don't have much time till the end of this month.
G
Post by Kay Cecil Cichini
thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Kay Cecil Cichini
2010-11-11 13:50:12 UTC
Permalink
many, many thanks,
i think this can serve as a pedagogically worthful example for anyone
facing troubles with dependence within species / environment data.

yours,
kay

ps: nevermind the mistake.
Post by Gavin Simpson
Post by Kay Cecil Cichini
thanks a lot for the illustrative example.
"...This of course doesn't account for any temporal correlation within
sites nor, if the observations on the 24 blocks were made at the same
times, that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that."
..so for the time beeing we assume the former case -
and for the latter there is no way out.
Yes - for the case of wanting the same temporal permutation within each
block there only are 3 permutations (6 if you allow time to go backwards
[mirror = TRUE]), but this includes the observed ordering, so only 2 (5)
other permutations to try. This is where permutation tests fail. If the
observed statistic is bigger than the statistic from the two random
permutations, this is an exact statistic in the sense that you've
evaluated all possible orderings consistent with the null, but all you
can is that the p-value is p < 0.333.
Having said that, we can perhaps try to be reasonable and relax some of
the assumptions (how often do our data fully meet all the assumptions of
the parametric statistical approaches we use?) and be happy with a null
that respects temporal autocorrelation within block, but not across
blocks. One might then choose to accept as a significant result a
permutation p that is say p <= 0.01 or even p <= 0.001, rather than the
usual p <= 0.05, to help guard against using the wrong Null.
Post by Kay Cecil Cichini
yours,
kay
ps: in germany/austria there are two alternative spellings for the name
kai/kay - beeing a male name opposed to the english kay.
I am truly sorry for my mistake - please accept my apologies. Totally
unintentional I assure you.
All the best,
G
Post by Kay Cecil Cichini
Post by Gavin Simpson
[Apologies - I replied with this only to Kay. Hopefully she won't mind
receiving it twice!]
thanks a lot for your elaborations.
of course, envfit(..,strata=rep.mes) does it.
then, at least, i consider it exercise for other cases, were you really
might need a handmade permutation
so, as to round this off, i actually can't analyse this very design in
such a way, with the right NULL concerned - but were to go from here?
You could hook up the code in 'permute'. It contains a new
permuted.index() function (and currently no NAMESPACE, so will overwrite
(mask) the vegan version if loaded after vegan) which will break all the
permutation code in vegan).
Here is your example, modified to use the code in permute. I post this
to illustrate how you'd use the new code. There are lots of examples
in ?permuted.index (for the permute package, not the vegan package
version), but *don't* touch the permutation t-test example code as it
uses permCheck() and it might call allPerms() and allPerms() *IS*
*WRONG* for some designs --- this is the last bit I need to fix/get
working before we can make our first release of this code.
HTH
G
## Load packages
require(vegan)
require(permute)
## Data
set.seed(123)
sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
env <- rnorm(24*3, 10, 2)
rep.mes <- gl(24, 3)
sol <- metaMDS(sp, trymax = 5)
fit <- envfit(sol~env, permutations = 0) ## perms now won't work!
B <- 999 ## number of perms
pop <- rep(NA, B + 1)
pop[1] <- fit$vectors$r
ctrl <- permControl(strata = rep.mes,
within = Within(type = "series", mirror = FALSE))
## we turn off mirroring as time should only flow in one direction
## Number of observations
nobs <- nrow(sp)
## check it works
matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
## Yep - Phew!!!
set.seed(1)
for(i in 2:(B+1)){
idx <- permuted.index(nobs, control = ctrl)
fit.rand <- envfit(sol ~ env[idx], permutations = 0)
pop[i] <- fit.rand$vectors$r
}
pval <- sum(pop >= pop[1]) / (B + 1)
pval
pval
[1] 0.286
Now to compare with the actual permutation you'd have gotten from
detach(package:permute)
set.seed(1)
fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
fit2
***VECTORS
NMDS1 NMDS2 r2 Pr(>r)
env 0.28727 0.95785 0.0315 0.321
P values based on 999 permutations, stratified within strata.
a simplistic approach could be, averaging sites, yielding n=24 for
further testing.
yours,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
gavin,
sorry - of course it should be permute.strata=F, permuting within
individual sites!
but despite of this the code should work, doesn't it?
Yes, it should - i.e the permutation will be random within the blocks.
Whether it does or not is another matter entirely. AFAICR, this option
in permuted.index2() did work.
*But*, this is doing exactly what the original permuted.index() does if
envfit(sol ~ env, strata = rep.mes, perm = 999)
for example. So there is no need to code up the analysis by hand.
This of course doesn't account for any temporal correlation within sites
nor, if the observations on the 24 blocks were made at the same times,
that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that.
The two restricted permutations /should/ work correctly, /but/ if you
are doing this by hand, I'd look at the functions in the 'permute'
package - only on R-Forge, on the Vegan R-Forge area - as I know the
code to generate these permutations in that package *does* work. (It is
the helper cruft around it that needs more work.)
https://r-forge.r-project.org/R/?group_id=68
I've had a busy Summer and not made as much progress as I would have
liked, but I hope to finish this soon and make an initial release to
CRAN so we can start grafting it into vegan.
In the meantime, I can help people try to link the two packages if
needed, but I don't have much time till the end of this month.
G
Post by Kay Cecil Cichini
thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an appropiate
permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good enough
for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
Eduard Szöcs
2010-11-15 21:23:09 UTC
Permalink
I can only agree with kay - thank you for your informative answers!

Eduard
Post by Kay Cecil Cichini
many, many thanks,
i think this can serve as a pedagogically worthful example for anyone
facing troubles with dependence within species / environment data.
yours,
kay
ps: nevermind the mistake.
Post by Gavin Simpson
Post by Kay Cecil Cichini
thanks a lot for the illustrative example.
"...This of course doesn't account for any temporal correlation
within sites nor, if the observations on the 24 blocks were made at
the same times, that you might want to have the same permutation
within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that."
..so for the time beeing we assume the former case -
and for the latter there is no way out.
Yes - for the case of wanting the same temporal permutation within each
block there only are 3 permutations (6 if you allow time to go backwards
[mirror = TRUE]), but this includes the observed ordering, so only 2 (5)
other permutations to try. This is where permutation tests fail. If the
observed statistic is bigger than the statistic from the two random
permutations, this is an exact statistic in the sense that you've
evaluated all possible orderings consistent with the null, but all you
can is that the p-value is p < 0.333.
Having said that, we can perhaps try to be reasonable and relax some of
the assumptions (how often do our data fully meet all the assumptions of
the parametric statistical approaches we use?) and be happy with a null
that respects temporal autocorrelation within block, but not across
blocks. One might then choose to accept as a significant result a
permutation p that is say p <= 0.01 or even p <= 0.001, rather than the
usual p <= 0.05, to help guard against using the wrong Null.
Post by Kay Cecil Cichini
yours,
kay
ps: in germany/austria there are two alternative spellings for the
name kai/kay - beeing a male name opposed to the english kay.
I am truly sorry for my mistake - please accept my apologies. Totally
unintentional I assure you.
All the best,
G
Post by Kay Cecil Cichini
Post by Gavin Simpson
[Apologies - I replied with this only to Kay. Hopefully she won't mind
receiving it twice!]
thanks a lot for your elaborations.
of course, envfit(..,strata=rep.mes) does it.
then, at least, i consider it exercise for other cases, were you
really might need a handmade permutation
so, as to round this off, i actually can't analyse this very
design in such a way, with the right NULL concerned - but were to
go from here?
You could hook up the code in 'permute'. It contains a new
permuted.index() function (and currently no NAMESPACE, so will overwrite
(mask) the vegan version if loaded after vegan) which will break all the
permutation code in vegan).
Here is your example, modified to use the code in permute. I post this
to illustrate how you'd use the new code. There are lots of examples
in ?permuted.index (for the permute package, not the vegan package
version), but *don't* touch the permutation t-test example code as it
uses permCheck() and it might call allPerms() and allPerms() *IS*
*WRONG* for some designs --- this is the last bit I need to fix/get
working before we can make our first release of this code.
HTH
G
## Load packages
require(vegan)
require(permute)
## Data
set.seed(123)
sp <- matrix(runif(24*3*5, 0, 100), nrow = 24 * 3, ncol = 5)
env <- rnorm(24*3, 10, 2)
rep.mes <- gl(24, 3)
sol <- metaMDS(sp, trymax = 5)
fit <- envfit(sol~env, permutations = 0) ## perms now won't work!
B <- 999 ## number of perms
pop <- rep(NA, B + 1)
pop[1] <- fit$vectors$r
ctrl <- permControl(strata = rep.mes,
within = Within(type = "series", mirror = FALSE))
## we turn off mirroring as time should only flow in one direction
## Number of observations
nobs <- nrow(sp)
## check it works
matrix(permuted.index(nobs, control = ctrl), ncol = 3, byrow = TRUE)
## Yep - Phew!!!
set.seed(1)
for(i in 2:(B+1)){
idx <- permuted.index(nobs, control = ctrl)
fit.rand <- envfit(sol ~ env[idx], permutations = 0)
pop[i] <- fit.rand$vectors$r
}
pval <- sum(pop >= pop[1]) / (B + 1)
pval
pval
[1] 0.286
Now to compare with the actual permutation you'd have gotten from
detach(package:permute)
set.seed(1)
fit2 <- envfit(sol~env, permutations = 999, strata = rep.mes)
fit2
***VECTORS
NMDS1 NMDS2 r2 Pr(>r)
env 0.28727 0.95785 0.0315 0.321
P values based on 999 permutations, stratified within strata.
a simplistic approach could be, averaging sites, yielding n=24 for
further testing.
yours,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
gavin,
sorry - of course it should be permute.strata=F, permuting
within individual sites!
but despite of this the code should work, doesn't it?
Yes, it should - i.e the permutation will be random within the blocks.
Whether it does or not is another matter entirely. AFAICR, this option
in permuted.index2() did work.
*But*, this is doing exactly what the original permuted.index() does if
envfit(sol ~ env, strata = rep.mes, perm = 999)
for example. So there is no need to code up the analysis by hand.
This of course doesn't account for any temporal correlation within sites
nor, if the observations on the 24 blocks were made at the same times,
that you might want to have the same permutation within each block.
In the former there are 3^24 possible permutations (time series within
blocks), so 999 random perms is reasonable, *but* some of these random
perms (in permuted.index()) will not respect the temporal
ordering and
thus you aren't really exploring the correct NULL.
With the latter constraint (same temporal perm within blocks), there are
3 random permutations, so good luck getting a reasonable p-value from
that.
The two restricted permutations /should/ work correctly, /but/ if you
are doing this by hand, I'd look at the functions in the 'permute'
package - only on R-Forge, on the Vegan R-Forge area - as I know the
code to generate these permutations in that package *does* work. (It is
the helper cruft around it that needs more work.)
https://r-forge.r-project.org/R/?group_id=68
I've had a busy Summer and not made as much progress as I would have
liked, but I hope to finish this soon and make an initial release to
CRAN so we can start grafting it into vegan.
In the meantime, I can help people try to link the two packages if
needed, but I don't have much time till the end of this month.
G
Post by Kay Cecil Cichini
thanks,
kay
Post by Gavin Simpson
Post by Kay Cecil Cichini
hi eduard,
i faced similar problems recently and came to the below solution.
i only try to address the pseudoreplication with an
appropiate permutation scheme.
when it comes to testing the interactions, things may get more complicated.
the code is in no way approven of, but at least it maybe good
enough for a starter.
best,
kay
Hi Kay,
I don't think you have this right.
If you have measured repeatedly, say 5 times, on the same 10
individuals, or if you have ten fields and you take 5 quadrats from
each, you need to permute *within* the individuals/fields, not permute
the individuals/fields which is what permute.strata does. permute.strata
would be useful in evaluating factors that vary at the block
(individuals/fields) level, not at the sample levels.
s***@sci.muni.cz
2010-11-10 09:50:12 UTC
Permalink
Dear listers,

I intended to send a separate question but it fell under comments on
Eduard Szöcs's question. Sorry for that.

The problem:
I collected aquatic insect larvae from 17 sites three times (spring,
summer, autumn), giving a total of 51 samples. During each sampling
campaign several chemical and substratum variables were measured, while
some others were measured only once, as they are not expected to change
over time.
Now I want to relate the assemblage taxonomic structure to environmental
variables and find those variables which explain considerable portion of
variance in the assemblage data-set.

Suggested analysis by Zuur et al.:
In chapter 26 of Zuur et al. 2007 (Analysing Ecological Data) the authors
analyze similar kind of data (12 time seris) and suggest using direct
ordination (RDA, CCA), or Mantel test if there are many zeros (because of
the possibility to select a measure of dissimilarity), to relate the
communities to the environmental variables and to identify the important
ones.

My thoughts:
As there are many zeros in the data, I decided to use adonis() (as a
better alternative to Mantel test).
But it seems that the samples are not independent - I have only 17 sites
with 3 replications for each. Am I correct?
I thought about setting strata=season in the adonis model as I am
interested in the differences among sites, but don't know if it solves the
problem? Moreover, the effect of environmental variables might differ
among seasons, therefore interaction with season should probably be
included.

I ended up with this model:
adonis(log(spe+1) ~ env1 %in% season + env2 %in% season + ..., distance=
'bray', permut= 999, data= env)
(where season is a factor with 3 levels)

I'm not convinced the model is correct and would greatly appreciate any
comments.

And, please, let me know if I am completely wrong.

Thank you very much and sorry for the long text,

Vit

--
Vit Syrovatka
Department of Botany and Zoology
Masaryk University
Kotlarska 2
CZ-611 37 Brno, Czech Republic

Tel.: +420-532146342
E-mail: ***@sci.muni.cz
Loading...