俺的学习笔记

Sunday, November 25, 2018

Wilcoxon matched-pairs signed-ranks test

The Wilcoxon signed-ranks test is a non-parametric equivalent of the paired t-test. It is most commonly used to test for a difference in the mean (or median) of paired observations - whether measurements on pairs of units or before and after measurements on the same unit. It can also be used as a one-sample test to test whether a particular sample came from a population with a specified median.

Unlike the t-test, the paired differences do not need to follow a normal distribution. But if you wish to test the median (= mean) difference, the distribution each side of the median - must have a similar shape. In other words the distribution of the differences must be symmetrical. If the distribution of the differences is not symmetrical, you can only test the null hypothesis that the Hodges-Lehmann estimate of the median difference is zero. Unlike most rank tests, this test outcome is affected by a transformation before ranking since differences are ranked in order of their absolute size. It may thus be worth plotting the distribution of the differences after an appropriate transformation (for example logarithmic=对数) to see if it makes the distribution appear more symmetrical.

A signed-ranks upon paired samples is less powerful than the t-test (relative efficiency is about 95%) providing the differences are normally distributed. If they are not, and cannot be transformed such that they are, a paired t-test is not appropriate and the non-parametric test should be used.
(from here, a very nice website)
在R里,还是用wilcox.test()来做。或者参考这里
> vx=c(1.83, 1.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
> vy=c(0.88, 0.65, 0.60, 1.05, 1.06, 1.29, 1.06, 2.14, 1.29)
> wilcox.test(x=vx,y=vy,paired=T,conf.int=T,conf.level=0.95)

    Wilcoxon signed rank test

data: vx and vy
V = 45, p-value = 0.003906
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.465 1.025
sample estimates:
(pseudo)median
0.77

> wilcox.test(x=vx,y=vy,paired = T)

    Wilcoxon signed rank test

data: vx and vy
V = 45, p-value = 0.003906
alternative hypothesis: true location shift is not equal to 0
但是,这篇文章里面介绍说,用exactRankTests包里面的wilcox.exact()来做比较好,因为当有tie (日本語:同順位。中文:相持——这谁TM翻译的?成心不想让人看明白是吧?)的时候,没有办法准确计算p-value。
> wilcox.test(x=vx,y=vy,conf.int=T,conf.level=0.95)
  Wilcoxon rank sum test with continuity correction
data: vx and vy
W = 74, p-value = 0.003534
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.3300431 1.1900409
sample estimates:
difference in location
0.6699332
Warning messages:
1: In wilcox.test.default(x = vx, y = vy, conf.int = T, conf.level = 0.95) :
cannot compute exact p-value with ties
2: In wilcox.test.default(x = vx, y = vy, conf.int = T, conf.level = 0.95) :
cannot compute exact confidence intervals with ties
那么在统计学里什么叫tie呢?
就是这组数据里面有相同的数字而已。
比如刚才的测试里面
vy=c(0.880.650.601.051.061.291.062.141.29)
1.29有两个,1.06也有两个,这个就是statistical tie。日语里叫同顺位,也就是说有两个或两个以上的数字是相同的。
如果俺们把这两个tie修改一下。让它们不再相同,就不会出这个警告了。
> vy=c(0.88, 0.65, 0.60, 1.05, 1.06, 1.29, 1.061, 2.14, 1.291)
> wilcox.test(x=vx,y=vy,conf.int=T,conf.level=0.95)

  Wilcoxon rank sum test

data: vx and vy
W = 74, p-value = 0.001851
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.33 1.19
sample estimates:
difference in location
0.67
当然p-value计算依然不准,并没有解决问题。只是为了说明什么是tie。

Labels: , ,

0 Comments:

Post a Comment

<< Home