俺的学习笔记

Thursday, November 29, 2018

The Moses Extreme Reaction test/Moses 极端反应检验/Moses の極値反応の検定

Moses极端反应检验:
它适用于实验条件导致两个不同方向的极端反应情况(多用于医学,比如有的药物会导致一部分病人好转的同时也会导致一部分病人恶化)。
它通过比较实验组和观察组,会告诉你是否产生了极端反应

For two independent samples from a continuous field, this tests:
H0: Extreme values are equally likely in both populations
HA: Extreme values are more likely to occur in the population from which the sample with the larger range was drawn.

Span computation

Observations from both specified groups are jointly sorted and ranked, with the average rank being assigned in the case of ties. The smallest and largest ranks of the control group (the group defined by the first value in ascending order) are determined, and the span is computed as
 SPAN = The largest rank of control group-the smallest rank of control group  + 1 
If SPAN is not an integer, then it will be rounded to its nearest integer.

Significance Level

Let nc,f and ne,f be the numbers of records in the control group and experiment group respectively, incorporating the frequency weight, and g= SPAN - nc,f+2h. Then the exact one-tailed probability of span s is
where h=0. The same formula is used below where h is not zero.

Censoring the Range

The previous test is repeated, dropping the h lowest and h highest ranks from the control group, where h is a positive user-specified integer (default at the integer part of 0.05nc,f or 1, whichever is greater). If 2h>nc,f2, then the test will be implemented using the largest integer such that 2h(nc,f2).
The exact one-tailed probability is calculated by the formula above, and p1α rejects the null hypothesis.

Labels: , ,

Fisher Exact Test

教材里面是这么说的:
也就是说判断名义尺度的相似性(有没有统计意义上的差),可以用Fisher Exact test,也可以用χ2 test,但是什么情况下用Fisher,什么情况下用χ2呢?这篇文章里有详细说明,简单来说就是样本数量大的时候,用χ2,数量小的时候用Fisher
Fisher(フィッシャー)の正確検定(Fisher's exact test)は,分割表(クロス集計表)の各行(各列)が独立かどうかを調べる方法です。直接確率法とも呼ばれます(ref)。
这个Fisher Test是干啥用的呢?判断有没有统计意义上的差是怎么回事儿?举个例子说明:
比如,有两组药,分别给A组和B组两组患者使用,得到的结果如下,要确定两种药有没有统计学意义上的差别?
A组
B组
死亡
1
5
6
生存
24
15
39
25
20
47
再比如瓶子里有4个红球和6个白球,从瓶子里拿出5个球,里面有3个红球2个白球,请问出现这种情况是否“正常”?
计算方法如下:
球色
取出4132231405
剩下0514233241
概率0.023809520.23809520.47619050.23809520.02380952
概率是这么计算的(R):choose(4,r)*choose(6,w)/choose(10,5)
※choose是R里的计算组合的函数
那么取出3个红球和2个白球的概率是0.2380952。而比这个更加极端的情况(4个红球,1个白球)的概率是0.02380952。合起来是0.2619048,这就是p-value。p-value>0.05所以接受H0(属于正常)。
在R里面也有相应的函数,fisher.test
> fisher.test(matrix(c(3,1,2,4),nrow=2),alternative="greater")

  Fisher's Exact Test for Count Data

data: matrix(c(3, 1, 2, 4), nrow = 2)
p-value = 0.2619
alternative hypothesis: true odds ratio is greater than 1
95 percent confidence interval:
0.3152217 Inf
sample estimates:
odds ratio
4.918388
结果和手动计算是一致的。
当然也可以做双边测试,就是比红3白2更极端的两侧的概率(即,概率≤0.2380952)之和。
=0.02380952(红4白1)+0.2380952(红3白2)+0.2380952(红1白4)+0.02380952(红0白5)=0.5238095
在R里面计算一下:
> fisher.test(matrix(c(3,1,2,4),nrow=2),alternative="two.sided")

  Fisher's Exact Test for Count Data

data: matrix(c(3, 1, 2, 4), nrow = 2)
p-value = 0.5238
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.218046 390.562917
sample estimates:
odds ratio
4.918388
https://oku.edu.mie-u.ac.jp/~okumura/stat/fishertest.html

Labels: , ,

Siegel-Tukey Test for Differences in Scale

这是一种用来检测Ordinal scale(顺序尺度)的两组数据的分布是否相同的检定方法。
在R里面都没有相应的函数来做Siegel-Tukey test。关于siegel Tukey test网上的说明也都是不明不白。看来只要知道Siegel Tukey Test是比较两个顺序尺度的样本的分布的检验即可。
https://accendoreliability.com/siegel-tukey-test-differences-in-scale/

Labels: , ,

Sunday, November 25, 2018

Wilcoxon matched-pairs signed-ranks test

The Wilcoxon signed-ranks test is a non-parametric equivalent of the paired t-test. It is most commonly used to test for a difference in the mean (or median) of paired observations - whether measurements on pairs of units or before and after measurements on the same unit. It can also be used as a one-sample test to test whether a particular sample came from a population with a specified median.

Unlike the t-test, the paired differences do not need to follow a normal distribution. But if you wish to test the median (= mean) difference, the distribution each side of the median - must have a similar shape. In other words the distribution of the differences must be symmetrical. If the distribution of the differences is not symmetrical, you can only test the null hypothesis that the Hodges-Lehmann estimate of the median difference is zero. Unlike most rank tests, this test outcome is affected by a transformation before ranking since differences are ranked in order of their absolute size. It may thus be worth plotting the distribution of the differences after an appropriate transformation (for example logarithmic=对数) to see if it makes the distribution appear more symmetrical.

A signed-ranks upon paired samples is less powerful than the t-test (relative efficiency is about 95%) providing the differences are normally distributed. If they are not, and cannot be transformed such that they are, a paired t-test is not appropriate and the non-parametric test should be used.
(from here, a very nice website)
在R里,还是用wilcox.test()来做。或者参考这里
> vx=c(1.83, 1.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30)
> vy=c(0.88, 0.65, 0.60, 1.05, 1.06, 1.29, 1.06, 2.14, 1.29)
> wilcox.test(x=vx,y=vy,paired=T,conf.int=T,conf.level=0.95)

    Wilcoxon signed rank test

data: vx and vy
V = 45, p-value = 0.003906
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.465 1.025
sample estimates:
(pseudo)median
0.77

> wilcox.test(x=vx,y=vy,paired = T)

    Wilcoxon signed rank test

data: vx and vy
V = 45, p-value = 0.003906
alternative hypothesis: true location shift is not equal to 0
但是,这篇文章里面介绍说,用exactRankTests包里面的wilcox.exact()来做比较好,因为当有tie (日本語:同順位。中文:相持——这谁TM翻译的?成心不想让人看明白是吧?)的时候,没有办法准确计算p-value。
> wilcox.test(x=vx,y=vy,conf.int=T,conf.level=0.95)
  Wilcoxon rank sum test with continuity correction
data: vx and vy
W = 74, p-value = 0.003534
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.3300431 1.1900409
sample estimates:
difference in location
0.6699332
Warning messages:
1: In wilcox.test.default(x = vx, y = vy, conf.int = T, conf.level = 0.95) :
cannot compute exact p-value with ties
2: In wilcox.test.default(x = vx, y = vy, conf.int = T, conf.level = 0.95) :
cannot compute exact confidence intervals with ties
那么在统计学里什么叫tie呢?
就是这组数据里面有相同的数字而已。
比如刚才的测试里面
vy=c(0.880.650.601.051.061.291.062.141.29)
1.29有两个,1.06也有两个,这个就是statistical tie。日语里叫同顺位,也就是说有两个或两个以上的数字是相同的。
如果俺们把这两个tie修改一下。让它们不再相同,就不会出这个警告了。
> vy=c(0.88, 0.65, 0.60, 1.05, 1.06, 1.29, 1.061, 2.14, 1.291)
> wilcox.test(x=vx,y=vy,conf.int=T,conf.level=0.95)

  Wilcoxon rank sum test

data: vx and vy
W = 74, p-value = 0.001851
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
0.33 1.19
sample estimates:
difference in location
0.67
当然p-value计算依然不准,并没有解决问题。只是为了说明什么是tie。

Labels: , ,

Mann-Whitney U test

Mann-Whitney U test is the non-parametric alternative test to the independent sample t-test.  It is a non-parametric test that is used to compare two sample means that come from the same population, and used to test whether two sample means are equal or not.  Usually, the Mann-Whitney U test is used when the data is ordinal or when the assumptions of the t-test are not met.
Mann-Whitney U test is a non-parametric test, so it does not assume any assumptions related to the distribution of scores.  There are, however, some assumptions that are assumed
1. The sample drawn from the population is random.
2. Independence within the samples and mutual independence is assumed.  That means that an observation is in one group or the other (it cannot be in both).
3. Ordinal measurement scale is assumed.

Where:
U=Mann-Whitney U test
N1 = sample size one
N2= Sample size two
Ri = Rank of the sample size
※Mann Whitney U test is also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test

Use of Mann-Whitney

Mann-Whitney U test is used for every field, but is frequently used in psychology, healthcare, nursing, business, and many other disciplines.  For example, in psychology, it is used to compare attitude or behavior, etc.  In medicine, it is used to know the effect of two medicines and whether they are equal or not.  It is also used to know whether or not a particular medicine cures the ailment or not.  In business, it can be used to know the preferences of different people and it can be used to see if that changes depending on location.
(from here)

在教材里面,这个Mann Whitney U test是用来比较两组相互独立的样本的中间值的(median)。
Mann-Whitney test in R
wilcox.test


Labels: , ,

Levene Test for Equality of Variances

Levene's test is used to test if k samples have equal variances. Equal variances across samples is called homogeneity of variance. Some statistical tests, for example the analysis of variance, assume that variances are equal across groups or samples. The Levene test can be used to verify that assumption.
Levene's test is an alternative to the Bartlett test. The Levene test is less sensitive than the Bartlett test to departures from normality. If you have strong evidence that your data do in fact come from a normal, or nearly normal, distribution, then Bartlett's test has better performance.
当不是正态分布时使用。中文叫Levene方差齐性检验

同样,计算也非常复杂,直接采用R来计算就很方便。
levene.test
https://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm

Labels: , ,

Saturday, November 24, 2018

F Test

A “F Test” is a catch-all term for any test that uses the F-distribution. In most cases, when people talk about the F-Test, what they are actually talking about is The F-Test to Compare Two Variances.
However, the f-statistic is used in a variety of tests including regression analysis, the Chow test and the Scheffe Test (a post-hoc ANOVA test).
F分布:

参见这里(这个网站有几本ebook可以参考)。
F分布怎么算的不重要,因为它是用来做判断的界限值用,一般用查F-table表的方法得到F值即可。

F test指的是任何用F分布(来得到界限值)来进行的测试。但通常情况下说的F test指的是比较两个方差
其实F statistic可以用在很多test中,包括回归分析,Chow test,Scheffe Test和ANOVA test。

F-Test的做法

1.定义元假设(null hypothesis)和代替假设(alternative hypothesis)。
2.计算F值。F = s21 / s22
3.根据自由度(自由度=样本数减一)和显著水平(significance level),查表找出Fα。
4.比较计算的F值和Fα,如果F>Fα,就可以拒绝元假设。
详细点儿:
H0: (no change, no difference)
※有的书上对于upper one tailed定义为≦,lower one tailed定义为≧。
H1:
  Upper one tailed:
  Lower one tailed:
  Two tailed:
判断基准:
  Upper one tailed:
  Lower one tailed:
  Two tailed:
最简单的方法还是直接用工具(比如R)来做F test。因为计算F值是一个繁琐的过程。
教材里面说,F-test适用于正态分布(别的教科书似乎没有这么说)。
非正态分布的方差比较,教材里面说用levene test

F Test in R

F Test to Compare Two Variances

Description

Performs an F test to compare the variances of two samples from normal populations.

Usage

var.test(x, ...)

## Default S3 method:
var.test(x, y, ratio = 1,
         alternative = c("two.sided", "less", "greater"),
         conf.level = 0.95, ...)

## S3 method for class 'formula'
var.test(formula, data, subset, na.action, ...)

Arguments

x, ynumeric vectors of data values, or fitted linear model objects (inheriting from class "lm").
ratiothe hypothesized ratio of the population variances of x and y.
alternativea character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
conf.levelconfidence level for the returned confidence interval.
formulaa formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.
dataan optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken fromenvironment(formula).
subsetan optional vector specifying a subset of observations to be used.
na.actiona function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").
...further arguments to be passed to or from methods.

Details

The null hypothesis is that the ratio of the variances of the populations from which x and y were drawn, or in the data to which the linear models x and ywere fitted, is equal to ratio.

Value

A list with class "htest" containing the following components:
statisticthe value of the F test statistic.
parameterthe degrees of the freedom of the F distribution of the test statistic.
p.valuethe p-value of the test.
conf.inta confidence interval for the ratio of the population variances.
estimatethe ratio of the sample variances of x and y.
null.valuethe ratio of population variances under the null.
alternativea character string describing the alternative hypothesis.
methodthe character string "F test to compare two variances".
data.namea character string giving the names of the data.
http://www.socr.ucla.edu/applets.dir/f_table.html
https://www.statisticshowto.datasciencecentral.com/probability-and-statistics/hypothesis-testing/f-test/

Labels: , ,

Thursday, November 22, 2018

Two Sample T-Test & Paired t-Test

The two-sample t-test is applied to compare whether the average difference between two groups is really significant or if it is due instead to random chance. It helps to answer questions like whether the average success rate is higher after implementing a new sales tool than before or whether the test results of patients who received a drug are better than test results of those who received a placebo.
Two sample t-test是用来判断两组数据的均值的差是有统计意义上的区别,还是仅仅由于随机波动。可以用来回答采用了新的销售工具后平均销售额是否上升了,或者服用某种药物的患者症状是否比服用安慰剂(placebo)的人改善了等等。
https://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm
关于什么是1-sample, 2-sample, 和 Paired t-Tests,可以参照这篇文章
1-sample t test是前些日子刚刚学习过的(参见这里),是比较一组数据的均值是否等于某一个值m0。
Paired t test是用来比较成对的数据,比如改进前和改进后,服药前和服药后……。其实这本质上也是1-sample t test,只不过这个1-sample用的是前后的差值。
2-sample t test是比较两组数据的均值之差,它的t值这样求:
教材里面对于啥时候用paired啥时候用2-sample是这样说的,就是当两组数据是相互独立的,就用2-sample,当两组数据是成对出现的,就用paired。
在R里面不管是1-sample、2-sample和Paired,都是用t.test。

t.test






Student's T-Test
Performs one and two sample t-tests on vectors of data.
Keywords
htest
Usage
t.test(x, …)
# S3 method for default t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, …)
# S3 method for formula t.test(formula, data, subset, na.action, …)
Arguments
x
a (non-empty) numeric vector of data values.
y
an optional (non-empty) numeric vector of data values.
alternative
a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter.
mu
a number indicating the true value of the mean (or difference in means if you are performing a two sample test).
paired
a logical indicating whether you want a paired t-test.
var.equal
a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
conf.level
confidence level of the interval.
formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups.
data
an optional matrix or data frame (or similar: see model.frame) containing the variables in the formula formula. By default the variables are taken fromenvironment(formula).
subset
an optional vector specifying a subset of observations to be used.
na.action
a function which indicates what should happen when the data contain NAs. Defaults to getOption("na.action").
further arguments to be passed to or from methods.
Details
The formula interface is only applicable for the 2-sample tests.
alternative = "greater" is the alternative that x has a larger mean than y.
If paired is TRUE then both x and y must be specified and they must be the same length. Missing values are silently removed (in pairs if paired is TRUE). Ifvar.equal is TRUE then the pooled estimate of the variance is used. By default, if var.equal is FALSE then the variance is estimated separately for both groups and the Welch modification to the degrees of freedom is used.
If the input data are effectively constant (compared to the larger of the two means) an error is generated.
Value
A list with class "htest" containing the following components:
statistic
the value of the t-statistic.
parameter
the degrees of freedom for the t-statistic.
p.value
the p-value for the test.
conf.int
a confidence interval for the mean appropriate to the specified alternative hypothesis.
estimate
the estimated mean or difference in means depending on whether it was a one-sample test or a two-sample test.
null.value
the specified hypothesized value of the mean or mean difference depending on whether it was a one-sample test or a two-sample test.
alternative
a character string describing the alternative hypothesis.
method
a character string indicating what type of t-test was performed.
data.name
a character string giving the name(s) of the data.

from RDocumentation

Labels: , ,

Type I error and Type II error

关于hypothesis test,有Type I error和Type II error,这个教材里面也有解释:
教材认为,alpha, beta, deltasigma四个指标是在做hypothesis test时,决定样本大小时需要考虑的指标。
alpha (normally use 5%; statistical synonym is "Type I" error)
beta (normally use 10-20%; statistical synonym is "Type II" error)
delta (to be determined case by case)
sigma (to be determined case by case)
那么这个alphabetadeltasigma到底是什么东西有什么含义呢,教材里面是语焉不详。
关于alphabeta还是有解释,是这样的。
也就是当元假设(null)实际是正确的,而我们拒绝了它的概率就是alpha,也叫type I error。这个值就是我们设置的p-value,也叫显著性水平(Significance Level)。就是说alpha是显著性水平(Significance Level)
反之,当元假设实际是错误的,而我们却接受了它的概率就是beta,也叫type II error。就是说beta是type II error的水平。换句话说就是,1-β是这个检验的能力(power)
教材里关于deltasigma的解释,看不懂。
delta:The difference that you want to be able to detect in your hypothesis test (i.e. accuracy).
sigma:The estimated standard deviation of the population of the data from which you are sampling (i.e, precision)
关于alphabeta教材还举了两个例子:
例1)如果嫌疑人真的无辜,而法庭判其有罪,alpha, Type I error。反之嫌疑人真的有罪,而法庭判其无罪,beta, Type II error。
例2)如果病人实际没有癌症,我们诊断其有癌症,alpha, Type I error。反之病人真有癌症,我们诊断其没有癌症,beta, Type II error。
当然了,上面这个图还有一个Power (右下角),这个是这个hypothesis test做出正确判断的概率或这说水平,这个叫做这个hypothesis test的Power。
但是,俺发现有一个网页解释的非常清楚。
首先,关于有意水準(Significance Level) alpha,他是这样解释的:
(双向)
(单向)
(Type I和Type II error和检验的Power)
上面这几个图一看就非常清楚了。
alphabetadeltasigma的含义是:
alpha = significance level
beta = Type II error (其实关注的是power,亦即1-β)
delta = true difference
sigma = true SD or true variation
sample size取决于δ, σ, α, 1-β (或者说1-β取决于sample size)。
至于详细的原理和决定方法,就不必去细究了,教材里面没有。毕竟不是研究统计学,而是统计学的思维方式在过程改进中的应用。如需要可参照这里的section 5 Sample size for precision and power, multiple hypothesis testing

很好的统计学参考资料,https://oku.edu.mie-u.ac.jp/~okumura/stat/

Labels: , ,

Wednesday, November 21, 2018

Test of Proportion

Population Proportion

proportion就是比例的意思,这是population的一个参数。
What is the Population Proportion?
A population proportion is a fraction of the population that has a certain characteristic. For example, let’s say you had 1,000 people in the population and 237 of those people have blue eyes. The fraction of people who have blue eyes is 237 out of 1,000, or 237/1000. The letter p is used for the population proportion, so you would write this fact like this:
p = 237/1000.
You can also write 237/1000 as a decimal (by dividing 1000 by 237). If you did that, then p = 0.237.
それではProportion testというのは何ですか。
We'll start our exploration of hypothesis tests by focusing on population proportions. Specifically, we'll derive the methods used for testing (1) whether a single population proportion p equals a particular value, p0, say, and (2) whether the difference in two population proportions p1p2 equals a particular value p0, say, with the most common value being 0, and thereby allowing us to test whether two populations proportions are equal. Along the way, we'll learn two different approaches to hypothesis testing, one being the critical value approach and one being the P-value approach.
1.proportionであるpはP0に等しいかどうかの判断。
2.二つproportion p1とp2の差はp0に等しいかどうかの判断。特にp0=0の場合、p1とp2は等しいかの判断になる。
也就是说,proportion检验就是测量这样两件事:1,有一个proportion为p,测这个p是否等于某一特定的值p0?
2.如果有两个proportion,分别是p1和p2,测这两个proportion的差p1-p2是不是等于某一特定的值p0?特别是p0等于0的时候,就是测p1和p2是否相等。
具体的做法和例子在这里都有。
比如,在500个有肺癌的人里有480个抽烟,而500个健康的人里有400个抽烟。那么p1=480/500=0.96,p2=400/500=0.80。
现在我们要研究下面的事项:
1.肺癌群体和健康群体里面抽烟人的比例是相同的?
2.肺癌群体里抽烟的比例>健康人群?
3.肺癌群体里抽烟的比例<健康人群?
这就需要用到proportion test,用数学公式来说就是这样的
元假设(null hypothesis)
    H0:     pA=pB
    H0:     pA≤pB
    H0:     pA≥pB
代替假设(alternative hypothesis)
    Hα:     pA≠pB  (different)
    Hα:     pA>pB  (greater)
    Hα:     pA<pB  (less)
上面是“2 proportion test”,如果是“1 proportion test”,则是这样的:
比如我们用一组白鼠进行某种诱发癌症的试验,公母各半(p = 0.5)。我们从中抽出160只患癌症的老鼠,其中有95只公鼠,65只母鼠。那么我们想检验是不是公鼠更容易罹患癌症?
也就是我们期待的概率是0.5(不受性别影响),那么我们要研究如下事项:
1.公鼠患癌的比例(95/160)和期望值(0.5)相同吗?
2.公鼠患癌的比例>期望值(0.5)
3.公鼠患癌的比例<期望值(0.5)
假设我们观察到的比率(proportion)为p1,期望值为p0则用数学公式来说是这样的:
元假设(null hypothesis)
    H0:     p1=p0
    H0:     p1≤p0
    H0:     p1≥p0
代替假设(alternative hypothesis)
    Hα:     p1≠p0  (different)
    Hα:     p1>p0  (greater)
    Hα:     p1<p0  (less)

手动计算的方法,这里这里还有这里都讲的比较清楚。大意都是算出z值,然后从z值求出正态分布情况下的概率(就是p-value),然后做判断。如果不求甚解的话,直接用R来计算就省事多了。

Proportion Test in R

Test of Equal or Given Proportions

Description

prop.test can be used for testing the null that the proportions (probabilities of success) in several groups are the same, or that they equal certain given values.

Usage

prop.test(x, n, p = NULL,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, correct = TRUE)

Arguments

xa vector of counts of successes, a one-dimensional table with two entries, or a two-dimensional table (or matrix) with 2 columns, giving the counts of successes and failures, respectively.
na vector of counts of trials; ignored if x is a matrix or a table.
pa vector of probabilities of success. The length of p must be the same as the number of groups specified by x, and its elements must be greater than 0 and less than 1.
alternativea character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less". You can specify just the initial letter. Only used for testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise.
conf.levelconfidence level of the returned confidence interval. Must be a single number between 0 and 1. Only used when testing the null that a single proportion equals a given value, or that two proportions are equal; ignored otherwise.
correcta logical indicating whether Yates' continuity correction should be applied where possible.

https://onlinecourses.science.psu.edu/stat414/node/222/
https://newonlinecourses.science.psu.edu/statprogram/reviews/statistical-concepts/proportions

Labels: , ,

Monday, November 19, 2018

Binomial Sign Test

要想搞清楚Binomial Sign Test,首先要搞清楚Binomial Distribution(二项分布)。

Binomial Distribution

Bernoulli trial
コイン投げのように結果が2通りにしかならない確率実験のこと。
試行を繰り返したとき、どの試行においても結果が起こる確率は同じであり、各試行の結果は互いに独立である。
那么,这种只有两种可能性的试行的概率怎么求呢?假设投硬币,出现正面的概率是p,反面的概率就是(1-p)。所以投n次,出现k个正面的概率为:
计算一下投10次硬币,出现正面的概率(p=0.5的情况下),如下:
出现正面的次数概率
00.000976563
10.009765625
20.043945313
30.1171875
40.205078125
50.24609375
60.205078125
70.1171875
80.043945313
90.009765625
100.000976563
它的分布图是这个样子的:
100次的情况下,是这样的:
概率p变化时是这样的(n=100):
这就是二项分布(binomial distribution)。
注意,the binomial distribution is perfectly symmetric only when p = 0.50.
二项分布只有p=0.5的情况下才是对称的
OK,下面看看Binomial Test和Sign Test

Binomial Test

Use the binomial test when there are two possible outcomes. You know how many of each kind of outcome (traditionally called "success" and "failure") occurred in your experiment. You also have a hypothesis for what the true overall probability of "success" is. The binomial test answers this question: If the true probability of "success" is what your theory predicts, then how likely is it to find results that deviate as far, or further, from the prediction.
(from GraphPad)
另一个网站的说法:
The binomial test is used when an experiment has two possible outcomes (i.e. success/failure) and you have an idea about what the probability of success is. A binomial test is run to see if observed test results differ from what was expected.
Example: you theorize that 75% of physics students are male. You survey a random sample of 12 physics students and find that 7 are male. Do your results significantly differ from the expected results?
二项检测(binomial test)就是有两个输出的情况下,而且你已知这两种情况出现的概率。当你在试验中观察到某一种情况出现的次数,用二项检测判断你观察到的现象(出现的次数)是否符合你已知的概率。比如,某一试验成功和失败出现的概率分别为0.1和0.9,当进行10次试验,出现3次成功的情况下,判断这次实验是不是遵从0.1-0.90的概率分布。
关于Binomial test,这个网址说的相对比较明白,这个说的是双侧的,单侧的测试,只要找出概率是比观测到的大还是小即可。
硬貨を10枚投げて表が2枚しか出ませんでした。この硬貨は歪んでいるでしょうか。
ある母集団から10人をランダムに選んで聞いたところ,賛成2人,反対8人でした。母集団全体でも反対のほうが多いと言えるでしょうか。
これらの問いについて考えるために,仮に硬貨は歪んでいない(あるいは母集団全体では賛否が等しい)というモデル(帰無仮説,null hypothesis)を立てます。そして,この帰無仮説が正しかった場合に,実際に観測された以上の外れ方(2:8,1:9,0:10,そして通常はさらにそれをひっくり返した8:2,9:1,10:0)が生じる確率の合計を求めます。
按:这里说的不明不白,如果做双向检测(two-tail)的话,要把相反的方向概率也加上,或者干脆有意水准就是0.05/2就可以了。
この確率の合計を p 値(ピーち,p-value)といいます。p 値が非常に小さければ,実際に起きた事象はこのモデルでは説明しにくいので,たぶん硬貨は歪んでいる(あるいは賛否は等しくない)と推測します。p 値が大きければ,これだけのデータでは何も言えないということがわかるだけです。p 値が大きいか小さいかの境界(有意水準)を仮に 0.05 として,p0.05 であれば帰無仮説からの外れが「統計的に有意」(statistically significant)である,あるいは「帰無仮説は棄却(reject)される」ということがあります。0.05 という値に特に意味はありませんが,伝統的によく使われています(物理学では通常もっともっと厳しい条件を課します)。
さて,硬貨を投げて表の出る枚数の分布は2項分布と考えられますので,表も裏も 1/2 の確率で出るとすれば,表が r 枚出る確率は 10Cr(1/2)10 です。表裏が 0:10,1:9,2:8,8:2,9:1,10:0 である確率はそれぞれ
> dbinom(c(0,1,2,8,9,10), 10, 0.5)
[1] 0.0009765625 0.0097656250 0.0439453125 0.0439453125
[5] 0.0097656250 0.0009765625
で,この合計,すなわち p 値は
> sum(dbinom(c(0,1,2,8,9,10), 10, 0.5))
[1] 0.109375
になります。同じことが
> pbinom(2, 10, 0.5) * 2
[1] 0.109375
でも求められます。また,後で詳しく述べますが,binom.test() という関数でも2項検定ができます。
> binom.test(2, 10, 0.5)
Exact binomial test
data: 2 and 10
number of successes = 2, number of trials = 10, p-value = 0.1094
...
したがって,有意水準を 0.05 とすれば,表裏の差は統計的に有意ではありませんし,アンケートであればこんなに少人数の結果から「賛成が少ない」という結論を導いてはいけないということになります。
表が1枚(賛成が1人)なら,p 値は 0.02 ほどになり,水準 0.05 で有意になります。
これが,フィッシャー(R. A. Fisher,1890〜1962年)が「有意性の検定」(tests of significance,significance tests)と呼んだ方法の考え方です。
也就是说binomial test就是根据已知的两个结果出现的概率(比如,这里的硬币出现正反面的概率是0.5),算出你观察到的现象的概率(也就是正面出现2次,再加上1次,0次的概率)也就是p值(p-value),看看这个p-value是否是统计学上有意义的(statistically significant),如果是,则元假设不成立(即上例的硬币是歪的,或者赞成和反对的比例是不等的),否则元假设成立(即硬币是没问题的,或赞成和反对的比例是相等的)。

Sign test

The sign test is a special case of the binomial case where your theory is that the two outcomes have equal probabilities. (from GraphPad)
这句话似乎有点儿问题,应该说sign test (符号检测)是针对category数据进行分析,它把数据给符号化为+和-,用计算概率的方法来判断这个+/-是随机出现的,还是具有一定的倾向的。比如,用过某种药物以后,病人排尿的次数是比用药前增加了还是减少了,增加了就标记为+减少了就标记为-,然后就可以用sign test来判断这种药物是否有效。
所以这个sign test就是一种特殊的binomial test它只关心+/-两种情况。而且+/-出现的概率都是0.5。
※1这里很多材料的说法不一致,比如二项检测,有的说是只有两个可能结果,有的则不说这一点,举例说明掷骰(tóu)子,出现某一数字的概率是1/6,比如抛60次,出现6的概率是15次时,问这个骰子是否被做了手脚?(不过解法依然是通过出现6和不出现6的概率进行判断,似乎不矛盾)
※sign test的说明,很多资料是不全面或错误的。
具体的判断方法和上面的binomial test一样的(sign test只是binomial test的一个特例而已)
符号検定について、このサイトによい例がある。
10人の患者にある睡眠薬を飲ませたところ,睡眠時間がそれぞれ次の時間だけ増えました (Arthur R. Cushny and A. Roy Peebles, The Journal of Physiology 32, 501-510 1905):
1.9, 0.8, 1.1, 0.1, -0.1, 4.4, 5.5, 1.6, 4.6, 3.4
つまり,10個のうち負の値は1個だけで,残り9個は正です。正・負の符号の付き方は全部で 210=1024 通りあり,そのうちで
  • すべて正になる場合は 10C0=1 通り
  • 一つだけ負になる場合は 10C1=10 通り
  • 二つだけ負になる場合は 10C2=(10×9)/(2×1)=45 通り
等々のように場合分けできます。すべての場合を合計すると,当然ですが 210=1024 になります。
もし正になる確率と負になる確率が同じなら,
  • すべて正になる確率は 1/1024
  • 一つだけ負になる確率は 10/1024
  • 二つだけ負になる確率は 45/1024
となるはずです。実際のデータは10個のうち一つだけ負ですので,このようになる確率と,もっと極端な(すべて正になる)確率を合計すれば,10/1024 + 1/1024 = 11/1024 です。逆に,すべて負になる確率と,一つだけ正になる確率を合計すれば,やはり 10/1024 + 1/1024 = 11/1024 です。そこで,10個のうち1個以下の符号が他と異なる確率は,22/1024 で,p 値は約0.02です。つまり,偶然では50回に1回しか起きない事象です。
このような検定法を符号検定(sign test)といいます。
符号検定では,差が 0 のデータは外して考えます。
一般に,0でない数が n 個あって,そのうち m 個が正(または負)であるなら,binom.test(m, n) で符号検定できます。例えば上の例の場合は
> binom.test(1, 10)

 Exact binomial test

data:  1 and 10
number of successes = 1, number of trials = 10, p-value = 0.02148
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.002528579 0.445016117
sample estimates:
probability of success 
                   0.1 
で,p=0.02148 となります。

Binomial test in R

A binomial test compares the number of successes observed in a given number of trials with a hypothesised probability of success. The test has the null hypothesis that the real probability of success is equal to some value denoted p, and the alternative hypothesis that it is not equal to p. The test can also be performed with a one-sided alternative hypothesis that the real probability of success is either greater than p or that it is less than p. (from Instant R,这段关于binomial test的解释俺感觉也很好)

Exact Binomial Test

Description

Performs an exact test of a simple null hypothesis about the probability of success in a Bernoulli experiment.

Usage

binom.test(x, n, p = 0.5,
           alternative = c("two.sided", "less", "greater"),
           conf.level = 0.95)

Arguments

xnumber of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.
nnumber of trials; ignored if x has length 2.
phypothesized probability of success.
alternativeindicates the alternative hypothesis and must be one of "two.sided""greater" or "less". You can specify just the initial letter.
conf.levelconfidence level for the returned confidence interval.
(from this page)
比如上面那个投10次硬币出现2次正面的问题,用R来做binomial test:
> binom.test(2,10,p=0.5,alternative="two.sided")

    Exact binomial test

data: 2 and 10
number of successes = 2, number of trials = 10,
p-value = 0.1094
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.02521073 0.55609546
sample estimates:
probability of success
0.2
Sign test是Binomial test的特殊情况,所以是一样的。可参见这里
http://ogawas.cerp.u-toyama.ac.jp/e-stat/05.html

PS:「よきにはからえ」とは、君の思うようにしなさい、という意味です。
自身では判断がつかないようなお伺いをたてたときに、このように言っておけば、とりあえず間違いはないでしょう。

Labels: , ,