BootstrapR的两均值相比印证(非参数检验)

1.两独样本参数的非参数检验

1.1.Welcoxon秩和验证

先将分歧本作为是单纯样本(混合样本)然后由小到大排列观看值统一编秩。倘使原假使五个独立样本来自同一的完整为真,那么秩将大致均匀分布在八个样本中,即小的、中等的、大的秩值应该差不离被均匀分在五个样本中。即便准备假诺多个单身样本来自差距等的共同体为真,那么内部一个样本将会有越多的小秩值,那样就会拿走一个较小的秩和;另一个样本将会有越多的大秩值,由此就会得到一个较大的秩和。

Bootstrap 1

R:wilcox.test

Bootstrap 2

 

##################独立样本的曼-惠特尼U检验
Forest<-read.table(file="ForestData.txt",header=TRUE,sep="   ")
Forest$month<-factor(Forest$month,levels=c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"))
Tmp<-subset(Forest,Forest$month=="jan" | Forest$month=="aug")
wilcox.test(temp~month,data=Tmp)

  

Wilcoxon rank sum test with continuity correction

data: temp by month
W = 2, p-value = 0.01653
alternative hypothesis: true location shift is not equal to 0

Bootstrap 3

1.2.K-S检验

Bootstrap 4

##################独立样本的K-S检验
x1<-subset(Forest,Forest$month=="jan")
x2<-subset(Forest,Forest$month=="aug")
ks.test(x1$temp,x2$temp)

  

Two-sample Kolmogorov-Smirnov test

data: x1$temp and x2$temp
D = 0.99457, p-value = 0.03992
alternative hypothesis: two-sided

1.3.两配对样本分布

Bootstrap 5

###############配对样本的Wilcoxon符号秩检验
ReportCard<-read.table(file="ReportCard.txt",header=TRUE,sep=" ")
ReportCard<-na.omit(ReportCard)
wilcox.test(ReportCard$chi,ReportCard$math,paired=TRUE)

sum(outer(ReportCard$chi,ReportCard$math,"-")<0)
sum(outer(ReportCard$math,ReportCard$chi,"-")<0)

  

Wilcoxon signed rank test with continuity correction

data: ReportCard$chi and ReportCard$math
V = 1695.5, p-value = 8.021e-11
alternative hypothesis: true location shift is not equal to 0

>
> sum(outer(ReportCard$chi,ReportCard$math,”-“)<0)
[1] 332
> sum(outer(ReportCard$math,ReportCard$chi,”-“)<0)
[1] 3026

2.两样本均值置换检验

俺们在尝试中时常会因为各样难点(时间、经费、人力、物力)得到部分小样本结果,若是大家想清楚那几个小样本结果的完整是怎么着子的,就要求采纳置换检验。

Permutation test
置换检验是Fisher于20世纪30年份提议的一种基于大批量总结(computationally
intensive),利用样本数量的全(或自由)排列,举办总计测算的章程,因其对完全分布自由,应用较为广阔,尤其适用于完全分布未知的小样本资料,以及某些难以用常规格局分析材料的假使检验难题。在实际运用上它和Bootstrap
Methods类似,通过对样本举行逐一上的沟通,重新计估摸算检验量,构造经验分布,然后在此基础上求出P-value进行揣度。

2.1.概述

Bootstrap 6

参数也得以是中位数等

2.2R程序

oneway_test()

Bootstrap 7

 

Forest<-read.table(file="ForestData.txt",header=TRUE,sep=" ")
Forest$month<-factor(Forest$month,levels=c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"))
Tmp<-subset(Forest,Forest$month=="jan" | Forest$month=="aug")
t.test(temp~month,data=Tmp,paired=FALSE,var.equal=TRUE)
Tmp$month<-as.vector(Tmp$month)
Tmp$month<-as.factor(Tmp$month)
oneway_test(temp~month,data=Tmp,distribution="exact")
oneway_test(temp~month,data=Tmp,distribution="asymptotic")
oneway_test(temp~month,data=Tmp,distribution=approximate(B=1000))

  

Two Sample t-test

data: temp by month
t = -4.8063, df = 184, p-value = 3.184e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-23.106033 -9.657011
sample estimates:
mean in group jan mean in group aug
5.25000 21.63152

 

 

Exact Two-Sample Fisher-Pitman Permutation Test

data: temp by month (aug, jan)
Z = 4.5426, p-value = 0.0001744
alternative hypothesis: true mu is not equal to 0

 

 

Asymptotic Two-Sample Fisher-Pitman Permutation Test

data: temp by month (aug, jan)
Z = 4.5426, p-value = 5.557e-06
alternative hypothesis: true mu is not equal to 0

 

Approximative Two-Sample Fisher-Pitman Permutation Test

data: temp by month (aug, jan)
Z = 4.5426, p-value < 2.2e-16
alternative hypothesis: true mu is not equal to 0

2.3相关联数置换检验

spearsman_test

Bootstrap 8

对学生战绩,基于数学和大体成绩的spearsman相关周密举办沟通检验

ReportCard<-read.table(file="ReportCard.txt",header=TRUE,sep=" ")
Tmp<-ReportCard[complete.cases(ReportCard),]
cor.test(Tmp[,5],Tmp[,7],alternative="two.side",method="spearman")
#是让你的模拟能够可重复出现,因为很多时候我们需要取随机数,但这段代码再跑一次的时候,结果就不一样
#了,如果需要重复出现的模拟结果的话,就可以用set.seed()。在调试程序或者做展示的时候,结果的可重#复性是很重要的. 12345是种子数
set.seed(12345)
spearman_test(math~phy,data=Tmp,distribution=approximate(B=1000))

  

sample estimates:
rho
0.7651233

Approximative Spearman Correlation Test

data: math by phy
Z = 5.7766, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0

 

2.4卡方分安排换检验

对于学生的大成,在性别和平均分等级列联表上,选拔置换检验,看性别与平均分三个变量是还是不是是独立的

Tmp<-ReportCard[complete.cases(ReportCard),]
CrossTable<-table(Tmp[,c(2,12)])  #编制性别和平均分等级的列联表
chisq.test(CrossTable,correct=FALSE)
chisq_test(sex~avScore,data=Tmp,distribution="asymptotic")
set.seed(12345)
chisq_test(sex~avScore,data=Tmp,distribution=approximate(B=1000))

 

> CrossTable
avScore
sex B C D E
F 2 13 10 3
M 2 11 12 5

Pearson’s Chi-squared test

data: CrossTable
X-squared = 0.78045, df = 3, p-value = 0.8541

Asymptotic Pearson Chi-Squared Test

data: sex by avScore (B, C, D, E)
chi-squared = 0.78045, df = 3, p-value = 0.8541

 

Approximative Pearson Chi-Squared Test

data: sex by avScore (B, C, D, E)
chi-squared = 0.78045, p-value = 0.922

原倘使:有关,不应拒绝原假若。

2.5两配对样本置换检验

wilcoxsign_test

Bootstrap 9

ReportCard<-read.table(file="ReportCard.txt",header=TRUE,sep=" ")
ReportCard<-na.omit(ReportCard)
wilcox.test(ReportCard$chi,ReportCard$math,paired=TRUE)
wilcoxsign_test(chi~math,data=ReportCard,distribution="asymptotic")

  

Wilcoxon signed rank test with continuity correction

data: ReportCard$chi and ReportCard$math
V = 1695.5, p-value = 8.021e-11
alternative hypothesis: true location shift is not equal to 0

 

Asymptotic Wilcoxon-Pratt Signed-Rank Test

data: y by x (pos, neg)
stratified by block
Z = 6.5041, p-value = 7.817e-11
alternative hypothesis: true mu is not equal to 0

量结论一致

3.两样本均值差的自举检验

3.1概述

两样本均值的互换检验可以印证出多个全体的均值是还是不是留存明显差异,但对一体化均值差的置信区间测度相比不方便。置信区间的算计,是以样本均值差的取样分布已知且对号称前提的,若无法担保这么些前提,则可利用自举发进行查看。

Bootstrap 10

3.2.R实现

1.编制用户自定义函数

譬如,对两样本均值的自举法检验:分别总计七个样本的均值并再次来到

DiffMean<-function(DataSet,indices){
 ReSample<-DataSet[indices,]#从Dataset中抽取indices决定的观测形成自举样本
 diff<-tapply(ReSample[,1],INDEX=as.factor(ReSample[,2]),FUN=mean)
#表示以自举样本第2列分组标识,分别计算自举样本第1列的均值。
 return(diff[1]-diff[2])
}
#第一列是待检验变量,第二列为观测来自总体的标识。indices包括了n个元素的随机位置向量,它是从DataSet
#中抽取观测以形成自举样本的依据。

  

2.调用boot函数完成自举法检验
Bootstrap 11

library("boot")
Forest<-read.table(file="ForestData.txt",header=TRUE,sep="   ")
Forest$month<-factor(Forest$month,levels=c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec"))
Tmp<-subset(Forest,Forest$month=="jan" | Forest$month=="aug")
Tmp<-cbind(Tmp$temp,Tmp$month)
set.seed(12345)
BootObject<-boot(data=Tmp,statistic=DiffMean,R=20)
#调用自定义函数,自举重复次数20。

 

Call:
boot(data = Tmp, statistic = DiffMean, R = 20)

Bootstrap Statistics :
original bias std. error
t1* -16.38152 -0.07459533 0.2012279

BootObject:t是从自举样本中取得的M个计算量。

 

3.拿走计算结果

Bootstrap 12

Bootstrap 13

BootObject$t0
mean(BootObject$t,na.rm=TRUE)
print(BootObject)
plot(BootObject)
boot.ci(BootObject,conf=0.95,type=c("norm","perc"))

  

CALL :
boot.ci(boot.out = BootObject, conf = 0.95, type = c(“norm”,
“perc”))

Intervals :
Level Normal Percentile
95% (-16.70, -15.91 ) (-16.85, -16.06 )
Calculations and Intervals on Original Scale

Bootstrap 14

依照自举样本的样本均值差不遵从正态分布,由此不吻合采用依照正态分布确定的置信区间。

相关文章