a formula of the form x ~ group, where x is a numeric variable and group is a factor with one or multiple levels. For example, formula = TP53 ~ cancer_group. It’s also possible to perform the test for multiple response variables at the same time. For example, formula = c(TP53, PTEN) ~ cancer_group.
data: 使用的数据。
a data.frame containing the variables in the formula.
method: 使用的检验P值的方法。the type of test. Default is “wilcox.test”. Allowed values include:
“t.test” (parametric) and “wilcox.test”" (non-parametric). Perform comparison between two groups of samples. If the grouping variable contains more than two levels, then a pairwise comparison is performed.
“anova” (parametric) and “kruskal.test” (non-parametric). Perform one-way ANOVA test comparing multiple groups.
paired: 是否做配对检验。
a logical indicating whether you want a paired test. Used only in t.test and in wilcox.test.
group.by: 指定分组做P值检验。variables used to group the data set before applying the test. When specified the mean comparisons will be performed in each subset of the data formed by the different levels of the group.by variables.
a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). ref.group can be also “.all.”. In this case, each of the grouping variable levels is compared to all (i.e. base-mean). 这里的base-mean指的就是所有组的平均值
A list of length-2 vectors. The entries in the vector are either the names of 2 values on the x-axis or the 2 integers that correspond to the index of the groups of interest, to be compared.
hide.ns: 逻辑值,默认为FALSE。是否显示不显著时的符号ns。
logical value. If TRUE, hide ns symbol when displaying significance levels.
compare_means(len ~ supp, data = ToothGrowth) %>% knitr::kable()
.y.
group1
group2
p
p.adj
p.format
p.signif
method
len
OJ
VC
0.0644907
0.064
0.064
ns
Wilcoxon
Returned value is a data frame with the following columns:
.y.: the y variable used in the test.
p: the p-value
p.adj: the adjusted p-value. Default value for p.adjust.method = “holm”
p.format: the formatted p-value
p.signif: the significance level.
method: the statistical test used to compare groups.
Create a box plot with p-values:
p <- ggboxplot(data = ToothGrowth, x = "supp", y = "len", color = "supp", palette = "jco", add = "jitter") # add = "jitter"参数添加箱线图的数值点 # 添加p值 p + stat_compare_means() # 再画一幅,这次改变P值的统计方法 p + stat_compare_means(method = "t.test")
显示P值的标签位置可以通过如下参数来调整:label.x, label.y。
显示P值的标签默认为 compare_means() 返回值中的 method 和 p 的组合。也可以通过 aes() 函数指定为其他显示形式。例如:
For example,
aes(label = …p.format…) or aes(label = paste0(“p =”, …p.format…)): display only the formatted p-value (without the method name)
aes(label = …p.signif…): display only the significance level.
aes(label = paste0(…method…, “\n”, “p =”, …p.format…)): Use line break (“\n”) between the method name and the p-value.
# Default method = "kruskal.test" for multiple groups ggboxplot(ToothGrowth, x = "dose", y = "len", color = "dose", palette = "jco")+ stat_compare_means() # Change method to anova ggboxplot(ToothGrowth, x = "dose", y = "len", color = "dose", palette = "jco")+ stat_compare_means(method = "anova")
# Load myeloma data from GitHub myeloma <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/myeloma.txt")
# Perform the test compare_means(DEPDC1 ~ molecular_group, data = myeloma, ref.group = ".all.", method = "t.test")
.y.
group1
group2
p
p.adj
p.format
p.signif
method
DEPDC1
.all.
Cyclin D-1
0.2877529
1.0e+00
0.29
ns
T-test
DEPDC1
.all.
Cyclin D-2
0.4244240
1.0e+00
0.42
ns
T-test
DEPDC1
.all.
MMSET
0.5784193
1.0e+00
0.58
ns
T-test
DEPDC1
.all.
MAF
0.2538126
1.0e+00
0.25
ns
T-test
DEPDC1
.all.
Hyperdiploid
0.0000000
2.0e-07
2.7e-08
****
T-test
DEPDC1
.all.
Proliferation
0.0000239
1.2e-04
2.4e-05
****
T-test
DEPDC1
.all.
Low bone disease
0.0000053
3.2e-05
5.3e-06
****
T-test
head(myeloma)
X
molecular_group
chr1q21_status
treatment
event
time
CCND1
CRIM1
DEPDC1
IRF4
TP53
WHSC1
group
GSM50986
Cyclin D-1
3 copies
TT2
0
69.24
9908.4
420.9
523.5
16156.5
10.0
261.9
cyclin.d.1
GSM50988
Cyclin D-2
2 copies
TT2
0
66.43
16698.8
52.0
21.1
16946.2
1056.9
363.8
cyclin.d.2
GSM50989
MMSET
2 copies
TT2
0
66.50
294.5
617.9
192.9
8903.9
1762.8
10042.9
mmset
GSM50990
MMSET
3 copies
TT2
1
42.67
241.9
11.9
184.7
11894.7
946.8
4931.0
mmset
GSM50991
MAF
NA
TT2
0
65.00
472.6
38.8
212.0
7563.1
361.4
165.0
maf
GSM50992
Hyperdiploid
2 copies
TT2
0
65.20
664.1
16.9
341.6
16023.4
2096.3
569.2
hyperdiploid
# 可视化 # legend = "none"参数去掉图形上方显示的每种颜色的box代表哪种分组的信息 # ggboxplot(myeloma, x = "molecular_group", y = "DEPDC1", color = "molecular_group", add = "jitter", legend = "none") + rotate_x_text(angle = 45)+ geom_hline(yintercept = mean(myeloma$DEPDC1), linetype = 2)+ # Add horizontal line at base mean stat_compare_means(method = "anova", label.y = 1600)+ # Add global annova p-value stat_compare_means(label = "p.signif", method = "t.test", ref.group = ".all.") # Pairwise comparison against all, 此处添加hide.ns = TRUE参数,将隐藏不显著的ns符号
From the plot above, we can conclude that DEPDC1 is significantly overexpressed in proliferation group and, it’s significantly downexpressed in Hyperdiploid and Low bone disease compared to all.
5. 多个分组变量
在使用另外一个变量分组后,两个独立样本之间的比较
compare_means(len ~ supp, data = ToothGrowth, group.by = "dose")
dose
.y.
group1
group2
p
p.adj
p.format
p.signif
method
0.5
len
OJ
VC
0.0231864
0.046
0.023
*
Wilcoxon
1.0
len
OJ
VC
0.0040304
0.012
0.004
**
Wilcoxon
2.0
len
OJ
VC
1.0000000
1.000
1.000
ns
Wilcoxon
Visualize (1/2). Create a multi-panel box plots facetted by group (here, “dose”):
# 多个小图facet p <- ggboxplot(data = ToothGrowth, x = "supp", y = "len", color = "supp", palette = "jco", add = "jitter", facet.by = "dose", short.panel.labs = F) # short.panel.labs默认为T,改为F显示分组dose:
# 仅显示P值,去除方法 p + stat_compare_means(label = "p.format")
# 或者显示显著性符号,而不是直接显示P值 p + stat_compare_means(label = "p.signif", label.x = 1.5)
# To hide the ‘ns’ symbol, use the argument hide.ns = TRUE.
Visualize (2/2). Create one single panel with all box plots. Plot y = “len” by x = “dose” and color by “supp”:
# 将所有小图组合放在一张图里 p <- ggboxplot(ToothGrowth, x = "dose", y = "len", color = "supp", palette = "jco", add = "jitter") # 去掉facet.by参数 p + stat_compare_means(aes(group = supp)) # 使用aes映射时,列名不能加双引号
# 只显示P值 p + stat_compare_means(aes(group = supp),label = "p.format")
# 单纯的只显示P的数值 p + stat_compare_means(aes(group = supp, label = paste(..p.format..)))
# 只显示显著性水平 p + stat_compare_means(aes(group = supp), label = "p.signif")