使用ggpubr绘制出版级别的图形
本文主要学习和介绍:
- 
比较两组或者多组数据的平均值 
- 
然后自动添加P值或者显著性水平到ggplot的各种图形中(such as box plots, dot plots, bar plots and line plots …) 
一、比较均值的方法
The standard methods to compare the means of two or more groups in R, have been largely described at: comparing means in R.
常用的几种比较方法如下:
| Methods | R function | Description | 
|---|---|---|
| T-test | t.test() | Compare two groups (parametric) | 
| Wilcoxon test | wilcox.test() | Compare two groups (non-parametric) | 
| ANOVA | aov() or anova() | Compare multiple groups (parametric) | 
| Kruskal-Wallis | kruskal.test() | Compare multiple groups (non-parametric) | 
如何使用这些方法进行计算和对结果进行解释,可以参考以下链接:
- Comparing one-sample mean to a standard known mean:
- Comparing the means of two independent groups:
- Comparing the means of paired samples:
- Comparing the means of more than two groups
- Analysis of variance (ANOVA, parametric):
- Kruskal-Wallis Test in R (non parametric alternative to one-way ANOVA)
 
二、R中添加P值的函数介绍
这里主要使用的是ggpubr包的的两个函数:
- compare_means(): 比较一组或多组均值,包括P值和显著性水平
- stat_compare_means(): 自动添加P值或显著性水平到ggplot的图形中
1. compare_means函数
compare_means函数用于比较输出一组或者多组的均值,使用不同的方法
| compare_means(formula, data, method = "wilcox.test", paired = FALSE, | 
- formula: 画图的公式y ~ x。
- a formula of the form x ~ group, where x is a numeric variable and group is a factor with one or multiple levels. For example, formula = TP53 ~ cancer_group. It’s also possible to perform the test for multiple response variables at the same time. For example, formula = c(TP53, PTEN) ~ cancer_group.
 
- data: 使用的数据。
- a data.frame containing the variables in the formula.
 
- method: 使用的检验P值的方法。the type of test. Default is “wilcox.test”. Allowed values include:
- “t.test” (parametric) and “wilcox.test”" (non-parametric). Perform comparison between two groups of samples. If the grouping variable contains more than two levels, then a pairwise comparison is performed.
- “anova” (parametric) and “kruskal.test” (non-parametric). Perform one-way ANOVA test comparing multiple groups.
 
- paired: 是否做配对检验。
- a logical indicating whether you want a paired test. Used only in t.test and in wilcox.test.
 
- group.by: 指定分组做P值检验。variables used to group the data set before applying the test. When specified the mean comparisons will be performed in each subset of the data formed by the different levels of the group.by variables.
- ref.group: 比较均值时指定参考的组,就像比较case-control的数据时,指定control组为对照一样,指定一个组做为参照。
- a character string specifying the reference group. If specified, for a given grouping variable, each of the group levels will be compared to the reference group (i.e. control group). ref.group can be also “.all.”. In this case, each of the grouping variable levels is compared to all (i.e. base-mean). 这里的base-mean指的就是所有组的平均值
 
2.stat_compare_means函数
功能与上一个函数相似,只是这个函数用于在ggplot图形中添加平均值比较的P值或者显著性水平
| stat_compare_means(mapping = NULL, comparisons = NULL hide.ns = FALSE, | 
- mapping: 和ggplot函数一样指定aes()映射。
- Set of aesthetic mappings created by aes().
 
- comparisons: 由两个向量组成的列表,这两个向量就是要对比组的名字或者index值。
- A list of length-2 vectors. The entries in the vector are either the names of 2 values on the x-axis or the 2 integers that correspond to the index of the groups of interest, to be compared.
 
- hide.ns: 逻辑值,默认为FALSE。是否显示不显著时的符号ns。
- logical value. If TRUE, hide ns symbol when displaying significance levels.
 
- label: 指定标签的类型,可以是P值(“p.format”)或者是显著性水平(“p.signif”)(用*号和ns表示)。
- character string specifying label type. Allowed values include “p.signif” (shows the significance levels), “p.format” (shows the formatted p value).
 
- label.x,label.y: 指定label的坐标位置。
- numeric values. coordinates (in data units) to be used for absolute positioning of the label. If too short they will be recycled.
 
- …: 其他的参数。
- other arguments passed to the function compare_means() such as method, paired, ref.group.
 
使用boxplot演示添加P值和显著性水平
1. 准备
| # 加载包 | 
| head(ToothGrowth) %>% knitr::kable(format = "markdown") | 
| len | supp | dose | 
|---|---|---|
| 4.2 | VC | 0.5 | 
| 11.5 | VC | 0.5 | 
| 7.3 | VC | 0.5 | 
| 5.8 | VC | 0.5 | 
| 6.4 | VC | 0.5 | 
| 10.0 | VC | 0.5 | 
2. 独立双样本比较
| compare_means(len ~ supp, data = ToothGrowth) %>% knitr::kable() | 
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|
| len | OJ | VC | 0.0644907 | 0.064 | 0.064 | ns | Wilcoxon | 
Returned value is a data frame with the following columns:
.y.: the y variable used in the test.
p: the p-value
p.adj: the adjusted p-value. Default value for p.adjust.method = “holm”
p.format: the formatted p-value
p.signif: the significance level.
method: the statistical test used to compare groups.
Create a box plot with p-values:
| p <- ggboxplot(data = ToothGrowth, | 

显示P值的标签位置可以通过如下参数来调整:label.x, label.y。
显示P值的标签默认为 compare_means() 返回值中的 method 和 p 的组合。也可以通过 aes() 函数指定为其他显示形式。例如:
For example,
aes(label = …p.format…) or aes(label = paste0(“p =”, …p.format…)): display only the formatted p-value (without the method name)
aes(label = …p.signif…): display only the significance level.
aes(label = paste0(…method…, “\n”, “p =”, …p.format…)): Use line break (“\n”) between the method name and the p-value.
如:
| p + stat_compare_means(aes(label = paste0(..method..,": ","p=",..p.format..)), | 

| # 单独显示字符型向量的显著水平 | 

| ## 单独显示P值 | 

3. 比较两个配对样本
| compare_means(len ~ supp, data = ToothGrowth, paired = TRUE) | 
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|
| len | OJ | VC | 0.0043126 | 0.0043 | 0.0043 | ** | Wilcoxon | 
| # 使用ggpaired函数可视化配对数据 | 

- palette: 选择不同的调色板
4.比较多组样本
1. 全局比较
| # 使用anova的方法, 全局比较 | 
| .y. | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|
| len | 0 | 0 | 9.5e-16 | **** | Anova | 
| # Default method = "kruskal.test" for multiple groups | 

2. 成对比较
如果分组超过两个,默认将所有组两两比较, 默认方法为wilcox.test, 可以改为t.test
| compare_means(len ~ dose, data = ToothGrowth) | 
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|
| len | 0.5 | 1 | 0.0000070 | 1.4e-05 | 7.0e-06 | **** | Wilcoxon | 
| len | 0.5 | 2 | 0.0000001 | 2.0e-07 | 8.4e-08 | **** | Wilcoxon | 
| len | 1 | 2 | 0.0001772 | 1.8e-04 | 0.00018 | *** | Wilcoxon | 
| # 可视化所指定的比较 | 

| # 将分组比较的P值label使用label.y指定位置 | 

3. 指定一个参考组的两两比较
| # Pairwise comparison against reference | 
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|
| len | 0.5 | 1 | 1e-07 | 1e-07 | 1.3e-07 | **** | T-test | 
| len | 0.5 | 2 | 0e+00 | 0e+00 | 4.4e-14 | **** | T-test | 
| # Visualize | 

4. 相对于所有组的均值(base-mean)的多重两两比较的检验
| # Comparison of each group against base-mean | 
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|
| len | .all. | 0.5 | 0.0000003 | 9.0e-07 | 2.9e-07 | **** | T-test | 
| len | .all. | 1 | 0.5118773 | 5.1e-01 | 0.51 | ns | T-test | 
| len | .all. | 2 | 0.0000004 | 9.0e-07 | 4.3e-07 | **** | T-test | 
| # .all.参数比较其中一组与所有的组的均值(base-mean),在组比较多时,比较方便 | 

下面使用骨髓瘤的数据来说明和所有样本的均值(.all.)比较在一些典型的情况中是很有用的。
将依据患者molecular进行分组,绘制各个组别DEPDC1基因的表达水平。目的是比较各个组别之间是否存在差别,如果有差别,差别又在哪里。
要回答以上问题,可以在所有7个组之间进行两两比较(pairwise comparison)。 由于组别较多,将导致很多种组别组合,这将很难解释。
一个简单的解决办法是将7组中的每一组与“.all.”(如base-mean)比较。当检验结果显著时,可以得出结论:与所有组相比,xxx组的DEPDC1的表达水平显著降低或显著升高。
| # Load myeloma data from GitHub | 
| .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|
| DEPDC1 | .all. | Cyclin D-1 | 0.2877529 | 1.0e+00 | 0.29 | ns | T-test | 
| DEPDC1 | .all. | Cyclin D-2 | 0.4244240 | 1.0e+00 | 0.42 | ns | T-test | 
| DEPDC1 | .all. | MMSET | 0.5784193 | 1.0e+00 | 0.58 | ns | T-test | 
| DEPDC1 | .all. | MAF | 0.2538126 | 1.0e+00 | 0.25 | ns | T-test | 
| DEPDC1 | .all. | Hyperdiploid | 0.0000000 | 2.0e-07 | 2.7e-08 | **** | T-test | 
| DEPDC1 | .all. | Proliferation | 0.0000239 | 1.2e-04 | 2.4e-05 | **** | T-test | 
| DEPDC1 | .all. | Low bone disease | 0.0000053 | 3.2e-05 | 5.3e-06 | **** | T-test | 
| head(myeloma) | 
| X | molecular_group | chr1q21_status | treatment | event | time | CCND1 | CRIM1 | DEPDC1 | IRF4 | TP53 | WHSC1 | group | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GSM50986 | Cyclin D-1 | 3 copies | TT2 | 0 | 69.24 | 9908.4 | 420.9 | 523.5 | 16156.5 | 10.0 | 261.9 | cyclin.d.1 | 
| GSM50988 | Cyclin D-2 | 2 copies | TT2 | 0 | 66.43 | 16698.8 | 52.0 | 21.1 | 16946.2 | 1056.9 | 363.8 | cyclin.d.2 | 
| GSM50989 | MMSET | 2 copies | TT2 | 0 | 66.50 | 294.5 | 617.9 | 192.9 | 8903.9 | 1762.8 | 10042.9 | mmset | 
| GSM50990 | MMSET | 3 copies | TT2 | 1 | 42.67 | 241.9 | 11.9 | 184.7 | 11894.7 | 946.8 | 4931.0 | mmset | 
| GSM50991 | MAF | NA | TT2 | 0 | 65.00 | 472.6 | 38.8 | 212.0 | 7563.1 | 361.4 | 165.0 | maf | 
| GSM50992 | Hyperdiploid | 2 copies | TT2 | 0 | 65.20 | 664.1 | 16.9 | 341.6 | 16023.4 | 2096.3 | 569.2 | hyperdiploid | 
| # 可视化 | 

From the plot above, we can conclude that DEPDC1 is significantly overexpressed in proliferation group and, it’s significantly downexpressed in Hyperdiploid and Low bone disease compared to all.
5. 多个分组变量
- 在使用另外一个变量分组后,两个独立样本之间的比较
| compare_means(len ~ supp, data = ToothGrowth, | 
| dose | .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|---|
| 0.5 | len | OJ | VC | 0.0231864 | 0.046 | 0.023 | * | Wilcoxon | 
| 1.0 | len | OJ | VC | 0.0040304 | 0.012 | 0.004 | ** | Wilcoxon | 
| 2.0 | len | OJ | VC | 1.0000000 | 1.000 | 1.000 | ns | Wilcoxon | 
Visualize (1/2). Create a multi-panel box plots facetted by group (here, “dose”):
| # 多个小图facet | 

| # 或者显示显著性符号,而不是直接显示P值 | 

Visualize (2/2). Create one single panel with all box plots. Plot y = “len” by x = “dose” and color by “supp”:
| # 将所有小图组合放在一张图里 | 

| # 只显示P值 | 

| # 单纯的只显示P的数值 | 

| # 只显示显著性水平 | 

- 将数据按另一个变量分组后进行配对样本比较
| compare_means(len ~ supp, data = ToothGrowth, | 
| dose | .y. | group1 | group2 | p | p.adj | p.format | p.signif | method | 
|---|---|---|---|---|---|---|---|---|
| 0.5 | len | OJ | VC | 0.0329694 | 0.066 | 0.033 | * | Wilcoxon | 
| 1.0 | len | OJ | VC | 0.0136719 | 0.041 | 0.014 | * | Wilcoxon | 
| 2.0 | len | OJ | VC | 1.0000000 | 1.000 | 1.000 | ns | Wilcoxon | 
相比于非配对P检验,配对P检验的P值更大
| # Box plot facetted by "dose" | 

6. 其他类型的图表
6.1 不分组的图
- 条形图
| # 条形图 | 

- 折线图
| # Line plot of mean +/-se | 

6.2 分组的图
| # 分组条形图 | 

参考:




