我试图使用R中的VennDiagram包中的draw.quad.venn生成一个四向维恩图,但它总是抛出错误消息:
ERROR [2019-05-14 11:28:24] Impossible: a7 <- n234 - a6 produces negative area
Error in draw.quad.venn(length(gene_lists[[1]]), length(gene_lists[[2]]), :
Impossible: a7 <- n234 - a6 produces negative area我使用了4个不同的基因列表作为输入。calculate.overlap运行良好,然后我通过对重叠值使用length(x)函数获得数字,并将其解析为一个列表。我将所有重叠值以及适当的总组大小传递给draw.quad.venn函数,但它一直声称其中一个组是不可能的,因为它会生成一个负数。
我已经手动检查了这些数字,它们的总和显然是正确的。我还在一组随机的20000个基因上测试了这个脚本,生成的脚本类似于下面的脚本,它工作得很好,即生成一个四向维恩图。除了大小之外,随机生成的基因列表和我从实际结果文件中整理的基因列表之间没有区别。下面是一个最小的工作示例:
# working example that fails
# get vector of 10000 elements (representative of gene list)
values <- c(1:10000)
# generate 4 subsets by random sampling
list_1 <- sample(values, size = 5000, replace = FALSE)
list_2 <- sample(values, size = 4000, replace = FALSE)
list_3 <- sample(values, size = 3000, replace = FALSE)
list_4 <- sample(values, size = 2000, replace = FALSE)
# compile them in to a list
lists <- list(list_1, list_2, list_3, list_4)
# find overlap between all possible combinations (11 plus 4 unique to each list = 15 total)
overlap <- calculate.overlap(lists)
# get the lengths of each list - these will be the numbers used for the Venn diagram
overlap_values <- lapply(overlap, function(x) length(x))
# rename overlap values (easier to identify which groups are intersecting)
names(overlap_values) <- c("n1234", "n123", "n124", "n134", "n234", "n12", "n13", "n14", "n23", "n24", "n34", "n1", "n2", "n3", "n4")
# generate the venn diagram
draw.quad.venn(length(lists[[1]]), length(lists[[2]]), length(lists[[3]]), length(lists[[4]]), overlap_values$n12,
overlap_values$n13, overlap_values$n14, overlap_values$n23, overlap_values$n24, overlap_values$n34,
overlap_values$n123, overlap_values$n124, overlap_values$n134, overlap_values$n234, overlap_values$n1234)我期望一个四向维恩图,不管一些组是否为0,它们应该仍然在那里,但被标记为0。它应该是这样的:

我不确定这是不是因为我在真实数据中有0值,即某些没有重叠的组?有没有办法强制draw.quad.venn()接受任何值?如果没有,有没有其他的包可以用来达到同样的效果?非常感谢您的帮助!
发布于 2019-05-14 21:26:16
因此,我尝试的任何方法都不能用VennDiagram包中的draw.quad.venn解决这个错误。它的书写方式有问题。只要4个椭圆中的所有数字加起来等于该特定列表中元素的总数,维恩图就是有效的。由于某些原因,VennDiagram将只接受交叉点较少导致数字较高的数据,例如,组1、2和3的交集必须高于所有4个组的交集。这并不代表真实世界的数据。组1、2和3完全有可能根本不相交,而所有4个组都相交。在维恩图中,所有的数字都是独立的,并表示每个交叉点共有的元素的总数。它们之间不需要有任何关系。
我查看了eulerr包,但实际上找到了一种非常简单的方法,可以在gplots中使用venn绘制venn图,如下所示:
# simple 4 way Venn diagram using gplots
# get some mock data
values <- c(1:20000)
list_1 <- sample(values, size = 5000, replace = FALSE)
list_2 <- sample(values, size = 4000, replace = FALSE)
list_3 <- sample(values, size = 3000, replace = FALSE)
list_4 <- sample(values, size = 2000, replace = FALSE)
lists <- list(list_1, list_2, list_3, list_4)
# name thec list (required for gplots)
names(lists) <- c("G1", "G2", "G3", "G4")
# get the venn table
v.table <- venn(lists)
# show venn table
print(v.table)
# plot Venn diagram
plot(v.table)我现在认为这个问题已经解决了。谢谢你的帮助,zx8754!
发布于 2019-05-17 01:30:28
我已经看过这个包的源代码了。如果您仍然对错误的原因感兴趣,有两种方法可以将数据发送到venn.diagram。一个是nxxxx (例如,n134)形式,另一个是an (例如,a5)形式。在示例中,n134表示“哪些元素至少属于组1、3和4”。另一方面,a5意味着“只有的元素属于组1、组3和组4”。这两种形式之间的关系非常错综复杂,例如a6对应于n1234。这意味着n134 = a5 + a6。问题是calculate.overlap以an形式给出数字,而draw.quad.venn默认情况下以nxxxx形式给出数字。要使用calculate.overlap中的值,可以将direct.area设置为true,并在area.vector参数中提供calculate.overlap的结果。例如,
tmp <- calculate.overlap(list(a=c(1, 2, 3, 4, 10), b=c(3, 4, 5, 6), c=c(4, 6, 7, 8, 9), d=c(4, 8, 1, 9)))
overlap_values <- lapply(tmp, function(x) length(x))
draw.quad.venn(area.vector = c(overlap_values$a1, overlap_values$a2, overlap_values$a3, overlap_values$a4,
overlap_values$a5, overlap_values$a6, overlap_values$a7, overlap_values$a8,
overlap_values$a9, overlap_values$a10, overlap_values$a11, overlap_values$a12,
overlap_values$a13, overlap_values$a14, overlap_values$a15), direct.area = T, category = c('a', 'b', 'c', 'd'))

如果你对更简单更灵活的东西感兴趣,我为这类问题制作了nVennR包:
library(nVennR)
g1 <- c('AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539', 'NM_000587', 'NM_000593', 'NM_000638', 'NM_000655', 'NM_000789', 'NM_000873', 'NM_000955', 'NM_000956', 'NM_000958', 'NM_000959', 'NM_001060', 'NM_001078', 'NM_001495', 'NM_001627', 'NM_001710', 'NM_001716')
g2 <- c('NM_001728', 'NM_001835', 'NM_001877', 'NM_001954', 'NM_001992', 'NM_002001', 'NM_002160', 'NM_002162', 'NM_002258', 'NM_002262', 'NM_002303', 'NM_002332', 'NM_002346', 'NM_002347', 'NM_002349', 'NM_002432', 'NM_002644', 'NM_002659', 'NM_002997', 'NM_003032', 'NM_003246', 'NM_003247', 'NM_003248', 'NM_003259', 'NM_003332', 'NM_003383', 'NM_003734', 'NM_003830', 'NM_003890', 'NM_004106', 'AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539')
g3 <- c('NM_000655', 'NM_000789', 'NM_004107', 'NM_004119', 'NM_004332', 'NM_004334', 'NM_004335', 'NM_004441', 'NM_004444', 'NM_004488', 'NM_004828', 'NM_005214', 'NM_005242', 'NM_005475', 'NM_005561', 'NM_005565', 'AF029684', 'M28825', 'M32074', 'NM_005567', 'NM_003734', 'NM_003830', 'NM_003890', 'NM_004106', 'AF029684', 'NM_005582', 'NM_005711', 'NM_005816', 'NM_005849', 'NM_005959', 'NM_006138', 'NM_006288', 'NM_006378', 'NM_006500', 'NM_006770', 'NM_012070', 'NM_012329', 'NM_013269', 'NM_016155', 'NM_018965', 'NM_021950', 'S69200', 'U01351', 'U08839', 'U59302')
g4 <- c('NM_001728', 'NM_001835', 'NM_001877', 'NM_001954', 'NM_005214', 'NM_005242', 'NM_005475', 'NM_005561', 'NM_005565', 'ex1', 'ex2', 'NM_003890', 'NM_004106', 'AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539')
myV <- plotVenn(list(g1=g1, g2=g2, g3=g3, g4=g4))
myV <- plotVenn(nVennObj = myV)
myV <- plotVenn(nVennObj = myV)故意重复最后一条命令。结果是:

然后,您可以探索交叉点:
> getVennRegion(myV, c('g1', 'g2', 'g4'))
[1] "NM_000139" "NM_000173" "NM_000208" "NM_000316" "NM_000318" "NM_000450" "NM_000539"有一个包含更多信息的vignette。
https://stackoverflow.com/questions/56128614
复制相似问题