Working with Venn Diagrams

In this post, we will learn how to create venn diagrams for gene lists and how to retrieve the genes present in each venn compartment with R.

Advertisements

In this post, we will learn how to create venn diagrams for gene lists and how to retrieve the genes present in each venn compartment with R.

In this particular example, we will generate random gene lists using the molbiotools gene set generator but you can use your own gene lists if you prefer. Specifically, we will generate a random list of 257 genes to represent those that are upregulated in condition and another list of 1570 genes to represent those that are upregulated in condition B.

Screen Shot 2016-06-21 at 2.01.15 PM

Then, we will sort and paste the gene lists in an excel document we will save as randomGeneLists.xlsx.

Now, let’s load the data into R using the gdata package.

library("gdata")
geneLists <- read.xls("randomGeneLists.xlsx", sheet=1, stringsAsFactors=FALSE, header=FALSE)
head(geneLists)

# Notice there are empty strings to complete the data frame in column 1 (V1)
tail(geneLists)

# To convert this data frame to separate gene lists with the empty strings removed we can use lapply() with our home made  function(x) x[x != ""]
geneLS <- lapply(as.list(geneLists), function(x) x[x != ""])

# If this is a bit confusing you can also write a function and then use it in lapply() 
removeEMPTYstrings <- function(x) {

 newVectorWOstrings <- x[x != ""]
 return(newVectorWOstrings)

}
geneLS2 <- lapply(as.list(geneLists), removeEMPTYstrings)

# You can print the last 6 entries of each vector stored in your list, as follows:
lapply(geneLS, tail)
lapply(geneLS2, tail) # Both methods return the same results

# We can rename our list vectors
names(geneLS) <- c("ConditionA", "ConditionB")

# Now we can plot a Venn diagram with the VennDiagram R package, as follows:
require("VennDiagram")

VENN.LIST <- geneLS
venn.plot <- venn.diagram(VENN.LIST , NULL, fill=c("darkmagenta", "darkblue"), alpha=c(0.5,0.5), cex = 2, cat.fontface=4, category.names=c("A", "B"), main="Random Gene Lists")

# To plot the venn diagram we will use the grid.draw() function to plot the venn diagram
grid.draw(venn.plot)

# To get the list of gene present in each Venn compartment we can use the gplots package
require("gplots")

a <- venn(VENN.LIST, show.plot=FALSE)

# You can inspect the contents of this object with the str() function
str(a)

# By inspecting the structure of the a object created, 
# you notice two attributes: 1) dimnames 2) intersections
# We can store the intersections in a new object named inters
inters <- attr(a,"intersections")

# We can summarize the contents of each venn compartment, as follows:
# in 1) ConditionA only, 2) ConditionB only, 3) ConditionA & ConditionB
lapply(inters, head) 

example2

Now you are ready, to review the genes in each section of the venn diagram separately. Alternatively, you can always use Venny web tool that is a great way to start looking at your data and then write a modified version of this R script to make a more exhaustive figure or facilitate downstream analysis in your script.

Feel free to leave comments or email me at info@rjbioinformatics.com.

Author: radiaj

I am a Research Scientist & Bioinformatician who specializes in Immunology and Cancer Biology. I routinely use R and other programing languages to explore genomic data of cancer cells to identify molecular changes and risk factors that contributes to cancer development.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s