Creating color palettes in R

In the R post, we will present how to create your own color palettes and how to work with other palettes such as RColorBrewer, wesanderson and hex codes from http://www.colorcombos.com for exciting color palettes.

There are several color palettes available in R such as rainbow(), heat.colors(), terrain.colors(), and topo.colors(). We can visualize these  as 3D pie charts using the plotrix R package.

# Let's create a pie chart with n=7 colors using each palette
library(plotrix)
sliceValues <- rep(10, 7) # each slice value=10 for proportionate slices
pie3D(sliceValues,explode=0, theta=1.2, col=rainbow(n=7), main="rainbow()")

# Let's create a figure with all 4 base color palettes
par(mfrow=c(1, 4))
pie3D(sliceValues,explode=0, theta=1.2, col=rainbow(n=7), main="rainbow()")
pie3D(sliceValues,explode=0, theta=1.2, col=heat.colors(n=7), main="heat.colors()")
pie3D(sliceValues,explode=0, theta=1.2, col=terrain.colors(n=7), main="terrain.colors()")
pie3D(sliceValues,explode=0, theta=1.2, col=topo.colors(n=7), main="topo.colors()")

Screen Shot 2016-07-10 at 9.01.30 AM

Other popular color palettes include the RColorBrewer package that has a variety of sequential, divergent and qualitative palettes and the wesanderson package that has color palettes derived from his films.

library(RColorBrewer)

# To see all palettes available in this package
par(mfrow=c(1, 1))
display.brewer.all()

# To create pie charts from a sequential, divergent and qualitative RColorBrewer palette
par(mfrow=c(1, 4))
pie3D(sliceValues,explode=0, theta=1.2, col=brewer.pal(7, "RdPu"), main="Sequential RdPu")
pie3D(sliceValues,explode=0, theta=1.2, col=brewer.pal(7, "RdGy"), main="Divergent RdGy")
pie3D(sliceValues,explode=0, theta=1.2, col=brewer.pal(7, "Set1"), main="Qualitative Set1")


# And add pie chart with a wes_anderson palette
# we will only use 5 slices in the example since the Darjeeling palette only has 5 colors
library(wesanderson)
pie3D(sliceValues[1:5],explode=0, theta=1.2, col=wes_palette("Darjeeling2"), main="Darjeeling2")

Screen Shot 2016-07-10 at 9.01.45 AMYou can also create your own color palettes in R with your colors of choice with the colors() function or creating a vector with the color names. A great review and cheat sheet for R colors can be found at http://research.stowers-institute.org/efg/R/Color/Chart/.

# To get an idea of the colors available
head(colors())
length(colors()) # 657

# To see all 657 colors as a color chart you can source the R script to generate a pdf version in your working directory

Screen Shot 2016-07-09 at 5.18.32 PM

# We can create choose a palette based on the R chart as follow:
mycols <- colors()[c(8, 5, 30, 53, 118, 72)] #
# or you could enter the color names directly
# mycols <- c("aquamarine", "antiquewhite2", "blue4", "chocolate1", "deeppink2", "cyan4")

# You could also get and store all distinct colors in the cl object and use the sample function to select colors at random
cl <- colors(distinct = TRUE)
set.seed(15887) # to set random generator seed
mycols2 <- sample(cl, 7)

You can also create color palettes with hex color codes. A great example of this is to work with popular color palettes available on the http://www.colorcombos.com website. This website has various palettes you can choose from and even derive color palettes from your favorite websites. For example, let’s grab the color palette from the rjbioinformatics.com website at http://www.colorcombos.com/grabcolors.html .

Screen Shot 2016-07-09 at 5.36.02 PM

After entering the URL of our website, we will receive the hex codes for the color scheme used on the website.

Screen Shot 2016-07-09 at 5.38.34 PM

We can even export the colors as little pencils 🙂

C6D4E1-2F2016-FCFAEA-456789.png

You can also choose from hundred of color schemes based on your color of choice. For example, we will also create a color palette based on the color olive – ColorCombo382.

C3D938-772877-7C821E-D8B98B-7A4012

# For the rjbioinformatics.com color palette
mycols3 <- c("#c6d4e1", "#2f2016", "#fcfaea", "#456789")

# For ColorCombos382 palette
mycols4 <- c("#C3D938", "#772877", "#7C821E", "#D8B98B", "#7A4012")

# Now to get the pie charts for the last four palettes
pie3D(sliceValues,explode=0, theta=1.2, col=mycols, main="colors()")
pie3D(sliceValues,explode=0, theta=1.2, col=mycols2, main="sample(colors(distinct=TRUE)")
pie3D(sliceValues[1:4],explode=0, theta=1.2, col=mycols3, main="rjbioinformatics.com color grab")
pie3D(sliceValues[1:5],explode=0, theta=1.2, col=mycols4, main="ColorCombos382 colorcombos.com")

Screen Shot 2016-07-10 at 9.01.56 AM

We can also create a color palette with the colorRampPalette() to use for heatmaps and other plots. For this example, we will use the leukemia dataset available in the GSVAdata package, which corresponds to microarray data from 37 human acute leukemias where 20 of these cases are Acute lymphoblastic leukemia (ALL) and the other 17 are ALL’s with Mixed leukemia gene rearrangements. For more information on the study please see Armstrong et al. Nat Genet 30:41-47, 2002.

library(GSVAdata)
data(leukemia) # loads leukemia_eset

# Create a matrix from the gene expression eset object
M1 <- exprs(leukemia_eset)

# Get a matrix of the top 50 most variable probes accros the samples
library(genefilter)
topVarGenes <- head(order(-rowVars(M1)), 50)
mat <- M1[ topVarGenes, ]
mat <- mat - rowMeans(mat)

# For sample annotation information
head(pData(leukemia_eset))
table(leukemia_eset$subtype)

# Get sample group as a factor the ColSideColors
ALLgroup <- as.factor(pData(leukemia_eset)[colnames(M1), 1])

# Get the colors for the ALL subtype
sidecols <- c("#4FD5D6", "#FF0000")

# Here is a fancy color palette inspired by http://www.colbyimaging.com/wiki/statistics/color-bars
cool = rainbow(50, start=rgb2hsv(col2rgb('cyan'))[1], end=rgb2hsv(col2rgb('blue'))[1])
warm = rainbow(50, start=rgb2hsv(col2rgb('red'))[1], end=rgb2hsv(col2rgb('yellow'))[1])
cols = c(rev(cool), rev(warm))
mypalette <- colorRampPalette(cols)(255)

library("gplots") # for the heatmap.2 function
par(mfrow=c(1,1))

png(filename="Heatmap_Example.png", width=12, height=10, units = 'in', res = 300)
heatmap.2(mat, trace="none", col=mypalette, ColSideColors=sidecols[ALLgroup],
labRow=FALSE, labCol=FALSE, mar=c(6,12), scale="row", key.title="")
legend("topright", legend=levels(ALLgroup), fill=sidecols, title="", cex=1.2)
graphics.off()

Heatmap_Example

Now you are all set to work with and create your own awesome color palettes! Happy R programing 🙂

 

Converting Gene Names in R with AnnotationDbi

There are many ways to convert gene accession numbers or ids to gene symbols or other types of ids in R and several R/Bioconductor packages to facilitate this process including the AnnotationDbi, annotate, and biomaRt packages. In this post, we are going to learn how to convert gene ids with the AnnotationDbi and org.Hs.eg.db package.

There are many ways to convert gene accession numbers or ids to gene symbols or other types of ids in R and several R/Bioconductor packages to facilitate this process including the AnnotationDbi, annotate, and biomaRt packages. In this post, we are going to learn how to convert gene ids with the AnnotationDbi and org.Hs.eg.db package. You could potentially modify this code to work with other species such as mice with the org.Mm.eg.db package.

For example, say we have a gene expression matrix stored in M1 created from an eset object you downloaded from GEO. The study I will be using for this example is A Leukemic Stem Cell Expression Signature is Associated with Clinical Outcomes in Acute Myeloid Leukemia deposited on GEO with the accession id GSE24006. To view the script on how to generate the expression set (eset) object see the post – Retrieving Gene Expression Data  Objects & Matrices From GEO.

# Convert you eset object to a matrix with the exprs() function
library(Biobase)
M1 <- exprs(eset)

# Convert the row names to entrez ids
library("AnnotationDbi")
library("org.Hs.eg.db")
columns(org.Hs.eg.db)

geneSymbols <- mapIds(org.Hs.eg.db, keys=rownames(M1), column="SYMBOL", keytype="ENTREZID", multiVals="first")
head(geneSymbols)

The mapIds() function from the AnnotationDbi package returns a named vector making it simple to retrieve entrez id for a given gene as follows:

gene.to.search <- c("658", "1360")
geneSymbols[gene.to.search]

# returns the gene symbols of the entrez
# "BMPR1B" "CPB1"

We can create a function to return a matrix with gene symbols instead of entrez ids as follows:

getMatrixWithSymbols <- function(df){
require("AnnotationDbi")
require("org.Hs.eg.db")

geneSymbols <- mapIds(org.Hs.eg.db, keys=rownames(df), column="SYMBOL", keytype="ENTREZID", multiVals="first")

# get the entrez ids with gene symbols i.e. remove those with NA's for gene symbols
inds <- which(!is.na(geneSymbols))
found_genes <- geneSymbols[inds]

# subset your data frame based on the found_genes
df2 <- df[names(found_genes), ]
rownames(df2) <- found_genes
return(df2)
}

# Now, let's use the function to create a matrix for the genes with gene symbols
M1symb <- getMatrixWithSymbols(M1)

We can generalize this function to go back and forth between gene symbols and entrez ids (or other ids) as follows:

We can generalize this function to go back and forth between gene symbols and entrez ids (or other ids) as follows:


# This function can take any of the columns(org.Hs.eg.db) as type and keys as long as the row names are in the format of the keys argument
getMatrixWithSelectedIds <- function(df, type, keys){
require("AnnotationDbi")
require("org.Hs.eg.db")

geneSymbols <- mapIds(org.Hs.eg.db, keys=rownames(df), column=type, keytype=keys, multiVals="first")

# get the entrez ids with gene symbols i.e. remove those with NA's for gene symbols
inds <- which(!is.na(geneSymbols))
found_genes <- geneSymbols[inds]

# subset your data frame based on the found_genes
df2 <- df[names(found_genes), ]
rownames(df2) <- found_genes
return(df2)
}

# for example, going from SYMBOL to ENTREZID
M1entrez <- getMatrixWithSelectedIds(M1symb, type="ENTREZID", keys="SYMBOL")

Stay tuned for more posts on Converting Gene Names in R with the annotation and biomaRt package.