Fundamentals of R Programming and Statistical Analysis Video Course

Here is the link to my new course at PACKT publishing. I apologize in advance for some of their video editing choices but you will definitely learn a lot and be able to work through a variety of practical examples to meet your bioinformatic needs. I will upload the R code on GitHub and post the links to the files for all the videos in the course section of my website rjbioinformatics.com. So be sure to stay tuned!

Advertisements

Here is the link to my new course at PACKT publishing. I apologize in advance for some of their video editing choices but you will definitely learn a lot and be able to work through a variety of practical examples to meet your bioinformatic needs. I will upload the R code on GitHub and post the links to the files for all the videos in the course section of my website https://rjbioinformatics.com/video-course/. So be sure to stay tuned!

Here is an overview of the course available at:

https://www.packtpub.com/big-data-and-business-intelligence/fundamentals-r-programming-and-statistical-analysis-video.

Video Description

The R language is widely used among statisticians and data miners to develop statistical software and data analysis.

In this course, we’ll start by diving into the different types of R data structures and you’ll learn how the R programming language handles data. Then we’ll look in-depth at manipulating different datasets in R. After that, we’ll dive into data visualization with R, using basic plots, heat maps, and networks. We’ll explore the different flow control loops of the R programming language, and you’ll learn how to debug your code.

In the second half of the course, you’ll get hands-on working with the various statistical methods in R programming. You’ll find out how to work with different probability distributions, various types of hypothesis testing, and statistical analysis with the R programming language.

By the end of this video course, you will be well-versed in the basics of R programming and the various concepts of statistical data analysis with R.

Style and Approach

This fast-paced, practical guide is filled with real-world examples that will take you on a journey through the various concepts and phases of statistical analysis using the R programming language.

Happy R programming :0)

Radia

Fundamentals of R Programming and Statistical Analysis Video Course link

The R code is available on GitHub at https://github.com/radiaj/fundaRprogStatistics.

Simulating genes and counts for DESeq2 analysis

Sometimes it is helpful to simulate gene expression data to test code or to see how your results look with simulated values from a particular probability distribution. Here I am going to show you how to simulate RNAseq expression data counts from a uniform distribution with a mininum = 0 and maximum = 1200.

Sometimes it is helpful to simulate gene expression data to test code or to see how your results look with simulated values from a particular probability distribution. Here I am going to show you how to simulate RNAseq expression data counts from a uniform distribution with a mininum = 0 and maximum = 1200.

# Get all human gene symbols from biomaRt
library("biomaRt")
mart <- useMart(biomart="ensembl", dataset = "hsapiens_gene_ensembl")
my_results <- getBM(attributes = c("hgnc_symbol"), mart=mart)
head(my_results)

# Simulate 100 gene names to be used for our cnts matrix
set.seed(32268)
my_genes <- with(my_results, sample(hgnc_symbol, size=100, replace=FALSE))
head(my_genes)

# Simulate a cnts matrix
cnts = matrix(runif(600, min=0, max=1200), ncol=6)
cnts = apply(cnts, c(1,2), as.integer)
head(cnts)
dim(cnts)

 

Now, say we run DESeq2 to look for differentially expressed genes between our two simulated groups.

# Running DESEQ2 based on https://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/RNA-seqWorkflow.pdf
library("DESeq2")
grp.idx <- rep(c("KO", "WT"), each=3)
coldat=DataFrame(grp=factor(grp.idx, levels=c("WT", "KO")))

# Add the column names and gene names
colnames(cnts) <- paste(grp.idx, 1:6, sep="_")
rownames(cnts) <- my_genes
head(cnts)

# Run DESeq2 analysis on the simulated counts
dds <- DESeqDataSetFromMatrix(cnts, colData=coldat, design = ~ grp)
dds <- DESeq(dds)
deseq2.res <- results(dds)
deseq2.fc=deseq2.res$log2FoldChange
names(deseq2.fc)=rownames(deseq2.res)
exp.fc=deseq2.fc

head(exp.fc)
#  SDAD1 SVOPL SRGAP2C MTND1P2 CNN2P8 IL13
# -0.48840808 0.32122109 -0.55584857 0.00184246 -0.15371042 0.11555792 

Now let’s see how many simulated genes had a log2 fold change greater than 1 by chance.


# Load the fold changes from DESeq2 analysis and order in decreasing order
geneList = sort(exp.fc, decreasing = TRUE) # log FC is shown
head(geneList)

gene <- geneList[abs(geneList) >= 1]
head(gene)

# C1orf216
#-1.129836

Now it’s your turn!  What other probability distributions could we simulate data from to perform a mock RNA seq experiment to determine how many genes could be different by chance? You can even use a bootstrap approach to calculate the p-value after running 1000 permutations of the code. Of course, to circumvent these problems we use adjusted p values but it is always nice to go back to basics and stress the importance of applying statistical methods when looking at differentially expressed genes. I encourage you all to leave your answers in the comment section below to inspire others.

Happy R programming!

Creating Annotated Data Frames from GEO with the GEOquery package

In this post, we will go over how to use the GEOquery package to download a data matrix (or eset object) directly into R and append specific probe annotation information to this matrix for it to be exported as a csv file for easy manipulation in Excel or spreadsheet tools. This is especially useful for sharing data with collaborators who are not familiar with R and would rather look up there favorite genes in a spreadsheet format.

Mining gene expression data from publicly available databases is a great way to find evidence to support you working hypothesis that gene X is relevant in condition Y. You may also want to mine publicly available data to build on an existing hypothesis or simply to find additional support for your favorite gene in a different animal model or experimental condition. In this post, we will go over how to use the GEOquery package to download a data matrix (or eset object) directly into R and append specific probe annotation information to this matrix for it to be exported as a csv file for easy manipulation in Excel or spreadsheet tools. This is especially useful for sharing data with collaborators who are not familiar with R and would rather look up there favorite genes in a spreadsheet format.

First, let’s start by opening an R session and creating a function to return the eset (ExpressionSet) object or the original list object downloaded by the getGEO() function in R.


getGEOdataObjects <- function(x, getGSEobject=FALSE){
# Make sure the GEOquery package is installed
require("GEOquery")
# Use the getGEO() function to download the GEO data for the id stored in x
GSEDATA <- getGEO(x, GSEMatrix=T, AnnotGPL=FALSE)
# Inspect the object by printing a summary of the expression values for the first 2 columns
print(summary(exprs(GSEDATA[[1]])[, 1:2]))

# Get the eset object
eset <- GSEDATA[[1]]
# Save the objects generated for future use in the current working directory
save(GSEDATA, eset, file=paste(x, ".RData", sep=""))

# check whether we want to return the list object we downloaded on GEO or
# just the eset object with the getGSEobject argument
if(getGSEobject) return(GSEDATA) else return(eset)
}

We can test this function on a GEO dataset such as GSE73835 as follows:


# Store the dataset ids in a vector GEO_DATASETS just in case you want to loop through several GEO ids
GEO_DATASETS <- c("GSE73835")

# Use the function we created to return the eset object
eset <- getGEOdataObjects(GEO_DATASETS[1])
# Inspect the eset object to get the annotation GPL id
eset 

You will see the following output:

ExpressionSet (storageMode: lockedEnvironment)
assayData: 45281 features, 6 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM1904293 GSM1904294 … GSM1904298 (6 total)
varLabels: title geo_accession … data_row_count (35 total)
varMetadata: labelDescription
featureData
featureNames: ILMN_1212602 ILMN_1212603 … ILMN_3163582 (45281 total)
fvarLabels: ID Species … ORF (30 total)
fvarMetadata: Column Description labelDescription
experimentData: use ‘experimentData(object)’
Annotation: GPL6887

We will first need to download the annotation file for GPL6887. Then we can create a data frame with the probe annotation categories we are most interested in as follows:


# Get the annotation GPL id (see Annotation: GPL10558)
gpl <- getGEO('GPL6887', destdir=".")
Meta(gpl)$title

# Inspect the table of the gpl annotation object
colnames(Table(gpl))

# Get the gene symbol and entrez ids to be used for annotations
Table(gpl)[1:10, c(1, 6, 9, 12)]
dim(Table(gpl))

# Get the gene expression data for all the probes with a gene symbol
geneProbes <- which(!is.na(Table(gpl)$Symbol))
probeids <- as.character(Table(gpl)$ID[geneProbes])

probes <- intersect(probeids, rownames(exprs(eset)))
length(probes)

geneMatrix <- exprs(eset)[probes, ]

inds <- which(Table(gpl)$ID %in% probes)
# Check you get the same probes
head(probes)
head(as.character(Table(gpl)$ID[inds]))

# Create the expression matrix with gene ids
geneMatTable <- cbind(geneMatrix, Table(gpl)[inds, c(1, 6, 9, 12)])
head(geneMatTable)

# Save a copy of the expression matrix as a csv file
write.csv(geneMatTable, paste(GEO_DATASETS[1], "_DataMatrix.csv", sep=""), row.names=T)

Let’s take a look at the first 6 lines of the data frame we just created with the head() function.

example1As you can see once we export this data frame as a csv file, it is much easier for others to open this file as a spreadsheet and get useful information such as the gene symbol or entrez id with the expression values across the samples.

Hope this helps and happy collaborations!

Creating color palettes in R

In the R post, we will present how to create your own color palettes and how to work with other palettes such as RColorBrewer, wesanderson and hex codes from http://www.colorcombos.com for exciting color palettes.

There are several color palettes available in R such as rainbow(), heat.colors(), terrain.colors(), and topo.colors(). We can visualize these  as 3D pie charts using the plotrix R package.

# Let's create a pie chart with n=7 colors using each palette
library(plotrix)
sliceValues <- rep(10, 7) # each slice value=10 for proportionate slices
pie3D(sliceValues,explode=0, theta=1.2, col=rainbow(n=7), main="rainbow()")

# Let's create a figure with all 4 base color palettes
par(mfrow=c(1, 4))
pie3D(sliceValues,explode=0, theta=1.2, col=rainbow(n=7), main="rainbow()")
pie3D(sliceValues,explode=0, theta=1.2, col=heat.colors(n=7), main="heat.colors()")
pie3D(sliceValues,explode=0, theta=1.2, col=terrain.colors(n=7), main="terrain.colors()")
pie3D(sliceValues,explode=0, theta=1.2, col=topo.colors(n=7), main="topo.colors()")

Screen Shot 2016-07-10 at 9.01.30 AM

Other popular color palettes include the RColorBrewer package that has a variety of sequential, divergent and qualitative palettes and the wesanderson package that has color palettes derived from his films.

library(RColorBrewer)

# To see all palettes available in this package
par(mfrow=c(1, 1))
display.brewer.all()

# To create pie charts from a sequential, divergent and qualitative RColorBrewer palette
par(mfrow=c(1, 4))
pie3D(sliceValues,explode=0, theta=1.2, col=brewer.pal(7, "RdPu"), main="Sequential RdPu")
pie3D(sliceValues,explode=0, theta=1.2, col=brewer.pal(7, "RdGy"), main="Divergent RdGy")
pie3D(sliceValues,explode=0, theta=1.2, col=brewer.pal(7, "Set1"), main="Qualitative Set1")


# And add pie chart with a wes_anderson palette
# we will only use 5 slices in the example since the Darjeeling palette only has 5 colors
library(wesanderson)
pie3D(sliceValues[1:5],explode=0, theta=1.2, col=wes_palette("Darjeeling2"), main="Darjeeling2")

Screen Shot 2016-07-10 at 9.01.45 AMYou can also create your own color palettes in R with your colors of choice with the colors() function or creating a vector with the color names. A great review and cheat sheet for R colors can be found at http://research.stowers-institute.org/efg/R/Color/Chart/.

# To get an idea of the colors available
head(colors())
length(colors()) # 657

# To see all 657 colors as a color chart you can source the R script to generate a pdf version in your working directory

Screen Shot 2016-07-09 at 5.18.32 PM

# We can create choose a palette based on the R chart as follow:
mycols <- colors()[c(8, 5, 30, 53, 118, 72)] #
# or you could enter the color names directly
# mycols <- c("aquamarine", "antiquewhite2", "blue4", "chocolate1", "deeppink2", "cyan4")

# You could also get and store all distinct colors in the cl object and use the sample function to select colors at random
cl <- colors(distinct = TRUE)
set.seed(15887) # to set random generator seed
mycols2 <- sample(cl, 7)

You can also create color palettes with hex color codes. A great example of this is to work with popular color palettes available on the http://www.colorcombos.com website. This website has various palettes you can choose from and even derive color palettes from your favorite websites. For example, let’s grab the color palette from the rjbioinformatics.com website at http://www.colorcombos.com/grabcolors.html .

Screen Shot 2016-07-09 at 5.36.02 PM

After entering the URL of our website, we will receive the hex codes for the color scheme used on the website.

Screen Shot 2016-07-09 at 5.38.34 PM

We can even export the colors as little pencils 🙂

C6D4E1-2F2016-FCFAEA-456789.png

You can also choose from hundred of color schemes based on your color of choice. For example, we will also create a color palette based on the color olive – ColorCombo382.

C3D938-772877-7C821E-D8B98B-7A4012

# For the rjbioinformatics.com color palette
mycols3 <- c("#c6d4e1", "#2f2016", "#fcfaea", "#456789")

# For ColorCombos382 palette
mycols4 <- c("#C3D938", "#772877", "#7C821E", "#D8B98B", "#7A4012")

# Now to get the pie charts for the last four palettes
pie3D(sliceValues,explode=0, theta=1.2, col=mycols, main="colors()")
pie3D(sliceValues,explode=0, theta=1.2, col=mycols2, main="sample(colors(distinct=TRUE)")
pie3D(sliceValues[1:4],explode=0, theta=1.2, col=mycols3, main="rjbioinformatics.com color grab")
pie3D(sliceValues[1:5],explode=0, theta=1.2, col=mycols4, main="ColorCombos382 colorcombos.com")

Screen Shot 2016-07-10 at 9.01.56 AM

We can also create a color palette with the colorRampPalette() to use for heatmaps and other plots. For this example, we will use the leukemia dataset available in the GSVAdata package, which corresponds to microarray data from 37 human acute leukemias where 20 of these cases are Acute lymphoblastic leukemia (ALL) and the other 17 are ALL’s with Mixed leukemia gene rearrangements. For more information on the study please see Armstrong et al. Nat Genet 30:41-47, 2002.

library(GSVAdata)
data(leukemia) # loads leukemia_eset

# Create a matrix from the gene expression eset object
M1 <- exprs(leukemia_eset)

# Get a matrix of the top 50 most variable probes accros the samples
library(genefilter)
topVarGenes <- head(order(-rowVars(M1)), 50)
mat <- M1[ topVarGenes, ]
mat <- mat - rowMeans(mat)

# For sample annotation information
head(pData(leukemia_eset))
table(leukemia_eset$subtype)

# Get sample group as a factor the ColSideColors
ALLgroup <- as.factor(pData(leukemia_eset)[colnames(M1), 1])

# Get the colors for the ALL subtype
sidecols <- c("#4FD5D6", "#FF0000")

# Here is a fancy color palette inspired by http://www.colbyimaging.com/wiki/statistics/color-bars
cool = rainbow(50, start=rgb2hsv(col2rgb('cyan'))[1], end=rgb2hsv(col2rgb('blue'))[1])
warm = rainbow(50, start=rgb2hsv(col2rgb('red'))[1], end=rgb2hsv(col2rgb('yellow'))[1])
cols = c(rev(cool), rev(warm))
mypalette <- colorRampPalette(cols)(255)

library("gplots") # for the heatmap.2 function
par(mfrow=c(1,1))

png(filename="Heatmap_Example.png", width=12, height=10, units = 'in', res = 300)
heatmap.2(mat, trace="none", col=mypalette, ColSideColors=sidecols[ALLgroup],
labRow=FALSE, labCol=FALSE, mar=c(6,12), scale="row", key.title="")
legend("topright", legend=levels(ALLgroup), fill=sidecols, title="", cex=1.2)
graphics.off()

Heatmap_Example

Now you are all set to work with and create your own awesome color palettes! Happy R programing 🙂

 

Working with Venn Diagrams

In this post, we will learn how to create venn diagrams for gene lists and how to retrieve the genes present in each venn compartment with R.

In this post, we will learn how to create venn diagrams for gene lists and how to retrieve the genes present in each venn compartment with R.

In this particular example, we will generate random gene lists using the molbiotools gene set generator but you can use your own gene lists if you prefer. Specifically, we will generate a random list of 257 genes to represent those that are upregulated in condition and another list of 1570 genes to represent those that are upregulated in condition B.

Screen Shot 2016-06-21 at 2.01.15 PM

Then, we will sort and paste the gene lists in an excel document we will save as randomGeneLists.xlsx.

Now, let’s load the data into R using the gdata package.

library("gdata")
geneLists <- read.xls("randomGeneLists.xlsx", sheet=1, stringsAsFactors=FALSE, header=FALSE)
head(geneLists)

# Notice there are empty strings to complete the data frame in column 1 (V1)
tail(geneLists)

# To convert this data frame to separate gene lists with the empty strings removed we can use lapply() with our home made  function(x) x[x != ""]
geneLS <- lapply(as.list(geneLists), function(x) x[x != ""])

# If this is a bit confusing you can also write a function and then use it in lapply() 
removeEMPTYstrings <- function(x) {

 newVectorWOstrings <- x[x != ""]
 return(newVectorWOstrings)

}
geneLS2 <- lapply(as.list(geneLists), removeEMPTYstrings)

# You can print the last 6 entries of each vector stored in your list, as follows:
lapply(geneLS, tail)
lapply(geneLS2, tail) # Both methods return the same results

# We can rename our list vectors
names(geneLS) <- c("ConditionA", "ConditionB")

# Now we can plot a Venn diagram with the VennDiagram R package, as follows:
require("VennDiagram")

VENN.LIST <- geneLS
venn.plot <- venn.diagram(VENN.LIST , NULL, fill=c("darkmagenta", "darkblue"), alpha=c(0.5,0.5), cex = 2, cat.fontface=4, category.names=c("A", "B"), main="Random Gene Lists")

# To plot the venn diagram we will use the grid.draw() function to plot the venn diagram
grid.draw(venn.plot)

# To get the list of gene present in each Venn compartment we can use the gplots package
require("gplots")

a <- venn(VENN.LIST, show.plot=FALSE)

# You can inspect the contents of this object with the str() function
str(a)

# By inspecting the structure of the a object created, 
# you notice two attributes: 1) dimnames 2) intersections
# We can store the intersections in a new object named inters
inters <- attr(a,"intersections")

# We can summarize the contents of each venn compartment, as follows:
# in 1) ConditionA only, 2) ConditionB only, 3) ConditionA & ConditionB
lapply(inters, head) 

example2

Now you are ready, to review the genes in each section of the venn diagram separately. Alternatively, you can always use Venny web tool that is a great way to start looking at your data and then write a modified version of this R script to make a more exhaustive figure or facilitate downstream analysis in your script.

Feel free to leave comments or email me at info@rjbioinformatics.com.

Pre-processing .CEL files in R

This post shows you how to compare data from two separate studies without the hassle of tackling batch effects, etc. By scaling and centring the data in both studies, you can look for trends in the data and look for gene expression changes that go in a similar direction.

One of the most efficient ways to pre-process microarray data in R, is to use the oligo R/Bioconductor package. In a few lines of code you can go from raw .CEL files to a normalized data matrix you can work with for downstream analysis. It is particularly useful, if you wish to reanalyse a subset of .CEL files from a previously published dataset.

For example, say we are interested in comparing relative gene expression levels of Atf1, Atf3, Brca1 i in the lung, liver, and bone marrow of CB17 mice to previously published mouse tissue expression data on GEO. From our study, we have a data matrix of normalized expression values we obtained from our microarray study stored in CB17mat object.

First, we will scale all gene expression values by row to obtain center and scale these values to have an idea which organ expresses the highest levels of Atf1, Atf3, and Brca1 relative to the others in our study. To do this we will use the scale() with the default settings center=TRUE and scale=TRUE. Since the scale default function scales and centers columns we will need to transpose our matrix before proceeding.


# From your gene expression matrix stored in CB17mat
genes = c("Atf1", "Atf3", "Brca1")
CB17scal <- t(apply(CB17mat[genes, ], 1, scale))

# We will also add the missing column names to our scaled matrix
colnames(CB17scal) <- colnames(CB17mat)

# You can also plot a heatmap to look at the effects of the scaling on the expression levels across the tissues using the gplots package

# Clusters the rows (and potentially columns) by Pearson correlation as distance method
corrdist = function(x) as.dist(1-cor(t(x), method="pearson"))

# and Ward method as the agglomeration method
hclust.avl = function(x) hclust(x, method="ward.D2")

# with dendrogram="row" and Colv=NA, we are only clustering the rows i.e. genes
png(filename="example1.png", width=10, height=10, units = 'in', res = 300)
heatmap.2(CB17scal, dendrogram="row", Colv=NA, scale="none", key.title="", col=rev(redblue(250)), trace='none', cexCol=1.2, cexRow=1.5, hclustfun=hclust.avl, distfun=corrdist, margins=c(8, 12))
graphics.off()

example1

As you can see Atf1 and Brca1 are relatively higher in bone marrow, whereas Atf3 is relatively higher in the lung compared to the other organs.

Now let’s analyze the liver, lung and bone marrow data from the Large-scale analysis of the human and mouse transcriptomes study from Su et al, 2002, PNAS Apr 2;99(7):4465-70. The individual files from the liver, lung and bone marrow were downloaded from GSE97 and the data was normalized with the oligo package as follows:


# Move the CEL files to ~/MY_WORKING_DIRECTORY/filesToAnalyse
# Set the working directory to the folder you would like to save your results
setwd("~/MY_WORKING_DIRECTORY")

# Load the libraries needed for the analysis
library("oligo")
library("pd.mg.u74a")

# Load the packages needed for the analysis
# You can choose to save the CEL files for your tissues of interest
geneCELs <- list.celfiles("~/MY_WORKING_DIRECTORY/filesToAnalyse", full.names=TRUE)

affyGeneFS <- read.celfiles(geneCELs)
affyGeneFS

# RMA at the probet level
geneCore <- rma(affyGeneFS)

# Inspect the eset object
head(exprs(geneCore))
head(featureNames(geneCore))

# For the featureData info for the array used in this study
library("annotate")
#library("mouse4302.db")

source("http://bioconductor.org/biocLite.R")
library("mgu74a.db")
probeids <- featureNames(geneCore)
geneAnnotation <- select(mgu74a.db, probeids, c("SYMBOL", "ENTREZID", "GENENAME"), multiVals="first")

# Save the ESET data and annotation
save(geneCore, geneAnnotation, file="geneCoreTissueV2.RData")
saveRDS(geneCore, "geneCore.rds")
saveRDS(geneAnnotation, "geneAnnotation.rds")

# Convert the eset object to a matrix to get the gene expression values
# for Atf1, Atf3, and Brca1
M1 <- exprs(geneCore)
head(M1)

Gene_Symbols <- sapply(rownames(M1), getGeneSymbol, df=geneAnnotation)
DM1 <- data.frame(M1, Gene_Symbols=Gene_Symbols[rownames(M1)], stringsAsFactors=FALSE)

# Use can create a function to select the probe with the top interquantile range with IQR()
# to represent the gene expression value or use the TopIqrSymbolMatrix()
# in the "Useful Functions to Work with Microarrays" post (coming soon!)

tissueM1 <- TopIqrSymbolMatrix(DM1)
colnames(tissueM1)
tissueExprs <- tissueM1[, 1:ncol(M1)]
rownames(tissueExprs) <- tissueM1$Gene_Symbols
head(tissueExprs[, 1:4])

# To get the geo file ids
geoColnames <- gsub(".CEL", "", colnames(tissueExprs))

# Get the new colnames based on tissue
# Create an excel sheet with the GEO .CEL file id and tissue sample names you would like to use
# Load the excel sheet as a data frame using the gdata package
library("gdata")
TissueNames <- read.xls("GSE97_FileSampleIdentifier.xlsx", sheet=1, stringsAsFactors=FALSE, row.names=1)
TissueSamples <- TissueNames[geoColnames, 2]

# Replace the tissue exprs with the tissue names
colnames(tissueExprs) <- TissueSamples
saveRDS(tissueExprs, "tissueExprs.rds")
head(tissueExprs)

# Now create a matrix for our 3 genes of interest
genes = c("Atf1", "Atf3", "Brca1")
genesMat <- as.matrix(M1[genes, ])

# Let's scale and center the expression values
genesMatScal <- t(apply(genesMat, 1, scale))

# Add the column names
colnames(genesMatScal) <- rep(c("liver", "lung", "bone marrow"), each=2)

png(filename="example1b.png", width=10, height=10, units = 'in', res = 300)
heatmap.2(genesMatScal, dendrogram="row", Colv=NA, scale="none", key.title="", col=rev(redblue(250)), trace='none', cexCol=1.2, cexRow=1.5, hclustfun=hclust.avl, distfun=corrdist, margins=c(8, 12))
graphics.off()

example1b

By comparing both heatmaps, we can see that Atf1 and Brca1 have more similar expression patterns across the tissues than Atf3 in these mice compared to our CB17 mice tissue data.
Therefore, Brca1 and Atf1 tissue difference might be more consistent among the tissues from different mice.

Alternatively, it could just be an artifact of the small sample size and/or how the samples were processed before running the arrays. That being said, it’s a good thing to reanalyse studies and compare with your arrays to get an idea of sample variability and how the experimental design, pre- and post- processing affects the overall interpretation of the results.

Feel free to leave comments or email me at info@rjbioinformatics.com.