Takes you to the NCI60 Microarray Hompage
Interactively explore details of the figures
Search the NCI60 Microarray dataset
Enhanced figures from the paper plus web supplements
Web supplement with additional figures
Download raw data
Help for interpretation of the data
Individuals and Institutions involved in the NCI60 Microarray Project
A. Tracing the MDA-MB435 cell line lineage
MDA-N was derived from MDA-MB435 by transfection with a plasmid designed to express erbB2. It was not grown under selective pressure. By gene expression pattern, MDA-N and MDA-MB435 are very similar. Both lack expression of erbB2, suggesting MDA-N has lost expression with serial culture. An independent, early passage isolate (MDA-MB435s) obtained from the American Type Culture Collection (ATCC) was significantly different in gene expression from to both MDA-N and MDA-MB435 used by the DTP. It expressed some genes characteristic of the melanoma cluster but also expressed genes not expressed by the DTP isolate. RFLP analyses comparing the ATCC MDA-MB435s isolate with the Developmental Therapeutics Program (DTP) isolates revealed that the DTP isolates (MDA-MB435 and MDA-N) were identical to each other, and similar but distinct from the ATCC. MDA-MB435s differed from the DTP isolates by a gain of two RFLP bands and loss of two bands consistent with genotypic differences found in first degree relatives. The combined results suggest that the DTP and ATCC isolates are clonally related and, therefore, from the same individual. It is unclear whether they are related by clonal evolution in vitro or whether two independent lines derived from the same individual co-existed in early isolates.
B. Reproducibility of the ratios measured in triplicate hybridizations comparing K562 and MCF7 to the mixed reference pool.
To assess the contribution of artifactual sources of variation in the experimentally measured expression patterns, K562 and MCF7 were grown in three independent cultures, and the entire process of mRNA isolation, labeling, and hybridization carried out independently on mRNA extracted from each culture.
The tables show the standard deviation of the triplicate ratios (log2 scale) as a function of the mean normalized fluorescent signal above background in the reference channel of the six reproducibility hybridizations (normalized to the one of the six hybridizations with the highest mean background intensity). The reference signal intensity (background subtracted) was binned into gene sets with increasing minimum signal intensities and plotted against the mean of the standard deviation of the triplicates for each set of binned genes. In the raw data (prior to normalization) the standard deviation of the measured triplicates drops to near its minimum (~0.2) at approximately 150-300 counts above background (total signal dynamic range of 0 to ~50,000 counts).
Figure 3 of the manuscript depicts the cluster analysis of a set of 6831 genes that were selected based on a fluorescence signal intensity >200 intensity units above background (i.e. fluorescence approximately >0.4% of the dynamic range above background) in all six hybridizations in our test of reproducibility. We believe this represents the set of genes whose expression was measured with acceptable precision.
C. Quantitation of co-clustering.
A measure the reproducibility of the measurement of gene expression across the set of 60 cell lines is the frequency with which multiple representations of the same gene cluster adjacent or near one another in the gene cluster tree. Using the 6831spots from the cluster diagram in Figure 3 (selected for high expression in the reference – see supplement A), the histograms below depict the frequency of correlation coefficient values between replicates, and the proximity of replicates in the cluster tree. The analysis was limited to the 218 named genes that were sequence validated and had multiple representations on the array.
60% of clones clustered immediately adjacent to one another while 81% clustered within five genes of one another out of the total cluster tree comprised of 6831 genes. This illustrates the reproducibility and consistency of the measurements, and provides independent confirmation of many of our measurements. This property also demonstrates that these, and probably all, genes have nearly unique patterns of variation across the sixty cell lines. If this were not the case, and multiple genes had identical patterns of variation, we would not expect to be able to distinguish, by clustering on the basis of expression variation, duplicate copies of individual genes from the other genes with identical expression patterns .
D. Search for housekeeping genes
Traditional approaches to measuring gene expression have relied upon the use of "housekeeping" genes that are assumed have an invariant partial concentration across mRNA samples prepared from multiple specimens. As shown below using the set of 6831 genes analyzed in Figure 3 of the manuscript, there were no genes that did not vary by ratios of least 2 fold across this set of cell lines.
Although there are a number of genes that are strongly expressed and vary by two to three fold in the data set, we believe it would be a misnomer to characterize these as "housekeeping" genes. Many of the genes that varied the least across the cell lines comprised part of the proliferation cluster, and vary in relation to cell doubling time. Across a different set of lines or conditions, these genes might be regulated in relation to factors that do not vary across this particular set of lines. We believe the notion of "housekeeping genes", if defined by this experiment, would lead to misclassification of those genes as "unregulated" and therefore could be erroneously used for normalization.