Material and Methods
cDNA clones and microarray production
The 8102 human cDNA
genes/clones used in this study were obtained from Research
Genetics (Huntsville AB, USA) and were chosen from a set of
15,000 cDNA clones that corresponded to the Research Genetics
Human Gene Filters sets GF200-202 (http://www.resgen.com/). This set of genes contained some redundancy (approximately 300 genes were printed more than once on each array) and contained approximately 4000 named genes, 2000 genes with homology to named genes in other species, and approximately 2000 ESTs of unknown function. In addition to these sequence verified clones, a small set (396) of clones whose sequence had not been verified were also included; all of these clones were excluded from all analyses presented in the figures. All of the non-sequence verified clones are identified in the primary data tables by the prefix SID (Stanford Identifier).
The cDNA microarrays used in this study were made as
previously described1,2. Detailed protocols are available at
and http://cmgm.stanford.edu/pbrown/mguide/index.html. All 84 microarray experiments in this study were conducted using microarrays from a single printing.
mRNA Isolations, Fluorescent cDNA Production and
Following their excision, breast tumor samples
were rapidly frozen in liquid N2 and then stored at -80 C until
use. mRNA was isolated from breast tumors as described in Perou et
al.3, using the Trizol Reagent (Gibco-BRL) and Invitrogen FastTrack
2.0 Kit (all Stanford samples, and see http://genome-www.stanford.edu/sbcmp/web.shtml for the detailed protocol), or using the Trizol Reagent followed by Dynal bead separation for the mRNA purification step (all Norway tissue samples). Isolation of mRNA from cell lines, and all mRNA labeling reactions (1.5-2 micrograms of mRNA/reaction) and microarray hybridizations were performed as described in Perou et al.3.
We identified a systematic difference in the R/G ratio of a subset of genes/cDNA elements that perfectly correlated with where the mRNA samples were prepared (i.e. Stanford or Norway). This "source" artifact affected a small subset of the cDNA elements and caused these genes/cDNAs to have a higher R/G ratio in one set of samples versus the other. The molecular cause of these results are unknown, but may be due to differences in the mRNA isolation protocols, or may reflect biological differences between the samples; at this time we can not distinguish between these possibilities.
Common Reference Sample
Each of the 84 experimental samples
tested here was analyzed by a comparative hybridization, using a
common "reference" mRNA pool as a standard; this reference sample was
composed of equal mixtures of mRNA isolated from 11 established human
cell lines (MCF7, Hs578T, OVCAR3, HepG2, NTERA2, MOLT4, RPMI-8226,
NB4+ATRA, UACC-62, SW872, and Colo205: see Supplementary Information Table 1 for more details). The 11 cell lines were all grown to 70-90% confluence in RPMI medium containing 10% Fetal Calf Serum and Penicillin/Streptomycin. The cells were harvested either by scraping or centrifugation, and quickly resuspended in RNA lysis buffer and mRNA prepared as described in Perou et al.3. In each case, multiple individual mRNA preparations were collected for each cell line, which were then pooled together and analyzed via Northern analysis before final mixing to ensure the quality of the input mRNAs. The 11 mRNA samples were then mixed together in equal amounts, aliquoted in 10mM Tris (7.4), and stored at -80 C until use (2 micrograms of common reference sample was used per microarray hybridization and was always labeled using Cy3).
Breast Tumor Pathology
The 39 individual breast tumor samples and the single fibroadenoma used in this study were collected at either Stanford University in Stanford CA, USA, or at the Haukeland University Hospital in Bergen, Norway. Twenty of the forty breast tumors analyzed here were sampled twice as part of a larger Norwegian study on locally advanced breast cancers (T3/T4 and/or N2 tumors) and have been described previously4; these patients underwent an open surgical biopsy before treatment with doxorubicin monotherapy (range 12-23 weeks), followed by the definitive surgical resection of the remaining tumor after therapy, and were evaluated for clinical responses according to UICC criteria5. In addition to the 20 pairs, there were 8 additional "before" specimens from Norway and 12 tissue specimens from Stanford (all Stanford tumors tested had a diameter of 3cm or larger). Finally, 2 of the 10 Stanford tumor specimens assayed were also paired with a lymph node metastasis from the same patient.
A single pathologist (MvdR) reviewed H&E sections of each tumor,
including all before and after pairs, and made a histological
evaluation of each while blinded to the source. Tumors were graded
using a modified version of the Bloom-Richardson method6. These data
are displayed in Supplementary Information Table 3 and a
representative H&E section of each tumor is posted on our website at
Immunohistochemistry was performed as described previously3,7; the antibodies used included CAM5.2 (specific for keratins 8/18, Becton Dickinson), anti-keratin 5/6 (Boehringer Mannheim), and anti-keratin 17 (Dako).
Microarray Data Analysis
The cDNA microarrays were scanned
with either a General Scanning (Watertown, MA) ScanArray 3000 at 20
microns resolution, or with a prototype Axon Instruments (Foster City,
CA) GenePix Scanner at 10 micron resolution. The output files, which
were TIFF images, were then analyzed using the program ScanAlyze
(M. Eisen; available at http://www.microarrays.org/software).
Fluorescent ratios and quantitative data on spot quality (see
ScanAlyze manual) were stored in a prototype of the AMAD database
(M. Eisen; available at http://www.microarrays.org/software).
Areas of the array with obvious blemishes were manually flagged and
excluded from subsequent analyses. The primary data tables can be
downloaded at http://genome-www.stanford.edu/molecularportraits/, in text/tab delimited format after obtaining a password.
Hierarchical-clustering gene selection criteria for Figure 1
(Supplementary Materials Figure 4); Data were extracted from the database in a single table, with each row representing an array element, each column a hybridization, and each cell the observed fluorescent ratio for the array element in the appropriate hybridization. This table had 9216 rows and 84 columns. Previously flagged spots were excluded, as were spots that did not pass the quality control ScanAlyze parameter of "%pixels > background of at least 0.55 in both the red and green channels". Array elements were also removed if they did not meet the above mentioned "%pixel" quality control measure in a least 80% of the hybridizations analyzed. The data table was then split into tissues and cell lines, and the two subtables were separately median polished (the rows and columns were iteratively adjusted to have median 0) before being rejoined into a single table. We finally then selected for the subset of genes whose expression varied by at least 4-fold from the median in this sample set in at least three of the samples tested (1753 genes satisfied these conditions).
We applied average-linkage hierarchical clustering, as implemented in
the program Cluster (M. Eisen; http://www.microarrays.org/software),
separately to both the genes and arrays. The results were analyzed,
and figures generated, using TreeView (M. Eisen; http://www.microarrays.org/software).
Selection of genes for the "intrinsic" gene subset
(Figure 2 and Supplementary Materials Figure 6). To select a set of genes from the entire 8102 gene set whose variation in expression optimally represented differences between tumors rather than just differences between tumor samples (i.e. the "intrinsic" gene subset used in Figure 2), we assigned a "within-between" score to each gene equal to the mean effect of the gene on the pairwise correlation coefficients of the 22 matched tumor pairs less the mean effect of the gene on the remaining 210 tumor-tumor pairwise correlation coefficients. The "effect" of a gene on a pairwise correlation was defined as the difference in the correlation coefficient with and without data for the gene included. Higher "within-between" scores indicated that the gene had a good tendency to group together paired samples. The 496 genes with a score one standard deviation above the mean score were selected and defined as the "intrinsic" gene subset. To confirm the existence of an "intrinsic" set of genes and to verify that the "within-between" score identified these genes, we examined the predictive quality of the score using a type of "leave-one-out" cross-validation analysis. The entire analysis was repeated 22 times, each with one of the 22 matched pairs completely removed from the analysis. If an "intrinsic" set of genes exists, and if the "within-between" score successfully identifies these genes, we would expect the genes with high scores in each reduced dataset to produce relatively high correlations in the excluded pair. Indeed, when the genes were sorted based on their "within-between" score in each reduced dataset, the correlation coefficient of the excluded matched pair in sliding windows of 250 genes increased progressively with increasing "within-between" score for nearly all of the matched pairs, while no such increase was found when randomly matched pairs were used.
Supplementary Information Methods References
- Ross, D. T. et al. Systematic variation in gene expression patterns in human cancer cell lines [see comments]. Nat Genet 24, 227-235 (2000).
- Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [see comments]. Nature 403, 503-511 (2000).
- Perou, C. M. et al. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A 96, 9212-9217 (1999).
- Aas, T. et al. Specific P53 mutations are associated with de novo resistance to doxorubicin in breast cancer patients. Nat Med 2, 811-814 (1996).
- Hayward, J. L. et al. Assessment of response to therapy in advanced breast cancer. Br J Cancer 35, 292-298 (1977).
- Robbins, P. et al. Histological grading of breast carcinomas: a study of interobserver agreement. Hum Pathol 26, 873-879 (1995).
- Bindl, J. M. & Warnke, R. A. Advantages of detecting monoclonal antibody binding to tissue sections with biotin and avidin reagents in Coplin jars. Am J Clin Pathol 85, 490-493 (1986).
Breast Cancer Portal
| Material & Methods