Material and Methods 
cDNA clones and microarray production
The sets of 22 654 and 42 611 human cDNA genes/clones used in this study were obtained from Research Genetics (Huntsville AB, USA) (http://www.resgen.com/). The cDNA microarrays used in this study were made as previously described1,2. Detailed protocols are available at The Old Microarray Homepage and The Brown Lab's MGuide.

Common Reference Sample

Each of the 46 experimental samples tested here was analyzed by a comparative hybridization, using a common "reference" mRNA pool as a standard; this reference sample was composed of equal mixtures of mRNA isolated from 11 established human cell lines (MCF7, Hs578T, OVCAR3, HepG2, NTERA2, MOLT4, RPMI-8226, NB4+ATRA, UACC-62, SW872, and Colo205: see Common Reference Cell Line List for more details). The 11 cell lines were all grown to 70-90% confluence in RPMI medium containing 10% Fetal Calf Serum and Penicillin/Streptomycin. The cells were harvested either by scraping or centrifugation, and quickly resuspended in RNA lysis buffer and mRNA prepared as described in Perou et al.3. In each case, multiple individual mRNA preparations were collected for each cell line, which were then pooled together and analyzed via Northern analysis before final mixing to ensure the quality of the input mRNAs. The 11 mRNA samples were then mixed together in equal amounts, aliquoted in 10mM Tris (7.4), and stored at -80 C until use (2 micrograms of common reference sample was used per microarray hybridization and was always labeled using Cy3).

Specimens and RNA isolation
Frozen tissue samples were archived from soft tissue tumors resected at the Vancouver Hospital & Health Sciences Centre, the Stanford University Medical Center, and the Hospital of the University of Pennsylvania in the period 1993-2000. A total of 41 specimens was used for this study, including 8 gastrointestinal stromal tumors (GIST), 8 monophasic synovial sarcomas (SS), 4 liposarcomas (1 dedifferentiated (STT563), 1 myxoid (STT419), 2 pleomorphic), 11 leiomyosarcomas (including one primary & metastatic pair), 8 malignant fibrous histiocytomas (MFH), and 2 benign peripheral nerve shealth tumors (Schwannoma). The clinical features of these tumors are shown on Supplemental Data Table 1. A frozen section was cut from each specimen prior to RNA isolation to confirm that the archived material was representative of the case. Frozen tissue specimens were anonymized and assigned an experimental code. Tissue was homogenized in Trizol reagent (GibcoBRL) and total RNA was prepared as described3; mRNA was then isolated using the FastTrack 2.0 method following the manufacturer’s protocol (see Chuck Perou's Tumor mRNA Isolation Protocol for the detailed protocol).

mRNA Labeling and hybridization to spotted cDNA microarrays
Preparation of Cy3 (green fluorescent) labeled cDNA from reference mRNA and Cy5 (red fluorescent) labeled cDNA from each tumor specimen mRNA, hybridization to 22 000 and 42 000 (22K and 42K) spotted cDNA microarrays, and subsequent analysis was performed as described 4. Halfway through this experiment, a new 42K gene array type replaced the old 22K gene array type and allowed expansion of the total number of genes used from 22 654 to 42 611. For this reason subsequent cases were analyzed on the larger arrays. The reference mRNA was isolated from a pool of 11 cell lines, identical to that described previously 4. Both arrays were prepared as described 4 with detailed protocols available at The Brown Lab and http://genome-www.stanford.edu/molecularportraits/. Five specimens for whom adequate amounts of mRNA were available were analyzed on both 22K (A specimens) and 42K (B specimens) gene arrays. This allowed us to use SVD to identify and correct for the bias introduced by different array types.

Data Analysis
(Supplemental Figures 1-5b); The levels of Cy3 and Cy5 fluorescence for each gene spot on the hybridized arrays were obtained with a Genepix 4000 scanner (Axon instruments), and analyzed with Genepix 3.0 software (Axon instruments). The primary data tables and the image files are stored in the Stanford Microarray Database. Fluorescent ratios were entered in the database for analysis. Uninterpretable spots were manually flagged and excluded. A selection was made from the remaining spots to include only those with at least 80% well-measured data points among the 46 arrays, with a fluorescence ratio at least 3 fold greater than the geometric mean ratio in the specimens examined in at least two arrays. A further selection criterion was that each spot should have a ratio of signal over background greater than 1.4 in either green or red channels. In this manner, 7425 array elements were identified. Hierarchical clustering was then performed as described5. The expression pattern of the tumor set was measured using two different types of slide arrays, one with 22K genes and the other with 42K genes, which contained almost the entire gene set represented on the 22K slide plus approximately 20 000 additional cDNAs, for a total of 42 611 spots (Supplemental Figure 1). To enlarge the total data set, and thereby increase the number of tumors in any single group, the two array sets were combined. For this new combined data set, we included only those genes present on both the 22K and 42K arrays. The combined dataset yielded a similar tumor clustering of the major diagnostic groups as was observed when either of the 2 datasets was analyzed separately (Supplemental Figures 1,2). However, in the combined dataset an influence of the type of array used (22K vs. 42K) on the clustering of the tumors was evident (Supplemental Figure 2). We performed singular value decomposition (SVD) in order to correct for this artifact (Supplemental Figure 3). This technique has previously been used to detect and correct artifacts in time course experiments6 and has been applied in many other fields of research to filter out noise from signal7-9. SVD determines unique dominant orthogonal (or uncorrelated) gene and corresponding array expression patterns (i.e. “eigengenes” and “eigenarrays,” respectively) that can be associated with some of the independent pathways and corresponding cellular states, that make up the similarities and differences among the distinct STT groups. A single “eigengene” was identified that correlated almost perfectly with the 22K versus 42K array bias (Supplemental Figure 4). The influence of this “eigengene” and corresponding “eigenarray” was subtracted from all data. This new data set was reselected for gene expression levels as described above and hierarchical clustering was performed (Supplemental Figure 5). Subsequently, the final data set was again analyzed by SVD (Figures 2,3). A more detailed explanation of the methods, including SVD is provided in the supplemental information section on this website (Supplemental Information). In addition to hierarchical clustering and SVD analysis, we used a supervised analytical method, SAM (Significance Analysis of Microarrays), to search for differentially expressed genes among different sarcoma diagnoses10.

Material & Methods References
  1. Ross, D. T. et al. Systematic variation in gene expression patterns in human cancer cell lines [see comments]. Nat Genet 24, 227-235 (2000).
  2. Alizadeh, A. A. et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [see comments]. Nature 403, 503-511 (2000).
  3. Perou, C. M., et al. Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci USA 96, 9212-9217 (1999).
  4. Perou, C.M., et al. Molecular portraits of human breast tumours. Nature 406, 747-752 (2000).
  5. Eisen, M.B., et al. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95, 14863-14868, 1998.
  6. Alter, O. et al. Singular value decomposition for genome-wide expression data processing and modeling. Proc Natl Acad Sci USA 97, 10101-10106 (2000).
  7. Swinnen, A. et al. Detection and multichannel SVD-based filtering of trigeminal somatosensory evoked potentials. Med Biol Eng Comput 38, 297-305 (2000).
  8. Zabel, M. et al. Analysis of 12-lead T-wave morphology for risk stratification after myocardial infarction. Circulation 102, 1252-1257 (2000).
  9. Calamante, F. et al. Delay and dispersion effects in dynamic susceptibility contrast MRI: simulations using singular value decomposition. Magnetic Resonance in Medicine 44, 466-473 (2000).
  10. Tusher, V. G. et al. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 98, 5116-5121 (2001).