Molecular Portraits of Soft Tissue Tumors  
GenExplore the total dataset of 46 arrays x 5520 genes  
Materials and Methods 
Description of Materials and Methods  
Figures and Tables  
Figures, Supplemental Web Figures, and Tables  
Supplemental Information 
Supplemental information on data analysis  
Download complete dataset  
H&E sections from the tumors  

Figures and Tables 
Figure 1.
Representative histology of specimens used for this study, including: gastrointestinal stromal tumor, synovial sarcoma, liposarcoma, leiomyosarcoma, malignant fibrous histiocytoma, and schwannoma. Histologic sections of all specimens used can be viewed on the accompanying webpage.
Figure 2.
A: Complete clustergram of the 46 soft tissue tumor specimens. A row in the cluster represents the relative level of expression for a gene, centered at the geometric mean of its expression level among the 46 samples, and displayed using red (relative high expression) and green (relative low expression) coloration. Tumor specimens are arranged in columns. The dendrogram of the tumor clustering is displayed above and describes the degree of relatedness between tumor samples, with short branches denoting a high degree of similarity. The first three most significant eigengenes and eigenarrays are aligned with the clustergram on the bottom and along the right side, respectively. Eigengene A correlates with the combination of synovial sarcomas and GIST from the remaining specimens, with a negative value corresponding to a diagnosis of either GIST or synovial sarcoma. Eigenarray A shows the genes that contribute to this distinction. Comparisons with the clustergram show that these genes fall into gene clusters that are specific for synovial sarcoma and/or GIST specimens. Likewise eigengene B separates synovial sarcomas (positive value) from GIST specimens (negative value), with values for this eigengene around zero in the remaining specimens. Eigenarray B shows almost perfect correlation with the genes found in the synovial sarcoma and GIST clusters. Finally, eigengene C show a near perfect correlation with the subset of leiomyosarcomas that express a muscle gene cluster, including calponin. B: An essentially similar pattern of gene expression is obtained when the 22K and 42K dataset are centered separately and then combined.
Figure 3.
Representative portions of the tumor specific gene clusters. The spectrum of green to red spots represents the relative centered expression for each gene (sidebar shows fold difference from mean); selected gene names are shown on the right. The branches of the array dendrogram are numbered from 1 to 5 as indicated in the text. Correlation coefficient bar shown to the right side of the dendrogram indicates the degree of relatedness between branches of the dendrogram. Panel a: synovial sarcoma gene cluster (DACH: dachshund, EGFR: epidermal growth factor receptor, CRABP1: cellular retinoic acid binding protein-1, TGFB2: transforming growth factor 2, ENC1: ectodermal-neural cortex-1, NSP: neuron-specific protein Hs. 79404, BMP2: bone morphogenetic protein 2, MSX2: msh homeo box homolog-2, SSX4: synovial sarcoma X breakpoint-4, SSX3: synovial sarcoma X breakpoint-3, FOXC1: forkhead box C1, BMP7: bone morphogenetic protein-7, RARG: retinoic acid receptor ?). Panel b: muscle gene cluster (ACTG2: actin ?2 smooth muscle enteric, MYH11: myosin heavy polypeptide 11 smooth muscle, MYPT2: myosin phosphatase target subunit-2, MYLK: myosin light polypeptide kinase, LMOD1: leiomodin-1 smooth muscle, ACTA2: actin a2 smooth muscle aorta, MYRL2: myosin regulatory light chain-2, SGCA: sarcoglycan a: SLAP: sarcolemmal-associated protein). Panel c: gastrointestinal stromal tumor gene cluster (SPRY1: sprouty homolog-1, CEP2: cdc42 effector protein-2, GUCY1A3: guanylate cyclase 1 a3, MYO6: myosin VI, ABCC4: ATP-binding cassette C4, PCAF: p300/CBP associated factor, prot kinase C: protein kinase C ?, kit: c-kit/CD117, SPRY4: sprouty homolog 4, INPP5a: inositol polyphosphate-5-phosphatase, PTP4A3: protein tyrosine phosphatase type 4A 3, ABCB1: ATP-binding cassette B1, DNCI1: dynein cytoplasmic intermediate peptide 1.
Supplemental Figure 1.
Hierarchical clustering dendrograms of the initial tumor sets hybridized to 22K and 42K arrays: A) 22K slide tumor set and B) 42K slide tumor set.
Supplemental Figure 2.
Hierarchical clustering dendrogram of the combined 22K and 42K arrays, before singular value decomposition. Tumor samples ran on both array types are identified by A for 22K arrays and B for 42K arrays. For each experiment, the type of array is also noted.
Supplemental Figure 3.
3A. SVD analysis of the combined dataset of both 22K and 42K arrays. Raster display of the expression data, with overexpression (red), no change in expression (black), and underexpression (green) around the geometric mean of relative expression, showing linear transformation of the data from the 7425-genes x 46-arrays space to the reduced diagonalized 46-eigenarrays x 46-eigengenes space using the 7425-genes x 46-eigenarrays and 46-eigengenes x 46-arrays basis sets. 3B. Eigenarrays of the combined dataset of 22K and 42K arrays. (a) Complete clustergram of the 46 specimens. (b) Eigenarrays expression in all 7425 genes. At least the top 4 significant eigenarrays, corresponding to the top 4 significant eigengenes, display some order, when the genes are ordered in the clustergram order. 3C. Eigengenes of the combined dataset of 22K and 42K arrays. (a) Raster display of the expression of 46 eigengenes in 46 arrays, with overexpression (red), no change in expression (black), and underexpression(green) around the geometric mean of the relative expression. (b) Bar chart of the probability of eigenexpression of each eigengene, showing about 16% of the overall relative expression in the most significant eigengene, that can be associated with the array-type bias, and about 14%, 10% and 6% of overall relative expression in the next 3 most significant eigengenes, that can be associated with the separation of synovial sarcomas and GIST from the remaining specimens, the separation of synovial sarcomas from the GISTs, and the separation of the subset of leiomyosarcomas that expresses a muscle gene cluster from the rest of the specimens, respectively.
Supplemental Figure 4.
Combined clustergram (panel a), eigengene (panel b, c), and eigenarray (panel d) specific for 22K/42K array bias.
Supplemental Figure 5a.
Hierarchical clustering dendrogram of the combined 22K and 42K array, after subtraction of the eigengene (and corresponding eigenarray) that is associated with the 22K/42K array bias, and after repeating the gene selection procedure (see methods).
Supplemental Figure 5b.
Comparison of the clustergrams from the cluster analysis of the initial combined data set and the subsequent data set that has undergone subtraction of the array-type bias followed by reselection of genes: initial combined data set before SVD (7425 genes), and data set after slide bias subtraction and reselection (5520 genes). The sidebars indicate the areas that encompass the gene sets unique for the GI stromal tumors (green), the synovial sarcomas (blue), and the calponin-positive subset of leiomyosarcomas (red).
Supplemental Figure 6.
Magnified section of hierarchical cluster of 125 genes, including kit (CD117), that correlated with GIST. The first 2 columns show the negative projection rank order and the correlation rank order for each of the 125 genes of eigengene 2. The third column shows the rank order for each gene after SAM analysis that identified those genes resposible for separating GIST from all other tumors. GIST are highlighted in green on the array dendrogram. The rank order of the genes in Web Figure 6 can be correlated with those reported in Web Tables 2 and 3. Please note that in those tables only named genes are included and duplicate genes were removed, hence there is no perfect correlation between rank order number in Web Figure 6 and those in Web Tables 2, 3. (See also Web Table 4).
Web Table 1.
Clinical features of tumors.
Web Table 2.

SVD sorting of genes by projection and correlation with eigengenes.
SVD defines the expression pattern of each gene to be a superposition i.e., a weighted sum of the expression patterns of all eigengenes. The projection of a gene onto an eigengene is the amplitude, i.e., the weight of this eigengene pattern in the expression of the given gene. The projection, therefore, measures the variation in expression of the gene along the direction defined by the eigengene. The correlation of a gene with an eigengene is the ratio between the corresponding projection and the overall amplitude of the expression pattern of the gene. The correlation, therefore, measures the similarity (or distance) between the expression pattern of the gene and that of the eigengene, that is independent of the overall amplitude of the expression pattern of the gene.
Web Table 3.
Significance of Microarray Analysis genes.
Web Table 4.
Comparison of classification of genes by hierarchical clustering, SVD and SAM.
Web Table 5.
List of misplaced genes in raw dataset and final dataset, obtained after SVD.

[ Home | Explore | Materials and Methods | Figures | Supplemental data | Download | Authors ]
Web site last modified on March 12th, 2002