Figure 4. Genome-wide influence of DNA copy number alterations on mRNA levels. (a) For breast cancer cell lines (gray) and tumor samples (black), both mean-centered mRNA fluorescence ratio (log2 scale) quartiles (box plots indicate 25th, 50th and 75th percentile) and averages (diamonds; Y-value error bars indicate standard errors of the mean) are plotted for each of five classes of genes, representing DNA deletion (tumor/normal ratio <0.8), no change (0.8-1.2), low- (1.2-2), medium- (2-4), and high-level (>4) amplification. P values for pair-wise Student's T-tests, comparing averages between adjacent classes (moving left to right), are 4x10-49, 1x10-49, 5x10-5, 1 x10-2 (cell lines), and 1 x10-43, 1 x10-214, 5 x10-41, 1 x10-4 (tumors). (b) Distribution of correlations between DNA copy number and mRNA levels, for 6,095 different human genes across 37 breast tumor samples. (c) Plot of observed vs. expected correlation coefficients. The expected values were obtained by randomization of the sample labels in the DNA copy number data set. The line of unity is indicated. (d) Percent variance in gene expression (among tumors) directly explained by variation in gene copy number. Percent variance explained (black line) and fraction of data retained (gray line) are plotted for different fluorescence intensity/background (a rough surrogate for signal/noise) cutoff values. Fraction of data retained is relative to the 1.2 intensity/background cutoff. Details of the linear regression model used to estimate the fraction of variation in gene expression attributable to underlying DNA copy number alteration can be found in the Web supplementary information.