|
Figure 4. Hox genes and topographic differentiation.
(A) Hierarchical clustering of fibroblast cultures based solely on
expression of genes
encoding homeodomain proteins reproduces the clustering by site of
origin. Out of 88
homeodomain-containing genes on the array, 51 were considered well
measured as
indicated by reference channel intensity over background > 1.5 fold and
no less than 80%
informative data. Hierarchical clustering was performed with these 51
genes and the
result is displayed in the same format as in Fig.1. Scale is the same as
Fig. 1C.
(B) Statistical significance of topographic clustering by homeobox
genes. The 51
homeobox genes identified above were clustered using PAM with k=6
clusters and 45
arrays (see Methods). The sites of origin of the fibroblast samples
(abdominal skin, arm,
fetal buttock thigh, fetal lung, foreskin, toe and gum) were taken as the
reference
grouping of 6 clusters. The similarity score comparing the PAM clustering
to the known
site of origin is 36 out of a maximum of 45. To assess the statistical
significance of the
similarity score, 5000 sets of 51 random genes from a data set of 19081
genes filtered as
in Fig.4A were subjected to the same analysis and the histogram of the
similarity scores
are shown. The median of the 5000 similarity scores is shown in blue (21
out of 45).
None of the 5000 trials achieved a score of 36; thus the p- value is
0/5000.
(C) Robustness of topographic clustering. The same analysis in Fig. 4B
was carried out
for 500 of random subsets of 10, 20, 30, 40, or 50 homeobox genes. The
distribution of
the similarity scores is summarized using boxplots. The central box in
each plot
represents the inter-quartile range (IQR), which is defined as the
difference between the
75th and 25th percentiles. The line in the middle of the box represents
the median.
Extreme values greater than 1.5 IQR above the 75 th percentile and less
than 1.5 IQR
below the 25th percentile were plotted individually. Site identity was
reasonably
recovered with as few as 10 homeobox genes which is better than with
random subsets of
51 genes (compare to median score of 21 in Fig. 4B).
|