| Welcome to
Webminer
The purpose
of this page is to introduce you to Webminer's features and some of the
applications for which it may be useful. Below you will find a basic
overview and two tutorials. The first
tutorial
walks you through the
search that Webminer was originally developed for, a hunt for
mating-specific membrane proteins that play a role in cell fusion,
which led to the identification of PRM1 (Heiman and Walter, J. Cell Biol.
2000). The second tutorial explores some of the
more advanced ways to
combine pattern matching and boolean searches using cell-type-specific
gene expression as a model.
Gene
the
Miner courtesy of Sarah C. Mutka
|
Overview
Webminer contains the results of several hundred experiments performed by
different labs for various purposes. Each experiment measures the fold
change in expression of every yeast ORF between an experimental condition
and a control condition. Thus, the 200 or so experiments in Webminer (at
the time I write this) contain over a million individual data points.
The experiments are grouped by biological category, such as "Mating" and
"Meiosis". Additionally, there is a broad category called "ORF
information" that contains information on every ORF other than
expression
levels. For instance, it contains all ORF promoters, predicted protein
sequences, predicted molecular weights, whether the gene is contained
within a duplicated block in the genome, as well as many other searchable
features.
To use Webminer, you must first frame a question of the style "What
genes
are expressed in cells grown under such-and-such conditions that also
contain such-and-such structural features?" On the Webminer home
page,
use the pull-down menu to select the relevant categories, then click on
the appropriate experiment. A short description of it will appear in the
boxes on the right. Clicking on the links beneath them will take you to
the author's web site, the PubMed listing of the paper's abstract, and the
full text of the paper on-line, if available.
Click the red arrow to add experiments to your search list. When
you
are
done, click the blue arrow to proceed to the next page. Here, you
will be
asked to set cut-offs to use in searching the data. Genes that are
expressed more highly in the experimental condition compared to the
control are said to be induced more than one-fold. Genes whose expression
decreases in the experiment are said to be induced less than
one-fold.
Depending on the particular experimental regime, a very large induction
may be anywhere from 20-fold to 1,000-fold. Genes induced less than
.01-fold (i.e., repressed more than 100-fold) are rounded to .01-fold. If
there is no information on an ORF from a given experiment, then it will
never be included in the search results. Therefore, the absence of ORFs
from search lists should not be interpreted too strongly. In general,
Webminer is intended for coarsely panning these millions of data points
for interesting patterns. You should always return to the
primary
data and verify that the information in Webminer is correct and
that the
experiment was done as you expect it was before you invest heavily in
studying a particular ORF.
Tutorial #1
As
an example, try reproducing the data that Webminer was first used to
collect. We were interested in identifying genes involved in cell fusion
during yeast mating, so we searched for ORFs predicted to encode membrane
proteins that were specifically expressed during alpha-factor
treatment.Part one: Building a search list
- From the
Webminer home page, select "Cell cycle" in the pull-down menu.
- Click
on "alpha-factor release, 0 min". In this experiment, cells are still
arrested by the mating pheromone alpha-factor.
- If you like, try
clicking on the links on the right to visit the author's web site or the
reference material for this experiment.
- Click the red arrow to add this
experiment to your search list.
- Select "ORF Information" from the
pull-down menu.
- Click on "Maximum hydrophobicity". This dataset
contains an index for each ORF that reflects its likelihood of encoding a
membrane protein.
- Click the red arrow to add it to your search
list.
- Click the blue arrow to proceed.
Part two: Setting
cut-offs for each criterion - For the first criterion,
expression in the alpha-factor experiment, we want to find genes
induced by pheromone treatment. I found that searching for genes
induced more than 3-fold returns a nice set of candidates.
- For
the second criterion, we want genes that are likely to encode membrane
proteins. Many membrane proteins have a hydrophobicity score, using this
index, of >28, but some have lower scores. To avoid being too
restrictive, ask for proteins with maxh values above 25.
- Click
the Submit button to continue.
Part three: The results
page - If everything went well, Webminer should tell you it
found 2,560 ORFs meeting 1 criterion (i.e., either >3-fold induced by
pheromone or with a maxh score >25), and 20 ORFs matching both criteria.
- You will also see a five-column table, sorted by gene name. Note that
named genes beginning with the letter Y or Z will be mixed in at the
bottom with the unnamed genes.
- Clicking on the blue ORF names in the
left-hand column will take you to whatever information on that ORF
was
available in the YPD database.
The pathway and function
columns contain short descriptions of the ORF
taken from the SGD database.
- You
should see that all the ORFs are pheromone-induced, possible transmembrane
proteins, including ten known to be involved in mating. Additionally,
there are ten ORFs of unknown function that, based on the other genes in
the set, are good candidates for regulating a step in the mating
pathway.
- Lastly, you may notice the inclusion of FUS2 in the list.
While FUS2 is a mating-specific gene involved in cell fusion, it is not a
transmembrane protein. In fact, its MaxH value is 25.49, sneaking in just
above the cut-off we had demanded.
Tutorial #2
For a more advanced search,
try identifying the sets of haploid-specific genes (hsg), a-specific genes
(hsg), and alpha-specific genes (@sg) by combining two datasets
using the right
search cut-offs.- hsg are expressed in MATa
AND MATalpha cells but NOT in MATa/alpha
- asg are expressed in MATa cells but NOT in MATalpha or
MATa/alpha
- @sg are expressed in MATalpha cells but NOT in MATa or
MATa/alpha
Use the "MATa/alpha vs. MATa" and "MATa/alpha vs. MATalpha" experiments
in the "Mating" class to perform the following searches. Remember
that a
gene expressed 5-fold more highly in a haploid than in MATa/alpha would
appear as "induced less than .2-fold" in these datasets.
| Gene set | MAT a/alpha
vs. MATa | MATa/alpha
vs. MATalpha | Example gene you should find
|
| asg | less than .2-fold | more than .2-fold | STE2, the
alpha-factor receptor
|
| @sg | more than .2-fold | less than .2-fold | STE3, the
a-factor receptor
|
| hsg | less than .2-fold | less than .2-fold | STE4, a shared
subunit of the pheromone signalling machinery
|
Thanks to Ira Herskowitz for this example