Welcome to Webminer

The purpose of this page is to introduce you to Webminer's features and some of the applications for which it may be useful. Below you will find a basic overview and two tutorials.

The first tutorial walks you through the search that Webminer was originally developed for, a hunt for mating-specific membrane proteins that play a role in cell fusion, which led to the identification of PRM1 (Heiman and Walter, J. Cell Biol. 2000). The second tutorial explores some of the more advanced ways to combine pattern matching and boolean searches using cell-type-specific gene expression as a model.

Gene the Miner courtesy of Sarah C. Mutka

Overview
Webminer contains the results of several hundred experiments performed by different labs for various purposes. Each experiment measures the fold change in expression of every yeast ORF between an experimental condition and a control condition. Thus, the 200 or so experiments in Webminer (at the time I write this) contain over a million individual data points.

The experiments are grouped by biological category, such as "Mating" and "Meiosis". Additionally, there is a broad category called "ORF information" that contains information on every ORF other than expression levels. For instance, it contains all ORF promoters, predicted protein sequences, predicted molecular weights, whether the gene is contained within a duplicated block in the genome, as well as many other searchable features.

To use Webminer, you must first frame a question of the style "What genes are expressed in cells grown under such-and-such conditions that also contain such-and-such structural features?" On the Webminer home page, use the pull-down menu to select the relevant categories, then click on the appropriate experiment. A short description of it will appear in the boxes on the right. Clicking on the links beneath them will take you to the author's web site, the PubMed listing of the paper's abstract, and the full text of the paper on-line, if available.

Click the red arrow to add experiments to your search list. When you are done, click the blue arrow to proceed to the next page. Here, you will be asked to set cut-offs to use in searching the data. Genes that are expressed more highly in the experimental condition compared to the control are said to be induced more than one-fold. Genes whose expression decreases in the experiment are said to be induced less than one-fold.

Depending on the particular experimental regime, a very large induction may be anywhere from 20-fold to 1,000-fold. Genes induced less than .01-fold (i.e., repressed more than 100-fold) are rounded to .01-fold. If there is no information on an ORF from a given experiment, then it will never be included in the search results. Therefore, the absence of ORFs from search lists should not be interpreted too strongly. In general, Webminer is intended for coarsely panning these millions of data points for interesting patterns. You should always return to the primary data and verify that the information in Webminer is correct and that the experiment was done as you expect it was before you invest heavily in studying a particular ORF.


Tutorial #1
As an example, try reproducing the data that Webminer was first used to collect. We were interested in identifying genes involved in cell fusion during yeast mating, so we searched for ORFs predicted to encode membrane proteins that were specifically expressed during alpha-factor treatment.

Part one: Building a search list

  1. From the Webminer home page, select "Cell cycle" in the pull-down menu.
  2. Click on "alpha-factor release, 0 min". In this experiment, cells are still arrested by the mating pheromone alpha-factor.
  3. If you like, try clicking on the links on the right to visit the author's web site or the reference material for this experiment.
  4. Click the red arrow to add this experiment to your search list.
  5. Select "ORF Information" from the pull-down menu.
  6. Click on "Maximum hydrophobicity". This dataset contains an index for each ORF that reflects its likelihood of encoding a membrane protein.
  7. Click the red arrow to add it to your search list.
  8. Click the blue arrow to proceed.
Part two: Setting cut-offs for each criterion
  1. For the first criterion, expression in the alpha-factor experiment, we want to find genes induced by pheromone treatment. I found that searching for genes induced more than 3-fold returns a nice set of candidates.
  2. For the second criterion, we want genes that are likely to encode membrane proteins. Many membrane proteins have a hydrophobicity score, using this index, of >28, but some have lower scores. To avoid being too restrictive, ask for proteins with maxh values above 25.
  3. Click the Submit button to continue.
Part three: The results page
  1. If everything went well, Webminer should tell you it found 2,560 ORFs meeting 1 criterion (i.e., either >3-fold induced by pheromone or with a maxh score >25), and 20 ORFs matching both criteria.
  2. You will also see a five-column table, sorted by gene name. Note that named genes beginning with the letter Y or Z will be mixed in at the bottom with the unnamed genes.
  3. Clicking on the blue ORF names in the left-hand column will take you to whatever information on that ORF was available in the YPD database. The pathway and function columns contain short descriptions of the ORF taken from the SGD database.
  4. You should see that all the ORFs are pheromone-induced, possible transmembrane proteins, including ten known to be involved in mating. Additionally, there are ten ORFs of unknown function that, based on the other genes in the set, are good candidates for regulating a step in the mating pathway.
  5. Lastly, you may notice the inclusion of FUS2 in the list. While FUS2 is a mating-specific gene involved in cell fusion, it is not a transmembrane protein. In fact, its MaxH value is 25.49, sneaking in just above the cut-off we had demanded.

Tutorial #2
For a more advanced search, try identifying the sets of haploid-specific genes (hsg), a-specific genes (hsg), and alpha-specific genes (@sg) by combining two datasets using the right search cut-offs.

Use the "MATa/alpha vs. MATa" and "MATa/alpha vs. MATalpha" experiments in the "Mating" class to perform the following searches. Remember that a gene expressed 5-fold more highly in a haploid than in MATa/alpha would appear as "induced less than .2-fold" in these datasets.
Gene set
MAT a/alpha
vs. MATa
MATa/alpha
vs. MATalpha
Example gene you should find
asgless than .2-foldmore than .2-foldSTE2, the alpha-factor receptor
@sgmore than .2-foldless than .2-foldSTE3, the a-factor receptor
hsgless than .2-foldless than .2-foldSTE4, a shared subunit of the pheromone signalling machinery

Thanks to Ira Herskowitz for this example