Results of motifs search

Our global analysis revealed two subsets of mRNAs that may have a unique mode of regulation. The first subset includes the 53 genes with most of their mRNA molecules associated with a single ribosome (peak in the monosome, listed in supplemental Table 1) and the second includes the 31 genes which the majority of the mRNAs are not associated with ribosomes (low occupancy, listed in supplemental Table 2). We attempted to identify a common sequence motif that may regulate their unique behavior. For this, we utilized two motif search algorithms: BioProspector which uses a Gibbs sampling strategy to identify binding sites (1) and MEME, which utilizes expectation maximization strategy (2). We used sequences 140 nts upstream or downstream to either the subset of 53 genes with high peak in the monosome or to the 31 genes (excluding redundant sequences) with low occupancy. We chose 140 nts as this is the average length of the 5'UTR in yeast, but similar results were obtained with sequences 500 nts long.

 

We cannot make any definitive conclusion about sequence motifs in the 5' or 3' regions of these genes from the following reasons: 1) All motifs show high degree of degeneracy, and in cases where there are conserved sequences these are stretches of A or T. Similar stretches appear also in our control dataset (3 sets of randomly selected UTR sequences of 50 yeast genes). 2) In most cases less than half of the genes in each group contain the motifs. 3) Only one motif (at the 5'-UTR of the genes that peak in the monosome) appear to have p value lower than e-02. This motif is highly degenerate and resembles other motifs with higher p value; its significance is not clear.

 

We are therefore currently investigating other analysis tools to try to identify regulatory elements, tools that are more directed to identify RNA motifs (BioProspector and MEME were designed to identify DNA motifs) and that will take into account other factors such as RNA structures.

 1. Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001;:127-38.

2. Timothy L. Bailey and Charles Elkan, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994

 


Results of motifs search by BioProspector

Motifs at the 5'-UTR (140 nts) of genes that peak in the monosome (total of 53 genes)

Motif

# of genes with this motif

P value*

MMMASACMAMWM

27

4*E-03

MMMASAMMAMWH

26

1.1*E-02

RAMAAAYAAMAC

17

1.6*E-02

*The P -value represents the probability to find this motif in a random set of sequences.

Motifs at the 3'-UTR (140 nts) of genes that peak in the monosome (total of 53 genes)

Motif

# of genes with this motif

P value

DHYCARCAAYMR

23

1*E000

WTGRWWCMWMVW

24

1*E000

CMAKWTCRKSCA

22

1*E000

Motifs at the 5'-UTR (140 nts) of genes with low occupancy (total of 31 genes)

Motif

# of genes with this motif

P value

WYMAAYWAGCCT

11

2.8*E-01

WYMMAMWMGCMW

11

3.5*E-01

SWCAMAHWMGCC

7

4.9*E-01

Motifs at the 3'-UTR (140 nts) of genes with low occupancy (total of 31 genes)

Motif

# of genes with this motif

P value

CKMCDSYYKRTY

16

2*E-02

RYTKKTCMASCT

16

2*E-02

WRRCKCCRSTYS

18

6*E-02

Motifs at the 5'-UTR (140 nts) of 3 sets of control genes (total of 50 genes in each set)

Motif

# of genes with this motif

P value

SKMWATYRVTRG

26

4*e-05

HCWAMYRWYAST

21

4*e-04

TTTTYTTTTTYT

26

2*e-03

WTYRATGAKRTK

20

8*e-04

TYTTWTTTYTTT

27

1*e-03

TTTTYTTTTTTT

24

2*e-03

MWYWAYTAWWHK

26

1.5*e-05

WWHRYTAWYAWH

39

2*e-04

CCTTYRYYRAYR

19

3*e-04

Motifs at the 3'-UTR (140 nts) of 3 sets of control genes (total of 50 genes in each set)

Motif

# of genes with this motif

P value

SKMWATYRVTRG

26

4*e-05

HCWAMYRWYAST

21

4*e-04

TTTTYTTTTTYT

26

2*e-03

WTYRATGAKRTK

20

8*e-04

TYTTWTTTYTTT

27

1*e-03

TTTTYTTTTTTT

24

2*e-03

MWYWAYTAWWHK

26

1.5*e-05

WWHRYTAWYAWH

39

2*e-04

CCTTYRYYRAYR

19

3*e-04

Redundancy code

R

A/G

Y

C/T

M

A/C

K

G/T

W

A/T

S

C/G

B

C/G/T

D

A/G/T

H

A/C/T

V

A/C/G


Results of motifs search by MEME

Motifs at the 5'-UTR (140 nts) of genes that peak in the monosome (total of 53 genes)

Motifs are presented in multilevel consensus, showing the most conserved letter(s) at each motif position.

 

Motif

Length

# of genes with this motif

E value*

G

T

A

T

T

T

T

T

T

T

C

T

C

T

C

T

T

T

C

T

T

C

C

C

A

A

C

T

G

T

C

A

C

T

G

T

G

21

15

2.8*e-001

C

A

G

G

A

A

G

C

C

A

A

A

G

A

G

T

G

C

A

G

C

A

A

A

T

A

T

C

C

C

A

C

T

G

A

C

T

T

C

G

A

T

C

A

A

G

C

C

A

T

G

C

G

T

G

G

C

C

G

T

T

28

5

2.2*e+001

C

C

C

G

C

G

G

G

T

C

T

8

4

5.5*e+002

*The E value is the Expected number of alignments with the given information content in a set of random sequences of the same size.

Motifs at the 3'-UTR (140 nts) of genes that peak in the monosome (total of 53 genes)

Motif

Length

# of genes with this motif

E value

G

G

G

C

T

G

G

C

A

G

T

T

C

A

G

G

G

G

G

A

A

T

A

C

A

C

C

A

T

C

T

A

19

7

1.3*e+001

T

G

G

T

T

C

T

C

C

T

T

T

A

G

11

8

2.2*e+001

C

A

C

A

G

T

T

C

G

T

C

A

A

T

C

C

C

G

C

C

T

C

T

A

A

T

T

G

G

C

T

A

A

A

C

T

G

A

T

G

G

21

5

2.2*e+002

Motifs at the 5'-UTR (140 nts) of genes with low occupancy (total of 31 genes)

Motif

Length

# of genes with this motif

E value

T

T

T

T

T

T

C

T

T

T

T

C

C

C

G

A

11

19

2.2*e000

T

A

G

A

T

A

A

G

A

G

T

A

A

A

A

G

A

C

A

G

A

A

A

A

G

A

G

A

T

A

A

A

G

A

A

C

G

G

G

T

T

G

A

A

T

A

C

A

G

T

G

T

T

T

A

T

T

A

A

T

A

41

6

1.6*e+002

T

A

T

C

C

A

A

A

T

C

G

C

T

C

T

G

C

T

A

C

A

C

T

T

T

C

T

C

C

C

T

C

A

A

T

A

G

A

A

C

C

G

C

C

T

A

C

G

G

A

A

A

A

G

A

C

G

G

G

C

A

A

T

A

T

A

C

C

A

T

C

T

C

A

T

C

T

T

G

T

T

C

T

41

4

2.3*e+002

Motifs at the 3'-UTR (140 nts) of genes with low occupancy (total of 31 genes)

Motif

Length

# of genes with this motif

P value

C

T

T

T

C

T

T

T

T

T

C

T

A

G

C

11

13

7.3*e-001

C

A

C

C

T

T

T

G

A

A

T

G

C

C

C

A

T

G

A

C

C

G

A

T

G

G

C

C

G

G

G

18

4

4.6*e+001

G

A

A

A

A

A

A

A

A

A

A

A

A

T

C

T

T

T

T

A

C

14

14

1.2*e+002

Motifs at the 5'-UTR (140 nts) of 3 sets of control genes (total of 50 genes in each set)

Motif

Length

# of genes with this motif

E value

T

T

T

T

T

T

T

C

T

T

C

C

C

G

C

A

T

G

11

22

4.8*e000

G

C

G

G

T

C

T

C

C

A

G

G

C

A

G

T

A

G

G

A

T

G

G

G

T

T

15

4

2.9*e+001

A

G

A

A

T

T

G

G

C

T

G

T

C

C

A

T

G

11

8

8.8*e+001

Motifs at the 3'-UTR (140 nts) of 3 sets of control genes (total of 50 genes in each set)

Motif

Length

# of genes with this motif

E value

G

C

G

C

A

A

A

G

C

T

T

A

A

A

A

A

G

G

G

A

A

A

T

A

G

A

A

A

T

C

A

A

G

T

A

G

C

A

C

C

A

G

G

A

T

C

A

G

A

A

G

T

G

C

C

A

G

A

G

T

C

A

A

G

C

A

G

C

G

C

T

A

A

G

C

A

A

T

A

G

C

G

A

A

T

G

G

C

T

C

T

T

T

T

C

T

G

T

T

G

T

C

T

50

6

1.7*e+002

A

A

C

G

T

C

C

C

G

A

G

A

A

T

G

A

A

G

A

G

A

T

A

T

T

A

T

A

A

A

A

C

A

A

A

G

A

T

A

T

T

G

C

A

G

A

T

C

C

A

C

G

A

C

G

A

G

A

G

T

T

C

A

C

C

G

A

C

A

T

T

G

T

G

T

C

C

T

C

C

C

T

C

A

G

A

A

T

G

A

G

A

A

T

T

A

G

T

T

T

C

T

G

G

G

T

G

C

G

50

5

4.7*e+001

G

G

G

C

C

C

G

C

C

G

G

10

2

3.1*e+002