Skip to main content

Table 1

From: Bioinformatic identification of novel regulatory DNA sequence motifs in Streptomyces coelicolor

Matrix

Protein class

Number of ORFs in this set

Number of ORFs in this set belonging to this protein class

P value

Consensus sequence

2083

2.1.3

Degradation of polysaccharides

54

10

7.557e-10

See Figure 2A

2318

2.2.3

DNA – replication, repair, restriction / modification

21

6

7.76e-08

See Figure 2B

1744

1.2.1

Chromosome replication

24

3

2.12e-06

See Figure 2C

1909

4.1.7

Gram +ve exported / lipoprotein

106

22

9.24e-08

See Figure 2D

46

6.2.1

sigma factor

116

10

6.60e-08

See Figure 2E

2034

6.3.13

ArsR

45

4

1.89e-06

See Figure 2F

1853

3.3.11

Nucleotide interconversions

46

5

4.09e-07

See Figure 2G

363

3.8.0

Secondary metabolism

9

5

4.89e-07

See Figure 2H

571

3.8.0

Secondary metabolism

10

5

9.621e-07

See Figure 2H

293

3.8.0

Secondary metabolism

10

5

9.62e-07

See Figure 2H

153

3.8.0

Secondary metabolism

18

6

1.31e-06

See Figure 2H

  1. Table 1. Position-specific weight matrices (PSWMs) that represent DNA sequence motifs shared by functionally coherent sets of genes in Streptomyces coelicolor. A library of 2497 matrices was generated from alignments of over-represented DNA sequence dyads as described in the Methods section. Each matrix is essentially a statistical model of a DNA sequence motif [58]. The non-coding regions of the S. coelicolor genome were searced against the matrices to find matches to each of the sequence motifs. The scanning method assigneda score (maximum 100) to each match site. The minimum score threshold was chosen as 80. For each matrix, we recorded the number of genes whose upstream region contains at least one match site. We also recorded the number of those genes belonging to each functional category in the protein classification scheme, and calculated a P value to determine whether that functional category was significantly over-represented.