Expressed Sequence Tags from the oomycete Plasmopara halstedii, an obligate parasite of the sunflower

Background Sunflower downy mildew is a major disease caused by the obligatory biotrophic oomycete Plasmopara halstedii. Little is known about the molecular mechanisms underlying its pathogenicity. In this study we used a genomics approach to gain a first insight into the transcriptome of P. halstedii. Results To identify genes from the obligatory biotrophic oomycete Plasmopara halstedii that are expressed during infection in sunflower (Helianthus annuus L.) we employed the suppression subtraction hybridization (SSH) method from sunflower seedlings infected by P. halstedii. Using this method and random sequencing of clones, a total of 602 expressed sequence tags (ESTs) corresponding to 230 unique sequence sets were identified. To determine the origin of the unisequences, PCR primers were designed to amplify these gene fragments from genomic DNA isolated either from P. halstedii sporangia or from Helianthus annuus. Only 145 nonredundant ESTs which correspond to a total of 373 ESTs (67.7%) proved to be derived from P. halstedii genes and that are expressed during infection in sunflower. A set of 87 nonredundant sequences were identified as showing matches to sequences deposited in public databases. Nevertheless, about 7% of the ESTs seem to be unique to P. halstedii without any homolog in any public database. Conclusion A summary of the assignment of nonredundant ESTs to functional categories as well as their relative abundance is listed and discussed. Annotation of the ESTs revealed a number of genes that could function in virulence. We provide a first glimpse into the gene content of P. halstedii. These resources should accelerate research on this important pathogen.


Background
Sunflower downy mildew is a major disease caused by the Oomycete Plasmopara halstedii (Farl.) Berl et de Toni. The first physiological race of this obligate parasitic oomycete has been identified by Zimmer in North America and Europe [1]. Both in compatible and incompatible interac-tions, host penetration occurs at the lower part of the hypocotyl [2]. Usually, about thirteen days after artificial infection of susceptible lines, the parasite invades almost all the plant tissues and is present in the cotyledons, epicotyls and leaves. In contrast, from the fifth days and onwards, a hypersensitive-like reaction develops within the hypocotyl of resistant lines and in many cases, the parasite's growth is arrested before it reaches the cotyledons [2,3]. Molecular analysis showed that the resistance could be associated with an unusual delayed hypersensitive reaction and a systemic acquired response that take place inside the hypocotyls with the seedlings showing no apparent symptoms [3].
The establishment of the disease or the resistance is the result of the expression of defence genes in the host and virulence or pathogenicity genes in the parasite. In the sunflower, some defence-related genes whose expression varied in compatible and incompatible interactions have been characterized [3,4]. In contrast, genes from P. halstedii potentially involved in the infectious process have not been reported yet. This may be explained by the obligate nature of the development of P. halstedii on its host.
Since completion of the Saccharomyces cerevisiae genome [5], progress on the Genome sequence information and expressed sequence tag (EST) collections from several other parasitic and symbiotic fungi that infect humans, other animals and plants are also becoming more widespread [6,7]. More recently, the whole genome sequences of Phytophthora ramorum and Phytophthora sojae, two major oomycetes pathogens have been reported [8], providing the framework for comparative genomics studies [9] or the identification of specific gene families potentially implicated in the infectious process [10]. Similarly, the availability of the whole genome sequence of Hyaloperonospora parasitica genome should help in the discovery of similar genes in the other oomycetes [11]. The EST (Expressed Sequence Tags) approach represents a relatively simple procedure for finding genes and generating information about their expression in organisms with no genetic research history [12]. For example, in Blumeria graminis, 4908 ESTs representing 1669 individual genes have been obtained by sequencing clones from two cDNA libraries from germinating and ungerminated conidia [13]. In a different work, van der Biezen et al. [14] used the cDNA-AFLP strategy to clone 10 cDNA fragments from the obligatory biotrophic oomycete Hyaloperonospora parasitica (formerly Peronospora parasitica (Fr.)) during infection in Arabidopsis thaliana. Similarly, Casimiro et al . [15] used DD-PCR to identify 21 ESTs from H. parasitica infecting Brassica oleracea.
Here we report the cloning and analysis of 602 EST obtained by Subtractive Suppression Hybridization PCR [16] from sunflower seedlings infected by P. halstedii. In addition, the origin of these ESTs was checked by PCR using specific primers and genomic DNA isolated from P. halstedii or from sunflower.

Expressed sequence tags analysis
After two rounds of subtraction hybridization, cDNAs were cloned into pGemT-easy vector and the bacteria arrayed in 96 well plates. To estimate the average size of the obtained clones, 40 clones were randomly chosen and their inserts were amplified using the SP6 and T7 universal primers present on the vector. The amplification products were then separated by agarose gel electrophoresis. The estimated sizes were between 400 and 800 bp with an average of about 500 bp. Subsequently, 602 clones were randomly chosen and single pass sequenced using the T7 universal primer. The length of good quality sequences was on average between 400 and 500. 51 sequences were of a poor quality or too short (<100 bases) thus were excluded from further analysis.
The remaining 551 sequences were compared to each other using the BlastN program [17] to identify overlapping sequences and assembled into contigs using the CAP3 program [18]. One hundred fifty three clones out of 551 were found as singletons and the remaining 398 clones formed 77 contigs containing at least 2 clones. Thus 230 unisequences were present in 551 cDNA clones analysed. The relative abundance of identical clones within the collection is shown in Table 1. The number of clones per contig ranged from 2 to 78. Half of the sequences were either unique or formed contigs containing 2 or 3 clones. Two contigs only contained respectively 52 and 78 clones, which represent approximately 24% of the total (cf. Table 1).

PCR amplification and origin of ESTs
Because the 230 unisequences could originate either from P. halstedii or represent induced genes in sunflower, PCR primers were designed to amplify these gene fragments from DNA isolated either from P. halstedii sporangia or from Helianthus annuus. All the 230 primer pairs tested amplified fragments either from P. halstedii DNA or from Helianthus annuus DNA; 145 primer pairs amplified fragments only from P. halstedii DNA, indicating that these ESTs were derived from P. halstedii genes that are expressed during infection in sunflower. Figure 1 shows an example of PCR amplifications from DNA isolated from P. halstedii sporangia or from Helianthus annuus.

Homology search
To identify homologs of the 145 ESTs derived from P. halstedii, each EST sequence was queried against the NCBI non-redundant protein database using the BLASTX algorithm [17]. Table 2 shows the proportion of EST sequences with no significant similarity to known protein sequences (> E -05 ), significant similarity (E -05 to E -20 ), or highly significant (< E -20 ). A total of 89 non-redundant sequences, which correspond to 60% of the 145 ESTs obtained, showed significant (< E -05 ) homology to sequences in the NCBI database and thus were retained for functional classification (Table 3).

Functional classification of P. halstedii ESTs
ESTs were assigned to putative cellular roles using the categories defined by Bevan et al. [19] and the Expressed Gene Anatomy Database (EGAD) [20]. Two categories (elicitor and pathogenecity, and cell defence) are added as described by Kamoun et al. [21]. A summary of the assignment of non-redundant ESTs to functional categories as well as their relative abundance is listed in Table 3. The majority of the identified cDNAs were related to protein synthesis, cell metabolism, signal transduction, and cell stress.

Protein-signature scanning and identification of putative secreted proteins
Because a large set of the ESTs (64) were predicted to code for hypothetical proteins or showed no significant homology with known proteins in the databases, we used Sig-nalP 3.0 [22] to identify potential secreted proteins among this set. Additional information on these ESTs was obtained by protein-signature scanning. InterProScan was used for sequence comparison to the InterPro database [23]. The results of these searches are summarized in Table 4. The majority of these ESTs (47) did not display any reported motif or a signal peptide. Among those which displayed significant protein-signatures, five are predicted to contain a signal peptide, thus they may correspond to secreted proteins (Table 4). However, the average size of the ESTs was about 500 bp which is too small to allow the identification of a larger number of potentially secreted proteins.

Identification of pathogenicity-related genes
Functional annotation of the ESTs identified at least 4 that could be potentially involved in the infectious process of P. halstedii. One EST [GenBank:CB174657] showed homology with a Kazal-like serine protease inhibitor from Phytophthora infestans [24]. Alignment of these sequences ( Figure 2) shows that the P. halstedii Kazal-like protein contains the conserved cysteine backbone and the motif C-X3-C-X7-C-X10-C-X6-C-X9-C defining the Kazal family signature [24].
A second EST [GenBank:CB174713] showed a strong homology with a Phytophthora infestans Cystatin-like protein [25]. The P. halstedii cystatin-like protein displays 42% identity with the P. infestans cystatin-like protein Examples of PCR amplification of ESTs originatingeither from sunflower (S) or from Plasmopara halstedii (P) Figure 1 Examples of PCR amplification of ESTs originatingeither from sunflower (S) or from Plasmopara halstedii (P).
Each primer pair was tested using DNA from either sunflower (S), or Plasmopara halstedii sporangia (P   Figure 3). Interestingly, as in the P. infestans protein, the P. halstedii protein contains a potential signal-peptide and conserved domains, including the N-terminal trunk (NT), first binding loop (L1) and second binding loop (L2) [25], suggesting that these proteins may have similar functions.

Comparison with true fungi and other microbes for conserved virulence factors
The PHI-Base is a database containing expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions [26]. Blastx search of this database identified 11 out of the 145 P. halstedii ESTs with significant similarity (E value < E -5 ) (

Comparison with other Oomycetes
Search for homologous sequences in the ESTs collections or whole genomes sequences of different oomycetes showed that 117 out of the 145 P. halstedii ESTs share similarity with sequences in at least one oomycete taxon with Alignment of Cystatin-like proteins from P. infestansand P. halstedii

Discussion
Many plant diseases are caused by parasitic microorganisms for which little molecular information is available. Thus, the large scale sequencing of Expressed Sequence Tags (EST) could be considered as a first step towards understanding the molecular basis of pathogenicity of these microorganisms. This approach allows rapid and exhaustive sampling of transcripts that are regulated during the infection process. For example, 704 unisequences have been identified in the wheat pathogen Mycosphaerella graminicola (Septoria tritici) [27]. In this study, we were interested in the identification of transcripts produced by P. halstedii during the infection of its host. However, a major challenge is the biotrophic nature of this parasite. To overcome this limitation, we decided to use the suppression subtractive hybridization (SSH) method [16]. One of its main advantages is that it allows the detection of low-abundance differentially expressed transcripts, such as many of those likely to be involved in signal transduction.

Redundancy
The 230 unisequences correspond to 153 clones present as singletons and 398 clones corresponding to redundant cDNA which formed 77 contigs ranging from 2 to 78 ESTs. This redundancy rate of 72% is higher than those obtained from other EST sequencing programs, for example, 49% of 1409 clone of N. crassa [28], 53% of 4809 clones from the cambial tissue of poplar [29], and 37% of 1000 clones from Phytophthora infestans [21]. This redundancy rate of 72% could be reduced by sequencing more clones after a differential screen with the most represented clones, elongation factor of P. halstedii and Asparagine synthase of Heliantus annuus which were represented in 78 and 56 copies respectively and account for up to a third of redundancy observed.

Origin of the EST
The 230 unisequences may correspond either to sunflower genes induced upon the infection by P. halstedii or sequences originating from the parasite. To overcome this difficulty, 230 primer pairs were designed and used to amplify the corresponding fragment with sunflower and P. halstedii DNA. This analysis resulted unambiguously in the identification of 145 EST originating from P. halstedii which corresponds approximately to 63% of the total of unisequences. The sequences of these primers are deposited along with the sequences of the ESTs in the Genbank. When the EST belongs to P. halstedii, amplification product was observed only when the DNA of P. halstedii extracted from infected sunflowers is used as template. Conversely, when the EST is originating from sunflower, a faint amplification product is often observed with DNA from P. halstedii (Figure 1). This is due to the contamination of sporangia with sunflower cells when being col- lected from infected cotyledons. Although this PCR strategy is robust, it is not suitable for high throughput sequencing project. Alternatively, in a similar work, Thara et al. [30] confirmed the fungal origin of several EST from the obligate basidiomycete Puccinia tritici by hybridization with radioactively labelled total genomic DNA. When the whole genome sequence of the host is available such as in the model plant A. thaliana, it can be exploited to distinguish between the EST originating from the host and those originating from the parasite. This strategy has been used by van der Biezen et al. [14] to identify 7 genes from the oomycete Peronospora parasitica. The G+C content of the EST also has been used to distinguish Phytophthora sojae from soybean cDNA [31]. The average G+C content of soybean EST was 46% whereas the G+C content of Phytophthora sojae was 58%, and plotting of the ESTs from infected soybean produced two distinct peaks of G+C percentage [31]. In the present study, the percentage G+C contents of ESTs from P. halstedii and from sunflower were entirely overlapping with averages of 47% and 45% respectively (data not shown), making this criterion inappropriate to uncover the origin of an EST in this pathosystem.
In contrast, all the sequences showing homology with sequences from other oomycetes such as Phytophthora species proved to be originating from P. halstedii. Therefore, the growing number of sequences produced in different oomycete EST sequencing projects and the availability of Phytophthora and H. parasitica genome sequences should facilitate the analysis of pathogen genes expressed during host-dependent stages.

Functional classification of P. halstedii ESTs
Many of the ESTs identified in this study are associated with basic metabolisms such as energy production, carbohydrates metabolism, nucleotides and protein synthesis. Protein synthesis process is highly represented which may indicate that this process is actively involved during infection. The most represented EST [GenBank:CB174619] shows significant similarity with the tef1 elongation factor from Phytophthora infestans [32], which is highly expressed during spore germination and mycelium formation [33]. Interestingly, this gene has affinity for actin and tubulin and may be involved in the regulation of the cytoskeleton [33].
The EST [GenBank:CB174624] shows homology with cyclophilin which has a cis-trans isomerase activity [34]. However, this gene has also been identified as a virulence factor in the rice blast fungus, Magnaporthe grisea [35]. It should be interesting to test whether this gene has conserved functions in different plant pathogen species as it was hypothesized by Thara et al. [30].
The homology found between one EST [Gen-Bank:CB174646] and an elicitor from P. megasperma [36] indicates that the EST approach can be a rapid way to generate sequences potentially involved in the pathogenicity process. However, whether this gene has a similar function in P. halstedii has still to be experimentally demonstrated. Additionally, many ESTs share homology with stress-related genes such as superoxide dismutase [Gen-Bank:CB174636] or gluthatione peroxidase [Gen-Bank:CB174637] which may indicate that P. halstedii faces a hostile environment within the sunflower tissues and that such enzymes may detoxify compounds released by the host such as hydrogen peroxide. For example, in Mycobacterium tuberculosis, a superoxide dismutase enzyme contributes to the resistance of the parasite to the oxidative burst in macrophages [37].

Identification of P. halstedii genes potentially involved in the pathogenesis process
Two of the ESTs obtained share significant homology with protease inhibitors from P. infestans. The first one [Gen-Bank:CB174657] contains a Kazal-like domain and is similar to the serine protease inhibitor EPI9 from P. infestans [24]. At least 35 Kazal-like Serine protease inhibitors have been reported in different oomycetes and two of these secreted proteins (EPI1 and EPI10) have been shown to interact and inhibit the apoplastic pathogenesisrelated Protease P69B, a subtilisin-like serine protease of tomato [24,38]. Interestingly, Catanzariti et al. [39] showed that the flax rust avirulence gene AvrP123-A encodes a Kazal-like protein that is recognized by P1 and P2 resistance genes in flax. The putative P. halstedii Kazallike protein possesses all the conserved domains defining the Kazal-like family. The second P. halstedii protease inhibitor like protein [GenBank:CB174713] shares similarity with a P. infestans Cystatin-like protein [38]. Both sequences possess all the signatures sequences of the Cystatin-like protease inhibitors, including the N-terminal trunk, the first loop and the conserved Trp within the second loop. It is likely that these sequences similarities may reflect conserved physiological functions. Thus, it should be interesting to test experimentally whether theses proteins could similarly inhibit proteases of infected sunflowers.

Identification of potentially shared factors with true fungi
We exploited the recently developed PHI-base [40] a database that catalogues the phenotypes resulting from mutations in defined genes of both plant and animal pathogens [26]. Homology search using the P. halstedii ESTs identified 11 sequences with significant matches in this database. Mutation of the identified genes led to reduced virulence in the respective hosts. For instance, the P. halstedii [GenBank:CB174644] is highly similar to a thiol peroxidase (PHI:386) from the basidiomycetous fungus Cryptococcus neoformans. As peroxidases, this gene acts to remove peroxides and provide defence against oxidative damage. Mutation of this gene significantly reduced virulence in mice [41]. We have shown that resistance of sunflower to P. halstedii is associated with an oxidative-like burst within the hypocotyls [3]. Thus, it should be interesting to test whether the P. halstedii thiol peroxidase gene plays a similar role in detoxifying peroxides in sunflower. Sequence homology does not necessarily imply a conserved function, yet many animal and plant pathogens appear to utilize common signaling cascades and protective compounds during their development and pathogenesis [42].

Conclusion
In this study we have initiated an EST approach combined with SSH PCR to obtain for the first time genes from the mycelium of the obligatory oomycete P. halstedii.
Nevertheless, in the long-term process towards the identification of pathogenicity factors in P. halstedii, it should be necessary to test the physiological function of each EST by genetically transforming P. halstedii in planta, as it was developed for Erisyphe graminis f.sp. hordei [43]. Recently, a transient expression of the gfp protein has been reported in P. halstedii sporangia using electroporation and a mechanoperforation method [44]. However, the gfp expression was lost during the subsequent rounds of infection. Alternatively, it should be interesting to know to which extent the metabolic pathways are conserved among the oomycetes and whether the heterologous expression of P. halstedii genes in a transformable oomycete such as P. infestans is useful. Overall, these resources will greatly accelerate research on this important pathogen and could lead to novel perspectives for controlling the pathogenicity of sunflower downy mildew.

P. halstedii isolate and culture conditions
One isolate identified as the physiological race 300 of P. halstedii on sunflower differentials was used. This isolate was provided by Dr D. Tourvieille (INRA-Clermont-Ferrand, France). It has been collected from infected fields in the south of France in 1995 and maintained by asexual reproduction on the sunflower genotype Peredovick, susceptible to all known races of P. halstedii in a containment culture chamber [45]. Frequently, sunflower differentials are infected to assess the behaviour of the isolate. The infection method of sunflower germinated seeds and growing conditions were those described by Mouzeyar et al. [2].

Preparation of mRNA and Suppression Subtraction Hybridization library construction
Total RNA was extracted from 15-day-old infected and non-infected sunflower plants using the method described by Bogorad et al. [46] and the polyadenylated mRNA with the PolyATract mRNA Isolation System (Promega France). Using the Clontech PCR-select™ cDNA Subtraction kit [16], second-strand cDNAs are prepared from the two mRNA populations under comparison. To enrich the cDNA library with clones specifically from P. halstedii, the cDNA from infected plants was used as tester and the cDNA from non infected plants was used as driver. After subtraction, the PCR-amplified cDNA were cloned into pGemT-easy vector and transformed into E. coli JM 109 strain (Promega France).

Sunflower and P. halstedii DNA extraction
Sunflower DNA was extracted from healthy young leaf tissues by Nucleon PhytoPure kit (Amersham™). To extract P. halstedii DNA, 15-day-old infected seedlings were placed in a plastic bag for 2 days to induce sporulation on cotyledons and leaves. Zoosporangia were collected by gently washing out the cotyledons and leaves with sterile water and DNA extracted using the Nucleon PhytoPure kit (Amersham™ France).

DNA sequencing
Clones from the subtracted library were selected randomly and grown in 96-well plates. Each EST was singlepass sequenced using the Dye-Terminator method and the T7 primer (Genome Express, France).

Sequence analysis
Sequences with low quality bases (Phred score less than 20) and short sequences (< 100 nucleotides) were removed from further analysis. The vector and polylinker sequences were manually trimmed. Sequences were then assembled and arranged into unisequences using the BlastN algorithm [17] and CAP3 program [18]. Similarity searches were done using the BlastX program [17] against current version of NCBI "nr" non-redundant amino acid database.

Comparison with true fungi and other microbes for conserved virulence factors
The PHI-base containing the description of curated genes involved in pathogenicity both in animals and in plants were searched for homology with the P. halstedii ESTs [26]. The version 2.31 containing 592 proteins was downloaded and used in BlastX search using the BioEdit package v7.0.8 [47]. The hits with an E-value < E -5 were considered as significant.

Primers design and PCR amplification
Primer pairs were designed using the primer3 program [48] and used to amplify each EST from DNA isolated from P. halstedii or from DNA isolated from sunflower line. The PCR amplifications were carried out with 50 ng DNA in the presence of 0.2 mM of each dNTP, 1 U of Taq DNA polymerase (Advantage 2, Clontech), 1 × Taq polymerase buffer and 0.5 μM of each primer. PCR was carried out in a 9600 Perkin-Elmer thermocycler under the following conditions: 35 cycles of 10 s at 94°C (denaturation), 30 s at 60°C (primer annealing), and 1 min 30 s at 72°C (primer extension). PCR products were separated using standard TAE agarose gel electrophoresis.