Population diversity and antibody selective pressure to Plasmodium falciparum MSP1 block2 locus in an African malaria-endemic setting

Background Genetic evidence for diversifying selection identified the Merozoite Surface Protein1 block2 (PfMSP1 block2) as a putative target of protective immunity against Plasmodium falciparum. The locus displays three family types and one recombinant type, each with multiple allelic forms differing by single nucleotide polymorphism as well as sequence, copy number and arrangement variation of three amino acid repeats. The family-specific antibody responses observed in endemic settings support immune selection operating at the family level. However, the factors contributing to the large intra-family allelic diversity remain unclear. To address this question, population allelic polymorphism and sequence variant-specific antibody responses were studied in a single Senegalese rural community where malaria transmission is intense and perennial. Results Family distribution showed no significant temporal fluctuation over the 10 y period surveyed. Sequencing of 358 PCR fragments identified 126 distinct alleles, including numerous novel alleles in each family and multiple novel alleles of recombinant types. The parasite population consisted in a large number of low frequency alleles, alongside one high-frequency and three intermediate frequency alleles. Population diversity tests supported positive selection at the family level, but showed no significant departure from neutrality when considering intra-family allelic sequence diversity and all families combined. Seroprevalence, analysed using biotinylated peptides displaying numerous sequence variants, was moderate and increased with age. Reactivity profiles were individual-specific, mapped to the family-specific flanking regions and to repeat sequences shared by numerous allelic forms within a family type. Seroreactivity to K1-, Mad20- and R033 families correlated with the relative family genotype distribution within the village. Antibody specificity remained unchanged with cumulated exposure to an increasingly large number of alleles. Conclusion The Pfmsp1 block2 locus presents a very large population sequence diversity. The lack of stable acquisition of novel antibody specificities despite exposure to novel allelic forms is reminiscent of clonal imprinting. The locus appears under antibody-mediated diversifying selection in a variable environment that maintains a balance between the various family types without selecting for sequence variant allelic forms. There is no evidence of positive selection for intra-family sequence diversity, consistent with the observed characteristics of the antibody response.

were individual-specific, mapped to the family-specific flanking regions and to repeat sequences shared by numerous allelic forms within a family type. Seroreactivity to K1-, Mad20-and R033 families correlated with the relative family genotype distribution within the village. Antibody specificity remained unchanged with cumulated exposure to an increasingly large number of alleles.

Conclusion:
The Pfmsp1 block2 locus presents a very large population sequence diversity. The lack of stable acquisition of novel antibody specificities despite exposure to novel allelic forms is reminiscent of clonal imprinting. The locus appears under antibody-mediated diversifying selection in a variable environment that maintains a balance between the various family types without selecting for sequence variant allelic forms. There is no evidence of positive selection for intrafamily sequence diversity, consistent with the observed characteristics of the antibody response.

Background
Around 40% of the world's population is at risk from malaria. Current widespread parasite drug resistance and insect pesticide resistance call for urgent development of new control tools, including malaria vaccines. Rationale vaccine development is challenged by the complexity of the life cycle and the large number of potential vaccine targets [1,2]. The search for genetic evidence of diversifying selection has been proposed as a strategy to identify major targets of protective immunity [3]. Several antigens under putative immune selection have been uncovered this way [4][5][6][7], including the N-terminal polymorphic domain of the merozoite surface protein-1 (MSP1), called MSP1 block2 [3].
The K1 and MAD20 MSP1 block 2 families are characterised by the presence of central three amino acids repeats.
The various K1-and MAD20-type block2 alleles differ in the number, sequence and relative arrangement of tripeptide repeats and in point mutation polymorphism of the flanking regions. The non-repetitive RO33 alleles only differ by point mutations [8]. The fourth family type called MR, which has been identified recently, results from recombination between the Mad20 and RO33 families [11,16]. Within each MSP1 block2 family, multiple sequence variants have been described.
Analysis of antibody responses in humans living in endemic areas using up to four full length recombinant proteins per family alongside recombinant sub-domains such as repeats only or flanking regions expressed in Escherichia coli [3,[23][24][25]28,[30][31][32][33]36] showed family-specific responses, with no inter-family cross-reactivity. Antibodies to specific sub-types within each family were observed as well [23,25,28,31], and their prevalence varied with malaria transmission conditions [23,24,28]. Monitoring of the antigenic consequences of sequence variation at the single epitope level was done using arrays of synthetic peptides [15,26,27,29]. Interestingly, this showed that sera from mice immunised with a full length recombinant protein reacted with peptides derived from the immunising allele but not with any of its sequence variants [23,27]. Sequence-dependent specificity of individual epitopes was similarly outlined using monoclonal antibodies [15,22,37]. In African populations exposed to P. falciparum, the response to MSP1 block2, assessed using synthetic sequence variants displayed a restricted specificity [15,26,27].
The antibody response to MSP1-block2 correlated with PCR typing of the parasites present at the time of plasma collection in some settings [25], weakly in some others [3,31] and not in others [27,33]. In Senegal, fine specificity of the antibodies to MSP1 block2 did not match with the infecting type and moreover was fixed over time, with no novel antibody specificity acquired upon cumulated exposure to multiple infections [27]. Interpretation of these studies has been limited insofar as molecular sequence data and sequence-specific serological responses were not gathered from the same population/setting [15], or sequence data were generated without exploring the immune response [9][10][11][12][13][14]16,17] or alternatively, immunological responses were studied without detailed knowledge of the actual sequence polymorphism of the local population [23][24][25][26][27][28]30,33]. Thus, whether the acquired antibodies to MSP1 block2 select for parasites presenting novel sequence variants and exert a significant diversifying selection at the epitope level remains to be studied.
We set out to address this question and analysed Pfmsp1 block2 sequence polymorphism and sequence-specific antibody responses using archived samples collected in Dielmo, a Senegalese rural setting. We have analysed sequence polymorphism of the locus over a 10 year period to gain a view of its overall polymorphism and possible temporal evolution. We have explored the humoral response of the villagers to MSP1 block2 using synthetic peptides displaying numerous sequence variants. Serological studies have included a cross-sectional study to measure point prevalence at the village level before a rainy season, a prospective study to explore the relationship between the presence of antibodies to MSP1 block2 at enrolment and protection from clinical malaria episodes during the following five months of intense transmission, and longitudinal follow up of individuals to study temporal antibody variation. This showed evidence for family-specific responses possibly exerting a balancing selection, but gave no support to the notion of antibody selection for variant sequence alleles.
Many samples contained more than one Pfmsp1 block2 type. The average multiplicity of infection estimated from the number of fragments detected (estimated moi -see Methods) was 1.73 Pfmsp1 block2 fragments/sample. This figure does most probably not reflect the actual number of distinct clones present in the patient, as distinct Pfmsp1 block2 alleles yet of similar size are not taken into account and as parasites with identical Pfmsp1 block2 alleles may differ in multiple other loci across their genome.
The number of Pfmsp1 block2 fragments detected was influenced by age (Kruskal Wallis test, p = 0.0192) ( Figure  2); it was highest in the 2-5 y and 6-9 y old children and lowest in the ≥ 20 y old. It was not associated with gender (Kruskal Wallis test, p = 0.670), β-globin type (idem, p = 0.482), ABO or Rhesus blood group (idem, p = 0.234 and p = 0.839, respectively) or with year of study (idem, p = 0.508).
Alleles were assigned to one of three allelic families by nested PCR. Distribution is shown by calendar year. The number of samples typed each year is shown in Table 1. Colour symbols: black: K1-types, white: Mad20-types, grey RO33 types. Note that hybrid alleles were not distinguished from the Mad20-types and are included in the Mad20 group.
Estimated multiplicity of infection by age group Figure 2 Estimated multiplicity of infection by age group. Estimated multiplicity of infection (i.e. the mean number of Pfmsp1 block 2-alleles detected per sample) was calculated from PCR fragments generated in the nested PCR reaction. There were 51, 83, 61, 60 and 51 samples in the 0-1 y, 2-5 y, 6-9 y, 10-19 y and ≥20 y age groups, respectively. The figures shown are the mean and SD.
Similar findings were observed for the Mad20 types alleles, which differed mainly in the number, arrangement and coding sequence of six tripeptide motifs (coded 5-9). There were two synonymous sequences coding for SGG (5 and 5) such that all Mad20-type alleles contained an SGGencoding motif [see Additional file 4]. In this family too, all alleles contained more than one motif sequence. The majority had four distinct nucleotide sequence motifs Codes for K1-and Mad20-types tripeptide repeats together with the nucleotide sequence are as proposed in [9,12]. Sequence of the K1-type alleles are grouped as mentioned in the text based on the 5' di-motif and the presence of motif 4 and 7. Sequence of the Mad20-type alleles are grouped as mentioned in the text based on the 5' motifs. ( Figure 4B), encoding three different tripeptide sequences ( Figure 4C). Some di-motifs were highly represented, with the SVA SGG motif (6 5 or 6 5) being present in virtually all alleles. There was a dichotomy within the family based on the first 5' motif, being either 5/5 (group 1, 8 alleles) or 8 (group 2, 26 alleles) ( Table 2). This group-specific 5' Distribution of Pfmsp1 block2 allele frequency in Dielmo Figure 5 Distribution of Pfmsp1 block2 allele frequency in Dielmo. A. Distribution by family based on sequenced alleles: K1-types (N sequenced = 144), Mad20-types grouped together with hybrid types (N sequenced = 90) and RO33types (N sequenced = 124). Each family is depicted separately, with alleles ranked clockwise by allele number coded as shown in Table 2. B. Relative individual allele frequency in the 358 sequenced fragments (top) and adjusted to the overall population based on relative family distribution established by nested PCR on 524 PCR fragments (bottom). Identical colour codes used for A and B, ordered clockwise as follows: RD types (light blue colours), Hybrids (green and orange), DM (orange-yellow) and DK alleles (indigo-dark blue colours), with alleles ranked clockwise by allele number coded as shown in Table 2.
Frequency distribution of the number of tri-peptide motif usage in the DK and DM alleles  end was followed by a variable copy number and arrangement of six di-motif sequences, which at the protein level translated into variable combinations of the SGG and SVA tripeptides. All Mad20-type block2 repeats except two (DM9 and DM29) terminated with the (5 6 5) sequence. The flanking non repeated region upstream from the tripeptide motifs was identical in all alleles. Downstream from the repeats, a 9 amino acid deletion (NSRRTNPSD) was observed in three alleles, but otherwise the familyspecific region was monomorphic.
Sequencing showed that 22 fragments assigned to the Mad20 family by semi-nested PCR were indeed Mad20/ RO33 (MR) hybrids. We are confident that these alleles are bona fide hybrids and not artifactual PCR products, as they have been observed in 14 of 22 samples where a RO33 allele could not be detected using multiple familyspecific nested PCR reactions. Moreover, 7 of 22 samples where the MR allele was detected by sequencing were monoinfections (i.e. there were no two partners for template switching). This MR hybrid family was quite diverse as eight alleles were observed. Allele DMR1 had a group1 type Mad20 while alleles DMR 2-8 derived from Mad20 group 2. All DMR alleles carried the same 25-residue long, RO33-type downstream region, which interestingly was a RD5 allelic type with a G97D D104N double mutation (Table 2). A novel hybrid, DMRK, displayed a RO33-K1 hybrid sequence in the family-specific 3' region (the K1 sequence located in 3' is underlined in Table 2) [for further analysis see Additional file 4].
The large local diversity was associated with a large number of low frequency alleles in the K1 and Mad20/MR family types, contrasting with the RO33 family where a dominant RD0 allele was observed in 78% (97 of 124) of the sequenced RO33-types alleles ( Figure 5A). At the population level ( Figure 5B)

Tests for neutrality
To gain some insight into a possible positive selection on this locus regarding the level (family and/or intra-family) and the type of selection operating, Ewens-Watterson-Slatkin tests for neutrality [38,39] were conducted. At the family level, i.e. grouping alleles by family type considering three families irrespective of size or sequence polymorphism, this showed a significant departure from neutrality on a yearly basis and when grouping all years together, i.e. for a 10 year period (Table 3). Thus, there was evidence for balancing selection at the family level in this setting, the observed homozygosity being lower than expected (Table 3).
We then considered the within family diversity of the K1, Mad20/Hybrids (DMR and DMRK) and RO33 alleles separately to look for evidence of selection within each family (Table 3 lower panels). Tests were performed for each year separately or for the 10 year period. Alleles were differentiated by either size polymorphism or both size and sequence polymorphism. Overall, the null hypothesis was not rejected, implying that there was no evidence for significant within-family balancing selection on the Pfmsp1 block2 locus. The results of these Ewens-Watterson-Slatkin tests need to be interpreted with caution though. These tests are based on the assumption that no recurrent mutation has occurred at the locus studied. Since the mutation rate is known to be high in minisatellite/repetitive sequences, this assumption may be violated. In other words, one cannot exclude that recurrent mutations may have occurred and in turn have artificially reduced our power to detect balancing selection acting at the intrafamily level.
Within the 124 RO33 PCR fragments sampled there was no size polymorphism and six different allele sequences were identified. An alignment of 126 nucleotides for all 124 alleles contained five polymorphic sites, all of which were non-synonymous single nucleotide polymorphisms. This indicates that dN/dS is infinite. Nucleotide diversity (π = average number of differences between any two sequences) was 4.84 × 10 -3 . To examine the possibility of natural selection acting on the RO33 family, Tajima's D and Fu and Li's D* and F* were calculated [40,41]. In view of the high number of segregating sites (N = 5), these tests are expected to show high statistical power for natural selection. No evidence for departure from neutrality was obtained, with non significant Tajima's D value, Fu and Li's D* and F* values (Table 4), thus confirming results obtained using the Ewens-Watterson test.

Anti-MSP1 block2 antibody prevalence and specificity
The sequence-specific antibody response was studied by ELISA using biotinylated MSP1 block2-derived peptides bound to streptavidin-coated plates that overall represented a fair coverage of the sequence diversity observed in the village [see Additional file 9]. We recorded as seropositive any individual reacting with one or more peptide. Seroprevalence was analysed at the village level using an archived cross-sectional study conducted at the beginning of the 1998 rainy season, to which 85% of the villagers had contributed. We recorded as seropositive any individual reacting with one or more peptide. Overall, seroprevalence was 25% (62 of 243 sera analysed). Seroprevalence increased with age and reached 40.5% in adults ( Figure  6). Confirming previous observations in this setting [26,27], all anti-block2 IgGs were exclusively IgG3 [see Additional file 10]. No anti-block2 IgM was detected.
The frequency of recognition of each allelic family mirrored the frequency distribution of the family types within the parasite population ( Figure 7A). The antibody reac-Prevalence of anti-MSP1-block 2 IgG by age group Figure 6 Prevalence of anti-MSP1-block 2 IgG by age group. Seroprevalence was determined using sera collected during a cross-sectional survey conducted before the 1998 rainy season (on 2-3 August 1998) when 243 villagers (i.e. 95% of the village population) donated a fingerprick blood sample. The presence of anti-MSP1 block2 specific IgG was assessed by ELISA on 16 pools of biotinylated peptides (sequence and composition of the pools described in Table 5). Plasma reacting with one or more pool was considered seropositive.

PPADASDSDAKSYAD
The 15-mer peptide sequence is represented in single letter code, and the location of the peptide in the region is indicated. Pools contained equimolar amounts of four to six biotinylated peptides (0.1 nM each).
tion was family-specific and usually restricted to one family, with 73%, 23% and only 4% of the positive plasma reacting with one, two and three allelic families, respectively ( Figure 7B), consistent with our previous survey in this village [27]. Figure 7C shows that antibody response to pools 1 and 2, derived from the adjacent block 1 domain and block 3 respectively, was rare. No immunodominant region was identified within block2. Antibodies to the repeats were detected alongside antibodies to the family-specific N-or C-terminus block2 sequences. Interestingly, when scrutinizing the response at the individual peptide level in the context of the allelic sequence diversity within this village, it was clear that antibodies reacted with motifs displayed by the vast majority of the alleles observed in the village, namely either frequent/universal di-motifs (such as motif 31 in the K1 family) or family-specific unique sequences. Indeed, 24 of 26 villagers with antibodies to K1-type peptides reacted with sequences present in 74 or more of the 77 observed K1 alleles. Similarly, 16 of 16 responders to Mad20-type peptides reacted to sequences present in 32 or more of the 34 observed alleles.
In addition to the family-specific antibodies, some villagers had sequence-variant specific antibodies, namely reacted with only one of sibling peptides while others reacted with multiple sibling peptides displaying sequence variants. For example, within the group of sibling peptides derived from the N-terminus of Mad20 block2 (peptides #04, 13, 25, 11 and 29), some villagers reacted with one peptide (#29), whilst others reacted with two (#29 and 04 or 29 or 11), but none reacted with all five peptides. Likewise for the group of sibling peptides derived from the K1 block1/block2 junction (peptides #46, 61 and 74), some villagers reacted with one (#61), two (#61 and 74) or all three peptides. This suggests that sequence variation indeed translates into antigenic polymorphism. Whether antibody reaction with multiple sequence variants reflects serologic cross-reaction or accumulation of distinct antibody specificities is unclear.

Antibodies to MSP1 block2 and subsequent clinical malaria
To look for a putative association between the presence of anti-MSP1 block2 antibodies and protection against clinical malaria during the subsequent high transmission season, we mined the database for the occurrence of clinical attacks within the 5-month period following the 1998 cross-sectional blood sampling studied above.  Table 5). Plasma reacting with one or more pool was considered seropositive, and grouped by family irrespective of the number of peptides sequences recognised within each of the three family types (i.e. MR alleles were disregarded as such, seropositivity being allocated either to Mad20 or to RO33). The relative distribution of family genotypes was established by nested PCR on 306 samples collected longitudinally during the 1990-9 time period as shown in Table 1. Colour codes K1: dark blue; Mad20: orange, RO33: light blue. B) Frequency of plasma with antibodies reacting with one, two and three allelic families. The number of families recognised is shown irrespective of the actual type recognised (i.e. individuals reacting with only K1types, only Mad20-types or only RO33-types are placed together in the group reacting with one family). C) Frequency of reaction with each peptide pool.

infected bites/person during this time period.
Twenty-nine percent of the seronegative individuals (with no detected anti-MSP1 block2 antibodies) experienced a clinical attack during that period, compared with 15% of individuals with anti-block2 antibodies. Using a Poisson regression model, the crude estimates of the Incidence Rate Ratio (IRR) of malaria attacks associated with the presence of antibodies to one allelic family or ≥ 2 families (no antibodies as reference group) were 0.55 (95%CI: 0.38-0.80) and 0.21 (95%CI: 0.08-0.58), respectively (P < 0.0001). In a multivariate Poisson regression analysis, this association was independent of haemoglobin type or ethnic group. However, it was confounded by age, i.e. within the age groups, there was no significant association between the incidence of clinical malaria attacks and the number of MSP1 block2 allelic families recognized.

Analysis of the response during a high transmission season
To study the impact of novel infections during the transmission season on the humoral response to MSP1 block2, we investigated the fingerprick blood samples collected from 25 seropositive individuals throughout the high transmission season.
By the end of December 1998, namely five months after the cross-sectional sampling, the anti-MSP1 block2 antibody level was reduced by ≥ 2-fold in 15 subjects (59%), had varied less than 2-fold in 9 individuals (36%) (typical profiles are shown in Figure 8 upper and middle panel, respectively) and was ≥ 2-fold higher in one individual ( Figure 8, lower panel). Importantly, when a change was observed, it concerned the intensity of the reaction but not its specificity. In other words, responding individuals usually reacted with the same pool(s) and within the pool(s) with the same individual peptide(s) before and after the transmission season. In none of the studied individuals were novel antibody specificities stably acquired during that time period, despite an elevated infection rate.
We then carried out a follow up of the antibody responses in villagers who experienced clinical malaria during the 5month transmission season, using archived fingerprick sera collected monthly, and when available, sera on the day of the clinical malaria episode. Transient fluctuations were observed, with in some cases boosting of a pre-existing response (see a representative example in Figure 9A), in others a decrease in antibodies (idem Figure 9B) or evidence of a short-lived response (idem Figure 9C). This was also observed in children experiencing multiple clinical episodes during that same time period (idem Figure 9D). In nine out of 10 subjects in whom peripheral blood parasites collected at diagnosis of the clinical malaria episode were genotyped, the three allelic families were detected, and one individual harboured only 2 allelic families. In all 10 cases, infection with an allele against which there was no evidenced pre-existing response did not elicit any long lasting novel antibody specificity.

Long term temporal analysis of the response to MSP1-block2
To analyse antibody patterns over several years, we used archived systematic blood samples collected during the longitudinal survey. Confirming a previous study in this village [27], once acquired, the response to MSP1-block2 was essentially fixed over time. A typical example is shown in Figure 10, where a 6-year follow-up was carried out on child 01/13, starting at 6 months of age. The child had been exposed to a mean of 200 infected bites each year over the six years. A single peptide pool was recognised by this child from the age of 2.5 years onwards ( Figure 10A). The intensity of the signal fluctuated subsequently, including a drop during malaria attacks [e.g. the 2/11/98 Typical profiles of the temporal evolution of MSP1 block2-specific IgG before and after the 1998 rainy season Figure 8 Typical profiles of the temporal evolution of MSP1 block2-specific IgG before and after the 1998 rainy season. Antibodies were assayed from 25 individuals in August 1998 (yellow) and December 1998, i.e. after a rainy season when each inhabitant was exposed to a mean of 170 infected bites. Anti-MSP1 block2 specific IgG was assessed by ELISA on 16 pools of biotinylated peptides. The upper, central and lower panels show a representative example of a reduction of specific antibodies, an essentially unchanged profile, and a boosting of pre-existing responses, respectively.
Temporal fluctuation of MSP1 block2-specific IgG during the 1998 rainy season Figure 9 Temporal fluctuation of MSP1 block2-specific IgG during the 1998 rainy season. Antibodies were assayed on 16 pools of biotinylated peptides (sequence and composition of the pools described in Table 5 blood sample, which was collected on the day a malaria episode was diagnosed presented a lower signal than the preceding (23/10/98) and following (4/12/98) samples], but nevertheless there was a progressive increase with cumulated exposure. Analysis of fine specificity on the individual constituents of peptide pool 11 showed the same pattern for all positive samples collected from this child with recognition of peptides # 46, 61 and 74, namely of the K1-specific block1-block2 junction ( Figure  10B). The occurrence of clinical malaria episodes in this child resulted in temporarily reduced signals (hence antibody levels), but was not associated with stable acquisition of any novel specificity.

Discussion
This first detailed longitudinal survey of Pfmsp1 block2 sequence polymorphism along with the assessment of the specific humoral response within a single endemic setting provides novel insights on the locus at the population level and on the possible selective forces underpinning such a polymorphism. A very large local polymorphism was detected, mainly due to microsatellite type variation, resulting in a very large number of low frequency alleles. Numerous novel alleles were identified here, including novel MR alleles, illustrating the value of in depth analysis of local polymorphism. The humoral response of the villagers, as deduced from the reaction with a series of 15mer peptides, displayed features that illuminate its possible role in selection for diversity. The relative distribution of the family-specific antibody responses mirrored the relative distribution of the family types at the parasite population level. Seroprevalence was moderate. Responses were usually limited to a single family and frequently directed to family-specific sequences present in most of the alleles from that family circulating in the village. This is consistent with a frequency-dependent selection operating at the family level. However, the serological analysis did not outline frequent occurrence of immune responses possibly selecting for sequence variants within that family. It confirmed and expanded on previous observations in this setting [27] of an essentially fixed antibody specificity, despite intense exposure to a very large number of Serological longitudinal follow up of child 01/13 from 6 months to 6 years of age Figure 10 Serological longitudinal follow up of child 01/13 from 6 months to 6 years of age. Antibodies were assayed on 16 pools of biotinylated peptides (A) and to each individual peptide from positive pool 11 (B). The peptide sequence and composition of the pools are described in Table 5. The dates of blood sampling are shown to the right of the graph. A. reactivity on the peptide pool. B. reactivity of three representative blood samples on individual peptides from pool 11.
allelic types. Overall, the data point to a possibly antibody-driven diversifying selection maintaining balanced family types within the population, as proposed by other groups [3,12,23,24,28,33] but do not support the commonly accepted notion that the families accumulate mutations that allow the parasite to circumvent the host's capacity to build up an efficient immune response selecting for sequence variants.
The study design used here differs from previous studies in combining assessment of actual sequence polymorphism with analysis of sequence-specific immune response rather than combining PCR fragment size polymorphism with assessment of antibody responses using recombinant antigens [3,23,25,28,[31][32][33]. No significant temporal fluctuations of the relative distribution of the allelic families was found over the 10-year period investigated, consistent with longitudinal studies in The Gambia using monoclonal antibody serotyping [42], and in Vietnam using PCR-based genotying [20], differing in this regard from studies conducted in Brazil [28,43]. The family distribution obtained here for symptomatic, high density infections was superimposable with the distribution observed in previous cross-sectional surveys of asymptomatic infections [44] [see Additional file 11]. Sequencing showed a very large number of low frequency genetic variants, along with one dominant allele (RD0) and few intermediate frequency alleles (DK65, RD5, DM11). Only 29 out of 126 alleles were detected at a frequency above 1%. The level of polymorphism of the non repeated R033 family was similar to the level observed in the same setting for Pfmsp4, in however a much smaller (30-fold lower) sample size [45]. Tests for neutrality did not show a significant departure from neutrality, for the repeated domains of the K1-, Mad20-and MR-types and for the repeatless RO33 family. The Tajima's test for RO33 is consistent with selectively neutral mutations [46]. Testing the repetitive sequences for selection is difficult, since the mutational and evolutionary processes underlying their diversification are not clearly understood. The Ewens-Watterson (E-W) [38] test is based on the idea that, under neutrality, the observed number of alleles should be consistent with the observed gene diversity. Because of their particular mutation patterns and rates, neutral microsatellites tend to show naturally more alleles than expected from their observed gene diversity [47]. This phenomenon could artificially reduce the effect of balancing selection on allele distribution and as such reduce our ability to detect it. However, the effect of repeated mutations on the distribution of alleles is most of the time rather small and occurs mainly when the observed gene diversity is low which is not the case for MSP1 repeat domains [47]. Hence, if a strong balancing selection is acting on the MSP1 repetitive sequences, we should still be able to detect it. Furthermore, the reported evidence for diversifying selection on the Pfmsp1 block2 locus [3] included the analysis of such repeat-related polymorphisms. When considering fragment size polymorphism, there was no evidence of departure from neutrality either, contrasting with a recent report from Kenya [16], where a different parasite population sampling strategy was used. The 306 samples successfully genotyped here originated from 229 different villagers (approx. 85% of the village inhabitants, with all age groups included) over a decade, whereas the 362 samples analysed in Kenya were collected by repeated sampling from 45 infants during a 4y period [16]. Such repeated sampling from the same sub-group may have biased the analysis of population polymorphism, in particular as successive clinical malaria attacks experienced by a child are each caused by "novel" parasite genotypes [48].
To assess the consequence of sequence diversity on antigenicity, and in the search for evidence of antibody-driven diversifying selection, we opted here for the use of synthetic peptides encompassing a large number of sequence variants, rather than using recombinant proteins expressing an entire MSP1 block2 domain, which exposes multiple antigenic determinants. Whereas recombinant proteins allow to study family cross-reactivity, recognition at the single epitope level is best monitored using synthetic peptides. Individual MSP1 epitopes are displayed by short peptidic sequences, which are recognised by monoclonal antibodies [15] and human sera [15,26,27]. Use of synthetic peptides may result in underrepresenting certain epitopes, including conformational epitopes, and hence in underestimating the overall seroprevalence to the locus. However, interestingly this assessment using synthetic peptides outlined a strikingly similar relative distribution of family genotypes and family-specific antibodies in Dielmo, consistent with observations in other settings monitoring immune responses using recombinant proteins [3,23,25,28,29,[31][32][33]36].
The humoral response of the Dielmo villagers suggested a family-specific selection pressure rather than an antibodymediated selection for sequence variants. Seroprevalence increased with age, but the number of peptides recognised was unrelated to age. Most individuals had antibodies to one family only, and within that family, polymorphic sites as well as common repeat motifs and the more conserved family-specific sequences were recognised. Importantly, antibody specificity remained essentially fixed over time. Confirming previous observations in this setting [27], the long term longitudinal follow up showed that cumulated exposure to an increasing number of Pfmsp1 block2 alleles was usually not associated with stable acquisition of antibody specificities to additional sequence variants. Analysis of anti-MSP1 block2 responses during a transmission season showed that some individuals experiencing a high density clinical episode had their pre-existing responses boosted, while antibodies were transiently undetectable in other patients. In some cases, novel specificities were acquired only transiently, since they were rarely detected a few weeks after the episode and undetected in subsequent longitudinal samplings, where a steady state, essentially stable specificity profile was consistently observed. The response pattern to MSP1 block2 markedly differs from the progressively enlarging antibody repertoire to erythrocyte surface variant antigens (see [49] and references therein). This rather stable steady state specificity profile is highly reminiscent of clonal imprinting. It may reflect particular constraints on the response or stimulation by chronic asymptomatic carriage and/or novel infections, quite frequent in such a holoendemic setting. Clonal imprinting of responses to another P. falciparum merozoite surface antigen displaying variable repeats, namely MSP2 has been suggested in some studies [50,51], but was not supported by studies on PfMSP1block2 responses in a hypoendemic Sudanese setting [25]. The best evidence in favour of clonal imprinting in malaria parasites stems from studies on cellular responses to peptide variants of the CS protein [52].
Studies conducted in other African settings, using recombinant proteins, have outlined several features that are consistent with the observations we made in Dielmo: i) a moderate seroprevalence to MSP1 block2 that increases with age [3,24], ii) recognition of a single family by a large proportion of responders [3,25,30], iii) family-specific and sub-type specific responses [3,[23][24][25] along with recognition of conserved family-specific flanking domains [23,24]; iv) transient acquisition antibody specificity or loss of pre-existing response during a malaria attack [24,25]. Thus in other African settings as well, the MSP1 block2-specific humoral response is unlikely to exert a significant selection favouring the outgrowth of parasites presenting mutant epitopes. This does not rule out a selection by cellular immune effectors, which has not been assessed here. This deserves a detailed study, since sequence variation of the block1-block2 junction has been shown to influence cellular responses [53].
Confirming studies in other areas [3,23,24], the antibodies to one or more MSP1 block2 allelic families were prospectively associated with protection against subsequent clinical attacks. However, multivariate analysis showed this association to be confounded by age, and as such difficult to distinguish from concomitant acquisition by Dielmo villagers of other responses involved in protection. Protection against clinical malaria has been indeed associated with an array of antigens in various endemic settings, including the antigenic variant PfEMP1 exposed onto the infected red blood cell surface [54,55], msp1-19 [56], R23 [57], msp3 [58].
Apart from the RO33 types, the large sequence polymorphism observed in Dielmo was essentially of microsatel-lite type. Variations within the K1, Mad20 and MR families mainly focused on the second and third codon of the tripeptide repeats, involving, furthermore, a restricted set of amino acid residues. As noted by others [16], fragment length did not adequately describe the local genetic diversity. Based on size polymorphism, 55 alleles were identified, but 126 alleles were identified by sequence analysis. All six RO33 alleles had the same size. Some size bins used to group alleles from other families turned out to group a large number of distinct sequences (up to 11 for K1-types and up to 9 for Mad20 types) [see Additional file 12].
Sequence analysis identified numerous novel alleles and specific motif arrangements, with 113 of the 126 Pfmsp1 block2 allele sequences observed in Dielmo being novel. The RO33 types displayed novel point mutation polymorphisms. Compared to the reported sequences, the K1 alleles from Dielmo were more diverse (higher number of distinct motifs), with more frequent usage of motifs 3 and 4, and with a novel K1-type motif encoding the SVT tripeptide (7). The Mad20 types were longer (more repeats per allele), used a restricted set of codons and particular motifs, with a higher occurrence of SGG-encoding motifs, more frequent use of motif 8 and fewer motifs 7 and 4. The MR family accounted for up to 13.3% of all Pfmsp1 block2 alleles from Dielmo, a lower frequency than the 28-29% observed in a Kenyan holoendemic setting [11,16]. We could not identify any epidemiological parameter associated with the presence of MR alleles: there was no association with age, gender, ABO or Rhesus blood group. Interestingly, like the other three families, MR alleles from Dielmo presented specific characteristics.
All harboured a RD5-type RO33 moiety, differing from most MR alleles with a worldwide distribution [11,16]. Furthermore, DMR1 displayed a novel MR subtype with a 5 7 5 motif (Mad20 sub-group 1c) instead of a 8 7 5 motif (Mad20 sub-group 2c). In addition, a novel hybrid with a 3' RO33/K1 hybrid sequence was observed. Whether this DMRK allele was generated by insertion of a SPPADAencoding DNA segment within a MR allele (possibly MR6), or whether this element was inserted within RD5 before recombination with a Mad20 allele is unclear. Insertion of the SPPADA-encoding segment within any allele of the RO33 family has never been reported, but was observed within the K1-type in this study (allele DK67) and in other settings [9]. Observation of a single RO33 progenitor together with a single Mad20 progenitor led Takala et al [16] to propose that the MR family arose from a single recombination event. The present data rather suggest that several separate recombination events involving distinct RO33-types and Mad20-types progenitors have contributed to the generation of this hybrid family.
The characteristic of the Pfmsp1 block2 allelic repertoire in Dielmo is in line with the epidemiological conditions pre-vailing in the village. Unlike the surroundings where transmission is moderate and highly seasonal, transmission in Dielmo is perennial and intense [59]. Therefore, local transmission largely dominates over the import of alleles from the neighbouring area during the 9-10 months of the dry season. As such, Dielmo constitutes a transmission area where a high level of genetic diversity can be maintained. Detailed analysis of parasite population structuring and expansion in this setting awaits study of additional genetic loci. Transmission in the village occurs throughout the year, albeit with marked seasonal fluctuation in entomological inoculation rates and vector species [59]. The seasonal pattern of family distribution may reflect different fitness/survival rates associated with different allelic families under different transmission conditions and/or for different Anopheline vector species. Additional studies are needed to explore this hypothesis further.
Previous studies have surveyed sequence polymorphism across large geographic areas or with a small sample size in a single setting, and as such did not capture the micro-geographic features observed here in a single setting. Better understanding at micro-geographic level is essential to analyse immune responses in the context of the parasite population to which people are exposed. This is critical importance to interpret selective forces on parasite population, and to design rationale control measures accordingly.

Conclusion
The Pfmsp1 block2 locus presents a population sequence diversity larger than we could anticipate from published studies. A very large local polymorphism was detected, mainly of microsatellite type. The humoral response observed here using synthetic peptides was consistent with a frequency-dependent selection operating at the family level. However, there was no evidence for major humoral selection for sequence variants. In contrast, antibody specificity remained fixed over time, despite exposure to novel allelic forms. Such a lack of stable acquisition of novel antibody specificities in response to novel infecting types is reminiscent of clonal imprinting. The locus appears under antibody-mediated diversifying selection in a variable environment that maintains a balance between the various family types without selecting for sequence variant allelic forms. At the family level, intra-family sequence diversity is consistent with a neutral evolution and with the observed characteristics of the antibody response. Finally, the data reported here do not confirm the association of the acquired humoral response to MSP1 block 2 with protection against subsequent clinical P. falciparum malaria attacks.

Study site and patient recruitment
Dielmo, located in Sine Saloum, Senegal, is a village of approximately 250 inhabitants, where malaria is holoen-demic. In 1990, the entire village population was enrolled in a longitudinal prospective study described in detail elsewhere [60]. The main vectors in the village are Anopheles gambiae s.s. and An. funestus [59]. Informed consent was obtained from each adult participant and from parents or legal guardians of each child at the beginning of the study and was renewed on a yearly basis. Individuals could withdraw from the study at any time. Each year the project was reviewed and approved by the Joint Ministry of Health and Pasteur Institute Surveillance Committee.
The retrospective analysis has received ethical clearance from the National Ethical Committee of the Republic of Senegal.

Parasite samples
We studied here 336 samples collected from mild malaria episodes selected from the existing collection of frozen blood samples and analysed for drug resistance markers [61]. The sampling strategy was as follows: From a list of approx 3,400 samples collected longitudinally during a malaria episode, samples were chosen for molecular analysis so as to survey the largest possible panel of villagers.
Since in this hyperendemic setting the heaviest clinical malaria burden is in the <10 y olds and since some children are more susceptible than others [62], we needed to avoid iteration bias due to the increased susceptibility of some individuals. This reduced the risk not only of overrepresenting certain genotypes to which some individuals might be more susceptible than others, but also of overestimating polymorphism, because each of the successive clinical malaria attacks experienced by one person is caused by "novel" parasites [48]. We therefore set an interval of >3 years between two samples from the same individual, with the further restriction that no person could contribute with more than three samples in all. . Whether the failure to amplify Pfmsp1 block2 was due to polymorphism within the primer sequence or a lower sensitivity of the reaction as compared to the other loci is unknown. These DNAs were excluded from the analysis.
In the case of mixed infections where different alleles belonging to the same family were detected by size polymorphism, the bands of different size were excised from the agarose gel, re-amplified with specific primers to recheck the allele type.

Sequencing
PCR products obtained by semi-nested PCR using family specific forward primers were directly sequenced. All Pfmsp1 block2-derived PCR products were purified using polyacrylamide P-100 gel (Bio-Gel, Bio-Rad, 150-4174) on 96 well plates equipped with a 0.45 μm filter (96 well format, Millipore,1887, ref MAHVN4550). The purified product was quantitated by comparing it with DNA quantitation standards (Abgene ® QSK-101) after electrophoresis on 1.2% agarose gel. The sequencing reaction contained 2 μl of PCR product (≥ 20 ng), 1.25 μL 5× Buffer, 1.5 μL BigDye v3.1, 2 μL of 2 μM primer in a 10 μL final volume. Amplification was performed in a GeneAmp9700 (Applied Biosystem) [1 min at 94°C followed by 35 cycles of (10 sec at 96°C, 5 sec at 50°C and 4 min at 60°C), and held at 4°C. The products were then precipitated and sequenced on both strands using an ABI ® prism 3100 DNA analyzer as described [61]. There were a few cases where sequencing of the excised band proved not possible because of ambiguity in base calling, proba-bly reflecting mixture of alleles with similar size. These samples were discarded from the analysis. We retained in the analysis only sequences where base calling was non ambiguous and the signal accounted for more than 95% of the signal for each individual base.
False recombinant alleles can be generated during PCR as a result of template switching, when long amplicons are generated, namely Pfmsp1 blocks 2-6, with cross-over sites identified in the distal part of block 3 and in block 5 [63].
To reduce the risk of this potential pitfall, short regions were amplified (i.e. upstream from the identified crossover sites), with PCR anchored in conserved regions but relatively close to the junction with polymorphic sequences. Second, we verified absence of undesired amplification from deliberate mixture of reference alleles using the semi-nested PCR strategy The PCR fragments were sequenced and all were perfect match with the reported sequence.

Sequence analysis
Pfmsp1 block2 alleles deposited in Genbank were retrieved by repeated blasting using each individual 9-mer nucleotide sequence observed in K1-type or Mad20-type alleles and the full length RO33-type block2 sequence. In addition, K1 alleles reported by Tetteh et al [15] originating from Zambia were included. The curation indicated by Miller et al [8] was included when needed. The various alleles were aligned using ClustalW and curated manually. Redundant alleles were discarded. This resulted in overall 59 distinct K1-type [see Additional file 5], 52 Mad20-type [see Additional file 6], four RO33-type [see Additional file 3] and nine MR-type alleles [see Additional file 7]. The alleles from Dielmo were compared to the reported alleles for the structure of the microsatellites: frequency of the individual tripeptide motifs, overall number of repeats, numbers of each individual tripeptide and combinations thereof (dimers, trimers and tetramers).

Neutrality tests
Allele distribution was analysed using the Ewens-Watterson-Slatkin (EWS) tests [38,39]. The test was applied considering a family as a single allele (i.e. grouping all alleles from that family together) or by considering individual alleles within each family independently. Individual alleles were then classified 1) by size and nucleotide sequence polymorphism or 2) by size polymorphism alone. Ewens-Watterson tests were performed using the software Pypop [64]. Nucleotide diversity within the RO33 family was analysed using Tajima's D test [40] and Fu and Li's test [41] from DnaSP version 4.0 software developed by Rozas et al [65].

Serological analysis
Archived sera, collected throughout the longitudinal follow up were used. Seroprevalence was studied using 243 plasma (i.e. 95% of the village population) collected during a cross-sectional survey conducted on 2-3 August 1998 at the beginning of the rainy season (27, 25, 26, 40 46 and 79 in the 0-2 y, 3-5, 6-8, 9-14, 15-24 and ≥25 y age groups, respectively). A subset of 25 sera collected in December 1998 from individuals whose August 1998 scored positive for antibodies to one or more MSP1-block2 derived peptides was analysed. A follow up of ten individuals during the 1998 rainy season was carried out using the monthly fingerprick blood samples collected on a systematic basis together with a fingerprick sample collected on diagnosis of clinical malaria when available. The entomological inoculation rate during the August-December 1998 period, assessed as described [59], was 170 infected bites/ person. In addition, archived sera from children, collected longitudinally during the survey were used to follow the acquisition of antibodies over a period of several years.
A set of 82 15-mers derived from MSP1 block2 tripeptide repeats and the family-specific flanking region was synthesized by Chiron Mimotopes Pty. Ltd. (Clayton, Victoria, 3168, Australia). There were 34, 31, and 12 K1-Mad20-and RO33-specific sequences. In addition, 5 peptides derived from the junction with block1 were used. The peptide sequences are described in Table 5. The peptides represented the tripeptide combinations observed in Dielmo for the K1 and Mad20 families [see Additional file 9]. These peptides were synthesized with an N-terminal biotin group separated from the peptide sequence by a SGSG spacer and with an amidated C-terminus. All peptides were soluble. A similar set of peptides was used to explore the humoral response in Dielmo villagers in previous studies [26,27]. Based on these results, which showed a restricted specificity, and in view of the limited volume available for several sera, we first screened individual sera using 16 peptide pools (4-6 peptides per pool as described in Table 5) and in a second step analysed the reactivity of the positive sera on individual peptides from each positive pool. ELISA was performed on streptavidincoated plates with either pools of 0.1 nM each biotinylated peptides or 0.5 nM biotinylated peptide adsorbed in each well as described [27]. We checked with control mouse sera and individual human positive controls that peptide dilution within the pool of peptides did not modify the outcome of specificity analysis. Human plasma was tested in duplicate at a 1:500 dilution and bound IgG or IgM was measured using horseradish peroxidase-conjugated goat F(ab')2 to human IgG Fc (γ) or to human IgM Fc (μ) (Cappel, Organon-Technica, Turnhout, Belgium). Optical density (OD) was measured on an Emax reader (Molecular Device) at 450 nm. Control wells without peptide were used to check for potential anti-streptavidin antibodies. The wells that gave a signal twice the OD value of the wells without peptide were considered positive. IgG subclass analysis was performed as described [27].

Association with protection
This was done based on the data gathered during the longitudinal survey protocol and available in the database. Daily clinical surveillance was carried out over the August-December 1998 follow-up period, as described [60,66]. Each villager was visited at home for clinical surveillance and blood films were made in case of fever. The protocol included the notification of all febrile episodes to the medical staff and the controlled use of anti-malarial drugs. A malaria attack was defined as an association of symptoms suggesting malaria with parasitaemia above an age-specific threshold as described [66,67]. An anti-malarial drug cure was administered by the medical staff in all cases of malaria attacks. Procedures to estimate association with protection have been described [56,57,68]. In brief, the number of clinical malaria attacks experienced during the follow-up was analyzed as the dependent variable in a Poisson regression model, using the number of days of presence in the village as exposure variable. The association between the incidence of clinical malaria attacks and independent variables, i.e. presence of antibodies to allelic families, age, haemoglobin type or ethnic group, was tested.

Statistical analysis
Yearly distribution of the 524 PCR fragments by allelic family was analysed by Pearson Chi2 with the assumption that the alleles co-infecting the same individual were independent. Allelic family distribution by gender, age, Hb type, ABO group, Rhesus group and by month was analysed by Fisher's exact test. The allelic family infection rate (percentage of infected individuals harbouring one or more alleles from that family) by gender, β-globin type, ABO or Rhesus blood group, by age (0-1 y, 2-5 y, 6-9 y, 10-19 y and ≥20 y) and by season in the year was analysed by Fisher's exact test. For the analysis of seasonality, the year was divided into three periods based on the rains, the vectors present and the entomological inoculation rate. The mean entomological inoculation rate was 32, 140 and 39 infected bites/person/year in February-May (dry season), June-October (rainy season), and November-January, respectively.
The estimated multiplicity of infection was first analysed using a zero-truncated Poisson regression model, with the assumption of a constant probability to detect an additional allele in a homogeneous carrier population. The mean predicted estimated moi was 1.193 allele/infected individual. The predicted distribution was calculated, grouping the classes with estimated moi ≥ 4 and did not