Genetic epidemiology of infectious diseases in humans: design of population-based studies.

The spread and clinical manifestations of an infection in human populations depend on a variety of factors, among them host genetics. Familial linkage studies used in genetic epidemiology to identify host genes test for nonrandom segregation of a trait with a few candidate chromosomal regions or any regions in the genome (genomewide search). When a clear major gene model can be inferred and reliable epidemiologic information is collected (e.g., in schistosomiasis), parametric linkage studies are used. When the genetic model cannot be defined (e.g., in leprosy and malaria), nonparametric linkage studies (e.g., sibling-pair studies) are recommended. Once evidence of linkage is obtained, the gene can be identified by polymorphisms strongly associated with the trait. When the tested polymorphism is in strong linkage disequilibrium with the disease allele or is the disease allele itself (e.g., in HIV infection and malaria), association studies can directly identify the disease gene. Finally, the role of the detected polymorphism in causing the trait is validated by functional studies.

The profound influence of the hosts genetic makeup on resistance to infections has been established in numerous animal studies (1,2) in which disease phenotypes, environmental factors, and crosses can be controlled. Furthermore, recent developments (e.g., use of gene knockout or mutant and transgenic mice) allow genetic analysis of complex traits involved in susceptibility or resistance to infectious pathogens (2,3). As a result of these new developments, the Lsh/Ity/ Bcg gene was isolated on mouse chromosome 1, which controls innate early susceptibility to several Mycobacterium species, as well as other intracellular pathogens (e.g., Salmonella Typhimurium, Leishmania donovani) (2,4), and was further identified and designated natural resistance-associated macrophage protein 1 (Nramp1) (5). Involvement of a gene in an experimental infection does not imply that differences in susceptibility or resistance to that infection in human populations can be accounted for by polymorphisms in the human homologue of this gene. Genetic epidemiology studies (6,7) combine epidemiologic and genetic information to identify the genes that influence substantially the expression of human complex phenotypes, such as infectious disease-related traits. Epidemiologic information includes measured risk factors that could influence the trait under study (e.g., contamination by the infectious agent, age). Genetic information is derived from familial relationships between study participants (collection of families) or from the typing of genetic markers. Recent maps of the human genome established on the basis of highly polymorphic markers (8) are a fundamental tool for studies involving genetic markers, and two strategies can be used in this context. The first, the candidate gene method, is the typing of a few markers in a limited number of chromosomal regions containing genes related to the phenotype under study. The second is a random search along the whole genome (genomewide search) for chromosomal regions that could be involved in the control of the phenotype.
The spread and clinical manifestations of an infection in human populations depend on a variety of factors, among them host genetics. Familial linkage studies used in genetic epidemiology to identify host genes test for nonrandom segregation of a trait with a few candidate chromosomal regions or any regions in the genome (genomewide search). When a clear major gene model can be inferred and reliable epidemiologic information is collected (e.g., in schistosomiasis), parametric linkage studies are used. When the genetic model cannot be defined (e.g., in leprosy and malaria), nonparametric linkage studies (e.g., sibling-pair studies) are recommended. Once evidence of linkage is obtained, the gene can be identified by polymorphisms strongly associated with the trait. When the tested polymorphism is in strong linkage disequilibrium with the disease allele or is the disease allele itself (e.g., in HIV infection and malaria), association studies can directly identify the disease gene. Finally, the role of the detected polymorphism in causing the trait is validated by functional studies.

Synopses
The genetic epidemiology of human infectious diseases differs from the genetic study of other complex phenotypes in three ways. 1) Environmental factors influencing the risk for infection are generally known and when accurately measured, can be included in the analysis; 2) Choice of candidate genes is strongly determined by the genes function and response to the studied pathogen or by mouse-human chromosome tests that exploit the identification of murine resistance loci; and 3) Major genes involved in the response to a given pathogen can be identified by characterizing phenotypic response to pathogen exposure, such as clinical response, biologic response (intensity of infection), and immunologic response (levels of antibodies or cytokines). The role of genetic factors in the control of these phenotypic responses is generally suggested by twin studies, by strong ethnic differences, or by the great variability of individual phenotypes within their familial aggregation. Specific statistical methods are used to identify these genetic factors and to distinguish them from environmental factors causing the familial resemblance. All these statistical methods search for one or more genes that influence the studied phenotype and are classically divided into parametric and nonparametric. Parametric, or model-based, methods (segregation analysis and linkage analysis by the classical lod-score method) require defining the model and specifying the relationship between the phenotype and factors (mainly a putative gene and environmental covariates) that may influence its expression. Nonparametric or model-free methods (nonparametric linkage analysis and association studies) study the genetic factors influencing a phenotype without specifying the model. Each method has advantages and disadvantages; however, the two methods complement each other. The choice of a design for a particular study depends on several factors related to the phenotype (e.g., nature, frequency), population, accurate measurement of environmental factors, and known genetic background. Both methods have led to successful gene localizations and identifications in the analysis of several infectious disease phenotypes (9,10).

Parametric (Model-Based) Studies
Parametric studies require explicit specification of the model, i.e., the definition of the relationship between the observed phenotype and the putative genotype. In a simple monogenic disease due to a diallelic gene (D,d), the model is specified by the frequency of the deleterious allele (D for example) and the three probabilities for a person to have the disease, given the presence of genotype DD, Dd, or dd (penetrances). For complex instances, such as susceptibility/resistance, the susceptibility (or the resistance) depends not only on a putative genotype but also on environmental factors that may influence exposure. In such cases, the phenotype/genotype model includes, in addition to the frequency of the deleterious allele, all the parameters that describe and quantify the relationship between susceptibility and the relevant genetic and environmental factors. This relationship can be mathematically expressed in several ways, most recently regression methods that define model parameters in terms of regression coefficients. Furthermore, regression methods could be used to analyze binary (11) as well as quantitative (12) phenotypes. In quantitative phenotypes, the effect of a genotype is defined in terms of three different phenotypic means depending on the genotypes of the study participants. Parametric methods are based on two kinds of complementary analyses, segregation analysis and linkage analysis by the classical lod-score method (13). Both require epidemiologic information (i.e., the measure of the phenotype and of all relevant environmental factors) for each family member. Linkage analysis needs the typing of genetic markers.

Parametric Segregation Analysis
Segregation analysis is the first step in determining from family data how a given phenotype was inherited. Familial aggregation of infection-related phenotypes can result from genetic relationships, shared environment, and cultural habits. The goal of segregation analysis is to discriminate between these factors, primarily to test for the existence of a single gene, called a major gene. The major gene is not the only gene involved in the expression of the phenotype; rather, of all involved genes, this one has an effect important enough to distinguish it from the others. For a binary clinical phenotype (affected/unaffected by the disease), this effect can be expressed in terms of relative risks, e.g., the ratio of the probability for being infected given a DD genotype to the probability of being infected given a dd genotype. For a quantitative phenotype, this effect is measured by the proportion of the phenotypic variance explained by the major gene (heritability due to the gene). Primarily, segregation analysis uses maximum likelihood methods to test whether the observed familial distributions of the phenotype fit the distributions expected under different hypotheses of familial transmission (in particular the segregation of a major gene). When evidence indicates a major gene, segregation analysis estimates the measurements for the phenotype/ genotype model, which are required for parametric linkage analysis.

Parametric Linkage Analysis
Linkage analysis by the classical lod-score method (13) confirms and locates the gene, detected by segregation analysis (denoted as the phenotype locus). Linkage analysis tests whether, in families, the phenotype locus is transmitted with genetic markers of known chromosomal location. The lod score is a likelihood ratio testing the hypothesis of linkage (against the hypothesis of no linkage) for different genetic distances (or recombination fractions) between the phenotype locus and the marker locus (14). Classically, two conclusions can be reached with a lod-score analysis: 1) linkage between the two loci when the lod score is above a given threshold, and 2) exclusion of linkage between the two loci when the lod score is below a given threshold. Linkage with the phenotype locus can be tested marker by marker (two-point analysis) or by a set of linked markers (multipoint analysis). In linkage, as in segregation analysis, all inferences for individual genotypes at the phenotype locus are made from individual phenotypes and the specified phenotype/genotype model; the lodscore method is most powerful when this model is well defined. A mispecification of the phenotype/ genotype model, however, can lead to both inability to detect linkage (and therefore to false exclusion of the region containing the phenotype locus) and to a bias in the recombination fraction estimate (i.e., the genetic distance) between the phenotype locus and the marker locus (15). Nevertheless, such a mispecification does not affect the robustness of the method; i.e., it does not lead to false conclusions in favor of linkage, as long as only one phenotype/genotype model is tested. Correction for multiple testing should accompany the use of several phenotype/ genotype models. Similar problems occur when several markers are tested, and guidelines have been proposed to adapt lod-score thresholds to the context of genomewide search (16). Another problem arises when marker data are missing for some family members. In this case, linkage analysis also depends on marker allele frequencies; mispecification of these frequencies can affect both the power and robustness of the method. Multiple marker testing and mispecification of marker allele frequencies are also common problems to the nonparametric methods.

Leprosy Studies
Several segregation analyses have been performed in infectious diseases; some suggest that a recessive major gene may play a role in leprosy subtypes (lepromatous or nonlepromatous) (17)(18)(19). A recessive major gene was also found to influence leprosy regardless of the clinical defined subtype, in pedigrees of large families from a small Caribbean island (17); the frequency of the deleterious allele was estimated to be 0.3 (9% of homozygous persons predisposed to leprosy); by age 60, the penetrance was approximately 0.6 for predisposed homozygous, whereas it remained below 0.02 for others. Lod-score analysis could not find any linkage between this leprosy susceptibility locus and five markers (including HLA) that were typed in this population (20).

Malaria Studies
In malaria, segregation analyses have focused on a quantitative phenotype measuring the intensity of infection, i.e., parasitemia levels. Although one study showed the role of a recessive major gene controlling levels of parasitemia (21), two subsequent studies found evidence of a more complex genetic mechanism (22,23). The discrepancies in these results can be explained by several factors related to the host, the parasite, and mosquito transmission. However, all studies showed correlations between siblings and between age and infection (children becoming more often infected than adults). Further genetic analyses such as siblingpair (sib-pair) study designs should focus on infection in young children.

Schistosomiasis Studies
Model-based studies have been particularly successful in finding susceptibility genes in Synopses schistosomiasis. Several reports indicated that infection intensity was largely determined by the susceptibility/resistance of infected persons (24). In a Brazilian population, segregation analysis showed that the intensity of infection by Schistosoma mansoni was controlled by a major gene (25). This gene, SM1, accounts for 66% of the infection intensity variance that remains after other covariate effects (water contact levels, age, gender) have been taken into account. Under this major gene model, approximately 3% of the population is homozygous and predisposed to very high infection levels, 68% is homozygous resistant, and 29% is heterozygous with intermediate levels of resistance ( Figure 1). Parametric linkage analysis using the model estimated from segregation analysis was used to locate the gene. A genomewide search was carried out, and SM1 was mapped to human chromosome 5q31-q33, a genetic region that contains several genes encoding molecules that control T-lymphocyte differentiation (26). More recently, a study in a Senegalese population confirmed the presence of a locus influencing S. mansoni infection levels on chromosome 5q31-q33 (27). Furthermore, this region has been linked with loci related to immunoglobulin E (IgE) and eosinophilia production, i.e., a locus regulating IgE levels (28,29), a locus controlling bronchial hyperresponsiveness in asthma (30), and a locus involved in familial hypereosinophilia (31). This genetic localization, together with observations that human resistance to schistosomiasis is regulated by lymphokines characteristic of Th2 subsets (32) and that resistant homozygotes mount a Th0/2 response while susceptible homozygotes exhibit a Th0/1 response against schistosomes (V. Rodrigues, A. Dessein, unpub. data), argues strongly that differences in human susceptibility to schistosomiasis are influenced by polymorphisms in a gene controlling Tlymphocyte subset differentiation. In this regard, a segregation analysis showed that interleukin 5 (IL-5) levels are also under the control of a major gene in the same Brazilian population used in the study on infection intensity (33), raising the possibility that IL-5 might play a critical role in resistance, a view consistent with the known role of IL-5 in the defense against schistosome infections.
Another trait of interest in schistosomiasis is the phenotype of severe hepatic fibrosis due to S. mansoni infection for which the role of genetic factors has been suggested. Segregation analysis conducted in a Sudanese village found evidence of major gene involvement in severe hepatic periportal fibrosis (A. Dessein, L. Abel, unpub. data). Whether this gene and SM1 are one and the same is under investigation.

Nonparametric (Model-Free) Studies
Nonparametric or model-free studies (nonparametric linkage analysis and association studies) examine the genetic factors influencing a phenotype without specifying the phenotype/ genotype model. These studies are strongly recommended when little is known about the relationship between the phenotype and a putative gene as in the study of complex traits (e.g., infectious disease-related traits) when either no segregation analysis has been performed or no clear major gene model can be inferred from segregation analysis. Nonparametric studies test whether or not the alleles of a given marker are distributed at random in persons having a certain phenotypic resemblance. Nonparametric linkage analyses study the distribution of marker alleles inherited from a same ancestor, i.e., alleles identical by descent

Synopses
(IBD), in persons from the same family (e.g., siblings), whereas association studies examine the distribution of a given marker allele, e.g., HLA-DR2, in persons not from the same family.

Nonparametric Linkage Analysis
The most commonly used nonparametric linkage analysis is the sib-pair method. Two siblings can share 0, 1, or 2 parental IBD alleles of any locus, and the respective proportions of this sharing under random segregation are simply 0.25, 0.5, and 0.25 ( Figure 2). When the phenotype under study is a clinical disease (affected/unaffected), the method tests whether affected sib-pairs share more parental alleles than expected under random segregation. This excess allele sharing can be tested by a simple chi-square, in particular when all parental marker data are known. Maximum likelihood methods have also been developed to analyze data from affected sib-pairs data, such as the maximum likelihood score (34) and a maximum likelihood binomial approach (35), and can lead to more powerful tests. When the phenotypic response under study is quantitative, the method tests whether siblings with close phenotype values share more IBD alleles than siblings with more distant values. This is the basis of the classical approach proposed by Haseman and Elston (36), which regresses the squared difference of the sib-pair phenotypic values on the expected proportion of alleles shared IBD by the sib-pair. Many recent studies have used other methods not detailed here (37)(38)(39). Some of these methods are implemented in popular packages, such as MAPMAKER/SIBS (40), which also allow multipoint analysis of sibpair data. Sib-pair methods have the same problems as parametric linkage analysis with respect to missing parental marker data and testing with multiple markers; in particular, the number of comparisons made influences the significance levels of the tests, and suspected linkage should be confirmed by replication studies. However, affected sib-pair methods have been effective for several diseases, e.g., insulin-dependent diabetes mellitus (41,42), in genomewide searches for human susceptibility genes in a multifactorial phenotype.

Leprosy Studies
Sib-pair methods in infectious diseases have focused on candidate regions and have not yet resulted in published genome scans. In leprosy studies using the HLA complex, sib-pair analyses have shown a nonrandom segregation of parental HLA haplotypes in sets of children with tuberculoid leprosy and in siblings with lepromatous leprosy, respectively (18,43,44). However, the observed random segregation of HLA haplotypes in all leprosy patients and in healthy siblings in families with multiple cases of leprosy argued against any involvement of HLAlinked factors in susceptibility to leprosy (44,45). The human gene NRAMP1 (46), homologue of the mouse gene Nramp1, has provided an excellent candidate gene for the study of susceptibility to leprosy. A recent sib-pair study in Vietnam has found linkage between leprosy and NRAMP1 haplotypes consisting of six intragenic variants of NRAMP1 and four polymorphic flanking markers (47) and provided the first evidence that NRAMP1 could be a susceptibility locus for leprosy. Furthermore, this study, combined with segregation analysis performed in the same population (18), suggested genetic heterogeneity according to the ethnic origin of the families (Vietnamese or Chinese), which may explain, at least in part, the Synopses results of two previous reports that showed no association between leprosy and distal chromosome 2q where NRAMP1 is located (48,49). Overall, these studies suggest genetic control on at least two levels: a first dependent on non HLA-linked factors, among which NRAMP1 could play a role, and a second influenced by HLA-linked genes.

Malaria Studies
Two sib-pair studies focusing on candidate genes have been reported in malaria-related phenotypes. In one (50), nonrandom segregation of the MHC region was found in pairs of dizygous twins with mild clinical malaria. In another (51), the 5q31-q33 region, previously shown to be linked to S. mansoni infection levels (26), may be involved in the control of parasitemia due to Plasmodium falciparum, although the sample size was too small for definitive conclusion; larger studies are ongoing.

Mycobacterium Studies
The recent demonstration that mutations in the interferon γ receptor 1 (IFNγR1) gene cause disseminated infection due to weakly pathogenic mycobacteria (52,53) was first based on homozygosity mapping (54), a nonparametric linkage method, which locates a rare recessive mutation in consanguineous families by searching for chromosomal regions for which all affected family members are homozygous IBD; i.e., they have received two copies of the same ancestral mutation. In consanguineous infected children from two families, two groups located the genetic defect on chromosome region 6q22-q23 and identified mutations in the IFNγR1 gene leading to the absence of expression of the receptor at the cell surface (52,53). In vitro experiments established the causative relationship between the presence of two mutated IFNγR1 alleles and impaired response to IFN by the cells of these patients (55). Although inherited IFNγR1 deficiency was found in additional families, IFNγR1 mutations were not found in other families with infected patients (J.L. Casanova, pers. comm.), which suggests that other genetic defects may be involved.

Association Studies
Classic association studies are populationbased case-control studies that compare the frequency of a given allele marker in unrelated persons with the phenotype and controls without the phenotype (6,7). G is the disease locus influencing the trait, and M is the marker locus under consideration; G is assumed to be diallelic (D,d) with D being the deleterious allele, and M has several alleles (M 1 , M 2 , ..., M n ). Association studies examine the role of a particular allele of M. As an example, M 1 is said to be associated with the disease under study if it is found at a significantly higher or lower frequency in casepatients than in controls by a simple 2 x 2 contingency table. The simplest explanation for the association is that allele M 1 is the deleterious allele D itself. Another explanation is that M 1 has no direct effect on the phenotype but is in linkage disequilibrium with allele D. Linkage disequilibrium means two conditions: 1) linkage between locus M and locus G (generally close linkage) and 2) preferential association of allele M 1 with allele D; i.e., the DM 1 haplotype is more frequent than expected by the respective frequencies of D and M 1 (e.g., many present cases are due to one D allele from an ancestor bearing the DM 1 haplotype). Even very close linkage alone (only the first condition is fulfilled) does not lead to association, and therefore, the absence of association does not exclude linkage. On the basis of these two explanations, association studies best use the candidate gene approach when they consider markers that are either within or in close linkage with a gene that is related to the phenotypic response. A final explanation for association is the existence of an artifact due to population admixture. For example, a case-control study conducted in a mixture of two subpopulations of which one has a higher disease prevalence and a higher M 1 frequency than the second will show a positive association of allele M 1 with the disease. To avoid population admixture, family-based association methods have been developed (56), such as the transmission disequilibrium test (TDT) (57). The sampling unit in these methods consists of two parents with an affected child; parental alleles not transmitted to affected children are used as controls. More specifically, the TDT considers affected children of parents heterozygous for M 1 , e.g., M 1 M 2 , and simply tests whether these children have received M 1 with a probability different from 0.5, the value expected under random segregation (Figure 3). The TDT is a very efficient method of detecting the effect of allele M 1 when M 1 is the deleterious allele D itself (58). Under this hypothesis that the tested allele M 1 is the deleterious allele, TDT was more powerful than even the sib-pair method in the context of a genomewide search involving 500,000 diallelic polymorphisms (5 polymorphisms per gene for an assumed 100,000 genes) (58). However, in the more common situation where M 1 is different from D, the power of TDT is highly dependent on the respective frequencies of M 1 and D and the strength of the linkage disequilibrium between M 1 and D (59). These results indicate that linkage methods are still useful for identifying genes involved in infectious diseases, at least until molecular resources become available for full genomic screening of human genes.

Leprosy Associations
Most reported associations between leprosy and different HLA alleles could be due to population admixture and statistical problems (multiple testing); therefore, replication studies are very important. In tuberculoid leprosy, the most consistent associations were found with HLA-DR2 (43,45). With HLA molecular typing, a recent study (60) associated Indian tuberculoid leprosy patients and alleles DRB1*1501, DRB1*1502 (both DR2 alleles), and DRB1*1404, which are characterized by arginines at position 13 or 70-71. Lepromatous leprosy was associated with HLA-DR3 in several studies (43,45). One report (44) analyzed the transmission of the parental DR3 allele to lepromatous children by a method (similar to TDT) presented several years later (57).

Malaria Associations
In malaria, population-based association studies have been used to test the hypothesis that certain genetic red cell defects, found more frequently in malaria-endemic areas than in nonendemic-disease areas, had a protective effect against severe malaria (cerebral malaria, severe anemia); the results supported the hypothesis that persons with certain abnormal hemoglobins (61) or glucose-6-phosphatedeshydrogenase deficiency (62) had a reduced risk of developing severe malaria. More recently, a study in Gambia (63) showed that an HLA class I antigen and an HLA class II haplotype were independently associated with protection from severe malaria when a two-stage strategy was used to avoid the problem of multiple testing. In the same population, persons homozygous for a variant of the TNF-α gene promoter, denoted as TNF2, were found to have an increased risk (independent of their HLA alleles) for cerebral malaria (64). A recent work showing that TNF2 is a much stronger transcriptional activator than the more common allele TNF1 (65) indicates that TNF2 affects TNF-α expression and may be directly responsible for the reported association of TNF2 with cerebral malaria. These genetic findings are consistent with immunologic reports showing high TNF-α blood levels in cerebral malaria. Although these genetic polymorphisms (genetic defects of the red cell HLA-TNF polymorphisms) have certainly played a role in selection among populations exposed to malaria infection (61,63), they cannot entirely explain the large interindividual variable responses to the parasite; likely only a minority of genes influencing malaria resistance have been identified (66). This view is supported by a recent report that a coding polymorphism in the intercellular adhesion molecule-1 (ICAM-1), a molecule that affects adherence of infected red blood cells to small vessel endothelium, is associated with an increased susceptibility to cerebral malaria (67). Figure 3. Principle of the transmission disequilibrium test (TDT) for investigating association between a disease and allele M 1 . The sample consists of x+y families with one affected child and two parents. For ease of presentation, we assume that only one parent is heterozygous for M 1 (e.g., M 1 M 2 ), although the second parent could be used for the test if he were himself heterozygous for M 1 . There are x affected children who have received allele M 1 from their M 1 M 2 parent and y who have received M 2 . The TDT statistic is simply (x-y) 2 /(x+y), which is distributed as a chisquare with one degree of freedom.

HIV Associations
A major advance in the involvement of host factors in HIV-1 infection came when infection status (seropositive/seronegative) was associated with the gene encoding the CC-chemokine receptor 5 (CCR5), the coreceptor of macrophagetropic HIV-1 strains (68). Two persons exposed many times to HIV-1, yet uninfected, were shown to be homozygous for a defective CCR5 allele containing an internal 32 base-pair deletion (∆32) (69), and several large cohort studies found HIV-1 infected patients not to be CCR5∆32 homozygous, whereas exposed HIV-1 seronegative persons did have the defective allele (70)(71)(72). Subsequent reports showed that this protection was not complete since some CCR5∆32 homozygous persons were found to be HIV-1 infected (10). Furthermore, several studies in HIV-1 infected persons found CCR5∆32 heterozygous status may protect against disease progression (71,72), depending on virus strain (73). However, it is clear that CCR5∆32 does not alone explain HIV-1 infection status, especially in African populations where ∆32 is absent (70,74), and the search for other host genes involved in susceptibility/resistance to HIV infection will be of major interest.

Conclusions
Recently developed genetic epidemiology methods and dense human genetic maps, together with the growing availability of candidate genes, are essential for identifying genes that influence human infectious diseases. Nevertheless, investigating the role of genetic factors in a given phenotypic response depends on many different factors related to the phenotype, population, accurate measurement of environmental factors, and previous knowledge; no unique optimal design can be applied for most phenotypic responses related to infectious agents. Among possible study designs, familial linkage studies search for a chromosomal region showing a nonrandom segregation with the phenotype by either focusing on a few candidate regions or using a genomewide search. The main goals of the genome approach are to ensure that all major loci involved in the control of a phenotype are identified and to provide the opportunity to discover new major genes (and consequently physiopathologic pathways) involved in phenotypic responses. Parametric linkage studies are powerful when a clear major gene model can be inferred from segregation analysis. Nonparametric linkage studies are strongly recommended when little is known about the relationship between the studied phenotype and a putative gene, and sib-pair studies have led to successful gene localizations in the analysis of several complex traits, including infectious disease-related traits. Once evidence for linkage is obtained, fine genetic and physical mapping is performed to narrow down the genetic interval. The next step is the search, by molecular methods, of polymorphisms in candidate genes located within the identified interval. These candidate genes are selected from gene databanks or are obtained by a systematic characterization of the genes of the region (positional cloning). On the other hand, association studies performed with candidate genes can directly identify the disease gene when the tested polymorphism is in strong linkage disequilibrium with the disease allele or is the disease allele itself. Finally, evidence for an association should be completed by functional analysis, which will test whether the detected polymorphism modifies the gene expression or the gene product in a manner that can affect susceptibility to the disease.
Progress in the genetic dissection of infectious diseases will also come from the integrated analysis of different phenotypic responses (clinical response, intensity of infection, immunologic response), which can all contribute to the pathologic process, as illustrated in malaria and schistosomiasis studies. The identification of host genes in human infectious diseases will provide new understanding of disease pathogenesis. How this genetic information will modify our approach to prevention and treatment of infectious diseases cannot yet be fully appreciated. However, the identification of susceptibility/resistance genes in schistosomiasis, mycobacterial, and HIV infections has already opened new avenues for the screening of genetically predisposed persons and the development of vaccines.

Synopses
Dr. Dessein is professor at the Faculté de Médecine de Marseille-Université de la Méditerranée and head of INSERM Unit 399, Immunology and Genetic of Parasitic Diseases.