Molecular approaches to the identification of unculturable infectious agents.

New molecular biologic techniques, particularly representational difference analysis, consensus sequence-based polymerase chain reaction, and complementary DNA library screening, have led to the identification of several previously unculturable infectious agents. New agents have been found in tissues from patients with Kaposi's sarcoma, non-A, non-B hepatitis, hantavirus pulmonary syndrome, bacillary angiomatosis, and Whipple's disease by using these techniques without direct culture. The new methods rely on identifying subgenomic fragments from the suspected agent. After a unique nucleic acid fragment belonging to an agent is isolated from diseased tissues, the fragment can be sequenced and used as a probe to identify additional infected tissues or obtain extended portions of the agent's genome. For agents that cannot be cultured by standard techniques, these approaches have proved invaluable for identification and characterization studies. Applying these techniques to other human diseases of suspected infectious etiology may rapidly elucidate novel candidate pathogens.

Identifying the causative agent of an infectious disease is the cornerstone for its eventual control. In recent years, a great deal of progress has been made in identifying new agents associated with both well-known and newly emerging infectious diseases. A number of syndromes exist, however, in which infectious etiology is likely, but the pathogen resists cultivation with standard microbiologic techniques. For emerging diseases, rapid identification and characterization of the responsible agent are crucial first steps for epidemic control.
The rapid identification of a hantavirus responsible for an outbreak of severe pulmonary distress syndrome in the southwestern United States (1,2) demonstrated that applying molecular biologic approaches can accelerate the identification of an unknown agent. With extensive nucleic acid and protein databases readily available, isolating and sequencing genomic fragments from an unknown agent can provide important clues regarding its origin and biologic behavior. Once a new agent's phylogenetic relationship to other known organisms is established, appropriate culture conditions, serologic tests, and perhaps even therapeutic strategies can be rapidly developed.
Although they have revolutionized our ability to identify new pathogens, innovative molecular biologic techniques must be applied in conjunction with traditional epidemiologic procedures. Beral, Jaffe and colleagues, for example, showed that AIDS-related Kaposi's sarcoma (KS) is likely to be caused by a transmissible agent other than human immunodeficiency virus (HIV), before any likely causative agent was isolated (3,4). These findings focused the attention of investigators on AIDS-KS, which resulted in the isolation of viral DNA from AIDS-KS lesions and the description of a new human herpesvirus (5,6). Spurious associations between infectious agents and diseases are common, however, and only through careful epidemiologic studies can a causal link between an organism and a disease be established. Epidemiologic criteria for causality (7)(8)(9), superseding Koch's postulates, have been used for 30 years, and a critical phase in the process of new pathogen discovery involves the unambiguous establishment that an agent is central to the disease process.
The various molecular biologic approaches to agent identification differ in technical detail, but all rely on isolating nucleic acid fragments belonging to the agent's genome from diseased tissue. The formidable tasks of identifying and separating small unique nucleic acid fragments from human genomic material have been approached by various means, each with its own particular strengths and weak-Perspectives 160 Figure. Schematic representation of different molecular approaches to the identification of unculturable infectious agents.
nesses. The appropriate technique depends on the type of infectious agent involved (e.g., bacterial or viral), whether the disease occurs in a normally sterile site, and whether it can be passaged through animals.
Once a fragment from the agent's genome is isolated and sequenced, standard genomic walking techniques are used to extend the known sequence, and computer homology searches can be used to identify the likely phylogenetic relationship of the agent to other known organisms. In this article, we highlight recent successful situations when molecular approaches were used to identify and characterize unknown agents of infectious diseases.

Representational Difference Analysis
Representational difference analysis is one of the more robust methods of identifying new agents since it does not require prior knowledge of the agent's class (10). The technique is based on polymerase chain reaction (PCR) enrichment of DNA fragments present in diseased tissue but absent from healthy tissues of the same patient (Figure). Representa-tional difference analysis is also an important tool for identifying polymorphic DNA sequences associated with noninfectious diseases (11).
Representational difference analysis depends first on digesting DNA from both healthy and diseased tissues by using a restriction enzyme and then on separately "simplifying" the resulting genomes to reduce their sequence complexity. This is done by ligating PCR primers to both sets of DNA and nonspecifically amplifying the mixtures. Since PCR most efficiently amplifies fragments of 150 to1500 bp, restriction fragments in this size range are enriched, and the fragments of the sequence represented outside this size range (90%) are reduced.
Unique strands of DNA from diseased tissue representing restriction fragments of an exogenous agent are isolated in a subtractive hybridization process coupled to PCR amplification. First, the priming sequences ligated on DNA restriction fragments of both normal and infected tissues are removed. New primer sequences are ligated only to the diseased tissue DNA fragments. These are then hybridized with an excess of the healthy tis-Perspectives 161 sue representation. Human DNA fragments common to both diseased and healthy tissues reanneal to each other and, since the healthy tissue fragments are in excess, any given human fragment derived from the diseased tissue will reanneal to a complementary strand from the healthy tissue representation. Thus, common human sequences found in both representations will only have one PCR priming sequence or none present on two complementary strands. However, DNA fragments from the infectious agent will not find complementary strands in the healthy tissue representation and will reanneal with each other. Only hybrids with both strands derived from the diseased tissue representation will have priming sites and be able to undergo subsequent exponential PCR amplification. Several rounds of representational difference analysis are performed, which successively enrich the mixture for unique DNA sequences present only in the diseased tissue representation.

KS-Associated Herpesvirus and KS
The power of representational difference analysis and the difficulties encountered in establishing the etiology of disease are illustrated by its application to KS (5), a vascular neoplasm that frequently occurrs in homosexual men with AIDS (3). Geographic clustering (12,13) and association with specific sexual behavior (4,14) suggest that the disease is caused by a sexually transmitted agent. Several agents have been investigated, including human cytomegalovirus (CMV), human papillomavirus, human herpesvirus 6 (HHV-6), and HIV (for review, see [15]), but no convincing etiologic link has been established.
Using representational difference analysis, Chang and colleagues isolated two unique DNA sequences (KS330Bam and KS631Bam) from a KS lesion in an AIDS patient (5). These DNA sequences are homologous to portions of minor capsid and tegument protein genes from Epstein-Barr virus (EBV) and herpesvirus saimiri (HVS), a New World monkey herpesvirus. Both EBV and HVS are gamma herpesviruses associated with neoplastic disorders in humans (EBV) (16) and nonhuman primates (HVS) (17). These results suggested that a new human herpesvirus, now called KS-associated herpesvirus (KSHV or HHV-8), could be the KS agent.
The strong association between KS and KSHV was demonstrated by using Southern hybridization and PCR to amplify a 233-bp DNA fragment (KS330 233 ) internal to KS330Bam from all 25 intact and amplifiable DNA samples from KS lesions that were examined (5). Ninety tissues from patients without AIDS or KS were examined, and none showed evidence of KSHV infection.
Although controversy remains over the role of KSHV in KS (18)(19)(20), epidemiologic and biologic studies strongly suggest that the agent is a causal factor in KS (21). KSHV has now been identified in 211 (94.2%) of 224 KS lesions examined by a number of groups around the world, and the virus is found in all forms of KS, AIDS-related and non-AIDS-related (22)(23)(24)(25)(26)(27)(28)(29). Two independent studies have shown that KSHV is present in peripheral blood mononuclear cells of AIDS-KS patients before onset of KS (30,31), indicating that the virus is unlikely to be a "passenger virus" in KS lesions. Further, the virus has been localized to KS tumor tissues by semiquantitative PCR (25) and by PCR in-situ hybridization (26). The virus has only been found in 8 (1.8%) of 449 solid tissues from control patients without KS (5,22,25,27,28,32,38) and thus appears to be specifically associated with KS. Although early reports suggested that the virus is present in skin tumors from transplant patients without KS (19), subsequent studies have not confirmed this finding (33,34). One study has found evidence of viral DNA by nested PCR, but not unnested PCR, in semen from (23%) of healthy donors; these results suggest that the prevalence of infection in North America may be higher than that indicated by tissue or lymphocyte studies (35). This intriguing finding has not been reproduced by other groups (23,36); however, it should be explored in large rigorous studies.
Lymphoproliferative disorders are common secondary malignancies in KS patients with and without AIDS (37). KSHV has also been found in a rare subset of AIDS-related, body cavity-based lymphomas, which are manifested as primary effusions (5,38). In these AIDS-related lymphomas, tumor cells are coinfected with EBV (38, 39) but, unlike tumor cells in EBV-related Burkitt's lymphomas, these cells do not exhibit c-myc gene rearrangement. EBV-uninfected, KSHV-related body cavity-based lymphomas have also recently been identified (40,43). Another lymphoproliferative syndrome associated with KS is Castleman's disease in both AIDS patients and HIV-seronegative patients. KSHV has been found in AIDS-related multicentric Castleman's disease lesions at a high copy number and less frequently in tissues from Castleman's disease patients without AIDS (41).
Identifying KSHV DNA sequences by representational difference analysis also led to the quick identification of an in vitro culture system for growing the virus. KS330Bam and KS631Bam probes were used to identify a B-cell line derived from a body Perspectives 162 cavity-based lymphoma that was stably infected with both KSHV and EBV (42). Extended sequencing and transmission studies have been performed with body cavity-base lymphoma cell lines, which clearly define KSHV as a new human herpesvirus of the genus Rhadinovirus (6). This has recently been confirmed by detecting herpesvirus particles in a body cavity-based lymphoma cell line by electron microscopy (43).
Discovery of a continuously infected cell line has allowed the first generation of serologic assays for KSHV antibodies to be developed (6). Second-generation immunoblotting assays, which appear to be both sensitive and specific for detecting KSHV antibodies, indicate that KS patients are infected with the virus months to years before the disease develops and that few North American blood donors are infected (S-J Gao, P.S. Moore, unpublished observation). Thus, identifying a virus associated with KS by using molecular approaches has rapidly led to new assays that use traditional serologic techniques for detecting infection.

HHV-6 and Multiple Sclerosis
The difficulties of using representational difference analysis for new pathogen identification are also illustrated by the search for the agent that causes multiple sclerosis (MS) (44). An infectious cause for MS has been proposed (45)(46)(47), and geographic and household clustering of cases consistent with an infectious process have been shown (45,(48)(49)(50). Challoner and colleagues used DNA from sclerotic plaques in brain tissues from MS patients and performed representational difference analysis against pooled DNA from peripheral blood leukocytes of healthy donors. Use of pooled lymphocyte DNA in representational difference analysis opens the possibility for examining rare banked tissues for which healthy control tissues are not available. A 341-bp representational difference analysis band was isolated from MS tissues with near sequence identity to a region of the major DNA-binding protein gene of HHV-6 (44). Detecting exogenous DNA in diseased tissue by this technique, however, does not exclude the possibility that a commensal agent may be identified that is not the cause of the disease being examined. HHV-6 is a neurotrophic virus present in brain tissues from healthy control patients (51), and more than (70%) of both MS patients and controls were positive for the representational difference analysis sequence by PCR (44). Although case-patients and controls cannot be differentiated by the presence or absence of these HHV-6 sequences, only MS tissues showed nuclear staining of oligodendrocytes surrounding plaques when monoclonal antibodies against HHV-6 viral proteins were used (44). There may be a subtle difference in tissue distribution for the virus in MS patients not found in controls, or the HHV-6 variant B group 2 infecting MS plaques may be particularly prone to generating the autoimmune response seen in this disease.

GB Hepatitis Viruses
Since representational difference analysis relies on PCR amplification, the technique is suited for detecting agents with a DNA-based genome; sequences can be amplified by generating a cDNA intermediate, but detecting RNA viruses is problematic since mRNA expression patterns differ between tissues and are likely to give spurious representational difference analysis bands. Simons and colleagues overcame this problem by passing an agent associated with non-A, non-B (NANB) hepatitis through primates and using cell-free extracts of primate plasma for representational difference analysis and discovered two new human hepatitis viruses (52,53).
A novel form of NANB hepatitis was first identified in a physician who became ill with hepatitis, and the disease was transmitted to primates by injecting serum from patients into the animals (54). Extensive studies demonstrated that the virus, named the GB agent, was different from all known human hepatitis viruses (hepatitis A [HAV], hepatitis B [HBV], hepatitis C [HCV], and hepatitis E) (55)(56)(57). Total RNA from the samples was reversetranscribed to obtain cDNA by using cell-free preinfection and acute-phase plasma from infected monkeys. After cDNA synthesis, representational difference analysis was performed, and seven cDNA clones were found to be specifically associated with this form of hepatitis. Sequence analysis and comparison with other known hepatitis viruses identified two unique flaviviruses, GBV-A and GBV-B, as the GB agents (52,53,58). A third virus, GBV-C, was subsequently identified using the known GBV-A, GBV-B, and HVC consensus sequences (52b and see below).GBV-A and GBV-C are closely related phylogenetically. Cross-challenge experiments showed that GBV-C probably originated in human hepatitis patients, whereas GBV-A and B may be tamarin monkey viruses that were inadvertently passaged along with the human virus (53,58).
In addition to having difficulty in identifying RNA viruses, representational difference analysis has several other limitations. Polymorphic human DNA can be amplified through this technique (5), and not all bands generated by it belong to the suspected agent. Representational difference analysis may produce DNA fragments from an agent whose genome has not been sequenced (e.g., most bacteria and fungi), and sequence homology searches may be unable to distinguish the agent's genomic DNA fragments from unsequenced human DNA. Only normally sterile site tissues are appropriate for this technique because normal flora that differs between tissue sites could result in spurious amplification. Finally, for viruses with small genomes, multiple restriction digests may be required to identify a unique restriction fragment of the appropriate size that can be efficiently amplified. Although the technique has been successfully used by several groups to identify polymorphic DNA, the procedure is complex and not uniformly reproducible.

Consensus Sequence-Based PCR
Consensus sequence PCR relies on the use of highly conserved DNA sequences, such as ribosomal RNA (rRNA) gene sequences from known organisms, to amplify DNA from related organisms not yet discovered (59). This technique is simple and extremely successful in identifying new human pathogens. Subunit rRNA genes evolve in a relatively slow and uniform manner, which makes these sequences extremely useful for establishing phylogenetic relationships (59). By using conserved DNA sequences from bacterial 16S rRNA genes, at least two new bacteria associated with human diseases have been identified (60,61), and PCR amplification of conserved hantaviral capsid DNA sequences resulted in the rapid identification of a new hantavirus associated with an outbreak of severe pulmonary disease (1). Unlike using representational difference analysis, using consensus sequences to amplify DNA requires knowledge of the suspected agent's phylogenetic relationship to other organisms. The technique generally should be used on normally sterile site tissues if sequences from normal flora are also likely to be amplified. Although the lack of broadly amplifiable consensus primers limits use of this technique to prokaryotic and eukaryotic pathogens, consensus sequences are to likely exist among many of the classes of viruses that eventually could be used in screening panels of diseased tissues when a viral cause is suspected.

Bartonella: Bacillary Angiomatosis and Cat-Scratch Disease
Phylogenetic studies of bacteria have relied on comparisons of highly conserved rRNA gene sequences present in prokaryotes (62). rRNA genes are present in all living cells and contain regions of highly conserved sequences with intervening variable regions. Conserved sequences can be used to amplify and sequence the intervening variable regions by PCR. This technique was exploited by Relman and colleagues to identify the bacillus associated with bacillary angiomatosis (60).
Bacillary angiomatosis is an inflammatory vascular proliferative process that affects the skin, lymph nodes, and visceral organs of AIDS patients (63). Although bacilli can be identified by Warthin-Starry staining of lesions (64,65), the suspected causal organism was resistant to cultivation by standard techniques. Consensus oligonucleotide primers complementary to the 16S rRNA genes of eubacteria were used to amplify 16S rRNA gene fragments directly from tissue samples (60). Phylogenetic analysis of the amplified DNA sequence showed that the organism belonged to the genus Rochalimaea (now renamed Bartonella) (60). Bartonella organisms were also cultured from bacillary angiomatosis lesions (66) and blood from bacteremic patients (67,68), and serologic analyses have associated the organisms with cat-scratch disease (69). Current evidence suggests that bacillary angiomatosis in HIV-seropositive patients results from infection with either B. quitana or a newly described Bartonella species, B. henselae (68,70), whereas cat-scratch disease is primarily caused by infection with B. henselae (71) (for review, see [72]).

Whipple's Disease
Consensus sequence PCR was also used to identify an organism associated with Whipple's disease, one of the most persistent mysteries in microbiology (61). Whipple's disease is a systemic illness, first described in 1907, characterized by arthralgias, diarrhea, abdominal pain, and weight loss (73). Rod-shaped bacilli were identified histologically in Whipple's disease lesions in the early 1960s (74), but the suspected bacteria were not culturable by standard techniques (73). The agent was found to be a gram-positive actinomycete (Tropheryma whippelii gen.nov.sp.nov.), unrelated to any char-Perspectives 164 acterized species when a bacterial 165 rRNA sequence was amplified and sequenced directly from infected tissue (61,75).

Hantavirus Pulmonary Syndrome
Consensus PCR primers have been successfully used to identify and classify bacteria, and similar techniques can be used to diagnose new viral agents. In May 1993, an outbreak of unexplained acute respiratory illness with a high death rate occurred in the southwestern United States (76). In the initial phases of the investigation, the cause of the syndrome was not clearly known to be infectious. Serologic tests quickly detected cross-reactive antibodies to known hantaviral antigens in the serum of patients; these results suggested that a previously unrecognized hantavirus was the cause of the disease (77). PCR primers, based on consensus sequences within the G2 protein coding region of the M segment of the genomes of known hantaviruses, were designed and used in nested reverse-transcription PCR to amplify a short segment of the viral genome from diseased tissues (1).
Sequence analysis of the PCR products showed that the amplicon differed from the other known hantaviral sequences by least 30%, and phylogenetic analysis demonstrated that the new virus is most closely related to Prospect Hill hantavirus (1), a zoonotic hantavirus endemic in North America (78). Hantavirus antigens have been detected in endothelial cells from patients (1,79), and virus particles have been identified in infected pulmonary endothelial cells and macrophages (80). Deer mice (Peromyscus maniculatus) are the primary host for the virus (1), and the virus has been passaged through laboratory-bred deer mice and cultured in Vero E6 cells (2). This newly recognized virus was originally named Muerto Canyon virus (2) but has since been renamed Sin Nombre virus in light of nomenclatural disputes regarding the appropriateness of a descriptive name.

Identification of HCV and HGV
A third major approach to new organism identification relies on screening cDNA libraries made from diseased tissues by using hyperimmune serum from specimens. This method was successfully used to identify the cause of most cases of NANB hepatitis, HCV. When serologic tests for HAV and HBV became available, it became clear that most cases of transfusion-associated hepatitis in the United States were not caused by either virus (81,82). Conventional techniques failed to identify the agent responsible for most cases of NANB hepatitis (83), despite evidence that the disease was caused by a bloodborne, small, enveloped virus readily transmissible to chimpanzees (84,85).
Since the virus was presumed to be an RNA virus, Choo and colleagues made a cDNA expression library from RNA isolated from an infected chimpanzee's plasma (86). While the GBV-A, B, C were identified by direct detection of viral cDNA through representational difference analysis, HCV cDNA was identified by immunologic detection of cDNA that was encoding viral protein. Viral antigens expressed from the cDNA library were identified by immunoscreening with convalescent-phase human sera. A cDNA clone was isolated encoding an antigen that could be used to screen convalescent-phase sera from patients with NANB hepatitis. Identification of the clone rapidly led to the production of a recombinant antigen used for serologic screening to detect specific antibodies in infected chimpanzees and patients with hepatitis after transfusion. Extended sequence analysis demonstrated that the agent, now known as HCV, is closely related to the family Flaviviridae and is the major cause of NANB hepatitis throughout the world (86,87). Recently, the same approach was successfully used to identify HGV, which has been found to be almost identical to GBV-C, the human hepatitis virus identified by representational difference analysis (88). The potential role of HGV and GBV-C in human disease still remains uncertain (88b).
Using convalescent-phase sera to screen cDNA libraries from diseased tissues is a novel method for pathogen identification. It is a potentially useful technique for diseases in which well-defined convalescent-phase sera are available, and it requires tissues with a high titers of the agent. On the other hand, constructing and screening DNA libraries are laborious, and cross-reactive antigens are likely to be detected, especially for diseases in which autoantibodies are common.

Future Directions
Nucleic acid database information is rapidly expanding for all classes of organisms, and a significant fraction of the human genome has already been sequenced. Even small laboratories can exploit new and relatively inexpensive molecular biologic technologies to search for new pathogens. Once a small, unique nucleic acid fragment from a pathogen has been identified, nucleic acid detection and serologic assays can often be rapidly developed to establish an etiologic link with disease.
New pathogens are likely to be identified by some of these molecular biologic approaches. A number of diseases have long been suspected to have an infectious cause and are appropriate candidates for these techniques. The eventual identification of infectious etiologic agents of diseases such as sarcoidosis (89), Kawasaki's disease (90), and type I diabetes mellitus is likely. New techniques, such as arbitrarily primed PCR and phage display libraries (91), have not been used for pathogen discovery but show promise for expanding the repertoire of techniques available for identifying unculturable agents. Three years passed between the initial descriptions of AIDS and the identification of HIV (92). Use of molecular biologic approaches could lead to rapid identification and control of the next pandemic caused by a newly emergent infectious disease.