The protocol was approved in advance by the ethical review board at UNC-CH and the Karolinska Institutet and all subjects provided written informed consent. The parent study is described elsewhere [22–24], and we have previously shown that there were no differences in gene expression in peripheral blood in monozygotic twins discordant for chronic fatigue . We screened ~61,000 individual twins from the Swedish Twin Registry for the symptoms of fatiguing illness. All twins were born in Sweden of Scandinavian ancestry. Of 5,597 monozygotic twin pairs where both were alive and had provided usable responses to CFS screening questions, we identified 140 pairs of twins who met preliminary inclusion criteria: born 1935-1985, classified as a monozygotic twin based on questionnaire responses , and discordant for chronic fatiguing illness (i.e., one twin reported substantial fatigue and the other twin was evidently well). A telephone interview using a standardized script was used to assess eligibility for participation. Twins who remained eligible both attended a half-day clinical assessment by a specially trained physician at the Karolinska Institutet in Stockholm. At this visit, a CFS-focused medical assessment was conducted that included standardized medical history, physical examination, and screening biochemical, hormonal, and hematological studies in accordance with international recommendations .
Of 140 monozygotic and preliminarily discordant twin pairs, one or both twins declined participation in 23 pairs, 25 pairs were concordant for CFS-like illness, and inclusion criteria were not met in 35 pairs (e.g., chronic fatigue had resolved or an illness that could explain fatiguing symptoms such as neoplasia had emerged). After excluding these 83 pairs, 57 pairs of twins attended the clinical evaluation sessions, and 10 pairs were found not to meet inclusion criteria (9 pairs were concordant for the presence or absence of chronic fatigue or a medical explanation was detected and 1 pair was dizygotic). Serum samples were unavailable for both members of 2 pairs. Zygosity was confirmed by genotyping 46 single nucleotide polymorphisms using two Sequenom iPlex panels.
The analysis sample consisted of 45 pairs of rigorously discordant and genetically proven monozygotic twins. Discordance was defined as one twin meeting criteria for either idiopathic chronic fatigue (ICF, 13 pairs) or CFS (32 pairs) [1, 2] and the co-twin was required never to have experienced impairing unusual fatigue or tiredness lasting more than one month. Thus, all affected twins were required to have current, long-standing (≥6 months), medically unexplained fatigue associated with substantial impairment in social and occupational functioning and the unaffected co-twins were effectively well.
Biological sampling was standardized by having samples drawn from both members of a twin pair at the same place and time (~0900) after an overnight fast. We required that all subjects be in their usual state of health on the day of sampling (i.e., no acute illness or recent exacerbation of a chronic illness). It was neither practical nor ethical to study subjects medication-free, but we delayed assessment if there had been a recent significant dosage change. Peripheral venous blood was drawn using sterile technique.
Viral library preparation and sequencing
Serum samples from 45 pairs of affected and unaffected monozygotic twins were available for this study. Sample preparation for library construction was as described previously  and, briefly, consists of viral particle recovery and nucleic acid extraction, followed by amplification and cloning of viral nucleic acid. Serum samples (200 μl) from the affected twins were pooled separately from their unaffected co-twins. Serum pools were then filtered either through 0.22 μm or 0.45 μm membrane filters (Millipore) and virus particles were concentrated by ultracentrifugation (41,000 rpm for 1.5 h at 4°C in a Beckman SW41 rotor). Exogenous nucleic acids were removed by DNaseI and RNaseA treatment followed by extraction of viral DNA (Qiagen) or RNA (Trizol, Invitrogen). First strand synthesis was carried out with a random primer containing an EcoRV site plus exonuclease negative Klenow polymerase (Promega) for DNA and Superscript II reverse transcriptase (Invitrogen) for RNA. Second strand synthesis for the above reactions was carried out with exonuclease negative Klenow polymerase (Promega). These were then amplified with AmpliTaq Gold polymerase (Applied Biosystems) and a primer complementary to part of the random primer used in first strand synthesis. PCR products were purified, digested with EcoRV, subjected to gel electrophoresis, and bands 500 bp - 5 kb were extracted from the gels. Blunt-ended PCR products were then cloned into pCR-blunt (Invitrogen) and transformed into TOP10 chemically competent cells for sequencing of clones. The library was then verified using conventional Sanger sequencing with DYEnamic Dye Terminator kits and a Megabace 1000 sequencer (GE Healthcare). Gel-purified blunt-ended PCR products (1.25-1.35 μg) were subjected to ultra-deep sequencing using the 454 FLX chemistry and sequencer (Roche) according to the manufacturer's instructions at the time.
Even though enriched for viruses, most of the sequenced samples contained a large fraction of human reads. For the purpose of analyzing the viral content of the data, human reads can be removed from the samples before assembly without affecting the results. The benefits of removing human sequences pre-assembly include a heavily reduced assembly time and a reduced risk of mis-assembly. Most human reads are highly homologous to human database sequences and can be identified with MegaBLAST . Multiple NCBI databases (i.e., EST-Human, Human Genomic, and Human Genomic Transcripts)  were used to identify human reads. Highly repetitive human reads identified by MegaBLAST were also discarded. The remaining overlapping reads were then assembled into contigs using miraEST  which can perform a hybrid assembly using both Roche/454 and traditional Sanger sequences.
Before attempting to classify the contigs and singletons, highly repetitive sequences were eliminated using the DUST algorithm . Remaining sequences were classified through a protocol of database alignment searches using NCBI BLAST . Alignment search tools trade speed for sensitivity: for metagenomic datasets, efficient identification of more distantly homologous matches is accomplished using progressively more sensitive searches (rather than a single sensitive search). Progressive searches were performed using MegaBLAST against NCBI NT, then using BLASTn against NCBI NT, and finally using BLASTx against NCBI NR. For example, for a set of Roche/454 RNA reads, 70% of the remaining sequences were classified in the first step leaving far fewer data for the more time-consuming second and third steps. Sequences were then classified using the closest homologue defined by the alignment searches. Two main categories were built: classified sequences that are highly similar to a database sequence (> 90% identity with >70% query coverage) and "remainder" sequences that may contain new findings. Each category was split into taxonomy divisions and the virus division was further split into suitable virus subgroups to aid analysis.
Total nucleic acid extraction and PCR of individual serum samples
Serum samples (400 μl each) were used for total nucleic extraction using the Virus Mini M48 kit (Qiagen) according to the manufacturer's instructions. The automated extraction process was carried out in a Qiagen Biorobot M48.
Presence of GBV-C virus in the samples was confirmed by nested PCR with primers specific for the 5' UTR of virus RNA . First-round, one-step RT-PCR consisted of 1× AmpliTaq buffer (Applied Biosystems), 2 mM MgCl2, 200 μM dNTP mix, 0.4 μM of each primer GBV-F1 (5' CGGCCAAAAGGTGGTGGATG 3') and GBV-R1 (5' CACTGGTCCTTGTCAACTCG 3'), 5 μl of sample, 4 units AMV RT (Promega), 16 units of RNasin (Promega) and 1 unit of AmpliTaq DNA polymerase in a 50μl reaction. Cycling conditions were: 42°C for 60 min, and 35 cycles of 95°C for 1.5 min, 55°C for 2 min, 72°C for 3 min. The expected product size was 299 bp. Five μl of the first round reaction was used for a second round PCR reaction, which consisted of 1× AmpliTaq buffer, 2 mM MgCl2, 200 μM dNTP mix, 0.4 μM of each primer GBV-F2 (5' GGTGATGACAGGGTTGGTAG 3') and GBV-R2 (5' GCCTATTGGTCAAGAGAGACAT 3'), 1.25 U AmpliTaq DNA polymerase in a 50μl reaction. Reaction conditions were 94°C for 10 min, 35 cycles of 94°C for 30 s, 60°C for 30 s, 72°C for 1 min, and 72°C for 10 minutes. The expected PCR product size was 251 bp.
The diversity of GBV-C reads were compared against a database of complete GBV-C genome sequences from Genbank (23 sequences) using BLAST. A sequence was classified as similar to a certain isolate if the BLAST hit e-value was < 10-20 and if the top hit was at least 100 times more significant than the second hit.