Assembly of a pan-genome from deep sequencing of 910 humans of African descent
Recommended Citation
Sherman RM, Forman J, Antonescu V, Puiu D, Daya M, Rafaels N, Boorgula MP, Chavan S, Vergara C, Ortega VE, Levin AM, Eng C, Yazdanbakhsh M, Wilson JG, Marrugo J, Lange LA, Williams LK, Watson H, Ware LB, Olopade CO, Olopade O, Oliveira RR, Ober C, Nicolae DL, Meyers DA, Mayorga A, Knight-Madden J, Hartert T, Hansel NN, Foreman MG, Ford JG, Faruque MU, Dunston G, Caraballo L, Burchard E, Bleecker E, Araujo M, Herrera-Paz E, Campbell M, Foster C, Taub M, Beaty T, Ruczinski I, Mathias R, Barnes K, Salzberg S. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nature Genetics 2019; 51(1):30-35.
Document Type
Article
Publication Date
1-1-2019
Publication Title
Nature Genetics
Abstract
We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.
PubMed ID
30455414
Volume
51
Issue
1
First Page
30
Last Page
35