Szerkesztő:Alfa-ketosav/nem kódoló DNS

A nem kódoló DNS (ncDNS) a DNS fehérjét nem kódoló szakaszait jelenti. Egyes nem kódoló DNS működő nem kódoló RNS-sé (például transzfer-, mikro-, Piwi-kölcsönható vagy szabályzó RNS-sé) íródik át. További működő ncDNS-szakaszok a génexpressziót irányító szabályzó szakaszok, a szerkezeti csatlakozó régiók, a replikációs origók, a centromerek és a telomerek. Egyes nem kódoló részeket, például az intronokat, a pszeudogéneket, a génközi DNS-t és a transzpozon- és vírustöredékeket nagyrészt nem funkcionálisnak tekintenek

A nem kódoló genom-DNS aránya

szerkesztés

Baktériumokban a kódoló DNS-ek általában a genom 88%-át teszik ki.[1] A fennmaradó 12% nem kódol fehérjét, de ezek nagy része működő RNS-transzkriptummal rendelkező gének és szabályzó szakaszok révén működik, vagyis a bakteriális genom nagy része funkcionális.[1] Az eukarióták kódoló DNS-ének aránya általában kisebb, mivel az eukarióta genomok a prokariótákban nem megtalálható nagy mennyiségű ismétlődő DNS-t tartalmaznak. A humán genom nagyjából 1–2% kódoló DNS-t tartalmaz.[2][3] A pontos szám ismeretlen a működő kódoló exonok száma és a humán genom mérete körüli viták miatt. Ez azt jelenti, hogy a humán genom 98–99%-a ncDNS, és ez sok funkciós elemet, például nem kódoló gént és szabályzó szakaszt tartalmaz.

Az eukarióták genommérete még közeli rokon fajokban is jelentősen eltérhet. Ezt eredetileg C-érték-paradoxonnak nevezték, ahol C a haploid genom méretére utal.[4] A paradoxont az oldotta fel, hogy a legtöbb méretbeli eltérés oka nem a génmennyiség eltérése, hanem az ismétlődő szakaszok változása. Egyes kutatók szerint az ismétlődő DNS nagy része nem funkcionális DNS. A genomméret változásainak okai még ismeretlenek, ez a C-érték-enigma.[5]

Ez vezetett ahhoz a következtetéshez, hogy a génszám a komplexitási fogalmakkal nem korrelál, mivel a génszám állandó, ez a G-érték-paradoxon.[6] Például az egysejtű Polychaos dubium (korábbi nevén Amoeba dubia) esetén több mint 200-szor annyi DNS-t találtak, mint emberben (több mint 600 milliárd bázispárról, szemben az emberben lévő 3 milliárddal).[7] A Takifugu rubripes genomjának mérete a humán genomnak nyolcada, mégis közel ugyanannyi génje van. Genomja 30%-át gének alkotják, és 10% kódol. A kisebb genomméretet feltehetően a kisebb intronok és a kevesebb ismétlődő DNS okozza.[8][9]

Az Utricularia gibba magi genomja a legtöbb növényhez képest kicsi (100,7 Mb).[10][11] Feltehetően 1500 Mb-os ősi genomból származik.[11] Nagyjából ugyanannyi génje van, mint a többi növénynek, de a kódoló DNS a genom 30%-át teszi ki.[10][11]

A genom nem kódoló 70%-a a többi növénynél rövidebb promoterekből és szabályzó szekvenciákból áll.[10] A génekben vannak intronok, de számuk és méretük kisebb a más növényi genomokban lévőkénél.[10] Vannak nem kódoló gének, például sok rRNS-gén-másolat.[11] A genomban továbbá vannak telomerek és centromerek.[11] Sok ismétlődő DNS nem található meg benne, mivel a többi növénytől elvált. A genom 59%-a transzpozonokkal rokon szekvenciákból áll, de a genomméret miatt az ilyen eredetű DNS mennyisége is csökkent.[11] Az eredeti 2013-as cikk szerzői szerint az állati ncDNS-ben lévő funkciós elemek jelenléte nem alkalmazható növényi genomokra.[10]

Egy New York Times-cikk szerint a faj fejlődése során „a nem funkcionális örökítőanyag kikerült, a szükséges fennmaradt”.[12] A Buffalói Egyetemnél dolgozó Victor Albert szerint a növény képes a nem funkcionális DNS-t eltávolítani, és „teljesen jó állapotú többsejtű növényként élni sok különböző sejttel, szervvel, szövettel és virággal, és ez szemét nélkül is történhet. A szemét nem szükséges”.[13]

Az ncDNS-szekvenciák típusai

szerkesztés

Nem kódoló gének

szerkesztés

Kétféle gén van: fehérjekódoló és nem kódoló gén.[14] A nem kódoló gének fontos részei az ncDNS-nek, ide tartoznak a tRNS- és rRNS-gének. Ezeket az 1960-as években fedezték fel. A prokarióta genomok több más ncRNS-gént is tartalmaznak, de ezek gyakoribbak eukariótákban.

Jellemző nem kódoló gének a kis magi RNS-ek (snRNS), kis magvacska-RNS-ek (snoRNS), mikro-RNS-ek (miRNS), kis interferáló RNS-ek (siRNS), Piwi-kölcsönható RNS-ek (piRNS) és hosszú nem kódoló RNS-ek (lncRNS) génjei. Ezenkívül sok egyedi katalitikus RNS-gén is létezik.[15]

A nem kódoló gének a prokarióta genom néhány százalékát teszik ki,[16] de ennél több lehet eukarióta genomokban.[17]

A nem kódoló gének mennyisége a humán genomban vitatott. Egyes kutatók szerint mintegy 5000, mások szerint több mint 100 000 ilyen van. Ezt nagyrészt az lncRNS-gének számáról szóló vita okozza.[18]

Promoterek és szabályzó elemek

szerkesztés

A promoterek a gén 5'-vége közelében lévő DNS-szakaszok. Ide köt az RNS-polimeráz, mely elindítja az RNS-szintézist. Minden génnek van nem kötő promotere. Many regulatory sequences occur near promoters, usually upstream of the transcription start site of the gene. Some occur within a gene and a few are located downstream of the transcription termination site. In eukaryotes, there are some regulatory sequences that are located at a considerable distance from the promoter region. These distant regulatory sequences are often called enhancers but there is no rigorous definition of enhancer that distinguishes it from other transcription factor binding sites.[19][20]

 
Illustration of an unspliced pre-mRNA precursor, with five introns and six exons (top). After the introns have been removed via splicing, the mature mRNA sequence is ready for translation (bottom).

Group I and group II introns take up only a small percentage of the genome when they are present. Spliceosomal introns (see Figure) are only found in eukaryotes and they can represent a substantial proportion of the genome. In humans, for example, introns in protein-coding genes cover 37% of the genome. Combining that with about 1% coding sequences means that protein-coding genes occupy about 38% of the human genome. The calculations for noncoding genes are more complicated because there is considerable dispute over the total number of noncoding genes but taking only the well-defined examples means that noncoding genes occupy at least 6% of the genome.[21][2]

Untranslated regions

szerkesztés

The standard biochemistry and molecular biology textbooks describe non-coding nucleotides in mRNA located between the 5' end of the gene and the translation initiation codon. These regions are called 5'-untranslated regions or 5'-UTRs. Similar regions called 3'-untranslated regions (3'-UTRs) are found at the end of the gene. The 5'-UTRs and 3'UTRs are very short in bacteria but they can be several hundred nucleotides in length in eukaryotes. They contain short elements that control the initiation of translation (5'-UTRs) and transcription termination (3'-UTRs) as well as regulatory elements that may control mRNA stability, processing, and targeting to different regions of the cell.[22][23][24]

Origins of replication

szerkesztés

DNA synthesis begins at specific sites called origins of replication. These are regions of the genome where the DNA replication machinery is assembled and the DNA is unwound to begin DNA synthesis. In most cases, replication proceeds in both directions from the replication origin.

The main features of replication origins are sequences where specific initiation proteins are bound. A typical replication origin covers about 100-200 base pairs of DNA. Prokaryotes have one origin of replication per chromosome or plasmid but there are usually multiple origins in eukaryotic chromosomes. The human genome contains about 100,000 origins of replication representing about 0.3% of the genome.[25][26][27]

Centromeres

szerkesztés
 
Schematic karyogram of a human, showing an overview of the human genome on G banding, wherein non-coding DNA is present at the centromeres (shown as narrow segment of each chromosome), and also occurs to a greater extent in darker (GC poor) regions.[28]

Centromeres are the sites where spindle fibers attach to newly replicated chromosomes in order to segregate them into daughter cells when the cell divides. Each eukaryotic chromosome has a single functional centromere that is seen as a constricted region in a condensed metaphase chromosome. Centromeric DNA consists of a number of repetitive DNA sequences that often take up a significant fraction of the genome because each centromere can be millions of base pairs in length. In humans, for example, the sequences of all 24 centromeres have been determined[29] and they account for about 6% of the genome. However, it is unlikely that all of this noncoding DNA is essential since there is considerable variation in the total amount of centromeric DNA in different individuals.[30] Centromeres are another example of functional noncoding DNA sequences that have been known for almost half a century and it is likely that they are more abundant than coding DNA.

Telomeres are regions of repetitive DNA at the end of a chromosome, which provide protection from chromosomal deterioration during DNA replication. Recent studies have shown that telomeres function to aid in its own stability. Telomeric repeat-containing RNA (TERRA) are transcripts derived from telomeres. TERRA has been shown to maintain telomerase activity and lengthen the ends of chromosomes.[31]

Scaffold attachment regions

szerkesztés

Both prokaryotic and eukarotic genomes are organized into large loops of protein-bound DNA. In eukaryotes, the bases of the loops are called scaffold attachment regions (SARs) and they consist of stretches of DNA that bind an RNA/protein complex to stabilize the loop. There are about 100,000 loops in the human genome and each one consists of about 100 bp of DNA. The total amount of DNA devoted to SARs accounts for about 0.3% of the human genome.[32]

Pseudogenes

szerkesztés

Pseudogenes are mostly former genes that have become non-functional due to mutation but the term also refers to inactive DNA sequences that are derived from RNAs produced by functional genes (processed pseudogenes). Pseudogenes are only a small fraction of noncoding DNA in prokaryotic genomes because they are eliminated by negative selection. In some eukaryotes, however, pseudogenes can accumulate because selection is not powerful enough to eliminate them (see Nearly neutral theory of molecular evolution).

The human genome contains about 15,000 pseudogenes derived from protein-coding genes and an unknown number derived from noncoding genes.[33] They may cover a substantial fraction of the genome (~5%) since many of them contain former intron sequences.

Pseudogenes are junk DNA by definition and they evolve at the neutral rate as expected for junk DNA.[34] Some former pseudogenes have secondarily acquired a function and this leads some scientists to speculate that most pseudogenes are not junk because they have a yet-to-be-discovered function.[35]

Repeat sequences, transposons and viral elements

szerkesztés
 
Mobile genetic elements in the cell (left) and how they can be acquired (right)

Transposons and retrotransposons are mobile genetic elements. Retrotransposon repeated sequences, which include long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), account for a large proportion of the genomic sequences in many species. Alu sequences, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes.[36][37][38]

Endogenous retrovirus sequences are the product of reverse transcription of retrovirus genomes into the genomes of germ cells. Mutation within these retro-transcribed sequences can inactivate the viral genome.[39]

Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of DNA transposons. Much of the remaining half of the genome that is currently without an explained origin is expected to have found its origin in transposable elements that were active so long ago (> 200 million years) that random mutations have rendered them unrecognizable.[40] Genome size variation in at least two kinds of plants is mostly the result of retrotransposon sequences.[41][42]

Highly repetitive DNA

szerkesztés

Highly repetitive DNA consists of short stretches of DNA that are repeated many times in tandem (one after the other). The repeat segments are usually between 2 bp and 10 bp but longer ones are known. Highly repetitive DNA is rare in prokaryotes but common in eukaryotes, especially those with large genomes. It is sometimes called satellite DNA.

Most of the highly repetitive DNA is found in centromeres and telomeres (see above) and most of it is functional although some might be redundant. The other significant fraction resides in short tandem repeats (STRs; also called microsatellites) consisting of short stretches of a simple repeat such as ATC. There are about 350,000 STRs in the human genome and they are scattered throughout the genome with an average length of about 25 repeats.[43][44]

Variations in the number of STR repeats can cause genetic diseases when they lie within a gene but most of these regions appear to be non-functional junk DNA where the number of repeats can vary considerably from individual to individual. This is why these length differences are used extensively in DNA fingerprinting.

Junk DNA is DNA that has no biologically relevant function such as pseudogenes and fragments of once active transposons. Bacteria and viral genomes have very little junk DNA[45][46] but some eukaryotic genomes may have a substantial amount of junk DNA.[47] The exact amount of nonfunctional DNA in humans and other species with large genomes has not been determined and there is considerable controversy in the scientific literature.[48][49]

The nonfunctional DNA in bacterial genomes is mostly located in the intergenic fraction of non-coding DNA but in eukaryotic genomes it may also be found within introns. It is important to note that there are many examples of functional DNA elements in non-coding DNA and that it is erroneous to equate non-coding DNA with junk DNA.

Genome-wide association studies (GWAS) and non-coding DNA

szerkesztés

Genome-wide association studies (GWAS) identify linkages between alleles and observable traits such as phenotypes and diseases. Most of the associations are between single-nucleotide polymorphisms (SNPs) and the trait being examined and most of these SNPs are located in non-functional DNA. The association establishes a linkage that helps map the DNA region responsible for the trait but it does not necessarily identify the mutations causing the disease or phenotypic difference.[50][51][52][53][54]

SNPs that are tightly linked to traits are the ones most likely to identify a causal mutation. (The association is referred to as tight linkage disequilibrium.) About 12% of these polymorphisms are found in coding regions; about 40% are located in introns; and most of the rest are found in intergenic regions, including regulatory sequences.[51]

  1. a b Kirchberger PC, Schmidt ML, and Ochman H (2020). „The ingenuity of bacterial genomes”. Annual Review of Microbiology 74, 815–834. o. DOI:10.1146/annurev-micro-020518-115822. PMID 32692614. 
  2. a b Piovesan A, Antonaros F, Vitale L, Strippoli P, Pelleri MC, Caracausi M (2019. november 28.). „Human protein-coding genes and gene feature statistics in 2019”. BMC Research Notes 12 (1), 315. o. DOI:10.1186/s13104-019-4343-8. PMID 31164174. PMC 6549324. 
  3. Omenn GS (2021). „Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years”. Molecular & Cellular Proteomics 20, 100062. o. DOI:10.1016/j.mcpro.2021.100062. PMID 33640492. PMC 8058560. 
  4. Thomas CA (1971). „The genetic organization of chromosomes”. Annual Review of Genetics 5, 237–256. o. DOI:10.1146/annurev.ge.05.120171.001321. PMID 16097657. 
  5. Elliott TA, Gregory TR (2015). „What's in a genome? The C-value enigma and the evolution of eukaryotic genome content”. Phil. Trans. R. Soc. B 370 (1678), 20140331. o. DOI:10.1098/rstb.2014.0331. PMID 26323762. PMC 4571570. 
  6. Hahn MW, Wray GA (2002). „The g-value paradox”. Evolution and Development 4 (2), 73–75. o. DOI:10.1046/j.1525-142X.2002.01069.x. PMID 12004964. 
  7. Gregory TR, Hebert PD (1999. április 1.). „The modulation of DNA content: proximate causes and ultimate consequences”. Genome Research 9 (4), 317–324. o. DOI:10.1101/gr.9.4.317. PMID 10207154. 
  8. Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A (2002. november 28.). „Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes”. Science 297 (5585), 1301–1310. o. DOI:10.1126/science.1072104. PMID 12142439. 
  9. Ohno S (1972. november 28.). „So much "junk" DNA in our genome”. Brookhaven Symposia in Biology 23, 366–370. o. PMID 5065367. 
  10. a b c d e Ibarra-Laclette E, Lyons E, Hernández-Guzmán G, Pérez-Torres CA, Carretero-Paulet L, Chang TH, Lan T, Welch AJ, Juárez MJ, Simpson J, etal (2013). „Architecture and evolution of a minute plant genome”. Nature 498 (7452), 94–98. o. DOI:10.1038/nature12132. PMID 23665961. PMC 4972453. 
  11. a b c d e f Lan T, Renner T, Ibarra-Laclette E, Farr KM, Chang TH, Cervantes-Pérez SA, Zheng C, Sankoff D, Tang H, and Purbojati RW (2017). „Long-read sequencing uncovers the adaptive topography of a carnivorous plant genome”. Proceedings of the National Academy of Sciences 114 (22), E4435–E4441. o. DOI:10.1073/pnas.1702072114. PMID 28507139. PMC 5465930. 
  12. Klein J. „Genetic Tidying Up Made Humped Bladderworts Into Carnivorous Plants”, New York Times, 2017. május 19. (Hozzáférés: 2022. május 30.) 
  13. University of Arizona (2013-05-13). "Carnivorous Plant Throws Out 'Junk' DNA". Sajtóközlemény.
  14. Kampourakis K. Making sense of genes. Cambridge University Press, 67–88. o. (2017). ISBN 978-1-107-12813-2 
  15. Cech TR, Steitz JA (2014). „The Noncoding RNA Revolution - Trashing Old Rules to Forge New Ones”. Cell 157 (1), 77–94. o. DOI:10.1016/j.cell.2014.03.008. PMID 24679528. 
  16. Rogozin IB, Makarova KS, Natale DA, Spiridonov AN, Tatusov RL, Wolf YI, Yin J, Koonin EV (2002. október 1.). „Congruent evolution of different classes of non-coding DNA in prokaryotic genomes”. Nucleic Acids Research 30 (19), 4264–4271. o. DOI:10.1093/nar/gkf549. PMID 12364605. PMC 140549. 
  17. Bielawski JP, Jones C. Adaptive Molecular Evolution: Detection Methods, Encyclopedia of Evolutionary Biology, 16–25. o.. DOI: 10.1016/B978-0-12-800049-6.00171-2 (2016). ISBN 978-0-12-800426-5 
  18. Ponting CP, and Haerty W (2022). „Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review”. Annual Review of Genomics and Human Genetics 23, 153–172. o. DOI:10.1146/annurev-genom-112921-123710. PMID 35395170. 
  19. Compe E, Egly JM (2021. november 28.). „The Long Road to Understanding RNAPII Transcription Initiation and Related Syndromes”. Annual Review of Biochemistry 90, 193–219. o. DOI:10.1146/annurev-biochem-090220-112253. PMID 34153211. 
  20. Visel A, Rubin EM, Pennacchio LA (2009. szeptember 1.). „Genomic views of distant-acting enhancers”. Nature 461 (7261), 199–205. o. DOI:10.1038/nature08451. PMID 19741700. PMC 2923221. 
  21. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, Aken BL, Barrell D, Zadissa A, Searle S (2012. november 28.). „GENCODE: the reference human genome annotation for The ENCODE Project”. Genome Research 22 (9), 1760–1774. o. DOI:10.1101/gr.135350.111. PMID 22955987. PMC 3431492. 
  22. Alberts B, Bray D, Lewis J, Raff M, Roberts K, Watson JD. Molecular Biology of the Cell, 3rd edition. Garland Publishing Inc. (1994. november 28.) Sablon:Page needed
  23. Lewin B. Genes VIII. Pearson/Prentice Hall (2004. november 28.) Sablon:Page needed
  24. Moran L, Horton HR, Scrimgeour KG, Perry MD. Principles of Biochemistry Fifth Edition. Pearson (2012. november 28.) Sablon:Page needed
  25. Leonard AC, Méchali M (2013. november 28.). „DNA replication origins”. Cold Spring Harbor Perspectives in Biology 5 (10), a010116. o. DOI:10.1101/cshperspect.a010116. PMID 23838439. PMC 3783049. 
  26. Urban JM, Foulk MS, Casella C, Gerbi SA (2015. november 28.). „The hunt for origins of DNA replication in multicellular eukaryotes”. F1000Prime Reports 7, 30. o. DOI:10.12703/P7-30. PMID 25926981. PMC 4371235. 
  27. Prioleau M, MacAlpine DM (2016. november 28.). „DNA replication origins—where do we begin?”. Genes & Development 30 (15), 1683–1697. o. DOI:10.1101/gad.285114.116. PMID 27542827. PMC 5002974. 
  28. Romiguier J, Roux C (2017). „Analytical Biases Associated with GC-Content in Molecular Evolution”. Frontiers in Genetics 8, 16. o. DOI:10.3389/fgene.2017.00016. PMID 28261263. PMC 5309256. 
  29. Altemose N, Logsdon GA, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, et al. (2021. november 28.). „Complete genomic and epigenetic maps of human centromeres”. Science 376 (6588), 56. o. DOI:10.1126/science.abl4178. PMID 35357911. PMC 9233505. 
  30. Miga KH (2019. november 28.). „Centromeric satellite DNAs: hidden sequence variation in the human population”. Genes 10 (5), 353. o. DOI:10.3390/genes10050352. PMID 31072070. PMC 6562703. 
  31. Cusanelli E, Chartrand P (2014. május 1.). „Telomeric noncoding RNA: telomeric repeat-containing RNA in telomere biology”. Wiley Interdisciplinary Reviews. RNA 5 (3), 407–419. o. DOI:10.1002/wrna.1220. PMID 24523222. 
  32. Mistreli T (2020. november 28.). „The self-organizing genome: Principles of genome architecture and function”. Cell 183 (1), 28–45. o. DOI:10.1016/j.cell.2020.09.014. PMID 32976797. PMC 7541718. 
  33. Ensemble Human reference genome GRCh38.p13
  34. Xu J, Zhang J (2015. november 28.). „Are human translated pseudogenes functional?”. Molecular Biology and Evolution 33 (3), 755–760. o. DOI:10.1093/molbev/msv268. PMID 26589994. PMC 5009996. 
  35. Wen YZ, Zheng LL, Qu LH, Ayala FJ, Lun ZR (2012. november 28.). „Pseudogenes are not pseudo any more.”. RNA Biology 9 (1), 27–32. o. DOI:10.4161/rna.9.1.18277. PMID 22258143. 
  36. Ponicsan SL, Kugel JF, Goodrich JA (2010. április 1.). „Genomic gems: SINE RNAs regulate mRNA production”. Current Opinion in Genetics & Development 20 (2), 149–155. o. DOI:10.1016/j.gde.2010.01.004. PMID 20176473. PMC 2859989. 
  37. Häsler J, Samuelsson T, Strub K (2007. július 1.). „Useful 'junk': Alu RNAs in the human transcriptome”. Cellular and Molecular Life Sciences 64 (14), 1793–1800. o. DOI:10.1007/s00018-007-7084-0. PMID 17514354. 
  38. Walters RD, Kugel JF, Goodrich JA (2009. augusztus 1.). „InvAluable junk: the cellular impact and function of Alu and B2 RNAs”. IUBMB Life 61 (8), 831–837. o. DOI:10.1002/iub.227. PMID 19621349. PMC 4049031. 
  39. Nelson PN, Hooley P, Roden D, Davari Ejtehadi H, Rylance P, Warren P, Martin J, Murray PG (2004. október 1.). „Human endogenous retroviruses: transposable elements with potential?”. Clinical and Experimental Immunology 138 (1), 1–9. o. DOI:10.1111/j.1365-2249.2004.02592.x. PMID 15373898. PMC 1809191. 
  40. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J (2001. február 1.). „Initial sequencing and analysis of the human genome”. Nature 409 (6822), 860–921. o. DOI:10.1038/35057062. PMID 11237011. 
  41. Piegu B, Guyot R, Picault N, Roulin A, Sanyal A, Saniyal A, Kim H, Collura K, Brar DS, Jackson S, Wing RA, Panaud O (2006. október 1.). „Doubling genome size without polyploidization: dynamics of retrotransposition-driven genomic expansions in Oryza australiensis, a wild relative of rice”. Genome Research 16 (10), 1262–1269. o. DOI:10.1101/gr.5290206. PMID 16963705. PMC 1581435. 
  42. Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF (2006. október 1.). „Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium”. Genome Research 16 (10), 1252–1261. o. DOI:10.1101/gr.5282906. PMID 16954538. PMC 1581434. 
  43. Gymrek M, Willems T, Guilmatre A, Zeng H, Markus B, Georgiev S, Daly MJ, Price AL, Pritchard JK, Sharp AJ, Erlich Y (2016. november 28.). „Abundant contribution of short tandem repeats to gene expression variation in humans”. Nature Genetics 48 (1), 22–29. o. DOI:10.1038/ng.3461. PMID 26642241. PMC 4909355. 
  44. Kronenberg ZN, Fiddes IT, Gordon D, Murali S, Cantsilieris S, Meyerson OS, Underwood JG, Nelson BJ, Chaisson MJ, Dougherty ML (2018. november 28.). „High-resolution comparative analysis of great ape genomes”. Science 360 (6393), 1085. o. DOI:10.1126/science.aar6343. PMID 29880660. PMC 6178954. 
  45. Gil R, and Latorre A (2012. november 28.). „Factors behind junk DNA in bacteria”. Genes 3 (4), 634–650. o. DOI:10.3390/genes3040634. PMID 24705080. PMC 3899985. 
  46. (2016. november 28.) „Gene overlapping and size constraints in the viral world” (angol nyelven). Biology Direct 11 (1), 26. o. DOI:10.1186/s13062-016-0128-3. ISSN 1745-6150. PMID 27209091. PMC 4875738. 
  47. Palazzo AF, Gregory TR (2014. május 1.). „The case for junk DNA”. PLOS Genetics 10 (5), e1004351. o. DOI:10.1371/journal.pgen.1004351. PMID 24809441. PMC 4014423. 
  48. Morange, Michel (2014. november 28.). „Genome as a Multipurpose Structure Built by Evolution”. Perspectives in Biology and Medicine 57 (1), 162–171. o. DOI:10.1353/pbm.2014.0008. PMID 25345709. 
  49. Haerty W, and Ponting CP (2014). „No Gene in the Genome Makes Sense Except in the Light of Evolution.”. Annual Review of Genomics and Human Genetics 25, 71–92. o. DOI:10.1146/annurev-genom-090413-025621. PMID 24773316. 
  50. Korte A, Farlwo A (2013. november 28.). „The advantages and limitations of trait analysis with GWAS: a review”. Plant Methods 9, 29. o. DOI:10.1186/1746-4811-9-29. PMID 23876160. PMC 3750305. 
  51. a b Manolio TA (2010. július 1.). „Genomewide association studies and assessment of the risk of disease”. The New England Journal of Medicine 363 (2), 166–76. o. DOI:10.1056/NEJMra0905980. PMID 20647212. 
  52. Visscher PV, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J (2017. november 28.). „10 Years of GWAS Discovery: Biology, Function, and Translation”. American Journal of Human Genetics 101 (1), 5–22. o. DOI:10.1016/j.ajhg.2017.06.005. PMID 28686856. PMC 5501872. 
  53. Gallagher MD, Chen-Plotkin, AS (2018. november 28.). „The Post-GWAS Era: From Association to Function”. American Journal of Human Genetics 102 (5), 717–730. o. DOI:10.1016/j.ajhg.2018.04.002. PMID 29727686. PMC 5986732. 
  54. Marigorta UM, Rodríguez JA, Gibson G, Navarro A (2018. november 28.). „Replicability and Prediction: Lessons and Challenges from GWAS”. Trends in Genetics 34 (7), 504–517. o. DOI:10.1016/j.tig.2018.03.005. PMID 29716745. PMC 6003860. 

További információk

szerkesztés