题名人类基因组中保守二级结构的纯净化选择及其在转录调控网络中的作用
作者谢海兵
学位类别博士
答辩日期2008-06
授予单位中国科学院研究生院
授予地点北京
导师张亚平
关键词保守二级结构 纯净化选择作用 转录调控网络
其他题名Conserved secondary structures in the human genome are evolutionarily constrained and function in transcriptional regulation networks
学位专业动物学
中文摘要保守序列是一种跨物种保守的基因组序列,而且绝大多数为非蛋白编码序列。保守序列在人类遗传疾病中发挥着重要作用。其中,一部分保守序列能够折叠形成二级结构。已鉴定的一些保守二级结构编码一些RNA分子,如microRNA、RNA编辑序列和组蛋白mRNA 3’端非翻译区茎环结构等。但是,对于绝大部分的保守二级结构,它们的生物学功能以及作用于它们上面的进化作用力依然是未知的。 群体的SNP数据在分析序列上的进化作用力时非常有效。SNP在群体中的频率会因为受到不同的进化作用力而表现出差异,而与其是否位于基因组中的突变热点无关。对于受纯净化选择作用的SNP,它们的频率一般会比中性SNP具有低的新生型等位基因频率(DAF)。我们运用生物信息学的方法,在人类基因组保守二级结构中找到746个SNP。这746个SNP与基因组其它区段的SNP在突变模式上并不存在显著差异,在保守二级结构内同样存在突变热点。通过与侧翼序列SNP的分布比较发现,保守二级结构上SNP密度约为其侧翼序列的2/3。相比于侧翼序列SNP,有更高比例的保守二级结构SNP具有低的DAF值。这些结果提示,有很多保守二级结构上的SNP因为受到纯净化选择作用而在现代人群中被剔除了。保守二级结构与侧翼序列在SNP密度和DAF上的差异要高于保守序列与非保守序列之间的差异,提示保守二级结构是受到纯净化选择作用最为严格的一类保守序列。我们发现,在保守二级结构内部,纯净化选择作用的强度也有差异。茎区比环区具有更低的SNP密度,而且有更高比例的茎区SNP具有低的DAF值。这个结果提示,保守二级结构上的纯净化选择力主要作用于茎区上的位点。我们推测,这可能是茎区上的突变往往比环区的突变对二级结构的造成更大的影响导致的。 我们通过寻找保守二级结构与转录因子SOX2、OCT4、NANOG、SUZ12和C-MYC结合位点之间的重叠,还分析了保守二级结构在转录调控网络中的作用。结果显示,很多保守二级结构是作为转录因子的结合位点调控了许多与发育相关的转录因子编码基因的表达。转录因子与保守二级结构之间的结合模式非常复杂,可以有多个转录因子结合到同一个保守二级结构上,也可以是一个转录因子结合到自身编码基因相关的保守二级结构上。不同的转录因子和保守二级结构结合可以主导靶基因的特异模式,当绝大多数相关的保守二级结构与SUZ12结合时,基因表达受到抑制,而当绝大多数相关的保守二级结构不与SUZ12结合时,基因表达受到激活。在转录调控网络中,约有30%的保守二级结构是作为启动子来调控基因的表达。因为转录因子SOX2、OCT4、NANOG、SUZ12和C-MYC仅仅只结合到很小一部分保守二级结构上,提示可能还有更多的转录因子会结合到保守二级结构上。因此,保守二级结构介导的转录调控网络要比目前已知的复杂得多。
英文摘要Conserved sequences are one kind of genomic sequences shared by a wide spectrum of species, and most of them are non-coding sequences. Conserved sequences play critical roles in the genetic diseases. Many conserved secondary structures have been identified inside some conserved sequences. Some conserved secondary structures have been recognized as RNAs, such as microRNAs, RNA editing sequences, and stem-loops in the 3’-UTR of histone mRNAs. However, for most of the conserved secondary structures, it remains largely unknown about their biological functions and the evolutionary forces acting upon them. The data of SNPs from populations are effective in analyzing the evolutionary forces acting on a sequence. The frequencies of SNPs would be affected by the evolutionary forces acting on them, but not determined by whether they are located inside mutation hot spots. SNPs under purifying selection are always exhibiting lower derived allele frequencies (DAFs) than neutral SNPs. We identified 746 SNPs located inside conserved secondary structures by using bioinformatics methods. No significant difference of mutation patterns exists between SNPs in conserved secondary structures and other genomic regions, and hot mutation spots are also presented in the conserved secondary structures. By comparing the distribution of SNPs in conserved secondary structures and their flanking sequences, we found that SNP density in former is about 2/3 of that in latter. Further, a higher fraction of SNPs in conserved secondary structures have low DAFs than SNPs in the flanking sequences. These results indicate that many of the mutations in conserved secondary structures are removed in the human populations by purifying selection. The difference of SNP density and DAF distribution is more significant than the corresponding difference observed between conserved and nonconserved sequences, indicating that conserved secondary structures are the most conserved sequences. Even inside conserved secondary structures, we also observed an uneven distribution of the intensity of purifying selection. Sites on stems have lower SNP density than sites on loops and a higher fraction of SNPs on stems have low DAFs than SNPs on loops. This result indicates that the purifying selection against conserved secondary structures is mainly resulted from the purifying selection against sites on stems. We speculate that the difference might be owing to the fact that mutations on stems have greater impact on the secondary structures than mutations on loops. We investigated the roles of conserved secondary structures in the transcriptional regulation networks by examining their overlaps with the binding sites for transcription factors SOX2, OCT4, NANOG, SUZ12 and C-MYC. Our result indicates that many conserved secondary structures are regulating developmental transcription factor-encoding genes by providing binding sites for transcription factors. Transcription factors exhibit complicated patterns when binding to conserved secondary structures, some transcription factors can bind to a common conserved secondary structure, and some transcription factors can bind to conserved secondary structures that are associated with their encoding genes. The different binding patterns between transcription factors and conserved secondary structures are directing specific expression patterns of target genes. The expression of target gene is repressed while most of the associated intergenic conserved secondary structures are bound by SUZ12 and acitivated while most of the associated intergenic conserved secondary structures are devoild of binding of SUZ12. About 30% of the conserved secondary structures function as promoters in the transcriptional regulation networks. Because the transcription factors only bind to a small fraction of conserved secondary structures, many other transcription factors may bind to conserved secondary structures. Therefore, the transcriptional regulation networks mediated by conserved secondary structures would be much more complicated than currently appreciated.
语种中文
公开日期2010-10-14
内容类型学位论文
源URL[http://159.226.149.42:8088/handle/152453/6088]  
专题昆明动物研究所_分子进化基因组学
推荐引用方式
GB/T 7714
谢海兵. 人类基因组中保守二级结构的纯净化选择及其在转录调控网络中的作用[D]. 北京. 中国科学院研究生院. 2008.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace