中国畜牧兽医 ›› 2022, Vol. 49 ›› Issue (4): 1413-1421.doi: 10.16431/j.cnki.1671-7236.2022.04.023

• 遗传繁育 • 上一篇    下一篇

基于重测序技术的塔河马鹿特异性SNP位点筛选

邓伊华1,2, 王天娇2, 王洪亮2, 董依萌2, 刘欣1, 邢秀梅2   

  1. 1. 东北林业大学野生动物与自然保护地学院, 哈尔滨 150006;
    2. 中国农业科学院特产研究所, 特种经济动物分子生物重点实验室, 长春 130112
  • 收稿日期:2021-09-30 出版日期:2022-04-05 发布日期:2022-03-25
  • 通讯作者: 刘欣, 邢秀梅 E-mail:liuxin7415@163.com;xingxiumei2004@126.com
  • 基金资助:
    中国农业科学院科技创新工程(CAAS-ASTIP-2021-ISAPS);兵团财政科技计划项目(2021BC005)

Screening of Specific SNP Sites in Tahe Red Deer Based on Resequencing Technology

DENG Yihua1,2, WANG Tianjiao2, WANG Hongliang2, DONG Yimeng2, LIU Xin1, XING Xiumei2   

  1. 1. College of Wildlife Resources Northeast Forestry University, Harbin 150006, China;
    2. Key Laboratory of Molecular Biology of Special Economic Animals, Institute of Special Products, Chinese Academy of Agricultural Sciences, Changchun 130112, China
  • Received:2021-09-30 Online:2022-04-05 Published:2022-03-25

摘要: 【目的】 筛选出塔河马鹿高质量的单核苷酸多态性位点(SNP),构建特异性分子遗传标记,为塔河马鹿的纯种鉴别提供参考。【方法】 对国家特种经济动物资源共享平台1999年收集整理的32份塔河马鹿血液DNA样本进行全基因组重测序,与梅花鹿染色体级别的参考基因组进行比对,统计比对率、覆盖度和测序深度。用SNPEff统计每条染色体上SNP位点的分布,用SNPhylo基于位点构建分子进化树,用VCFTOOLS计算遗传分化指数(Fst),按降序排序并设定阈值Fst≥0.25,淘汰不符合条件位点,用R语言进行主成分分析(PCA),统计33条染色体排名前1 500的SNPs位点,提取前100个SNPs集合的特征值,分别进行主成分分析。【结果】 全基因组重测序结果表明,32份质检合格的塔河马鹿血液基因组DNA有效数据量为868 791 354 600 bp,测序质量均符合后续数据分析要求。32份样品测序数据的平均比对率为98.06%,平均覆盖度为97.66%,平均深度为6.787。过滤后得到20 139 122个高质量的SNPs位点,其中4号染色体上SNPs分布最多,90%以上的变异位于基因间和外显子区域。建树后候选特异性SNPs位点数为12 050 781个。使用VCFTOOLS筛选得到544 717个SNPs位点。选取每条染色体排名前1 500的SNPs位点进行主成分分析,主成分分析结果显示,当SNPs从49 500降至100时区分效力没有下降,最终筛选出100个特异性强、稳定性高的塔河马鹿SNPs位点。【结论】 通过全基因组重测序技术和生物信息学分析得到了100个塔河马鹿特异性SNPs位点,为塔河马鹿的纯种鉴别以及核心种质的筛选工作提供了理论依据。

关键词: 塔河马鹿; 重测序技术; SNPs; 筛选

Abstract: :【Objective】 To screen high quality single nucleotide polymorphic site (SNP) of Tahe red deer,construct specific molecular genetic markers, provide reference for pure breed identification of Tahe red deer. 【Method】 The whole genome of 32 blood DNA samples of Tahe red deer collected by the National Special Economic Animal Resource Sharing Platform in 1999 was re-sequenced. Compared with the chromosome-level reference genome of sika deer, the alignment rate, coverage and sequencing depth were counted. SNPEff was used to count the distribution of SNP on each chromosome, SNPhylo was used to construct a molecular evolutionary tree based on the sites, VCFTOOLS was used to calculate the genetic differentiation index(Fst), and the Fst value was sorted in descending order and the threshold was settet as Fst≥0.25 to eliminate the unqualified loci. R language was used for principal component analysis, the top 1 500 SNPs of 33 chromosomes was counted, the eigenvalues of the top 100 SNPs was extracted, and the PCA cluster diagrams were drew respectively. 【Result】 The results of whole-genome resequencing showed that the effective data volume of the blood genomic DNA of 32 Tahe red deer that passed the quality inspection was 868 791 354 600 bp, and the sequencing quality met the requirements of subsequent data analysis. The average comparison rate of the 32 samples sequencing data was 98.06%, the average coverage was 97.66%, and the average depth was 6.787. After filtering, 20 139 122 high-quality SNPs were obtained, of which SNPs were the most distributed on chromosome 4, and more than 90% of the mutations were located in the intergenic and exon regions. The number of candidate specific SNP sites after establishment was 12 050 781. 544 717 SNPs were selected by VCFTOOLS. The top 1 500 SNPs of each chromosome were selected for PCA, the PCA results showed that when the number of SNP was decreased from 49 500 to 100, the discrimination efficiency was not decreased. Finally, 100 SNPs with strong specificity and high stability were screened out.【Conclusion】 100 Tahe red deer-specific SNPs were obtained through whole-genome resequencing technology and bioinformatics analysis, which provided a theoretical basis for the identification of purebred Tahe red deer and the screening of core germplasm.

Key words: Tahe red deer; resequencing technology; SNPs; screening

中图分类号: