中国畜牧兽医 ›› 2021, Vol. 48 ›› Issue (5): 1664-1671.doi: 10.16431/j.cnki.1671-7236.2021.05.017

• 遗传繁育 • 上一篇    下一篇

基于荷斯坦牛群体基因组数据填充软件的准确性比较(Minimac 3与Beagle 5.1)

罗汉鹏1, 窦金焕1, 安涛1, 陈少侃2, 王雅春1   

  1. 1. 中国农业大学动物科学技术学院, 北京 100193;
    2. 北京首农畜牧发展有限公司, 北京 100029
  • 收稿日期:2020-11-06 出版日期:2021-05-20 发布日期:2021-05-20
  • 通讯作者: 王雅春 E-mail:wangyachun@cau.edu.cn
  • 作者简介:罗汉鹏(1993-),男,江西南昌人,博士生,研究方向:分子数量遗传学,E-mail:hanpengluo@qq.com
  • 基金资助:
    现代农业(奶牛)产业技术体系建设专项资金(CARS-36);长江学者和创新团队发展计划(IRT_15R62);农业品种改良提升专项(2130135)

Comparison of Software (Minimac 3 and Beagle 5.1) for Genomic Imputation Using Holstein Cow Population

LUO Hanpeng1, DOU Jinhuan1, AN Tao1, CHEN Shaokan2, WANG Yachun1   

  1. 1. College of Animal Science and Technology, China Agricultural University, Beijing 100193, China;
    2. Beijing Sunlon Livestock Development Co., Ltd., Beijing 100029, China
  • Received:2020-11-06 Online:2021-05-20 Published:2021-05-20

摘要: 为探究基因组数据填充软件准确性的影响因素和展示填充具体过程,本研究使用两款主要填充软件Beagle 5.1和Minimac 3对奶牛基因组50K芯片数据进行填充至150K,使用个体的填充结果和真实数据进行填充一致性计算,比较两软件的填充准确性和一致性的差异及其主要影响因素。研究结果表明,Minimac 3软件需要使用其他软件进行基因定向后再进行填充,而Beagle 5.1软件可同时进行基因定向和基因组填充。Beagle 5.1与Minimac 3软件填充一致性的相关系数为0.98;Beagle 5.1软件平均填充的准确性(r2)为0.9841,一致性为0.9914,填充准确性与一致性的相关系数为0.39;Minimac 3软件平均填充的准确性为0.9782,一致性为0.9911,填充准确性(r2)和一致性的相关系数为0.36。由于软件计算填充准确性原理问题,填充的准确性(r2)受最小等位基因影响较大。填充的一致性在最小等位基因频率和位点杂合度上升时均呈下降趋势,当位点杂合度>0.6时显著下降(填充一致性低于0.8),但Beagle 5.1软件的填充效果在相同的最小等位基因频率和杂合度下均优于Minimac 3软件。本研究发现填充准确性(r2)受填充位点的杂合度影响较大,而Beagle 5.1软件进行基因组数据填充的准确性更高,基因组数据填充后使用填充一致性作为填充准确性的判断标准可避免删除过多有效填充位点。

关键词: 荷斯坦牛; 群体; 基因组填充; 准确性

Abstract: The aims of current study were to show the process of genomic imputation and investigate the factors affecting accuracy of genomic imputation.The data of 50K panel imputed to 150K for dairy cattle was used to compare accuracy and concordance of imputation for two software (Beagle 5.1 and Minimac 3).Concordance was calculated by cross validation of individuals with imputed data and real data.The target population for imputation should be phased by Minimac 3 and the function of Beagle 5.1 including pre-phasing and imputation.The correlation of concordance between Minimac 3 and Beagle 5.1 were 0.98.For Beagle 5.1,the average of imputation accuracy (r2) and concordance were 0.9841 and 0.9914,respectively,and the correlation between imputation accuracy and concordance was 0.39.For Minimac 3,the average of imputation accuracy (r2) and concordance were 0.9782 and 0.9911,respectively,and the correlation between imputation accuracy (r2) and concordance was 0.36.Imputation accuracy (r2) was associated with minor allele frequency due to the formula for calculating accuracy from the software.With the increasing of minor allele frequency and heterozygosity for makers,the concordance of imputation was decreased.There was a steep decline when heterozygosity was higher than 0.6 (concordance of imputation was lower than 0.8).However, the accuracy of Beagle 5.1 software was better than that of Minimac 3 software under the same minor allele frequency and heterozygosity of imputed site. The accuracy of imputation (r2) was mainly affected by heterozygosity of SNPs and Beagle 5.1 had better performance on imputation accuracy than that of Minimac 3. Using concordance as the accuracy of imputation to select SNPs could avoid losing useful makers for further study.

Key words: Holstein cow; population; genomic imputation; accuracy

中图分类号: