中国畜牧兽医 ›› 2022, Vol. 49 ›› Issue (7): 2534-2546.doi: 10.16431/j.cnki.1671-7236.2022.07.011

• 营养与饲料 • 上一篇    下一篇

基于机器学习算法的奶牛疾病预测模型的研究

李尚汝, 宋佳美, 张城瑞, 孙雨坤, 张永根   

  1. 东北农业大学动物科学技术学院, 哈尔滨 150030
  • 收稿日期:2021-11-20 出版日期:2022-07-05 发布日期:2022-06-29
  • 通讯作者: 张永根 E-mail:zhangyonggen@sina.com
  • 作者简介:李尚汝,E-mail:lisr1010@163.com。
  • 基金资助:
    国家现代农业产业技术体系

Study on Dairy Cow Disease Prediction Model Based on Machine Learning Algorithm

LI Shangru, SONG Jiamei, ZHANG Chengrui, SUN Yukun, ZHANG Yonggen   

  1. College of Animal Science and Technology, Northeast Agricultural University, Harbin 150030, China
  • Received:2021-11-20 Online:2022-07-05 Published:2022-06-29

摘要: 【目的】评估建立奶牛疾病预测模型的6种机器学习(machine learning,ML)算法的性能及预测变量的重要性。【方法】选取2020年12月至2021年11月,共计944头泌乳牛的生产信息、行为信息作为预测因子,疾病信息作为输出变量,训练并验证模型。将日产奶量、反刍量、活动量、胎次和泌乳天数作为输入变量,利用ML算法建立奶牛疾病的预测模型,评估决策树(Decision Tree,DT) C5.0、CHAID算法、人工神经网络(Artificial Neural Network,ANN)、随机森林(Random Forests,RF)、贝叶斯网络(Bayesian Networks,BN)和逻辑回归(Logistic Regression,LR)6种ML算法的性能,评估预测变量的重要性,以及将胎次和泌乳天数纳入预测变量后模型性能的改善情况。采用敏感性和特异性评估模型性能,按照权重排序评估输入变量对模型预测的重要性。【结果】DT C5.0算法敏感性>85%,特异性>90%,为性能最佳的模型;RF总敏感性为56.8%,对各类牛预测的性能较稳定;ANN、BN、DT CHAID则对样本量较多的疾病预测性能较好,可达74.4%;LR对病牛正确识别率不足40.0%,大多识别为健康牛。产奶量为RF、ANN、LR最重要的预测变量,泌乳天数为DT C5.0、CHAID和BN最重要的预测变量;纳入胎次和泌乳天数后,模型预测的敏感性平均提高9.8%。【结论】ML算法在对奶牛疾病的预测方面表现出很大潜力,其中,DT C5.0更适合用于预测奶牛疾病。产奶量和泌乳天数为疾病预测模型中相对重要的变量,此外,将胎次和泌乳天数纳入预测变量,可提高模型的预测精度。

关键词: 奶牛; 机器学习; 疾病预测

Abstract: 【Objective】 This study was aimed to evaluate 6 kind of machine learning (ML) algorithms which were used to establish a dairy cow disease prediction model, and the importance of predictors. 【Method】 The production information,behavior information and disease information of a total of 944 lactating cows from December 2020 to November 2021 were selected as predictors to train and validated the models.Daily milk production,rumination,activity,parity,and lactation days were used as input variables,machine learning algorithms were used to establish a dairy cow disease prediction model,6 machine learning algorithms including Decision Tree (DT) C5.0,CHAID algorithm,Artificial Neural Network (ANN),Random Forests (RF),Bayesian Networks (BN) and Logistic Regression (LR) were evaluated,the importance of predictors and the improvement of model performance by including parity and lactation days were assessed as predictors.Sensitivity and specificity were used to evaluate the performance of the models,and the importance of input variables for models predictions was evaluated according to the weight ranking.【Result】 The sensitivity of DT C5.0 algorithm was greater than 85%,and the specificity was greater than 90%,which was the model with the best performance.The total sensitivity of RF was 56.8%,and the prediction performance for various types of coe was relatively stable.ANN,BN and DT CHAID had better prediction performance for diseases with a large sample size,up to 74.4%.The correct identification rate of LR for sick cow was less than 40.0%,and most of them were identified as healthy cattle.The sum of daily milk production was the most important predictor of RF,ANN,and LR,and the number of days of lactation was the most important predictor of DT C5.0,CHAID and BN.After adding parity and lactation days,the sensitivity of the model's prediction was significantly improved.【Conclusion】 Using machine learning algorithms to predict dairy cow diseases has shown potential,and among them,DT C5.0 was a more suitable model.What's more,milk production and lactation days were relatively important variables in disease prediction models.In addition,including parity and lactation days as predictors could improve the accuracy of model prediction.

Key words: dairy cow; machine learning; disease prediction

中图分类号: