目的 筛选2型糖尿病患者群合并冠心病危险因素并建立风险分类模型,为临床辅助诊断提供有价值的参考。方法 通过重庆医科大学大数据平台收集出院时间为2014年1月1日至2019年12月31日行冠状动脉造影术的2型糖尿病患者944例,根据造影结果分为2型糖尿病合并冠心病715例(T2DM-CAD组)和2型糖尿病非冠心病229例(T2DM组)。采用倾向得分匹配法(Propensity Score Matching,PSM)均衡组间混杂因素的影响,匹配后T2DM-CAD组389例,T2DM组221例。使用单因素分析与Logistic回归筛选冠心病发病的危险因素。采用贝叶斯优化(Bayesian Optimization,BO)算法优化支持向量机(Support Vector Machine,SVM)模型、随机森林(Random Forest,RF)模型、极限梯度上升(eXtreme Gradient Boosting,XGB)模型和Logistic回归模型,并比较4种分类模型的分类性能。结果 共收集缺失值<30%的指标35项,单因素分析筛选出有统计学差异的指标20项。逐步向前Logistic回归筛选出11项危险因素,包括心率、吸烟、糖尿病肾病、血肌酐、甘油三酯、脂蛋白a、白蛋白、总胆红素、谷草转氨酶、糖化血红蛋白和尿糖。基于危险因素建立的分类模型中优化后的RF模型性能在5折交叉验证(F1值=0.711,AUC=0.811) 以及验证集(F1值=0.752,AUC=0.810)中表现最优。结论 建立了参数优化RF模型,可用于判断2型糖尿病患者是否合并冠心病,具有良好性能。
Abstract
Objective To screen the risk factors of coronary heart disease and establish a classification model of coronary heart disease in people with type 2 diabetes,so as to provide a valuable reference for clinical auxiliary diagnosis.Methods A total of 944 patients with type 2 diabetes mellitus who underwent coronary angiography on the big data platform of Chongqing Medical University were collected from Jan 1,2014 to Dec 31,2019. According to the results of the angiography,they were divided into 715 patients with type 2 diabetes and coronary heart disease (T2DM-CAD group),229 cases of type 2 diabetes without coronary heart disease (T2DM group).Propensity Score Matching (PSM) was used to balance the effects of confounding factors between groups.After matching,there were 389 cases in T2DM-CAD group and 221 cases in T2DM group.Univariate analysis and Logistic regression were used to screen independent risk factors of coronary heart disease.Bayesian Optimization (BO) algorithm was used to optimize Random Forest (RF) model,Support Vector Machine (SVM) model, eXtreme gradient boosting (XGB) model and Logistic regression model, and their classification performance was compared.Results Thirty-five indicators with missing values <30% were included,and 20 indicators with statistical differences were selected by univariate analysis.Eleven risk factors including heart rate,smoking,diabetic nephropathy, serum creatinine,triglycerides,lipoprotein a,albumin,total bilirubin,aspartate aminotransferase, glycosylated hemoglobin, and urine glucose were screened by stepwise forward Logistic regression.In the classification model established based on risk factors,the performance of the optimized RF model was the best in both the 5-fold cross validation (F1 value=0.711,AUC=0.811) and the validation set (F1 value=0.752,AUC=0.810).Conclusion In this study,a parameter optimized RF model with good performance was established to determine whether coronary heart disease patients with type 2 diabetes mellitus.
关键词
机器学习 /
2型糖尿病(T2DM) /
冠心病 /
诊断
{{custom_keyword}} /
Key words
machine learning /
type 2 diabetes mellitus(T2DM) /
coronary heart disease /
diagnosis
{{custom_keyword}} /
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] IDF.IDF DIABETES ATLAS (9th edition 2019)[EB/OL].https://www.diabetesatlas.org/data/en/country/42/cn.html.
[2] TOUSOULIS D,PAPAGEORGIOU N,ANDROUL-AKIS E,et al.Diabetes mellitus-associated vascular impairment:novel circulating biomarkers and therapeutic approaches[J].J Am Coll Cardiol,2013,62(8):667-676.
[3] GREGG EW,SATTAR N,ALI MK.The changing face of diabetes complications[J].Lancet Diabetes Endocrinol,2016,4(6):537-547.
[4] ZHENG Y,LEY SH,HU FB.Global aetiology and epidemiology of type 2 diabetes mellitus and its complications[J].Nat Rev Endocrinol,2018,14(2):88-98.
[5] 金学林,沈卫峰,陆林,等.2型糖尿病无症状性心肌缺血的研究进展[J].国际心血管病杂志,2008,35(3):154-158.
[6] 杨秀颖,张莉,陈熙,等.2型糖尿病周围神经病变机制研究进展[J].中国药理学通报,2016,(5):598-602.
[7] 《中国高血压防治指南》修订委员会.中国高血压防治指南2018年修订版[J].心脑血管病防治,2019,19(1):1-44.
[8] 曾玲,陆泽元,余颖,等.2型糖尿病合并无症状冠心病的危险因素分析[J].中国糖尿病杂志,2018,26(5):20-23.
[9] KOPIN L,LOWENSTEIN C.Dyslipidemia[J].Ann Intern Med,2017,167(11):81-96.
[10] 童国玉,朱大龙.糖尿病肾病国内外临床指南和专家共识解读[J].中国实用内科杂志,2017,37(3):211-2116.
[11] 钟文晖.尿糖、尿微量清蛋白联合检测在糖尿病早期肾损伤诊断中的临床价值[J].国际检验医学杂志,2016,37(3):403-404.
[12] ASTOR BC,CORESH J,HEISS G,et al.Kidney function and anemia as risk factors for coronary heart disease and mortality:the Atherosclerosis Risk in Communities (ARIC) Study[J].Am Heart J,2006,151(2):492-500.
[13] DI ANGELANTONIO E,DANESH J,EIRIKSDOTTIR G,et al.Renal function and risk of coronary heart disease in general populations:new prospective study and systematic review[J].PLoS Med,2007,4(9):e270.
[14] SALIM A,TAI ES,TAN VY,et al.C-reactive protein and serum creatinine,but not haemoglobin A1c,are independent predictors of coronary heart disease risk in non-diabetic Chinese[J].Eur J Prev Cardiol,2016,23(12):1339-1349.
[15] 倪丹,张玲玲,潘洪川,等.冠心病患者血清CRP、Hcy及心肌酶与冠脉狭窄程度的相关性研究[J].标记免疫分析与临床,2019,26(12):2048-2052.
[16] SHEN J,ZHANG J,WEN J,et al.Correlation of serum alanine aminotransferase and aspartate aminotransferase with coronary heart disease[J].Int J Clin Exp Med,2015,8(3):4399-4404.
[17] EVANS JM,OSTROW BH,POLIS GN,et al.Serum glutamic-oxalacetic transaminase in coronary artery disease; a review of 201 cases[J].Circulation,1956,14(5):790-799.
[18] MADAN SA,SINGAL D,PATEL SR,et al.Serum aminotransferase levels and angiographic coronary artery disease in octogenarians[J].Endocrine,2015,50(2):512-515.
[19] 刘志强,吴振军,杨刘顺,等.BNP联合心肌酶检测对冠心病危险分层和冠脉搭桥术疗效的预测作用[J].山东医药,2016,56(5):57-59.
[20] MURASE T,OKUBO M,AMEMIYA-KUDO M,et al.Impact of elevated serum lipoprotein (a) concentrations on the risk of coronary heart disease in patients with type 2 diabetes mellitus[J].Metabolism,2008,57(6):791-795.
[21] ARQUES S.Human serum albumin in cardiovascular diseases[J].Eur J Intern Med,2018,52:8-12.
[22] WANG J,WU X,LI Y,et al.Serum bilirubin concentrations and incident coronary heart disease risk among patients with type 2 diabetes:the Dongfeng-Tongji cohort[J].Acta Diabetol,2017,54(3):257-264.
[23] 李慧华,吕慧,陆建灿,等.糖尿病合并冠状动脉粥样硬化性心脏病患者冠状动脉病变程度与糖化血红蛋白及胆红素水平相关性分析[J].上海交通大学学报(医学版),2016,36(2):233-236.
[24] 梁东亮.老年冠心病合并高血压及糖尿病患者血压及血糖控制水平、影响因素及降压降糖药物应用现况调查[D].中国人民解放军医学院,2016.
[25] 龚军,杜超,钟小钢,等.基于机器学习算法的原发性高血压并发冠心病的患病风险研究[J].解放军医学杂志,2020,45(7):735-741.
[26] ALIZADEHSANI R,ABDAR M,ROSHANZAMIR M,et al.Machine learning-based coronary artery disease diagnosis:a comprehensive review[J].Comput Biol Med,2019,111:103346.
[27] 尹春燕.基于集成特征选择的冠心病筛查模型研究[D].山东大学,2019.
[28] 崔佳旭,杨博.贝叶斯优化方法和应用综述[J].软件学报,2018,29(10):3068-3090.
[29] CHAIR-KRISHNAPURAM BG,CHAIR-SHAH MG,CHAIR-SMOLA AP,et al.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,August 13-17,2016[C].New York:ACM,2016.
[30] CHERKASSKY V.The nature of statistical learning theory[J].IEEE Trans Neural Netw,1997,8(6):1564.
[31] 王奕森,夏树涛.集成学习之随机森林算法综述[J].信息通信技术,2018,12(1):49-55.
[32] WANG X,ZHAI M,REN Z,et al.Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier[J].BMC Med Inform Decis Mak,2021,21(1):105.
[33] 谈军涛,许晓梅,何雨芯,等.基于机器学习算法的肝硬化相关肝性脑病预测模型的构建[J].解放军医学杂志,2021,46(4):1-16.
[34] 杨弘,田晶,王可,等.混合型缺失数据填补方法比较与应用[J].中国卫生统计,2020,37(3):395-399.
{{custom_fnGroup.title_cn}}
脚注
{{custom_fn.content}}
基金
重庆市技术创新与应用发展专项面上项目(cstc2019jscx-msxmX0262);重庆医科大学智慧医学项目(ZHYX2019013,YJSZHYX202017)
{{custom_fund}}