Document Type

Article

Publication Date

1-1-2018

Publication Title

PLoS One

Abstract

This study evaluates and compares the performance of different machine learning techniques on predicting the individuals at risk of developing hypertension, and who are likely to benefit most from interventions, using the cardiorespiratory fitness data. The dataset of this study contains information of 23,095 patients who underwent clinician- referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 10-year follow-up. The variables of the dataset include information on vital signs, diagnosis and clinical laboratory measurements. Six machine learning techniques were investigated: LogitBoost (LB), Bayesian Network classifier (BN), Locally Weighted Naive Bayes (LWB), Artificial Neural Network (ANN), Support Vector Machine (SVM) and Random Tree Forest (RTF). Using different validation methods, the RTF model has shown the best performance (AUC = 0.93) and outperformed all other machine learning techniques examined in this study. The results have also shown that it is critical to carefully explore and evaluate the performance of the machine learning models using various model evaluation methods as the prediction accuracy can significantly differ.

Comments

© authors, Creative Commons Attribution License 4.0

Medical Subject Headings

Adolescent; Adult; Aged; Aged, 80 and over; Area Under Curve; Bayes Theorem; Cardiorespiratory Fitness; Databases, Factual; Exercise Test; Female; Humans; Hypertension; Machine Learning; Male; Middle Aged; Neural Networks (Computer); Support Vector Machine; Young Adult

PubMed ID

29668729

Volume

13

Issue

4

First Page

0195344

Last Page

0195344

Share

COinS