ECG Beat classification: Impact of linear dependent samples
Sprache des Titels:
Englisch
Original Kurzfassung:
The Electro Cardio Gram (ECG) is a very valu-able clinical tool to access the electric function of the heart.It provides insight into the different phases of the heart beatand various kinds of disorders which may affect them. In lit-erature the impact of linear dependency between feature sig-nals upon the classification outcome and how to reduce it havebeen largely investigated and discussed. This study puts a fo-cus upon linear dependency between samples of imbalanceddata sets, its relation to the observed over fitting with respectto majority classes and hot to reduce it. A set of 58 featuresignals is used to train a several LDA classifier either discrim-inating 3 classes (Normal, Artefact, Arrhythmic) or 5 Classes(Normal, Artefact, Atrial and ventricular premature contrac-tions and bundle branch blocks). The training data set is pre-processed using four sample reduction approaches and a near-est neighbour clustering method. In the case of 5 classes ac-curacies of 96.82 % in the imbalanced case and 97.44 % forthe data preprocessed with the QR or SVD methods were ob-tained. For 3 classes curacies of 97.68 % and 98.12 % wereachieved. With the nearest neighbour clustering method onlyaccuracies of 96.00 % for 5 classes and 97.37 % for 3 classescould be achieved. The results clearly show that imbalancedECG data does contain linear dependent samples. These causea bias towards majority class which will be over fitted by theclassifier. Sample reduction methods and algorithms which arenot aware of the presence linear dependent samples like thenearest neighbour clustering approach even further increasethis bias ore even worse destroy relevant information by merg-ing samples which encode distinct aspects of the beat class,destroying relevant information.