Zeki sınıflandırma ve kümeleme yöntemlerinin tıbbi tanı ve tedavide kullanımı = The usage of intelligent classification and clustering methods in medical diagnosis and treatment

Kocamaz, Uğur Erkin

DSpace Home
→
Enstitüler / Institutes
→
Fen Bilimleri Enstitüsü / Instıtute of Scıence and Technology
→
Tez Koleksiyonu
→
Doktora Tezleri Koleksiyonu
→
View Item

Zeki sınıflandırma ve kümeleme yöntemlerinin tıbbi tanı ve tedavide kullanımı = The usage of intelligent classification and clustering methods in medical diagnosis and treatment

Kocamaz, Uğur Erkin

URI: https://hdl.handle.net/20.500.12619/102652

Date: 2024

Abstract:

Bu doktora tezinde, hastalıkların teşhis ve tedavisinde doktorların kararlarına destek olmak amacı ile makine öğrenmesi içeren sınıflandırma ve kümeleme algoritmaları kullanılmıştır. Önce, makine öğrenmesinin tıbbi tanı ve tedavideki yeri ve önemi hakkında bilgi verilmiştir. Sınıflandırma ve kümeleme aşamalarında kullanılan zeki algoritmalar anlatılmıştır. Bu kapsamda yapılmış bilimsel çalışmalar özetlenmiştir. Karar destek sistemleri, tıbbi karar destek sistemleri, tıbbi karar destek sistemlerinin özellikleri ve tıbbi karar destek sistemlerinin etkileri incelenmiştir. Sonra, dünyada en çok gerçekleştirilen cerrahi işlemlerden biri olan sezaryen ya da normal doğum kararı için önemli sınıflandırma algoritmaları Öğrenmeli Vektör Niceleme Sinir Ağları (Learning Vector Quantization Neural Networks – LVQNN), Olasılıksal Sinir Ağları (Probabilistic Neural Networks – PNN) ve Örüntü Tanıma Sinir Ağları (Pattern Recognation Neural Networks – PRNN) kullanılmıştır. Bu üç yöntem karşılaştırılmış, sonuçlar başarı yönünden değerlendirilmiştir. Kanser günümüzün en sık görülen ölümcül hastalıklarından biridir. Tümörlerin iyi huylu ya da kötü huylu olduğunun tespiti için kümeleme yapay sinir ağı algoritması Kendi Kendini Düzenleyen Haritalar (Self-Organizing Maps – SOM) ile veri seti kümelere ayrıldıktan sonra, sınıflandırma yapay sinir ağları LVQNN, PNN ve PRNN uygulanmıştır. Ayrıca, yine önemli yapay zeka kümeleme algoritmalarından Bulanık C-Ortalamalar (Fuzzy C-Means – FCM) ile de veri seti kümelere ayrıldıktan sonra, sınıflandırma yapay sinir ağıları LVQNN, PNN ve PRNN kullanılmıştır. SOM ve FCM değerlerinin girdi olarak ilave edildiği bir model daha uygulanmıştır. Bu kümeleme algoritmalarının sayesinde, nispeten daha düşük başarıda olan kanser sınıflandırmasının performansı artmıştır. Diyabet ve tiroid hastalıkları birçok organda hasara yol açabilen günümüzün en yaygın görülen kronik hastalıklarındandır. Erken tespiti önemlidir. Bu işlem için de LVQNN, PNN ve PRNN yöntemleri kullanılmış, oldukça başarılı sonuçlar elde edilmiştir. Mevcut araştırma, farklı makine öğrenmesi sınıflandırma yöntemleri kullanarak en uygun yöntem sayesinde tahminlerdeki doğruluğu iyileştirmeyi amaçlamaktadır. Bu çalışma tıbbi tanı ve tedavinin sınıflandırmasında LVQNN, PNN ve PRNN algoritmalarının etkinliğini ortaya koymaktadır. Her birinin farklı hastalıkta daha başarılı olduğu görülmektedir. Ayrıca, bu çalışma SOM ve FCM kümeleme algoritmalarının sınıflandırma işlemlerine dâhil edilmesinin olumlu katkı sağladığını göstermektedir.

In this thesis, classification and clustering algorithms including machine learning are used to assist the decisions of doctors in the diagnosis and treatment of diseases. First, information about the status and importance of machine learning techniques in medical diagnosis and treatment is given. The intelligent algorithms which are used for classification and clustering are explained. The scientific papers conducted in this area are summarized. Decision support systems, medical decision support systems, their features and effects are explained. Then, important neural classification algorithms, namely Learning Vector Quantization Neural Networks (LVQNN), Probabilistic Neural Networks (PNN) and Pattern Recognition Neural Networks (PRNN), are used for deciding on vaginal delivery or cesarean section, which is one of the most commonly performed surgical operations in the world. The results of these artificial neural network algorithms are compared and evaluated in terms of success. Nowadays, cancer is one of the most common fatal diseases. To determine whether the tumors are benign or malignant, first, the data set is divided into clusters with the clustering neural network algorithm Self-Organizing Maps (SOM), then the classification neural networks LVQNN, PNN and PRNN are applied. In addition, the data set is divided into clusters with Fuzzy C-Means (FCM), which is one of the significant intelligent clustering algorithms, then the classification neural networks LVQNN, PNN and PRNN are again used. Another model is applied in which SOM and FCM values are added as input to the classification neural networks. Thanks to these clustering algorithms, the accuracy of cancer classification has increased. Diabetes and thyroid diseases are among the most common chronic diseases of today which can cause damage to many organs. Their early detection is important. LVQNN, PNN and PRNN algorithms are used for these diseases and very successful results are obtained. During pregnancy and childbirth period, the baby and mother may face many risks, and the type of delivery method should be decided after regular monitoring processes. Cesarean section is preferred in cases where vaginal delivery poses a risk of morbidity or mortality for the baby or mother. Cesarean section is one of the most commonly performed surgical operations across worldwide. Early and correct caesarean section decision would help reduce the problems that may occur for the mother and the baby. The appropriate delivery method can be determined by using some intelligent methods. For this purpose, in this thesis, 3 different classification neural network techniques are used, namely LVQNN, PNN and PRNN. Each neural network technique is first trained and then tested with the Tabriz Health Center data set, which includes the childbirth results of 80 pregnant women and information about their age, delivery number, delivery time, blood pressure and heart status. The neural networks classify data into two categories: vaginal delivery or cesarean section. Figures and tables show that all of the neural networks provide successful results in classifying the delivery method. LVQNN model has classified the delivery method with only 2 errors in 20 test data and 10 errors in 60 training data. On the other hand, PNN and PRNN have produced the same results: a total of 66 (82.5%) results are classified correctly, 3 errors are made in the test set and 11 errors are made in the training set. Therefore, the LVQNN model gives better accuracy rates compared to the PNN and PRNN models. The results show that LVQNN is a better cesarean classifier, reaching 83.33% and 90% accuracy rates in the training and test phases, respectively. PNN and PRNN are capable of achieving 81.67% and 85% delivery method classification accuracy for the training and test data, respectively. Cancer, one of the important diseases of our age, is the uncontrolled and continuous proliferation of some cells in the body due to damage of DNA under the influence of environmental and genetic factors. After the cardiovascular diseases, cancer is the second disease that causes death. In recent years, early diagnosis can be achieved through cancer screenings. Thanks to advances in surgical and medical treatments, cancer has become more treatable disease and survival rates have increased. In this thesis, a prediction system for the diagnosis of breast cancer, which is the most common type of cancer for women, is applied with the artificial neural networks. "Mammography Mass Breast Cancer" dataset obtained from the UCI machine learning database is used, which consists of patient records of the Erlangen–Nuremberg University Hospital. Some neural network models have been proposed, trained and tested. During training, 2/3 of the 961 samples, that is 641, are used. The remaining 320 samples are used for test. 355 of the samples used in training are benign, 286 are malignant, and 161 of the samples used in the test are benign and 159 belong to the malignant class. In Model-I, the tumor is classified as benign or malignant with LVQNN, PNN and PRNN. The PNN with a spreading factor of 0.17 has produced the best results. The accuracy rate of this network is 525/641 = 81.9% for the training data and 274/320 = 85.63% for the test data. In Model-II and Model-III, first the SOM and FCM networks and the breast cancer data set are divided into 2 clusters (clusters A and B), since the optimum number of clusters is determined as 2 according to the Davies–Bouldin and Silhouette methods, and then each set is trained and tested with LVQNN, PNN and PRNN. In Model-II, PRNN and LVQNN have produced the most successful results for cluster A and B, respectively. For cluster A, the best results are obtained with a PRNN which has 46 neurons in its hidden layer and trained by the Levenberg–Marquardt learning algorithm. For cluster B, the best results are obtained with a LVQNN which has 34 neurons in its hidden layer and trained by the learnlv2 learning function. The total accuracy rate of these two networks is calculated as 526/641 = 82.06% for the training data and 279/320 = 87.19% for the test data. In Model-III, where FCM is used, PRNN and LVQNN have produced again the most successful results for cluster A and B, respectively. The total accuracy rate of these networks is calculated as 543/641 = 84.71% for the training data and 279/320 = 87.19% for the test data. In Model-IV, where SOM and FCM values are added as input to the classification neural networks, the same accuracy rate of 275/320 = 85.94% is obtained for the test data of LVQNN, PNN and PRNN. Model-IV-C (both SOM and FCM values are input) of LVQNN gives the most successful result for training data with 523/641 = 81.59% accuracy. To conclude, the proposed Model-III is the best model among the applied models for tumor classification as benign-malignant. Diabetes is one of the serious chronic diseases, whose prevalence is increasing all over the world, can be seen at all ages, although it is more common in adults, and can cause organ and function losses and even death as a result of complications in different systems. It is known that approximately 400 million people in the world have diabetes, this number is increasing rapidly every year, and it is estimated that millions of people are in the risk group for becoming diabetes. By identifying people in the diabetes risk group and starting the treatment early, damage to organs can be reduced. In this thesis, the Sylhet, Bangladesh early diabetes risk dataset obtained from the UCI machine learning database is used. This data set contains a total of 17 attributes of 520 patients. 320 people has got diabetes and 200 hasn't got. 2/3 of the data is used during training and the remaining 1/3 is used for test. Classification of early diabetes risk diagnosis is applied with LVQNN, PNN and PRNN. The PRNN, which has 49 neurons in its hidden layer and trained by the Levenberg–Marquardt learning algorithm, has produced the most successful test results. The accuracy rate of this network is obtained as 344/347 = 99.14% for training data and 169/173 = 97.69% for test data. LVQNN and PNN have performed the classification of test data with accuracy rates of 167/173 = 96.53% and 166/173 = 95.95%, respectively. Thyroid hormones secreted from the thyroid gland regulate the metabolic rate. It has effects on heart rate, blood pressure, blood lipids, appetite, digestive system, musculoskeletal system and nervous system. Thyroid hormone should be at a normal level for a healthy life. It is undesirable for the thyroid gland to work more or less. Due to the dysfunction of the thyroid gland, the functioning of many tissues and systems is negatively affected, and if left untreated, it may cause serious health problems. In this thesis, the thyroid function dataset consisting of 22 features of 7200 people from the Garavan Institute, Sydney, Australia, is used. It is obtained from the UCI machine learning database. 166 samples in this data set belong to the normal class, 368 to the hyperthyroidism class and 6666 to the hypothyroidism class. 2/3 of the data is used during training and the remaining 1/3 is used for test. Classification of the diagnosis of thyroid function is applied with LVQNN, PNN and PRNN. The PRNN, which has 3 neurons in the hidden layer and trained by the Bayesian learning algorithm, has produced the most successful test results. The accuracy rate of this network is obtained as 4780/4800 = 99.58% for training data and 2377/2400 = 99.04% for test data. LVQNN and PNN have performed the classification of test data with accuracy rates of 2228/2400 = 92.83% and 2203/2400 = 91.79%, respectively. This research aims to improve the accuracy of predictions using different machine learning classification algorithms. It shows the effectiveness of LVQNN, PNN and PRNN algorithms in the classification of medical diagnosis and treatment. Each of them appears to be more successful in different diseases. Furthermore, this thesis shows that including SOM and FCM clustering algorithms to classification processes make positive contributions.

Description:

06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.

Show full item record