Makine öğrenimi, bilim ve endüstrinin yönlendirici güçlerinden biridir; ancak büyük verinin oluşmasıyla birlikte, bu muazzam veri hacmini işleyebilmek için geleneksel makine öğrenimi algoritmalarında değişiklik yapma ihtiyacı ortaya çıkmaktadır. Bu veri setlerinin boyutu ve karmaşıklığı, geleneksel veri işleme yöntemleri yardımıyla ele alınmasını zor bir hale getirmiştir. İşte burada büyük veri analitiği teknolojileri devreye girmektedir. Büyük veri analitiği teknolojileri sayesinde, yüksek hacimli sağlık veri kümelerinin mevcudiyeti olması durumunda dahi, istenilen hızda ve performansta çeşitli tıbbi durumlar doğru bir şekilde teşhis edilebilmektedir. Büyük veri analitiği teknolojileri, veri setlerinin içindeki değerli bilgileri ortaya çıkarmak için çeşitli analitik yöntemler kullanmaktadır. İstatistiksel analiz, veri madenciliği, makine öğrenimi ve yapay zeka gibi teknikler, bu büyük veri setlerinden anlamlı sonuçlar elde etmeyi sağlamaktadır. Bunların yanında, büyük veri setlerini paralel ve dağıtık bir şekilde işleyerek hızlı analizler yapılmasını da sağlamaktadır. Büyük veri analitiği teknolojilerinin sağladığı bu yetenekler, birçok alanda büyük bir etki yaratmıştır. Sağlık alanı da bu alanlardan biridir. Çalışmanın birincil amacı, performansı optimize edilmiş algoritmalar aracılığıyla hamile kadınlarda anksiyete ve depresyonla ilgili en zorlayıcı soruları belirlemektir. Bu sayede daha az soru ile daha kısa sürede sonuca ulaşmak amaçlanmaktadır. Bu çalışmanın bir sonraki hedefi, büyük veri akışında makine öğrenimi modellerini kullanarak hamile kadınlarda depresyon ve anksiyete için anlık bir uzaktan sağlık durumu tahmin sistemi oluşturmaktır. Bu hedef, Apache Spark büyük veri işleme motorunu temel almaktadır. Bu ölçeklenebilir sistemde uygulama, hastanın sağlık durumunu tahmin etmek için gebe kadınlardan veri almaktadır. Bu veriler işlenebilir hale getirilebilmesi adına ön işlem süreçlerinden geçirilmektedir. Performans adına yapılan bir dizi işlemden sonra bu veri kümesi için en iyi sonuçları üreten algoritmanın, Naïve Bayes makine öğrenimi algoritması olduğu tespit edilmiştir. Çalışmada ayrıca, anlık olarak sonuç üreten tahmin sistemi, Kubernetes kümeleme sistemine entegre edilmiştir. Böylece, oluşturulan ölçeklenebilir mimarinin işlem süresi açısından araştırılması sağlanmıştır. Geliştirilen büyük veri platformunun başarımı dikkate alındığında, hamile kadınlarda anksiyete ve depresyonu tespit etmek için kullanılan geleneksel yöntemler yerine, gerçek zamanlı olarak önemli doğruluk oranında ve hızda çalışan bilgisayar tabanlı sistemlerin kullanılabileceğini söylemek mümkündür.
One of the forces shaping driving science and industry is machine learning. This technology, which enables computers to learn from data instead of complex algorithms written by humans, has revolutionized many industries. Machine learning is a tool used to discover patterns and relationships on large data sets. This enables businesses to better understand their data, uncover confidential information and predict future trends. Machine learning also plays a large role in automated decision-making processes. In many areas, tasks that are difficult or time consuming for humans to do can be delegated to machines. For these reasons, machine learning is an increasingly important technology today. However, the prevalence of big data requires new uses for this large amount of data, which has different speeds than traditional approaches to applying machine learning techniques. Big data is a valuable concept that plays an important role in machine learning. Today, the rapidly increasing amount of data across various industries enables businesses to make better decisions, make predictions and better understand customer needs. Technologies used in big data analytics are needed to process this large amount of data under the necessary conditions. Big data analytics technologies play an indispensable role today to process and analyze large data sets and extract valuable information. Big data analytics tools enable businesses to deal with complex data masses, discover patterns and make better decisions. One of its key advantages is the data processing capacity of big data analytics tools. Large datasets are created quickly, often in high volume. Processing and analyzing this data may not be possible with traditional methods. Big data analytics tools enable rapid analysis by processing these large data sets in a parallel and distributed manner. In addition, it offers data visualization and reporting capabilities. Because large data sets are often complex and voluminous, it is important to make sense of and share data. Big data analytics technology provides better understanding of data by visualizing it with graphs, tables and interactive visuals. It also provides users with the ability to create shareable reports that summarize data with its reporting features. In summary, big data analytics technologies are important technologies that help businesses effectively manage, analyze and extract valuable information from large data sets. These technologies facilitate the processing and interpretation of big data with features such as data processing capacity, analysis capabilities, visualization and reporting features, forecasting and forecasting capabilities. There are many areas covered by big data, and one of the most critical areas is medicine, which requires rapid decision-making. Psychiatry, on the other hand, has an important place in the field of medicine as a branch focusing on mental health. Psychiatry, which examines mental, emotional and behavioral disorders, evaluates people's mental health, makes a diagnosis and applies appropriate treatment methods. Mental health diseases are health problems that deeply affect the lives of individuals and need to be treated carefully. If psychological disorders such as anxiety and depression occurring in the perinatal period, which includes the pregnancy process, are not detected on time, they will have negative effects on both the mother and the baby. The effects on the perinatal period and pregnant women are of great importance for the mental health of the society. For this reason, rapid diagnosis process is of great importance in psychiatry. Early diagnosis of mental health problems enables early initiation of appropriate treatment and better outcomes for patients. Although anxiety and depression, which are common in the perinatal period, are diseases known by physicians and whose negative consequences are better understood day by day, they are not sufficiently recognized, and even if recognized, adequate treatment is not applied. Anxiety and depression in the perinatal period need to be diagnosed and treated early, as it causes serious harm to the society. It is possible to solve this need with big data analytics systems. With big data analytics, it is possible to quickly process pregnant data and transform it into meaningful information. This can enable doctors, healthcare professionals and researchers to provide fast and accurate diagnoses, treatment plans and healthcare services. In our study, a big data analytics system, which can produce instant results in the diagnosis of anxiety and depression in women in the perinatal period, and which integrates scalable architecture thanks to its cluster structure, has been built. While creating this system, technologies such as Apache Kafka, Apache Spark, Kubernetes were used. Thus, it is aimed to minimize the potential harm caused by depression and anxiety in women in the perinatal period. The primary aim of this study is to identify the most influential questions on outcome related to anxiety and depression in pregnant women by extracting features through performance-optimized algorithms. In this way, it is aimed to reach the result in a shorter time with fewer questions. The next goal of this study is to find the machine learning algorithm that gives the most optimal result on these cleaned data. Its next goal is to create an instant remote health status prediction module for depression and anxiety in pregnant women, based on the Apache Spark big data processing engine, which focuses on using machine learning models in big data flow. Its final goal is to integrate the modules containing these purposes into the Kubernetes clustering system. In this study, first of all, the data was made ready for processing by passing through the preprocessing stages. There are sixty questions that pregnant women need to answer. The optimized selection-evolutionary feature selection algorithm, which determined this number as optimum, was reduced to sixteen. These; gestational week, baby's health problem, living place (urban or rural), communication with her spouse, education level, number of people living at home, working status, total income level, emotional support from her husband, presence of people to share her problems with, exercise status, educational status of her husband, whether there is a desired pregnancy, the sex of the baby, smoking status, and the chronic illness of the spouse. Various algorithms (Decision Tree, Naive Bayes, K-Nearest Neighbor (K-NN), Random Forest, Gradient Boosted Tree (GBT), Logistic Regression and Deep Feed Forward Neural Network (DFFNN)) has been run on these data to predict the health status of the pregnant woman. It is concluded that Naïve Bayes machine learning algorithm produces the best results with 90.8% and 81.71% accuracy and precision, respectively. All the work done was built on the Kubernetes cluster, and the infrastructure, which includes a scalable architecture suitable for big data that produces results about the disease, has been developed. By performing a performance test on this infrastructure, the effects of the increase in the number of executors on the result were observed. As a result of the performance tests, it was concluded that the spark executor, which is effective in parallelization, contributed positively in terms of speed. In addition, the effect of the increase in the number of executors on resource consumption was also examined, and CPU and memory resource usages were included in the diagrams. Considering the performance of the developed big data platform, it is possible to conclude that computer-based systems operating in real-time with significant accuracy and speed can be used to detect anxiety and depression in pregnant women, replacing traditional methods.