Hava kirliliği, son yıllarda küresel bir kriz olarak değerlendirilmektedir. Araştırmacılar hava kirliliği konusu üzerinde yoğun bir şekilde durmaktadır. Bunun sebebi, hava kirliliğinin insan sağlığı, bitki örtüsü ve su kaynakları için olumsuz etkilere sebep olmasıdır. Canlı yaşamı için önemli bir tehlike unusuru olan bu problem ile baş edilmesi için önlemler alınması gerekmektedir. Bu çalışmada hava kiriliği sorunu için altarnatif bir öneri olarak, hava kalitesi modellemesi ile hava kalitesinin tahmini ve analizi üzerinde durulmuştur.Hava kirliliği, LSTM (Long Short-Term Memory) algortiması kullanılarak modellenmiştir. Bu model ile geçmiş veriler kullanılarak gelecekteki hava kirliliği seviyelerinin tahmin edilmesi amaçlanmıştır. Çalışmada, hava kalitesi ölçüm parametrelerinin saatlik ölçümüne dayalı veriler toplanmıştır ve kirletici parametre olarak; PM10, NO2, NOX, CO kullanılmıştır. Ölçüm değerleri UHKİA (Ulusal Hava Kalitesi İzleme Ağı) tarafından kurulan, Sakarya Endüstri Meslek Lisesinin bahçesinde konumlanan, hava kirliliği ölçüm istasyonundan kaydedilen verilerden oluşmaktadır. Modellemede 2020 Ocak ayından, 2022 Eylül ayına kadar kaydedilmiş veriler kullanılmıştır. Hava kirliliği verilerinin toplanmasından sonra, veri ön işleme yöntemleri ile temizleme çalışması yapılmıştır. 2020 Ocak ve 2022 Eylül ayları arasında her bir parametre için, 24.282 adet veri olmak üzere toplam 97.128 adet veri elde edilmişitir. Veri ön işleme için kullanılan farklı iki yöntem ile iki ayrı veri seti elde edilmiştir. Bu iki adet veri seti arasında mukayeseler yapılmıştır. Elde edilen 97.128 veri ile LSTM algortiması kullanılarak bir model oluşturulmuş, eğitim verileri kullanılarak model eğitilmiştir. Elde edilen doğrulama verileri kullanılarak modelin performansı, hiperparametre optimizasyonu ile değerlendirilmiştir. Kullanılan model, daha sonra gelecekteki hava kirliliği seviyelerini tahmin etmek için kullanılabileceği görülmüştür. Önerilen modelin hava kirliliği tahmini için yüksek performansa sahip olduğu görülmüştür.
In recent years, air pollution has been recognized as a global crisis. Researchers are mainly focused on the issue of air pollution, as the problems caused by air pollution pose a vital danger. When studies related to air pollution are examined, their results generally indicate that the components in the air that are the source of vital functions are being disrupted, and understanding the consequences of this, as well as understanding that it is not possible to completely eliminate pollutant sources but that they can be reduced by stopping them through deep and detailed interdisciplinary studies, is crucial for human health. Air pollution poses a significant threat to human health, causing many diseases such as asthma, bronchitis, upper respiratory tract infections, and in some cases, life-threatening diseases. In addition, acid rain caused by air pollution has deep and disturbing effects on the ecosystem. Air pollution, in the broadest sense, means the chemical, physical, and biological changes in the quality of air due to substances that pollute the air. Air pollutants can arise as a result of natural cycles or as a result of activities necessary for life. Common air pollutants include particulate matter (PM10, PM2.5), carbon monoxide (CO), ozone (O3), nitrogen oxides (NO2, NOX), and sulfur dioxide (SO2). Each of these pollutants must be restricted from separate sources and evaluated separately. For this reason, the density of each air quality measurement station is measured at regular intervals, and threshold values are determined to pose the least threat to health. Additionally, air quality is determined based on measurement results. In recent years, air pollution modeling has been performed based on obtained data, and efforts have been made to facilitate the prediction of future data using various traditional methods, as well as to take measures and precautions against possible air pollution events. The predictions obtained as a result of these studies make it easier to take future precautions, while also allowing measurement errors that may arise due to any technical failure in daily data to be easily detected, and error analyses can be more accurately evaluated to reach effective results. It is necessary to pay attention to regular interval measurements of air quality data, as in the example of Sakarya. Furthermore, air pollution modeling based on measurement results and artificial intelligence, as well as the prediction of future data, will facilitate taking measures and precautions against possible air pollution events. The prediction data obtained as a result of these studies will facilitate future measurements. In addition, measurement errors that may arise due to any technical problem in daily data will be easily detected. Error analyses will be evaluated more accurately to achieve effective results. In this study, air quality prediction was made for Sakarya province. The air quality data was obtained from the official website of the Ministry of Environment and Urbanization, www.havaizleme.gov.tr. The study covered the period from January 2020 to September 2022. Air quality data was collected by measuring stations established under the National Air Quality Monitoring Network. In the study, one of the four air quality measurement stations in Sakarya was used. The station used in the study is Sakarya Central station, which is located on Sakarya Street near the Vocational High School. It is located 3.5 meters from the nearest residence and 0.5 meters from the road. This station was established to measure air pollution caused by traffic and has been continuously measuring since March 2013. Four parameters were used for the prospective estimation of air quality in the study, which include PM10, NO2, NOX, and CO. Hourly measurement results between January 2020 and September 2022 were used for each of the four parameters of the central station. There should be 97,128 data in total, 24,282 for each parameter, within the specified date range. However, partial data deficiencies have been observed due to sensor errors and similar situations. Data cleaning was performed using Python in the dataset. Two separate datasets were created by completing the missing values in the dataset with mean and nearest neighbor (kNN) methods. After data preprocessing, missing data in the dataset were detected. Missing data were filled using two pure methods. Then, the values in the dataset were observed, and the periods when the values of the pollutant parameters were the most intense and the periods when they were the lowest were evaluated separately. A violin graph was created for the datasets filled with the mean and kNN methods considering the MSE results. Based on this graph, a comparison was made between the mean and kNN methods. According to the obtained MSE results, it was observed that the distribution and average of the neighbor search algorithm were better than NOX. It has been observed that the NOX parameter gives better results when filled with the mean. Thus, the datasets to be used in the second stage were decided. The dataset filled with the mean method was used for NOX. For NO2, PM10, and CO, the dataset filled with the kNN algorithm was used. The dataset used in this study includes air quality data collected between January 2020 and September 2022, comprising a total of 97,128 samples, with 24,282 samples for each pollutant parameter. Mean and kNN methods were used to fill the missing values in the datasets, and a violin graph was created to compare the Mean and kNN methods based on the MSE results. According to the obtained MSE results, it was observed that the distribution and average of the kNN algorithm were better than that of NOX. On the other hand, the NOX parameter gave better results when filled with the mean method. Therefore, it was decided to use the mean method for NOX and the kNN algorithm for filling the missing values in NO2, PM10, and CO datasets in the second stage. After the data preprocessing, a total of 24,282 datasets were obtained for each pollutant parameter, and 80% (19,423 samples) of the train-split dataset were allocated for training, while the remaining 20% (4,859 samples) were reserved for testing. Upon examining the dataset, a strong relationship was observed between PM10 and the pollutants NO2, NOX, and CO. In the study, LSTM (Long Short Term Memory), one of the deep learning methods, was preferred for predictive analysis. Hyperparameter optimization was applied to increase the prediction performance of the LSTM used in the study. Correct hyperparameter selection for each parameter of air quality is provided by hyperparameter optimization. Random search, bayesian search, and hyperband search algorithms were adapted from hyperparameter search algorithms. For hyperparameter optimization for each pollutant, the RMSE, and MSE value was calculated. By examining the results obtained, hyperparameter selection was made. Hyberband for NOX, Bayesian for NO2, PM10, and hyperparameter for CO were found suitable. The hyperparameter values determined by the best search algorithm were used for retraining. Estimates created after this stage were evaluated separately. The LSTM method used for forecasting is a preferred method in the forward forecasting of air quality. As a result, it is seen that the model used is a method with high estimation accuracy. In this context, it shows that the LSTM Algorithm can be widely used to predict air quality data. In addition, in order to draw attention to air pollution, it is thought that the study is the right way, and it is thought that if the estimation steps are expanded and developed, it can produce more efficient results.