dc.description.abstract |
Talep tahmini, günümüz koşullarını da göz önüne aldığımızda talep üzerinde çeşitli etkenler olması sebebiyle işletmelerin kolay ve etkili kararlar alabilmesini sağlamaktadır. İşletmeler müşteri memnuniyetin arttırılması, iyi bir üretim planının gerçekleştirilmesi, kaynakların verimli kullanılabilmesi için güvenilir talep tahminleri gerçekleştirmeyi amaçlamaktadırlar. Belirsizlikler altında, gelecek için öngörüde bulunmak işletmeler için büyük avantajlar sağlayabilmektedir. Otomobil talebini etkileyen birçok faktör bulunmaktadır. Otomobil talebinin ekonomik, çevresel ve sosyal etkenlerin tümünden etkilenebilmesi sebebiyle tahmininin gerçekleştirilmesi kolay olmamaktadır. Gelecek tahmini yapmanın dışında, talep tahmini geçmişte yaşanan ve gelecekte de yaşanabilecek olan olay ve durumların yorumlanmasına fırsat tanımaktadır. Bu sebepler göz önüne alındığında otomobil talep tahminini gerçekleştirmek önemli olmaktadır. Nitel yöntemlerin dışında daha güvenilir sonuçlar elde edilmesi için matematiksel yöntemlere dayanan algoritmalar talep tahmininde kullanılmaktadır. Matematiksel yöntemler literatürde, sıklıkla kullanılan ve yüksek performans gösteren yöntemler olarak yer almaktadır. Özellikle son yıllarda yapılan çalışmalarda, talep tahmini üzerinde yapay zeka yöntemlerinin önemli bir yer tuttuğu görülmektedir. Bu çalışmada, otomobil talep tahmini uygulaması gerçekleştirilmiş olup makine öğrenmesi algoritmaları, yapay sinir ağları ve zaman serileri analizi yöntemleri tahmin performansları analiz edilmiş ve karşılaştırılmıştır. 2014-2022 yılları arasındaki veriler ile uygulama gerçekleştirilmiştir. Çalışmada, bağımlı veri olarak otomobil satış verileri kullanılmıştır. Bağımsız veriler ise otomobiller için ithalat miktar endeksi, reel kesim güven endeksi, tüketici güven endeksi, ortalama taşıt kredi faiz oranları, Türkiye otomobil üretim adetleri ve zaman olarak belirlenmiştir. Yöntemler ile daha yüksek performans elde edilmesi ve daha güvenilir tahminler yapılması için veri ön işleme uygulamaları gerçekleştirilmiştir. Makine öğrenmesi algoritmaları, yapay sinir ağları ve zaman serileri analizi için veriden güvenilir tahminler elde etmeyi zorlaştıran engellerden olan aykırı veriler için aykırı değer analizi gerçekleştirilmiştir. Çalışmada makine öğrenmesi yöntemlerinden Kategorik Arttırma Yöntemi (CatBoost), Gradyan Arttırma Yöntemi (Gradient Boosting), Rassal Karar Ormanları (Random Forest-RF), Destek Vektör Regresyonu (SVR) algoritmaları, Yapay Sinir Ağları (YSA) ve Zaman Serileri Analizi yöntemleri kullanılmış ve performans analizi gerçekleştirilerek yöntem karşılaştırması yapılmıştır. Sonuç olarak, 0,85 R2 değeri ile en yüksek performans gösteren yöntemin makine öğrenmesi yöntemlerinden CatBoost algoritması olduğu görülmüştür. Otomotiv sektörü, ülke ekonomisinin etkili sektörlerinden biri olmasının yanında, diğer sektörler için de tedarikçi konumundadır. Bu sebeple otomobil sektöründeki taleplere güvenilir cevap verilmesi bir gereklilik haline gelmektedir. Bu çalışmada da otomobil talebine doğru ve güvenilir bir yaklaşım sergileyen algoritmanın bulunması amaçlanmıştır. |
|
dc.description.abstract |
Considering today's conditions, demand forecasting enables businesses to make easy and effective decisions as there are various factors on demand. Companies are required to make and implement correct decisions in order to have a competitive edge. Demand forecasting plays a significant role in this aspect. Companies must adapt to rapidly changing demands and desires of consumers. With the advancement of technology, meeting demand, especially in the automotive sector, has become challenging for firms. Due to these reasons, analyzing data effectively and meeting customer expectations has become a necessity for companies. Businesses aim to make reliable demand forecasts to ensure customer satisfaction, realize a good production plan, and use capacity efficiency. Making predictions for the future under uncertainties can provide significant advantages for businesses. Demand forecasting is classified into short, medium, and long-term categories, applicable in daily plans, supply process plans, and large-scale plans requiring capital investment. Initiating demand forecasting involves identifying the dependent variable and the independent variables influencing it. Subsequently, data must be collected and analyzed. For the creation of high-performance models, thorough data analysis and working with complete data are crucial. In the data analysis stage, prediction periods need to be determined. To create an appropriate prediction model, the correct method must be selected. Finally, the model needs to be analyzed with data that it has not encountered before. There are numerous factors influencing automobile demand, making its prediction challenging due to the impact of economic, environmental, and social factors. Beyond predicting the future, demand forecasting allows for the interpretation of past and potential future events and situations. Given these reasons, predicting automobile demand becomes crucial. Mathematical algorithms based on machine learning methods, artificial neural networks, and time series analysis are used for demand forecasting to obtain more reliable results than traditional methods. Machine learning methods, artificial neural networks, and time series analysis are frequently used and high-performance techniques in the literature. In recent studies, artificial intelligence methods have played a significant role in demand forecasting. Artificial intelligence methods, including artificial neural networks and machine learning, are employed in various fields. These systems are built to mimic human intelligence, acquiring information, learning, and performing analysis through generalization. Artificial neural networks and machine learning methods are utilized not only for regression problems but also for solving classification problems. ANNs can be utilized in various studies, including regression, classification, and data correlation. ANNs consist of input, output, and hidden layers. The input layer transmits data received from the external environment to the hidden layer. The data is processed in the hidden layer, and inferences are drawn from this processed data. Subsequently, the data is forwarded to the output layer. The output layer communicates the processed data from the hidden layer to the external environment. Time series analysis is a demand forecasting method where past data is analyzed to gain insights into future periods. Time series comprises four components: trend, seasonality, cyclical, and irregular components. The trend component illustrates the linear increase or decrease in the time series over time. The seasonality component indicates the repeating pattern of the model at fixed time intervals according to seasons. Stationarity is an important concept in time series. Time series encompass trend, seasonality, and noise components. Therefore, in particular, the condition of stationarity is sought in the SARIMA method commonly used in studies. Time series are detrended from trend and seasonality. There are various methods to make time series stationary. One of these methods frequently used in the literature is differencing. The differencing method is divided into seasonal and non-seasonal differencing operations. If the time series exhibits seasonality, seasonal differencing is performed. If there is a trend within the time series, non-seasonal differencing is done. Unit root tests are applied to check the stationarity of the series Time series analysis provides more reliable results for short-term predictions. In this study, an automobile demand forecasting application has been implemented, and the prediction performances of machine learning algorithms, artificial neural networks, and time series analysis methods have been analyzed and compared. The application was conducted using data from the years 2014-2022. The dependent variable used in the study is automobile sales data, and independent variables include the import quantity index for automobiles, real sector confidence index, consumer confidence index, average vehicle loan interest rates, Turkey's automobile production quantities, and time. The import quantity index measures the change in automobile imports. The real sector confidence index is a measure indicating the trend in the manufacturing industry based on the views of top-level managers engaged in production. The consumer confidence index is an indicator showing consumers' financial situations and future spending. Vehicle loan interest rates play a significant role in automobile sales. Economic conditions are among the factors that most influence automobile sales. Data preprocessing applications were performed to achieve higher performance with the methods and make more reliable predictions. Outlier analysis was conducted for outliers, which pose challenges to obtaining reliable predictions, using the local outlier factor (LOF) method, a community learning approach for machine learning algorithms and time series analysis. The LOF method looks at the dataset as a whole rather than variable-wise, as variable-wise outlier analysis may be insufficient for multivariate datasets. During the LOF method, correlation coefficients of the data were compared, and a threshold value needed for the LOF method was determined. Selection of neighborhood coefficient and density parameter is crucial in the LOF method. After outlier analysis, standardization was applied to the data to bring them to the same scale before model creation. The goal is to make the variables comparable. For artificial neural network modeling, standardization was applied to the dataset using the z-score method. The mean of the data is set to zero, and the standard deviations are set to one. For time series analysis, outlier analysis was conducted using the box-plot method, a univariate outlier analysis method. Before creating models, dependent and independent variables of the dataset were separated. Subsequently, variables were split into training and test sets. In machine learning methods, the dataset was used as 80% training and 20% test data. For artificial neural networks, the dataset was divided into 65% training, 15% validation, and 20% test data. Different algorithms were tested in machine learning methods, and the best-performing algorithms were selected for further application. SARIMA method was used for time series analysis. Outlier analysis was performed, and then the dataset with seasonality was differenced to make it stationary. When determining the parameters of the SARIMA model, hyperparameter optimization, cross-validation for time series, and autocorrelation graph analysis were used. In the study, the Categorical Boosting Method (CatBoost), Gradient Boosting Method, Random Forest (RF), Support Vector Regression (SVR) algorithms from machine learning methods, Artificial Neural Networks (ANN), and the Seasonal Autoregressive Integrated Moving Average (SARIMA) method from time series analysis were used. A performance analysis was conducted, and a method comparison was made. CatBoost is an algorithm is which based on Gradient Boosting. It makes reliable predictions with a small number of data. It can handle numerical, categorical, and text data. It has a high learning rate and can cope with overfitting. Gradient boosting uses the entire dataset, initially forming a single tree and then creating subsequent trees to minimize the errors of the previous ones. Random Forest can be used for both classification and regression problems, forming multiple decision trees and obtaining a performance output from their averages. It provides more reliable results with datasets containing a large number of data points. Support Vector Regression is a machine learning algorithm based on Support Vector Machines, aiming to create a hyperplane encompassing data points and minimizing errors. For the evaluation of methods, determination coefficient and root mean square error performance metrics were used. As a result, the CatBoost algorithm was observed to have the highest performance among the machine learning methods. |
|