Makine öğrenmesi yöntemlerini kullanarak bir petrokimya firmasının hisse senedi fiyat tahmini = Stock price prediction of a petrochemical company using machine learning methods

Toprak, Şevval

DSpace Home
→
Enstitüler / Institutes
→
Fen Bilimleri Enstitüsü / Instıtute of Scıence and Technology
→
Tez Koleksiyonu
→
2023 Yüksek Lisans Tezleri Koleksiyonu
→
View Item

dc.contributor.advisor	Doçent Doktor Gültekin Çağıl
dc.date.accessioned	2023-06-20T08:31:21Z
dc.date.available	2023-06-20T08:31:21Z
dc.date.issued	2023
dc.identifier.citation	Toprak, Şevval. Makine öğrenmesi yöntemlerini kullanarak bir petrokimya firmasının hisse senedi fiyat tahmini = Stock price prediction of a petrochemical company using machine learning methods. (Yayınlanmamış Yüksek Lisans Tezi). Sakarya Üniversitesi, Fen Bilimleri Enstitüsü, Sakarya
dc.identifier.uri	https://hdl.handle.net/20.500.12619/101167
dc.description	06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.
dc.description.abstract	Günümüzde borsalar önemli bir yatırım aracı halini almıştır. Yatırımcılar deneyimlerine ve sezgilerine güvenerek yatırım portföylerini oluşturup gelir elde etmeye çalışmaktadırlar. Ancak borsadaki hisse senedi ve endekslere etki eden pek çok etken bulunmaktadır. Bu da onların tahminini zorlaştırmaktadır. Bu nedenle literatürde zaman serisi tahminlerinde kullanılan yöntemler hisse senedi ve endeks değerlerinin tahmininde de kullanılmaya başlanmıştır. Özellikle son yıllarda yapılan hisse senedi tahmin çalışmalarında geleneksel yöntemlerinden çok yapay zeka yöntemlerinin önem kazandığı görülmektedir. Bu çalışmada, Türkiye, Rusya ve Katar'daki önemli petrol endeksi ve hisse senedi fiyatlarını içeren bir veri seti kullanılarak Petkim Petrokimya Holding A.Ş.'nin hisse senedi (PETKM) kapanış fiyatı tahmini yapılmıştır. Rassal Orman Regresyonu (RFR), Uzun Kısa Süreli Bellek (LSTM) ve Evrişimli Sinir Ağı+Uzun Kısa Süreli Bellek (CNN+LSTM) yöntemleriyle tahmin edilerek sonuçları karşılaştırılmıştır. Tahmin sonuçlarının başarısını ölçmek için Ortalama Karesel Hata (MSE), Kök Ortalama Karesel Hata (RMSE), Ortalama Mutlak Hata (MAE), Ortalama Mutlak Yüzde Hatası (MAPE) gibi hata metrik yöntemleri ve Determinasyon Katsayısı (R2) kullanılmıştır. Çalışmada kullanılan bu 3 tahmin yönteminin kullanılıp sonuçlarının karşılaştırıldığı bir çalışmaya literatürde rastlanmamıştır. Tahmin işlemi için 4 farklı girdi modeli oluşturulmuştur. 1. girdi veri setinde (Model 1), Dolar açılış ve ortalama alış/satış kur fiyatı; Türkiye'nin hisse senedi ve endeks fiyatları, 2. girdi veri setinde (Model 2), Dolar fiyatı açılış ve ortalama alış/satış kur fiyatı, Türkiye'deki ve Katar'daki hisse senedi-endeks fiyatları, 3. girdi veri setinde (Model 3), Dolar fiyatı açılış ve ortalama alış/satış kur fiyatı, Türkiye'deki ve Rusya'daki hisse senedi ve endeks fiyatları, 4. girdi veri setinde (Model 4) Dolar açılış ve ortalama alış/satış kur fiyatı, Türkiye, Katar ve Rusya'daki hisse senedi-endeks fiyatları kullanılmıştır. Çalışmada farklı ülkelerden veriler bulunduğundan borsaların işlem günleri farklılık göstermektedir. Bu nedenle eksik verilerin ön işlemesi için 4 farklı yöntem kullanılmıştır. Bunlardan 1'incisi eksik verinin bulunduğu tarihin veri setinden çıkarılması, 2'ncisi eksik verilerin yerine 0 yazılması, 3'üncüsü eksik verinin ortalama ile doldurulması ve 4'üncüsü eksik verinin doğrusal enterpolasyon yöntemi ile doldurulmasıdır. Model 1'deki veriler bu 4 farklı yöntemle doldurulmuş, veri setleriyle tahmin işlemi yapılmıştır. Çalışmadaki veriler için en iyi sonucu veren yöntemin doğrusal enterpolasyon yöntemi olduğu görülmüştür. En iyi soncu veren eksik veri doldurma yöntemi diğer modeller için de uygulanmıştır. Çalışmanın amacı çalışmada kullanılan veriler için en doğru sonucu veren algoritmayı, algoritmalarda kullanılan hiperparametrelerin tahmin başarısına etkisini ve Türkiye'nin petrol temin ettiği ülkelerin PETKM hisse senedi fiyatı tahmininde bir etkisi olup olmadığını gözlemlemektir. Çalışmanın sonucunda RFR algoritmasında tahmin sonucuna en çok etki eden hiperparametrenin eğitim ve test veri seti boyutu olduğu, LSTM algoritmasında tahmin sonucuna en çok etkisi olan hiperparametrenin rastgele öğrenme durumu (Shuffle) olduğu, CNN+LSTM algoritmasında tahmin sonucuna en çok etkisi olan hiperparametrelerin havuzlama boyutu, öğrenme aralığı ve çekirdek boyutu olduğu görülmüştür. Oluşturulan bu 4 farklı girdi modeli karşılaştırıldığında Rusya ve Katar'daki hisse senedi ve endeks fiyatlarının PETKM hisse senedi kapanış fiyatını tahmin etmede anlamlı bir etkisinin olmadığı sonucuna varılmıştır. Algoritmaların başarısı karşılaştırıldığında genellikler en iyi sonuçları veren algoritmanın LSTM olduğu görülmüştür. Çalışmanın Bulgular ve Tartışma bölümünde tüm karşılaştırılmalar detaylı bir şekilde yapılmış ve incelenmiştir. Elde edilen sonuçlar ve ileride yapılacak çalışmalar için öneriler sonuç bölümünde paylaşılmıştır.
dc.description.abstract	Today, stock markets have become an important investment tool. Investors rely on their experience and intuition to create their investment portfolios and try to generate income. However, there are many factors that affect stocks and indexes in the stock market. This makes their prediction difficult. For this reason, the methods used in time series estimations in the literature have also started to be used in the estimation of stock and index values. While estimations in this area were made with traditional methods in the early days, it is seen that artificial intelligence methods have gained importance rather than traditional methods in stock estimation studies carried out in recent years. This is because artificial intelligence methods are intelligent algorithms that can think, make decisions, interpret and make inferences. In this study, the closing price of Petkim Petrokimya Holding A.Ş.'s stock (PETKM) closing price is estimated by using a data set including dollar price, important oil index and stock prices in Turkey, Russia and Qatar. For the estimation process, 4 different input data sets were created. In the 1st input dataset (Model 1), USD/TRY opening and average buying/selling exchange rate; the opening and closing prices of Turkish stocks (PETKM, TUPRS-Türkiye Petrol Rafinerileri A.Ş.-) and the opening and closing prices of Turkey's sectoral petroleum index (XKMYA -BIST Chem Petrol Plastic-) are used. USD/TRY opening and average buying/selling exchange rate; the opening and closing prices of Sectoral petroleum index-stocks in Turkey (XKMYA, PETKM, TUPRAS), the opening and closing prices of sectoral petroleum index in Qatar (QECON- QE Consumer Goods & Services-) and the opening and closing prices of stock in Qatar (QFLS- Qatar Fuel Co-) are contained in 2nd input dataset (Model 2). 3rd Input data set (Model 3) consists of the USD/TRY opening and average buying/selling exchange rate; the opening/closing prices of sectoral oil index-stocks in Turkey (PETKM, TUPRS, XKMYA), the opening/ closing prices of Russia's sectoral oil indexes (MOEXOG-MOEX Oil & Gas and RTSOG -RTS Oil & Gas-) and the opening/ closing stock prices of Russia's leading oil company (LKOH - Lukoil-). The USD/TRY opening and average buying/selling exchange rate; the opening/closing prices of Sectoral petroleum index-stocks in Turkey (XKMYA, PETKM, TUPRAS), the opening/ closing prices of sectoral petroleum index-stock in Qatar (QECON, QFLS) and the opening/ closing prices of sectoral petroleum index-stock in Russia (MOEXOG, RTSOG, LKOH), are contained in 4th input dataset (Model 4). The data of these features between 03.01.2010-31.12.2020 were used. Since the data set includes data from foreign countries, the days when stock markets do not trade are changing. Four different methods were applied to the data set to fill in the missing data. One of them is to replace the missing data with 0, another is to remove the missing data from the data set, another is to use the average of the data 1 day before and after the missing data, and the last one is to fill in the missing data by linear interpolation method. In this way, estimation results were observed in Model 1 by applying these methods to the data set. In general, the data filled with the linear interpolation method gave the most successful results. For this reason, in order to avoid data loss, data filling was performed with the Linear Interpolation method for the dates except for the dates in which there is no data in all data properties. In addition, since not all feature values in the data set are within a certain value range, all features are normalized. Max-Min Normalization method, which is the most used normalization method, was used. With normalization, the feature values are set to 0-1 range. Failure to do this will adversely affect the prediction algorithms and algorithm results. The data sets created are estimated with Random Forest Regression (RFR) which is a powerful machine learning method, Long Short Term Memory (LSTM) algorithm which is the most used method in time series estimations and Convolutional Neural Network + Long Short Term Memory (CNN+LSTM) which one of the deep learning hybrid methods that have been used recently. These 3 methods were applied to all 4 data sets. Error metric methods such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and Coefficient of Determination (R2) were used to evaluate the estimation values. The Hyparpameter Tuning method was used for the optimization of the error results. The values of the hyperparameters provide different results in the data. For this reason, in the tests with high error results, experiments were made by changing the hyperparameter of the algorithm. Since these algorithms learn different features each time they run, they give different results. Algorithms were run at least 10 times in order to stabilize the results. Generally, while there are large differences between the evaluation results in the first runs, similar values are obtained in the next runs. In the implementation of the study, Python 3.7 writing language and Spyder 3 editor were used. Python's Pandas, SciKit Learn, Keras, Numpy, Math and Matplotlib libraries were used to create the algorithms. The machine on which the algorithms used are run has 8.00GB RAM, Intel(R) Core(TM) i7-4510U CPU processor and Windows 10 Pro 64 bit operating system. The data were kept in MS Excel 2013 program. The excel format used in the study is csv. The aims of this study are: 1- To observe whether the sectoral oil indexes and stock prices of the countries from which Turkey supplies oil have an effect on the estimation of the stock (PETKM) of Petkim Petrokimya Holding A.Ş., which has an important role in the petrochemical industry of our country, 2- Comparing the methods used in estimation, finding and interpreting the most appropriate estimation method, 3- To find the effect of the hyperparameter of these algorithms on the prediction results. In the study, data set splitting (training data set size, test data set size), number of trees, maximum number of features hyperparameters for the RFR algorithm were tested and compared. It has been seen that the most influential hyperparameter on this algorithm is the size of the training and test data sets. If the training data set contains less data and the test data set contains more data, the algorithm has been largely unsuccessful. Other hyperparameters showed less effect on estimation results compared to training and test dataset size. For LSTM algorithm, dataset splitting (training dataset size, validation dataset size, test dataset size), epoch, number of layers, number of units in layers and threshold value, batch size, activation function, random learning situation, optimization function and learning range hyperparameters are used. Among these hyperparameters, it has been observed that the most influential hyperparameter on the prediction values is the random learning situation. This hyper parameter provides random learning of the given data in the training data set while running the algorithm. The values of this parameter are selected as Yes and No. If it is selected as Yes, it has been observed that it gives mostly accurate and consistent results compared to the case of No in all tests. Dataset splitting (training dataset size, validation dataset size, test dataset size), epoch, number of layers, number of units in layers, threshold value and pool size, kernel size, optimization function, learning interval, batch in CNN+LSTM algorithm dimension, random learning situation hyperparameters were used. It was observed that the most influential hyperparameters among these hyperparameters were the maxpooling size, learning rate and kernel size hyperparameter. The success of this algorithm is generally similar to that of LSTM. When the estimation results obtained from the algorithm for the 4 data sets created were evaluated, it was seen that Model 1 was the model that gave the best result according to most of the results in the tests. Model 2 and Model 3's prediction success rates are close to Model 1. However, Model 4 performed less well than the other 3 data sets. For this reason, it is seen that the oil indexes and stocks related to Qatar and Russia used in the models do not make a difference in the algorithm estimation for PETKM stock closing price. In the Findings and Discussion section of the study, all comparisons were made and examined in detail. The results obtained and suggestions for future studies are shared in the conclusion section.
dc.format.extent	xxvi, 130 yaprak : şekil, tablo ; 30 cm.
dc.language	Türkçe
dc.language.iso	TUR
dc.publisher	Sakarya Üniversitesi
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.rights.uri	info:eu-repo/semantics/openAccess
dc.subject	Endüstri ve Endüstri Mühendisliği,
dc.subject	Industrial and Industrial Engineering
dc.title	Makine öğrenmesi yöntemlerini kullanarak bir petrokimya firmasının hisse senedi fiyat tahmini = Stock price prediction of a petrochemical company using machine learning methods
dc.type	masterThesis
dc.contributor.department	Sakarya Üniversitesi, Fen Bilimleri Enstitüsü, Endüstri Mühendisliği Anabilim Dalı, Endüstri Mühendisliği Bilim Dalı
dc.contributor.author	Toprak, Şevval
dc.relation.publicationcategory	TEZ