Pertübasyon yöntemi ile hassas veri güvenliğine yönelik çok değişkenli veriler için tahmin analizi = Prediction analysis for multivariate data with respect to sensitive data security using the perturbation method

İlter, İlker

DSpace Home
→
Enstitüler / Institutes
→
Fen Bilimleri Enstitüsü / Instıtute of Scıence and Technology
→
Tez Koleksiyonu
→
2023 Yüksek Lisans Tezleri Koleksiyonu
→
View Item

dc.contributor.advisor	Doçent Doktor Safiye Turgay
dc.date.accessioned	2024-01-26T12:23:01Z
dc.date.available	2024-01-26T12:23:01Z
dc.date.issued	2023
dc.identifier.citation	İlter, İlker. (2023). Pertübasyon yöntemi ile hassas veri güvenliğine yönelik çok değişkenli veriler için tahmin analizi = Prediction analysis for multivariate data with respect to sensitive data security using the perturbation method. (Yayınlanmamış Yüksek Lisans Tezi). Sakarya Üniversitesi Fen Bilimleri Enstitüsü
dc.identifier.uri	https://hdl.handle.net/20.500.12619/101774
dc.description	06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.
dc.description.abstract	Veri madenciliği, büyük verilerin, nesnelerin interneti (loT) yaygın kullanımı ile imalattan, sağlığa, adaletten, bankacılığa kadar tüm sektörlerde veri gizliliği kavramını gündeme getirmektedir. Veri madenciliği, büyük verilerden ilginç ve önceden bilinmeyen bilgileri keşfetme sürecidir. Dolayısıyla verilerin işlenmesi sürecinde veri gizliğinin sağlanması gerekliliği de ortaya çıkmaktadır. Veri gizliliğinin korunması ile veri madenciliğinin gerçekleştirilmesi daha sağlıklı ve etkin bir ortamda veri analizi sağlayacaktır. Bu çalışmada statik veya dinamik ortamlardaki veri; pertürbasyon yöntemleri ile gizliliği sağlanacak, öncesi ve sonrası verilerin çok boyutlu yapısına dikkate alarak analizleri yapılacaktır. Teknolojinin gelişimi ile büyük veri kullanımı da artan bir hızla yaygınlaşmaktadır. Verilerin depolanması, analiz edilmesi ve gizliliğinin sağlanması konuları, geliştirilmesi gereken algoritma yöntemlerini de beraberinde getirmiştir. Gizlilik koruması veri bozulması, kayıt gizliliğinin korunması, veri tabanında yer alan değerlerin anlamını ve değişkenler arasındaki ilişkiyi bozmadan saklama tekniğidir. Yarı tanımlı ve hassas sayısal verilerin gizliliğinin korunmasına esas alan bu çalışmada pertürbasyon yöntemlerine yer verilmiştir. Pertürbasyon yöntemleri, verilerin gizliliğini korurken veri analizine olanak sağlamak için kontrollü gürültü veya rastgelelik eklemek için kullanılan matematiksel tekniklerdir. Rastgele yanıtlama, farklılaştırılmış gizlilik, güvenli çoklu taraf hesaplama, gürültü eklemesi, örnekleme ve birleştirme gibi çeşitli yöntemler, hassas bilgilerin ifşa edilmesini veya istismarını engellemek için kullanılır. Bu yöntemler, makine öğrenimi, istatistik ve kriptografi alanlarında veri gizliliğini sağlamak için başarılı bir şekilde uygulanmaktadır. Bununla birlikte, uygulama dikkatli bir şekilde tasarlanmalıdır, böylece veri doğruluğunu tehlikeye atmaz veya analize önyargı getirmez. Genel olarak, pertürbasyon yöntemleri çeşitli alanlarda veri gizliliğini koruma konusunda umut verici bir yaklaşım sunar. Çalışmada veri gizliliğinin bununla birlikte veri güvenliliğinin sağlanması için çok boyutlu rotasyona dayalı pertürbasyon ve rastgele üretilmiş gürültü ekleme mekanizmaları 4 veri seti, 6 sınıflandırma yöntemi yardımı ile analiz edilmiştir. Veri setleri sınıflandırma yöntemleri öncesi aykırı değerleri, boş değerleri farklı veri madenciliği yöntemleri yardımıyla değiştirilmiştir. Veri setlerindeki sayısal değerler normalizasyon yöntemleri ile, kategorik değerler tek çizgi ve sahte değişken kodlama yöntemleri ile sınıflandırma öncesi ön işlemeye tabi tutulmuştur. Dört veri seti, test ve eğitim verileri olmak üzere çapraz doğrulama yöntemi ile kümelere ayrılmıştır. Orijinal veri setleri lojistik regresyon (LR), k en yakın komşu (KNN), yapay sinir ağları (NN), destek vektör makinaları (SVM), gradyanı artıran karar ağaçları (XGBoost), hafif gradyanı artırılmış makinaları yöntemleri ile doğruluk değerleri hesaplanmıştır. Veri setlerindeki sayısal iki yarı tanımlayıcı değişken önce 110 derece döndürülerek sonra ortalamaları ve standart sapmaları nezdinde gauss gürültüsü eklenerek doğruluk değerleri f1 skorları hesaplanmıştır. Hesaplanan Friedman sıralama değerleri yardımıyla da sınıflandırma çıktıları kıyaslanmış en iyi sınıflandırma yöntemi dört veri seti için bulunmuş ve performans metrikleri yorumlanmıştır. Çalışmada geometrik pertürbasyon yöntemleri ile nümerik verilerin gizliliği korunmaya çalışılmış ve sınıflandırma yöntemleri ile de bu değişimin doğurduğu bilgi kaybının miktarı analiz edilmiştir. Çalışmanın ana katkısı, mahremiyeti koruyan güvenlik tahmini analizi için bir çerçeve önerisidir. Çerçeve, veri toplama, pertürbasyon teknikleri, analiz metodolojileri ve mahremiyetin korunmasının değerlendirilmesi dahil olmak üzere çok değişkenli hassas verilere pertürbasyon yöntemlerinin uygulanmasıyla ilgili adımları özetlemektedir. Çalışmanın bir diğer katkısı, güvenlik tahmini için verilerin faydasını korurken mahremiyetindeki etkinliklerini değerlendirerek farklı pertürbasyon yöntemlerinin karşılaştırmalı bir analizini sunmaktır.
dc.description.abstract	With the widespread use of big data, the Internet of Things (LoT), data mining brings the concept of data privacy to the agenda in all sectors from manufacturing, health, and justice to banking. Data mining is the process of discovering interesting and previously unknown information from big data. Therefore, the necessity of ensuring data confidentiality in the process of processing data also arises. Protecting data privacy and performing data mining will provide data analysis in a healthier and more effective environment. In this study, data in static or dynamic environments; Confidentiality will be ensured with perturbation methods, and before and after data will be analyzed by considering the multidimensional structure. With the development of technology, the use of big data is becoming widespread at an increasing pace. The storage, analysis, and confidentiality of data have brought about algorithm methods that need to be developed. Privacy protection is the technique of data corruption, protection of record confidentiality, and storage without disturbing the meaning of the values in the database and the relationship between variables. Perturbation methods are included in this study, which is based on the protection of the confidentiality of semi-defined and sensitive numerical data. Sensitive data analysis is crucial to ensuring that appropriate security measures are in place to protect the confidentiality and integrity of information. One approach to handling sensitive data is data perturbation methods, which involve adding random noise or making small changes to data to maintain privacy while allowing analysis. Perturbation methods are mathematical techniques used to add controlled noise or randomness to allow data analysis while maintaining the confidentiality of the data. Various methods such as random response, differentiated privacy, secure multi-party computing, noise addition, sampling, and aggregation are used to prevent the disclosure or abuse of sensitive information. These methods are being successfully applied in machine learning, statistics, and cryptography to ensure data privacy. However, the implementation should be carefully designed so that it does not compromise data accuracy or introduce bias into the analysis. Overall, perturbation methods offer a promising approach to protecting data privacy in a variety of areas. In the context of security prediction analysis with multivariate data, we can apply data perturbation methods to protect sensitive information while maintaining the usefulness of the data for analysis. Examples of standard techniques used in the perturbation process include: Random Response: This technique involves adding random noise to data by offering controlled randomness during data collection or analysis. It ensures that individual data points are not easily attributed to specific individuals, increasing confidentiality while maintaining the overall statistical characteristics of the dataset. Differential Privacy: Differential privacy provides a mathematical framework for adding interference to data in a way that guarantees confidentiality. By adding carefully calibrated noise, meaningful analysis of the aggregated data is allowed while maintaining the confidentiality of individual data points. Data Masking: In this approach, sensitive data is modified or transformed into less sensitive values. For example, instead of using the exact ages of individuals, you can use age ranges or categories to obscure the exact values. Synthetic Data Generation: Synthetic data generation involves creating artificial data sets that mimic the statistical properties of the original data. By generating synthetic data that is not directly linked to sensitive information, privacy can be maintained while enabling analysis. It is important to note that the choice of perturbation method depends on your analysis's specific requirements and the data's sensitivity. In addition, it is important to comply with legal and ethical rules, such as obtaining appropriate consent when working with sensitive information and adhering to data protection regulations. It proposes a privacy-preserving approach for security prediction analysis that uses perturbation methods on multivariate sensitive data. The goal is to protect the confidentiality and integrity of data while maintaining the usefulness of the information for security prediction purposes. Before the classification methods of the data sets, the outliers and the null values were replaced with the help of different data mining methods. Numerical values in the data sets were pre-processed by normalization methods and categorical values were pre-processed before classification by a single line and pseudo-variable coding methods. The four data sets are divided into sets by cross-validation, including test and training data. The accuracy values of the original data sets were calculated by methods of logistic regression (LR), k nearest neighbor (KNN), neural networks (NN), support vector machines (SVM), gradient increasing decision trees (XGBoost), slightly gradient increased machines. The accuracy values F1 scores were calculated by first rotating the two semi-descriptive variables in the data sets by 110 degrees and then adding Gaussian noise in terms of their means and standard deviations. With the help of the calculated Friedman ranking values, the classification outputs were compared, the best classification method was found for four data sets, and the performance metrics were interpreted. In the study, the confidentiality of numerical data was tried to be protected with geometric perturbation methods and the amount of information loss caused by this change was analyzed with classification methods. The main contribution of this study is the proposal for a framework for security prediction analysis that protects privacy. The framework outlines the steps involved in applying perturbation methods to multivariate sensitive data, including data collection, perturbation techniques, analysis methodologies, and assessment of privacy protection. It also provides a comparative analysis of different perturbation methods, assessing their effectiveness in protecting privacy while maintaining the usefulness of the data for security prediction. It discusses trade-offs between privacy and data use and provides insights into choosing the most appropriate perturbation method based on the specific requirements of the analysis. Privacy concerns surrounding sensitive data have led to the development of perturbation methods to protect individual privacy while allowing for meaningful analysis. This study aims to contribute to the field of security prediction analysis by recommending and evaluating perturbation techniques specifically tailored for multivariate sensitive data. The contributions of this research lie in the following areas: New Perturbation Techniques: Innovative perturbation methods designed to address the unique challenges of multivariate safety prediction analysis are discussed. By adding controlled randomness to these techniques, data masking approaches are used, offering different privacy guarantees, and synthetic data generation is used to ensure the confidentiality of sensitive data. Privacy-Benefit Swap Assessment: Includes a comprehensive evaluation of the privacy-benefit trade-offs associated with various perturbation methods. By quantitatively assessing the impact of each technique on both privacy protection and data use, researchers and practitioners can make informed decisions about the most appropriate method for their specific security prediction analysis needs. Practical Implementation Guidelines: Acknowledging the practical considerations for applying perturbation methods in real-world security forecasting systems, this document provides guidelines and best practices for the appropriate application of perturbation techniques. It handles data collection, perturbation algorithms, analysis methodologies, and the evaluation of privacy protection, ensuring that practitioners comply with legal and ethical standards when making accurate security predictions. Comparative Analysis: A comparative analysis of existing perturbation methods applied to multivariate sensitive data is conducted in the context of security prediction analysis. By comparing the performance, strengths, and limitations of these methods, and facilitating the selection of appropriate techniques for specific applications, researchers gain insight into their effectiveness and suitability for different scenarios. Future Research Guidelines: By exploring privacy-preserving techniques and security prediction analysis, artificial intelligence techniques, and machine learning algorithms that protect data privacy, addressing the challenges of evolving privacy regulations and investigating the impact of perturbation techniques on different security prediction models. The contributions of the study contribute to the advancement of security prediction analysis that protects privacy using perturbation methods on multivariate sensitive data. In data security situations involving sensitive information, researchers and practitioners can confidently analyze data while protecting individual privacy, thereby encouraging secure and responsible data-driven decision-making.
dc.format.extent	xxiv 95 yaprak : şekil, tablo ; 30 cm.
dc.language	Türkçe
dc.language.iso	tur
dc.publisher	Sakarya Üniversitesi
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.rights.uri	info:eu-repo/semantics/openAccess
dc.subject	Endüstri ve Endüstri Mühendisliği,
dc.subject	Industrial and Industrial Engineering
dc.title	Pertübasyon yöntemi ile hassas veri güvenliğine yönelik çok değişkenli veriler için tahmin analizi = Prediction analysis for multivariate data with respect to sensitive data security using the perturbation method
dc.type	masterThesis
dc.contributor.department	Sakarya Üniversitesi, Fen Bilimleri Enstitüsü, Endüstri Mühendisliği Anabilim Dalı, Mühendislik Yönetimi Bilim Dalı
dc.contributor.author	İlter, İlker
dc.relation.publicationcategory	TEZ