Kural tabanlı şüpheli işlem önleme sistemlerinde kullanılmak üzere çizge veritabanı modeli önerisi = A graph database model proposal for use in rule based fraud transaction prevention systems

Demir, Bahadır Esad

DSpace Home
→
Enstitüler / Institutes
→
Fen Bilimleri Enstitüsü / Instıtute of Scıence and Technology
→
Tez Koleksiyonu
→
2024 Yüksek Lisans Tezleri Koleksiyonu
→
View Item

Kural tabanlı şüpheli işlem önleme sistemlerinde kullanılmak üzere çizge veritabanı modeli önerisi = A graph database model proposal for use in rule based fraud transaction prevention systems

Demir, Bahadır Esad

URI: https://hdl.handle.net/20.500.12619/102995

Date: 2024

Abstract:

Finansal dünyada şüpheli işlemlerin yöntemlerinde ve sayılarında her geçen gün artış olmaktadır. Kredi kartı işlemlerindeki şüpheli işlemler de bu yöntemlerden biridir. Kredi kartı şüpheli işlemleri, kredi kartı işlemlerinde yetkisiz veya aldatıcı yöntemlerin kullanıldığı ve başkasının kredi kartı bilgilerini kullanarak finansal kazanç elde etmeye çalışılan sahtekarlık faaliyetlerine denir. Artan kredi kartı işlemlerinin sayısıyla birlikte birçok banka ve kuruluş, sahtekarlık vakalarını tespit etmek ve önlemek amacıyla sistemler kullanmakta ve bu sistemleri geliştirecek çalışmalar yapmaktadır. Son zamanlarda, bu konuyla ilgili makine öğrenme algoritmalarını kullanan birçok yöntem önerilmiş olsada, kural tabanlı sistemler halen tercih edilmektedir. Kural tabanlı sahtekarlık tespiti ve önleme sistemlerinde meydana gelen vakalar işlem anında veya sonrasında analiz edilir ve işlemin şüpheli olup olmadığına bağlı olarak sistemdeki yetkililerin incelemesine yardımcı olmak ve gerektiğinde işlemleri engelleyebilmek için kurallar yazılır. Bu verilerin sonradan incelenerek analiz edilmesi sayesinde, sahtekarlık işlemleri ve tekrar eden şüpheli işlem desenleri tespit edilebilir ve kurallar bu doğrultuda zenginleştirilebilir. Kural tabanlı sistemlerde farklı veritabanı teknolojileri kullanılmaktadır. İlişkisel veritabanları ve doküman tabanlı veritabanları bu teknolojilere örnek olarak verilebilir. Bu sistemlerde asıl önemli olan görülemeyen, yapısal olarak belirtilmemiş ilişki ağlarını tespit edebilmektir. Diğer veritabanı teknolojilerine alternatif olarak çizge veritabanları da bu alanda kullanılmaktadır. Bu çalışmada öncelikle şüpheli işlemlerin nelere dendiğinden ve bu şüpheli işlemlerin türlerinden bahsedilmiş ve şüpheli işlem tespit ve önleme sistemlerinin özellikleri aktarılmıştır. Ardından çizge veritabanlarının özellikleri açıklanmış ve diğer veritabanı teknolojilerinden farklarına değinilmiştir. Günümüzde sektörde kullanılan çizge veritabanı uygulamalarından örnekler verilmiş ve bu çalışmada da kullanılan Neo4j uygulamasından detaylı bahsedilmiştir. Kaggle üzerinde paylaşılan bir bankanın kredi kartı işlemlerinin örnekleminden sentetik olarak üretilen bir kredi kartı sahtekarlık veri kümesini kullanarak bu veri seti açıklanmış, Neo4j modeline eklemeden önce incelemelerde faydası olacak şekilde tutar bilgisi aralıklı tutar gruplarına dönüştürülmüştür. Bu veri seti için bir model önerilmiş ve bu modele göre Neo4j uygulamasına aktarım gerçekleştirilmiştir. Son olarak önerilen model kullanılarak Neo4j üzerinde örnek sorgular gerçekleştirilmiştir. Bu sorguların çıktılarını analiz ederek kural tabanlı sistemlerdeki kural yazımlarında ek bir metrik ve kural sunulabileceği gösterilmiştir.

In the financial world, there is a continuous increase in both the methods and numbers of suspicious transactions. Various methods, such as identity theft, check fraud, ATM fraud, online banking fraud, online shopping fraud, and credit card fraud, contribute to this rise. This study specifically focuses on credit card fraud among these fraudulent methods. Credit card fraud involves fraudulent activities in credit card transactions where unauthorized or deceptive methods are employed. These activities encompass the theft of credit card information, the unauthorized creation of credit cards, the utilization of second-hand credit cards, and methods involving social engineering. With the increasing number of credit card transactions every day, many banks and organizations are developing systems to detect and prevent fraudulent cases. They integrate these systems and conduct efforts to enhance them. Systems developed to counteract suspicious transactions can be categorized into two main types: suspicious transaction detection systems and prevention systems. Suspicious transaction detection systems focus on identifying both past and ongoing suspicious transactions. On the other hand, suspicious transaction prevention systems not only aim to detect these transactions but also adopt an approach to prevent and block them in advance. While suspicious transaction detection systems serve as a security layer, prevention systems function as a security firewall, aiming to both identify and proactively thwart suspicious activities. Although various methods, such as machine learning, analytics, data mining, behavior analysis, and artificial intelligence, have been proposed for suspicious transaction prevention systems in recent times, many banks and organizations still prefer and use rule-based systems. In rule-based fraud detection and prevention systems, incidents are analyzed, and rules are written by authorized personnel based on whether the transaction is considered suspicious or not. These processes allow the prevention of transactions based on predefined rules. Additionally, through post-analysis, previously unnoticed methods can be detected, and rules can be enriched accordingly. The study also shares an example of a rule-based suspicious transaction prevention system model. These systems utilize relational databases, document-based databases, and recently popular graph databases as their database technology. Graph databases, by focusing on relationships, provide an advantage in revealing unseen networks in suspicious transaction analyses. In contrast to relational databases, where entities are stored in tables as rows and columns and results are obtained by combining different tables using 'join' operations during query time, graph databases store entities in a way where nodes are physically connected by edges. Unlike relational databases, there is no need for a join operation at query time in graph databases because the desired relationships are already represented as physical entities between nodes. The data size for graph databases does not pose a problem due to this structure, making them effective in handling large datasets. Consequently, they have started to be preferred in social networks, behavior analysis, and suspicious transaction detection and prevention systems. One of the most popular graph database applications is Neo4j. Neo4j assists users in conducting analyses and comprehending them through its user-friendly interface. The Cypher query language, developed by the same team for Neo4j, is used in Neo4j applications. This query language, specially designed for graph databases, allows for the creation of shorter and more understandable queries when querying complex relationships. In this study, the BankSim dataset, obtained from Kaggle and containing transaction data related to credit cards, has been used. The BankSim dataset is synthetically generated based on fraud models in customer payments for a bank and is shared for academic research purposes. To ensure data security, real personal information is not present in the dataset. The dataset includes approximately 600,000 records, with 7,200 of them being suspicious transactions. The fields in the dataset can be listed as follows: step, customer, customer age, customer gender, customer zipcode, merchant, merchant zipcode, transaction category, transaction amount, and information about whether the transaction is suspicious or not. Considering the large volume of data in the dataset, it is not necessary to know the exact transaction amount for the checks we will perform. Instead, dividing transaction amounts into more functional and defined groups will make our model and query outputs more meaningful. However, the transaction amount information in our dataset is concentrated within a specific range. When grouping asymmetric and skewed data, the aim is to minimize this skewness by using some mathematical formulas. Therefore, for the BankSim dataset, it was determined that the dataset needs to be divided into 21 groups using Doane's Rule and Jenks Natural Breaks formulas. To ensure balance in these groups, meaning that the transactions are distributed in a way that minimizes skewness, even if not equally in each group, the group intervals were determined. These amount groups were added to the original BankSim dataset as a new column called "Amount Group." Following this dataset, a model has been proposed for use in graph databases. In the model, workplaces, customers, transactions, categories, and amount groups are represented as separate nodes. Relationships between these nodes have also been specified to accurately convey the relevant dataset. Other fields in the dataset are directly represented as sub-properties of the nodes in the model. After these operations, the dataset has been imported into the Neo4j desktop application using the Cypher language in CSV format. Index definitions were primarily made for nodes related to workplaces, customers, categories, and amount groups, considering that queries would be performed on these nodes to ensure better performance. Subsequently, using the Cypher query language, the available data nodes and relationship definitions were imported according to the proposed model. Code blocks for these sections are detailed within the code. Once the dataset was transferred to the database in line with the proposed model, various queries were executed in the Neo4j application. The results of these queries were analyzed as they could be used as rules for credit card fraud prevention systems. Rule functions were also presented based on the outputs. In conclusion, a graph database model has been proposed for examining suspicious transactions in rule-based credit card fraud detection and prevention systems within the Neo4j application. The model utilizes the BankSim dataset. The Cypher commands executed in the Neo4j application have generated sample outputs for transaction reviews in the rule-based system. Additionally, suggestions have been made to add new rules based on the analysis results. The aim is to demonstrate the method and benefits of using Graph Databases as an alternative to relational databases in such systems.

Description:

06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.

Show full item record