dc.description.abstract |
In the financial world, there is a continuous increase in both the methods and numbers of suspicious transactions. Various methods, such as identity theft, check fraud, ATM fraud, online banking fraud, online shopping fraud, and credit card fraud, contribute to this rise. This study specifically focuses on credit card fraud among these fraudulent methods. Credit card fraud involves fraudulent activities in credit card transactions where unauthorized or deceptive methods are employed. These activities encompass the theft of credit card information, the unauthorized creation of credit cards, the utilization of second-hand credit cards, and methods involving social engineering. With the increasing number of credit card transactions every day, many banks and organizations are developing systems to detect and prevent fraudulent cases. They integrate these systems and conduct efforts to enhance them. Systems developed to counteract suspicious transactions can be categorized into two main types: suspicious transaction detection systems and prevention systems. Suspicious transaction detection systems focus on identifying both past and ongoing suspicious transactions. On the other hand, suspicious transaction prevention systems not only aim to detect these transactions but also adopt an approach to prevent and block them in advance. While suspicious transaction detection systems serve as a security layer, prevention systems function as a security firewall, aiming to both identify and proactively thwart suspicious activities. Although various methods, such as machine learning, analytics, data mining, behavior analysis, and artificial intelligence, have been proposed for suspicious transaction prevention systems in recent times, many banks and organizations still prefer and use rule-based systems. In rule-based fraud detection and prevention systems, incidents are analyzed, and rules are written by authorized personnel based on whether the transaction is considered suspicious or not. These processes allow the prevention of transactions based on predefined rules. Additionally, through post-analysis, previously unnoticed methods can be detected, and rules can be enriched accordingly. The study also shares an example of a rule-based suspicious transaction prevention system model. These systems utilize relational databases, document-based databases, and recently popular graph databases as their database technology. Graph databases, by focusing on relationships, provide an advantage in revealing unseen networks in suspicious transaction analyses. In contrast to relational databases, where entities are stored in tables as rows and columns and results are obtained by combining different tables using 'join' operations during query time, graph databases store entities in a way where nodes are physically connected by edges. Unlike relational databases, there is no need for a join operation at query time in graph databases because the desired relationships are already represented as physical entities between nodes. The data size for graph databases does not pose a problem due to this structure, making them effective in handling large datasets. Consequently, they have started to be preferred in social networks, behavior analysis, and suspicious transaction detection and prevention systems. One of the most popular graph database applications is Neo4j. Neo4j assists users in conducting analyses and comprehending them through its user-friendly interface. The Cypher query language, developed by the same team for Neo4j, is used in Neo4j applications. This query language, specially designed for graph databases, allows for the creation of shorter and more understandable queries when querying complex relationships. In this study, the BankSim dataset, obtained from Kaggle and containing transaction data related to credit cards, has been used. The BankSim dataset is synthetically generated based on fraud models in customer payments for a bank and is shared for academic research purposes. To ensure data security, real personal information is not present in the dataset. The dataset includes approximately 600,000 records, with 7,200 of them being suspicious transactions. The fields in the dataset can be listed as follows: step, customer, customer age, customer gender, customer zipcode, merchant, merchant zipcode, transaction category, transaction amount, and information about whether the transaction is suspicious or not. Considering the large volume of data in the dataset, it is not necessary to know the exact transaction amount for the checks we will perform. Instead, dividing transaction amounts into more functional and defined groups will make our model and query outputs more meaningful. However, the transaction amount information in our dataset is concentrated within a specific range. When grouping asymmetric and skewed data, the aim is to minimize this skewness by using some mathematical formulas. Therefore, for the BankSim dataset, it was determined that the dataset needs to be divided into 21 groups using Doane's Rule and Jenks Natural Breaks formulas. To ensure balance in these groups, meaning that the transactions are distributed in a way that minimizes skewness, even if not equally in each group, the group intervals were determined. These amount groups were added to the original BankSim dataset as a new column called "Amount Group." Following this dataset, a model has been proposed for use in graph databases. In the model, workplaces, customers, transactions, categories, and amount groups are represented as separate nodes. Relationships between these nodes have also been specified to accurately convey the relevant dataset. Other fields in the dataset are directly represented as sub-properties of the nodes in the model. After these operations, the dataset has been imported into the Neo4j desktop application using the Cypher language in CSV format. Index definitions were primarily made for nodes related to workplaces, customers, categories, and amount groups, considering that queries would be performed on these nodes to ensure better performance. Subsequently, using the Cypher query language, the available data nodes and relationship definitions were imported according to the proposed model. Code blocks for these sections are detailed within the code. Once the dataset was transferred to the database in line with the proposed model, various queries were executed in the Neo4j application. The results of these queries were analyzed as they could be used as rules for credit card fraud prevention systems. Rule functions were also presented based on the outputs. In conclusion, a graph database model has been proposed for examining suspicious transactions in rule-based credit card fraud detection and prevention systems within the Neo4j application. The model utilizes the BankSim dataset. The Cypher commands executed in the Neo4j application have generated sample outputs for transaction reviews in the rule-based system. Additionally, suggestions have been made to add new rules based on the analysis results. The aim is to demonstrate the method and benefits of using Graph Databases as an alternative to relational databases in such systems. |
|