Abstract:
According to the World Health Organization, the number of smokers reached 1.3 billion last year, and more than 8 million of them died from smoking-related diseases. These numbers are constantly growing every year, so to prevent this, several countries have introduced a ban on smoking in open areas. But still, the number of smokers does not decrease. Today, most smokers start smoking at a young age because they see people smoking in their environment, in films and videos on the Internet, and all this leads to the onset of smoking. Therefore, the main goal of the study is to create a safe and educational digital experience using computer vision and deep learning techniques to reduce the negative impact of modern Internet videos on children and improve the accuracy and reliability of smoking-related object recognition. This will significantly improve media content analysis and support anti-smoking campaigns. The rise of digital media and the internet has presented new challenges in tobacco control. Children and adolescents are frequently exposed to content that glorifies smoking, which can influence their attitudes towards tobacco use. There is a critical need for technologies that can automatically detect and analyze media content related to smoking-related images to reduce these effects. Therefore, it aims to examine modern technologies in the field of computer vision that allow to automate the process of recognizing smoking-related objects in images and videos, and solutions such as automatic content filtering and providing age-appropriate recommendations are being developed to solve problems such as exposure to harmful content in the video and excessive screen time. An important aspect of the work is the development and training of YOLOv5, YOLOv8 and YOLOv9 models for the cigarette detection task. YOLO is a popular algorithm for object detection tasks, known for its high accuracy and speed. YOLOv5 is an open-source object detection model that offers significant improvements in speed and accuracy over its predecessors. It is highly optimized for real-time detection tasks, making it suitable for applications requiring quick response times. Compared to previous versions, YOLOv8 is faster and more accurate, and requires fewer parameters to achieve its performance. It utilizes novel techniques in feature extraction and bounding box regression to achieve higher accuracy in object detection tasks. YOLOv9 is one of the latest versions of the YOLO algorithm, published in February 2024. A distinctive feature of YOLOv9 from previous versions is the use of the Generalized Efficient Layer Aggregation Network (GELAN) and Programmable Gradient Information (PGI). This project will create and tune a neural network that can accurately and quickly detect cigarettes in images, including various lighting conditions and backgrounds. Cigarette recognition accuracy is critical to creating reliable monitoring systems that can be used in real-world environments. We compiled a comprehensive dataset of images featuring cigarettes and ashtrays. In the study, we collected two datasets for the cigarette detection task and the ashtray detection task. It contains 1200 images of people smoking and 368 images of ashtrays. Each image in the dataset is annotated to indicate the presence of a cigarette in the photo, in the form of bounding boxes around the cigarette. The dataset includes various scenes, lighting and poses of people smoking, as well as background options such as streets, cafes and houses, making it representative of different cigarette detection scenarios. Bounding boxes were added to the dataset to indicate the locations of objects of interest. First, the model was trained using different sizes of YOLOv5, YOLOv8 and YOLOv9. When comparing the results of YOLO models, YOLOv5 models have higher accuracy than YOLOv8 and YOLOv9. More precisely, the highest result was 89%. For YOLOv8, the highest accuracy is 87%. For YOLOv9, the accuracy is 88%. Then we trained YOLOv5, YOLOv8 and YOLOv9 models on the prepared dataset using hyperparameter optimization to fine-tune their performance. Models were evaluated based on accuracy, precision, recall, and mAP score. Hyperparameter optimization plays a critical role in achieving optimal performance for deep learning models. Therefore, in the study, we tried various optimization techniques to fine-tune the YOLO models. To obtain good results, the hyperparameters of YOLOv5, YOLOv8 and YOLOv9 models were optimized using various optimizers such as SGD, Adam and AdamW. As a result of the optimization, it was seen that using the SGD optimizer with the YOLOv9 model allowed us to obtain the best results. Setting distribution plots demonstrated the superiority of this combination in terms of accuracy and convergence speed, making it the preferred choice for our applications. Results of the first trained models with default parameters: YOLOv5 achieves 89%, YOLOv8 achieves 87%, and YOLOv9 achieves 88%. After hyperparameter optimization, using the SGD optimizer proves to be the most effective and shows superior performance compared to other optimizers. While the performance of YOLOv5 remains unchanged at 89%, the performance of YOLOv8 and YOLOv9 increases to 93%. Although YOLOv8 and YOLOv9 have the same mAP(50) value after optimization with SGD, YOLOv9 outperforms YOLOv8 by 2% in the mAP(50-95) metric. An important aspect of our study was the inclusion of ashtray detection in the images. Ashtray is an important indicator of smoking, and its recognition significantly increases the accuracy of detecting smoking incidents. We developed approaches to detecting ashtrays that were successfully integrated into our models, improving the overall results and making the system more versatile. In addition, a special algorithm for detecting cigarettes and ashtrays has been developed and tested; This algorithm produced impressive results, reaching an accuracy of 96%. This algorithm successfully recognized cigarettes and ashtrays, minimizing the number of false positives and outperforming the initial result of models YOLOv5, YOLOv8 and YOLOv9. The results of our study show that the use of state-of-the-art deep learning models and hyperparameter optimization can significantly increase the efficiency and accuracy of cigarette and ashtray detection in images. The use of the developed models and algorithms provides significant contributions to the fight against tobacco use by providing effective tools for monitoring, analyzing and preventing the harmful effects of tobacco on public health. The use of state-of-the-art deep learning models and hyperparameter optimization has significantly increased the efficiency and accuracy of cigarette and ashtray detection in images. The developed models and custom algorithm provide effective tools for monitoring, analyzing, and preventing the harmful effects of tobacco on public health. The integration of these technologies into digital media platforms can play a crucial role in reducing the exposure of smoking-related content to vulnerable populations, particularly children. The promising results of this study pave the way for further research and development in this field, with the potential to make a significant impact on public health and safety.