Abstract:
With the increase of the web software complexity, defect detection and prevention have become crucial processes in the software industry. Over the past decades, defect prediction research has reported encouraging results for reducing software product costs. Despite promising results, these researches have hardly been applied to web based systems using clustering algorithms. An appropriate implementation of the clustering in defect prediction may facilitate to estimate defects in a web page source code. One of the widely used clustering algorithms is k-means whose derived versions such as k-means++ show good performance on large-data sets. Here, we present a new defect clustering method using k-means++ for web page source codes. According to the experimental results, almost half of the defects are detected in the middle of web pages. k-means++ is significantly better than the other four clustering algorithms in three criteria on four data set. We also tested our method on four classifiers and the results have shown that after the clustering, Linear Discriminant Analysis is, in general, better than the other three classifiers. (C) 2015 Elsevier Ltd. All rights reserved.