Performance evaluation of decision tree classification algorithms using fraud datasets

Eddie Bouy B. Palad, Mary Jane F. Burden, Christian Ray Dela Torre, Rachelle Bea C. Uy

Abstract


Text mining is one way of extracting knowledge and finding out hidden relationships among data using artificial intelligence methods. Surely, taking advantage of different techniques has been highlighted in previous researches however, the lack of literature focusing on cybercrimes implies the lack of utilization of data mining in facilitating cybercrime investigations in the Philippines. This study therefore classifies computer fraud or online scam data coming from Police incident reports as well as narratives of scam victims as a continuation of a prior study. The dataset consists mainly of unstructured data of 49,822 mainly Filipino words. Further, 5 decision tree algorithms namely, J48, Hoeffding Tree, Decision Stump, REPTree, and Random Forest were employed and compared in terms of their performance and prediction accuracy. The results show that J48 achieves the highest accuracy and the lowest error rate among other classifiers. Results were validated by Police investigators where J48 was likewise preferred as a potential tool to apply in cybercrime investigations. This indicates the importance of text mining in the field of cybercrime investigation domains in the country. Further work can be carried out in the future using different and more inclusive cybercrime datasets and other classification techniques in Weka or any other data mining tool.


Keywords


Classification; Cybercrime; Decision trees; Online scam; Text mining

Full Text:

PDF


DOI: https://doi.org/10.11591/eei.v9i6.2630

Refbacks

  • There are currently no refbacks.




Bulletin of EEI Stats