FEATURE EXTRACTION AND CLASSIFICATION FOR SPAM DETECTION USING SUPERVISED ML
Keywords:
Spam Detection, Supervised Machine Learning, Feature Extraction, Text Classification, Naïve Bayes, Support Vector Machine (SVM), TF-IDF, Bag-of-Words, Email Filtering, Data MiningAbstract
The objective of this research is to enhance the efficacy and precision of spam detection in digital communication systems by utilising supervised machine learning techniques for feature extraction and classification procedures. This paper examines large email and text message datasets for feature extraction using preprocessing techniques such as tokenization, stop-word deletion, stemming, and vectorization. A number of supervised machine learning techniques, including as Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Random Forest, and Logistic Regression, are compared in order to distinguish between spam and real messages. To measure how well these models work, we use metrics like F1-scores, recall, accuracy, and precision. In order to enhance classification accuracy and decrease false positives, the Paper stresses the significance of effective feature extraction methods as TF-IDF and Bag-of-Words. The proposed method allows trustworthy and secure communication by use of an adaptive spam detection framework that adjusts to actual changes in spam trends.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Advanced Research & Development Journal

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
All articles published in the Journal of Engineering Excellence (JEE) are licensed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Under this license, authors retain full copyright of their work while granting permission for anyone to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, without asking prior permission from the publisher or author — provided that the original work is properly cited.
This open-access license ensures maximum dissemination and impact of the published research by allowing free and immediate access to scholarly work.
For more details, please refer to the official license page:
???? https://creativecommons.org/licenses/by/4.0/
