Effect of Data Normalization on the Performance of Classification Algorithms

Muhammad Rehan Abbas, Malik Sajjad Ahmed Nadeem and Syed Rizwan Abbas DOWNLOAD PDF

Data normalization is an elementary data preprocessing step for learning from data before feeding to some machine learning classifiers. We conducted an empirical study to improve the performance of machine learning classifiers by inducing data normalization during learning phase of the classification algorithms. Three data normalization techniques; Decimal Scaling (DS), Min-Max (MM), and Z-Score (ZS) were selected along with five machine learning classifiers (Support Vector Machine with linear kernel (SVML), Support Vector Machine with radial kernel (SVMR), Linear Discriminant Analysis (LDA), Random Forest (RF) and K-Nearest Neighbors (kNN)). The investigation has been carried out on five publicly available clinical cancer datasets. To evaluate the performances of classification algorithms, prediction accuracy, Mean Squared Error (MSE) and Improved Squared Error (ISE) are three factors that were taken into account. Performance comparison of different learning algorithms was made after applying to each normalization technique.

Keywords: Transformation,classification, data normalization, SVM, LDA, kNN, Random Forest