Journal of Systems Integration, Vol 5, No 1 (2014)

Font Size:  Small  Medium  Large

Dealing with Missing Values in Data

Jiri Kaiser

Abstract


Many existing industrial and research data sets contain missing values due to various reasons, such as manual data entry procedures, equipment errors and incorrect measurements. Problems associated with missing values are loss of efficiency, complications in handling and analyzing the data and bias resulting from differences between missing and complete data. The important factor for selection of approach to missing values is missing data mechanism. There are various strategies for dealing with missing values. Some analytical methods have their own approach to handle missing values. Data set reduction is another option. Finally missing values problem can be handled by missing values imputation. This paper presents simple methods for missing values imputation like using most common value, mean or median, closest fit approach and methods based on data mining algorithms like k-nearest neighbor, neural networks and association rules, discusses their usability and presents issues with their applicability on examples.

Full Text: PDF

DOI: http://dx.doi.org/10.20470/jsi.v5i1.178

ISSN: 1804-2724

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 Czech Republic License.