Data Cleaning

Data Cleaning, also known as data munging or data wrangling, refers to taking whatever steps are necessary to organize one or more datasets in order to improve the computer readability and organization of the data.  This may include standardizing entries, correcting misspellings, removing duplicates, modifying column headers, changing column formats, and/or converting file formats.

Numerous software tools and tricks are available for performing data cleaning.  

  • Free or low cost text editors can be used, found through google searching and based on your operating system.  
  • Microsoft Excel can also be useful, especially if you take advantage of its built-in text functions.
  • One especially handy tool is OpenRefine.

Data services librarians are able to help with data cleaning tasks by contacting us for an appointment.