Dirty Data

Dirty data, also known as rogue data,[1] is inaccurate, incomplete or erroneous data, especially in a computer system or database.[2]

Dirty data can contain such mistakes as spelling or punctuation errors, incorrect data associated with a field, incomplete or outdated data, or even data that has been duplicated in the database. It can be cleaned through a process known as data cleansing.[3]

Following the definition of Gary T. Marx, Professor Emeritus of MIT, there are four types of data.[4]


  • Nonsecretive and nondiscrediting data:
    • Routinely available information.
  • Secretive and nondiscrediting data:
    • Strategic and fraternal secrets, privacy.
  • Nonscretive and discrediting data:
    • sanction immunity,
    • normative dissensus,
    • selective dissensus,
    • making good on a threat for credibility,
    • discovered dirty data.
  • Secretive and discrediting data: Hidden and dirty data.

See also


  1. ^ Spotless version 12 out now
  2. ^ Margaret Chu (2004), "What Are Dirty Data?", Blissful Data, p. 71 et seq., ISBN 9780814407806 
  3. ^ Wu, S. (2013), "A review on coarse warranty data and analysis", Reliability Engineering and System, 114: 1-11, doi:10.1016/j.ress.2012.12.021 
  4. ^ "Notes on the discovery, collection, and assessment of hidden and". web.mit.edu. Retrieved . 

  This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.



US Cities - Things to Do