Categorization is the process in which ideas and objects are recognized, differentiated, and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge. Categorization is fundamental in language, prediction, inference, decision making and in all kinds of environmental interaction. It is indicated that categorization plays a major role in computer programming.
There are many categorization theories and techniques. In a broader historical view, however, three general approaches to categorization may be identified:
Classical categorization first appears in the context of Western Philosophy in the work of Plato, who, in his Statesman dialogue, introduces the approach of grouping objects based on their similar properties. This approach was further explored and systematized by Aristotle in his Categories treatise, where he analyzes the differences between classes and objects. Aristotle also applied intensively the classical categorization scheme in his approach to the classification of living beings (which uses the technique of applying successive narrowing questions such as "Is it an animal or vegetable?", "How many feet does it have?", "Does it have fur or feathers?", "Can it fly?"...), establishing this way the basis for natural taxonomy.
The classical Aristotelian view claims that categories are discrete entities characterized by a set of properties which are shared by their members. In analytic philosophy, these properties are assumed to establish the conditions which are both necessary and sufficient conditions to capture meaning.
According to the classical view, categories should be clearly defined, mutually exclusive and collectively exhaustive. This way, any entity of the given classification universe belongs unequivocally to one, and only one, of the proposed categories.
Conceptual clustering is a modern variation of the classical approach, and derives from attempts to explain how knowledge is represented. In this approach, classes (clusters or entities) are generated by first formulating their conceptual descriptions and then classifying the entities according to the descriptions.
Conceptual clustering developed mainly during the 1980s, as a machine paradigm for unsupervised learning. It is distinguished from ordinary data clustering by generating a concept description for each generated category.
Categorization tasks in which category labels are provided to the learner for certain objects are referred to as supervised classification, supervised learning, or concept learning. Categorization tasks in which no labels are supplied are referred to as unsupervised classification, unsupervised learning, or data clustering. The task of supervised classification involves extracting information from the labeled examples that allows accurate prediction of class labels of future examples. This may involve the abstraction of a rule or concept relating observed object features to category labels, or it may not involve abstraction (e.g., exemplar models). The task of clustering involves recognizing inherent structure in a data set and grouping objects together by similarity into classes. It is thus a process of generating a classification structure.
Conceptual clustering is closely related to fuzzy set theory, in which objects may belong to one or more groups, in varying degrees of fitness.
Since the research by Eleanor Rosch and George Lakoff in the 1970s, categorization can also be viewed as the process of grouping things based on prototypes--the idea of necessary and sufficient conditions is almost never met in categories of naturally occurring things. It has also been suggested that categorization based on prototypes is the basis for human development, and that this learning relies on learning about the world via embodiment.
Systems of categories are not objectively "out there" in the world but are rooted in people's experience. Conceptual categories are not identical for different cultures, or indeed, for every individual in the same culture.
Categories form part of a hierarchical structure when applied to such subjects as taxonomy in biological classification: higher level: life-form level, middle level: generic or genus level, and lower level: the species level. These can be distinguished by certain traits that put an item in its distinctive category. But even these can be arbitrary and are subject to revision.
Categories at the middle level are perceptually and conceptually the more salient. The generic level of a category tends to elicit the most responses and richest images and seems to be the psychologically basic level. Typical taxonomies in zoology for example exhibit categorization at the embodied level, with similarities leading to formulation of "higher" categories, and differences leading to differentiation within categories.
Miscategorization can be a logical fallacy in which diverse and dissimilar objects, concepts, entities, etc. are grouped together based upon illogical common denominators, or common denominators that virtually any concept, object or entity have in common. A common way miscategorization occurs is through an over-categorization of concepts, objects or entities, and then miscategorization based upon characters that virtually all things have in common.