An example of classical multidimensional scaling applied to voting patterns in the
United States House of Representatives. Each red dot represents one Republican member of the House, and each blue dot one Democrat.
Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. It refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of nonlinear dimensionality reduction. An MDS algorithm aims to place each object in Ndimensional space such that the betweenobject distances are preserved as well as possible. Each object is then assigned coordinates in each of the N dimensions. The number of dimensions of an MDS plot N can exceed 2 and is specified a priori. Choosing N=2 optimizes the object locations for a twodimensional scatterplot.^{[1]}
Types
MDS algorithms fall into a taxonomy, depending on the meaning of the input matrix:
Classical multidimensional scaling
It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or TorgersonGower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a loss function called strain:^{[1]} For example, given the aerial distances between many cities in a matrix ${\textstyle D=[d_{ij}]}$, where ${\textstyle d_{ij}}$ is the distance between the coordinates of ${\textstyle i^{th}}$ and ${\textstyle j^{th}}$ city, given by ${\textstyle d_{ij}={\sqrt {(x_{i}x_{j})^{2}+(y_{i}y_{j})^{2}}}}$, you want to find the coordinates of the cities. This problem is addressed in classical MDS.
 General forms of loss functions called Stress in distance MDS and Strain in classical MDS. The strain is given by:

${\textstyle Strain_{D}(x_{1},x_{2},...,x_{N})={\Biggl (}{\frac {\sum _{i,j}{\bigl (}b_{ij}\langle x_{i},x_{j}\rangle {\bigr )}^{2}}{\sum _{i,j}b_{ij}^{2}}}{\Biggr )}^{1/2}}$, where $b_{ij}$ are the terms of the matrix $B$ defined on step 2 of the following algorithm.
 Steps of a Classical MDS algorithm:
 Classical MDS uses the fact that the coordinate matrix can be derived by eigenvalue decomposition from ${\textstyle B=XX'}$. And the matrix ${\textstyle B}$ can be computed from proximity matrix ${\textstyle D}$ by using double centering.^{[2]}
 Set up the squared proximity matrix ${\textstyle D^{(2)}=[d_{ij}^{2}]}$
 Apply double centering: ${\textstyle B={\frac {1}{2}}JD^{(2)}J}$ using the centering matrix ${\textstyle J=I{\frac {1}{n}}11'}$, where ${\textstyle n}$ is the number of objects.
 Determine the ${\textstyle m}$ largest eigenvalues ${\textstyle \lambda _{1},\lambda _{2},...,\lambda _{m}}$ and corresponding eigenvectors ${\textstyle e_{1},e_{2},...,e_{m}}$ of ${\textstyle B}$ (where ${\textstyle m}$ is the number of dimensions desired for the output).
 Now, ${\textstyle X=E_{m}\Lambda _{m}^{1/2}}$ , where ${\textstyle E_{m}}$ is the matrix of ${\textstyle m}$ eigenvectors and ${\textstyle \Lambda _{m}}$ is the diagonal matrix of ${\textstyle m}$ eigenvalues of ${\textstyle B}$.
 Classical MDS assumes Euclidean distances. So this is not applicable for direct dissimilarity ratings.
Metric multidimensional scaling
It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called stress, which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called "Stress" which is a residual sum of squares:
$Stress_{D}(x_{1},x_{2},...,x_{N})={\Biggl (}\sum _{i\neq j=1,...,N}{\bigl (}d_{ij}\x_{i}x_{j}\{\bigr )}^{2}{\Biggr )}^{1/2}$
: or, $Stress_{D}(x_{1},x_{2},...,x_{N})={\Biggl (}{\frac {\sum _{i,j}{\bigl (}d_{ij}\x_{i}x_{j}\{\bigr )}^{2}}{\sum _{i,j}d_{ij}^{2}}}{\Biggr )}^{1/2}$
 Metric scaling uses a power transformation with a usercontrolled exponent ${\textstyle p}$: ${\textstyle d_{ij}^{p}}$ and ${\textstyle d_{ij}^{2p}}$ for distance. In classical scaling ${\textstyle p=1}$. Nonmetric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities.
Nonmetric multidimensional scaling
In contrast to metric MDS, nonmetric MDS finds both a nonparametric monotonic relationship between the dissimilarities in the itemitem matrix and the Euclidean distances between items, and the location of each item in the lowdimensional space. The relationship is typically found using isotonic regression: let ${\textstyle x}$ denote the vector of proximities, ${\textstyle f(x)}$ a monotonic transformation of ${\textstyle x}$, and ${\textstyle d}$ the point distances; then coordinates have to be found, that minimize the socalled stress,

$Stress={\sqrt {\frac {\sum {\bigl (}f(x)d{\bigr )}^{2}}{\sum d^{2}}}}$
 A few variants of this cost function exist. MDS programs automatically minimize stress in order to obtain the MDS solution.
 The core of a nonmetric MDS algorithm is a twofold optimization process. First the optimal monotonic transformation of the proximities has to be found. Secondly, the points of a configuration have to be optimally arranged, so that their distances match the scaled proximities as closely as possible. The basic steps in a nonmetric MDS algorithm are:
 Find a random configuration of points, e. g. by sampling from a normal distribution.
 Calculate the distances d between the points.
 Find the optimal monotonic transformation of the proximities, in order to obtain optimally scaled data ${\textstyle f(x)}$.
 Minimize the stress between the optimally scaled data and the distances by finding a new configuration of points.
 Compare the stress to some criterion. If the stress is small enough then exit the algorithm else return to 2.

 Louis Guttman's smallest space analysis (SSA) is an example of a nonmetric MDS procedure.
Generalized multidimensional scaling
An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth nonEuclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimumdistortion embedding of one surface into another.^{[3]}
Details
The data to be analyzed is a collection of $I$ objects (colors, faces, stocks, . . .) on which a distance function is defined,
 $\delta _{i,j}:=$ distance between $i$th and $j$th objects.
These distances are the entries of the dissimilarity matrix
 $\Delta :={\begin{pmatrix}\delta _{1,1}&\delta _{1,2}&\cdots &\delta _{1,I}\\\delta _{2,1}&\delta _{2,2}&\cdots &\delta _{2,I}\\\vdots &\vdots &&\vdots \\\delta _{I,1}&\delta _{I,2}&\cdots &\delta _{I,I}\end{pmatrix}}.$
The goal of MDS is, given $\Delta$, to find $I$ vectors $x_{1},\ldots ,x_{I}\in \mathbb {R} ^{N}$ such that
 $\x_{i}x_{j}\\approx \delta _{i,j}$ for all $i,j\in {1,\dots ,I}$,
where $\\cdot \$ is a vector norm. In classical MDS, this norm is the Euclidean distance, but, in a broader sense, it may be a metric or arbitrary distance function.^{[4]}
In other words, MDS attempts to find an embedding from the $I$ objects into $\mathbb {R} ^{N}$ such that distances are preserved. If the dimension $N$ is chosen to be 2 or 3, we may plot the vectors $x_{i}$ to obtain a visualization of the similarities between the $I$ objects. Note that the vectors $x_{i}$ are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances $\x_{i}x_{j}\$.
(Note: The symbol $\mathbb {R}$ indicates the set of real numbers, and the notation $\mathbb {R} ^{N}$ refers to the Cartesian product of $N$ copies of $\mathbb {R}$, which is an $N$dimensional vector space over the field of the real numbers.)
There are various approaches to determining the vectors $x_{i}$. Usually, MDS is formulated as an optimization problem, where $(x_{1},\ldots ,x_{I})$ is found as a minimizer of some cost function, for example,
 $\min _{x_{1},\ldots ,x_{I}}\sum _{i<j}(\x_{i}x_{j}\\delta _{i,j})^{2}.\,$
A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimizers can be stated analytically in terms of matrix eigendecompositions.^{[]}
Procedure
There are several steps in conducting MDS research:
 Formulating the problem  What variables do you want to compare? How many variables do you want to compare? What purpose is the study to be used for?
 Obtaining input data  For example, : Respondents are asked a series of questions. For each product pair, they are asked to rate similarity (usually on a 7point Likert scale from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as $Q=N(N1)/2$ where Q is the number of questions and N is the number of brands. This approach is referred to as the "Perception data : direct approach". There are two other approaches. There is the "Perception data : derived approach" in which products are decomposed into attributes that are rated on a semantic differential scale. The other is the "Preference data approach" in which respondents are asked their preference rather than similarity.
 Running the MDS statistical program  Software for running the procedure is available in many statistical software packages. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS^{[5]} (which deals with ordinal data).
 Decide number of dimensions  The researcher must decide on the number of dimensions they want the computer to create. The more dimensions, the better the statistical fit, but the more difficult it is to interpret the results.
 Mapping the results and defining the dimensions  The statistical program (or a related module) will map the results. The map will plot each product (usually in twodimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. How the dimensions of the embedding actually correspond to dimensions of system behavior, however, are not necessarily obvious. Here, a subjective judgment about the correspondence can be made (see perceptual mapping).
 Test the results for reliability and validity  Compute Rsquared to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An Rsquare of 0.6 is considered the minimum acceptable level.^{[]} An Rsquare of 0.8 is considered good for metric scaling and .9 is considered good for nonmetric scaling. Other possible tests are Kruskal's Stress, split data tests, data stability tests (i.e., eliminating one brand), and testretest reliability.
 Report the results comprehensively  Along with the mapping, at least distance measure (e.g., Sorenson index, Jaccard index) and reliability (e.g., stress value) should be given. It is also very advisable to give the algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes replacing the algorithm report), if you have given a start configuration or had a random choice, the number of runs, the assessment of dimensionality, the Monte Carlo method results, the number of iterations, the assessment of stability, and the proportional variance of each axis (rsquare).
Implementations
See also
References
 ^ ^{a} ^{b} Borg,, I.; Groenen, P. (2005). Modern Multidimensional Scaling: theory and applications (2nd ed.). New York: SpringerVerlag. pp. 207212. ISBN 0387948457.
 ^ Wickelmaier, Florian. "An introduction to MDS." Sound Quality Research Unit, Aalborg University, Denmark (2003): 46
 ^ Bronstein AM, Bronstein MM, Kimmel R (January 2006). "Generalized multidimensional scaling: a framework for isometryinvariant partial surface matching". Proc. Natl. Acad. Sci. U.S.A. 103 (5): 116872. PMC 1360551 . PMID 16432211. doi:10.1073/pnas.0508601103.
 ^ Kruskal, J. B., and Wish, M. (1978), Multidimensional Scaling, Sage University Paper series on Quantitative Application in the Social Sciences, 07011. Beverly Hills and London: Sage Publications.
 ^ Kruskal, J. B. (1964). "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis". Psychometrika. 29 (1): 127. doi:10.1007/BF02289565.
Bibliography
 Cox, T.F.; Cox, M.A.A. (2001). Multidimensional Scaling. Chapman and Hall.
 Coxon, Anthony P.M. (1982). The User's Guide to Multidimensional Scaling. With special reference to the MDS(X) library of Computer Programs. London: Heinemann Educational Books.
 Green, P. (January 1975). "Marketing applications of MDS: Assessment and outlook". Journal of Marketing. 39 (1): 2431. doi:10.2307/1250799.
 McCune, B. & Grace, J.B. (2002). Analysis of Ecological Communities. Oregon, Gleneden Beach: MjM Software Design. ISBN 0972129006.
 Young, Forrest W. (1987). Multidimensional scaling: History, theory, and applications. Lawrence Erlbaum Associates. ISBN 9780898596632.
 Torgerson, Warren S. (1958). Theory & Methods of Scaling. New York: Wiley. ISBN 0898747228.
External links