Introduction
The goal is to prove that the median of a set of numbers minimizes the sum of absolute deviations, that is, for a given set of numbers
x(1), x(2), ..., x(n), the value of c that minimizes
the sum S(c) = Σ |x(i) - c| is the median of the set.
Step-by-Step Explanation
1. Definition of Absolute Deviation
For a set of numbers x(1), x(2), ..., x(n), the absolute deviation from a point c is defined as
|x(i) - c| for each x(i). The sum of absolute deviations is the sum of these individual deviations:
S(c) = Σ |x(i) - c|
2. Behavior of the Function S(c)
The absolute deviation function |x(i) - c| is piecewise linear, with a slope of -1 when c is less than x(i)
and a slope of +1 when c is greater than x(i). Therefore, the sum of absolute deviations S(c) is a piecewise
linear function as well, with changes in slope occurring at the values x(1), x(2), ..., x(n).
3. Slope Changes and Minimization
When c is less than the median, there are more terms in the sum that decrease as c increases (slope = -1). When c
is greater than the median, there are more terms that increase as c increases (slope = +1). Therefore, the sum of absolute deviations
decreases as c approaches the median from either side.
4. Why the Median Minimizes the Sum
The median is defined as the value that splits the data into two equal halves. At the median, the number of terms that increase and decrease as
c changes are balanced. Therefore, the sum of absolute deviations is minimized when c equals the median, as the slopes
of the function to the left and right of the median change symmetrically.
5. Formal Proof Using Derivatives
To formally prove this, consider the derivative of S(c) with respect to c:
dS(c)/dc = Σ sign(x(i) - c)
The derivative is the sum of the signs of the differences x(i) - c. When c is less than the median,
more terms contribute a negative sign, so the derivative is negative, indicating that S(c) decreases.
When c is greater than the median, more terms contribute a positive sign, so the derivative is positive,
indicating that S(c) increases. At the median, the number of positive and negative terms is balanced,
so the derivative is zero and the function reaches a minimum.
Conclusion
The sum of absolute deviations is minimized at the median because it is the point where the number of terms increasing and
decreasing is balanced. This balance makes the median the optimal value of c for minimizing the total deviation.
Conceptual Approaches to Defining "Location" or "Central Tendency" Statistics
The concept of central tendency (or location) can be understood as the point or value that attempts to represent the entire distribution of data in some meaningful way. There are various ways to define such a "location" statistic, each with its own underlying assumptions and applications. The generalization of these concepts can lead to an infinite number of different definitions.
1. Common Definitions of Central Tendency
- Mean (Arithmetic Average): The sum of all data points divided by the number of points. It minimizes the sum of squared deviations from the data.
- Median: The middle value when data points are arranged in order. It minimizes the sum of absolute deviations.
- Mode: The most frequent value in a dataset. It represents the value with the highest probability or frequency.
- Midrange: The average of the minimum and maximum values in the dataset. It gives a rough estimate of central tendency based on extremes.
2. Weighted Averages
The idea of central tendency can be generalized by assigning different weights to the data points. For example, a weighted mean gives more importance to certain values. This generalization allows for the inclusion of domain knowledge or external factors when determining the central point.
3. Geometric and Harmonic Means
The geometric mean is used when data points are multiplicatively related (e.g., rates of growth), while the harmonic mean is used in cases involving rates or ratios. These means provide alternatives to the arithmetic mean when the underlying relationships between data points require different treatments.
4. Percentiles and Quantiles
Central tendency can also be generalized to any quantile, not just the median (which is the 50th percentile). Other quantiles, such as the 25th percentile (first quartile) or 75th percentile (third quartile), may also be used to describe the central location of a distribution, depending on the context.
5. Generalized Means
The family of generalized means includes the arithmetic, geometric, and harmonic means as special cases. A generalized mean is defined as:
M_p(x_1, x_2, ..., x_n) = \left( \frac{1}{n} \sum_{i=1}^{n} x_i^p \right)^{1/p}
where p can take any real value. For p = 1, the generalized mean becomes the arithmetic mean;
for p = 0, it becomes the geometric mean; and for p = -1, it becomes the harmonic mean.
By changing the value of p, we can define an infinite number of different means.
6. Median of Absolute Deviations and Other Robust Statistics
Some statistics aim to be robust against outliers, such as the median of absolute deviations (MAD), which is a robust measure of variability. Other robust methods can be designed to provide alternative central tendency measures that are less sensitive to extreme values.
7. Central Tendency Based on Optimization
The general concept of central tendency can be framed as an optimization problem, where the "best" center minimizes a given loss function. Depending on the chosen loss function (e.g., squared deviations, absolute deviations, or something else), different central tendency measures can be derived. This approach leads to the possibility of defining an infinite number of central tendency measures by varying the loss function.