Why the Median Minimizes the Sum of Absolute Deviations

Introduction

The goal is to prove that the median of a set of numbers minimizes the sum of absolute deviations, that is, for a given set of numbers x(1), x(2), ..., x(n), the value of c that minimizes the sum S(c) = Σ |x(i) - c| is the median of the set.

Step-by-Step Explanation

1. Definition of Absolute Deviation

For a set of numbers x(1), x(2), ..., x(n), the absolute deviation from a point c is defined as |x(i) - c| for each x(i). The sum of absolute deviations is the sum of these individual deviations:

        S(c) = Σ |x(i) - c|
    

2. Behavior of the Function S(c)

The absolute deviation function |x(i) - c| is piecewise linear, with a slope of -1 when c is less than x(i) and a slope of +1 when c is greater than x(i). Therefore, the sum of absolute deviations S(c) is a piecewise linear function as well, with changes in slope occurring at the values x(1), x(2), ..., x(n).

3. Slope Changes and Minimization

When c is less than the median, there are more terms in the sum that decrease as c increases (slope = -1). When c is greater than the median, there are more terms that increase as c increases (slope = +1). Therefore, the sum of absolute deviations decreases as c approaches the median from either side.

4. Why the Median Minimizes the Sum

The median is defined as the value that splits the data into two equal halves. At the median, the number of terms that increase and decrease as c changes are balanced. Therefore, the sum of absolute deviations is minimized when c equals the median, as the slopes of the function to the left and right of the median change symmetrically.

5. Formal Proof Using Derivatives

To formally prove this, consider the derivative of S(c) with respect to c:

        dS(c)/dc = Σ sign(x(i) - c)
    

The derivative is the sum of the signs of the differences x(i) - c. When c is less than the median, more terms contribute a negative sign, so the derivative is negative, indicating that S(c) decreases. When c is greater than the median, more terms contribute a positive sign, so the derivative is positive, indicating that S(c) increases. At the median, the number of positive and negative terms is balanced, so the derivative is zero and the function reaches a minimum.

Conclusion

The sum of absolute deviations is minimized at the median because it is the point where the number of terms increasing and decreasing is balanced. This balance makes the median the optimal value of c for minimizing the total deviation.

Conceptual Approaches to Defining "Location" or "Central Tendency" Statistics

The concept of central tendency (or location) can be understood as the point or value that attempts to represent the entire distribution of data in some meaningful way. There are various ways to define such a "location" statistic, each with its own underlying assumptions and applications. The generalization of these concepts can lead to an infinite number of different definitions.

1. Common Definitions of Central Tendency

2. Weighted Averages

The idea of central tendency can be generalized by assigning different weights to the data points. For example, a weighted mean gives more importance to certain values. This generalization allows for the inclusion of domain knowledge or external factors when determining the central point.

3. Geometric and Harmonic Means

The geometric mean is used when data points are multiplicatively related (e.g., rates of growth), while the harmonic mean is used in cases involving rates or ratios. These means provide alternatives to the arithmetic mean when the underlying relationships between data points require different treatments.

4. Percentiles and Quantiles

Central tendency can also be generalized to any quantile, not just the median (which is the 50th percentile). Other quantiles, such as the 25th percentile (first quartile) or 75th percentile (third quartile), may also be used to describe the central location of a distribution, depending on the context.

5. Generalized Means

The family of generalized means includes the arithmetic, geometric, and harmonic means as special cases. A generalized mean is defined as:

        M_p(x_1, x_2, ..., x_n) = \left( \frac{1}{n} \sum_{i=1}^{n} x_i^p \right)^{1/p}
    

where p can take any real value. For p = 1, the generalized mean becomes the arithmetic mean; for p = 0, it becomes the geometric mean; and for p = -1, it becomes the harmonic mean. By changing the value of p, we can define an infinite number of different means.

6. Median of Absolute Deviations and Other Robust Statistics

Some statistics aim to be robust against outliers, such as the median of absolute deviations (MAD), which is a robust measure of variability. Other robust methods can be designed to provide alternative central tendency measures that are less sensitive to extreme values.

7. Central Tendency Based on Optimization

The general concept of central tendency can be framed as an optimization problem, where the "best" center minimizes a given loss function. Depending on the chosen loss function (e.g., squared deviations, absolute deviations, or something else), different central tendency measures can be derived. This approach leads to the possibility of defining an infinite number of central tendency measures by varying the loss function.

Exercise

Assignment:

Refine your SDE simulator to simulate a continuous time process where we can have an attack (indicated with a jump of +1) at any time with a constant rate of attack. To create the approximation of time continuity subdivide your reference temporal window into numerous intervals of vanishing size dt = 1/n and to each infinitesimal interval assign a probability of a +1 "jump" (attack success) equal to Lambda * dt, where Lambda is a simulation parameter, having the meaning of expected total number of attacks in the reference period.

You can find the code for this exercise here, and the online result can be accessed here.