Sampling and Lebesgue–Stieltjes Integration

1. General Concept of Sampling Mean and Variance

In probability and statistics, the sampling mean and variance are two essential concepts used to describe data derived from a population sample.

Sampling Mean: The sampling mean is the average of a set of observations. It is calculated as: \[ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \] where \( x_i \) represents individual sample values and \( n \) is the sample size.
Sampling Variance: The sampling variance measures the spread of the data. It is calculated as: \[ s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \] where \( \bar{x} \) is the sampling mean.

Key Features of Their Distributions:

The mean of the sampling distribution of the mean equals the population mean \( \mu \).
The variance of the sampling distribution of the mean decreases as the sample size increases (inversely proportional to \( n \)).
The Central Limit Theorem ensures that the sampling mean approaches a normal distribution as \( n \) grows large, regardless of the population distribution.

2. General Idea of Lebesgue–Stieltjes Integration

The Lebesgue–Stieltjes integral generalizes the Riemann integral and is particularly useful in probability theory and measure theory.

The integral is defined with respect to a function \( F \), which is non-decreasing and bounded. For a function \( f \), the Lebesgue–Stieltjes integral is: \[ \int_a^b f(x) \, dF(x) \] Here, \( dF(x) \) can represent a probability measure or a cumulative distribution function (CDF).

Applications to Probability Theory

The Lebesgue–Stieltjes integral is used to define expectations of random variables. For example: \[ \mathbb{E}[X] = \int_{-\infty}^\infty x \, dF(x) \] where \( F(x) \) is the CDF of \( X \).
It allows integration with respect to discrete, continuous, or mixed distributions seamlessly.
It simplifies working with probability measures in abstract spaces.

Applications to Measure Theory

Measure theory provides a rigorous foundation for probability, using the Lebesgue–Stieltjes integral to generalize summation and integration.
It is crucial for defining concepts like Lebesgue measure and proving theorems such as the Dominated Convergence Theorem.
It enables working with complex spaces where classical Riemann integration fails, such as unbounded or non-continuous domains.

3. Numerical Comparison of Lebesgue and Riemann Integrals

To understand the difference between the Lebesgue and Riemann integrals, consider the task of finding the expected value \( \mathbb{E}[X] \) of a random variable \( X \) with the following probability density function (PDF):

\[ f(x) = \begin{cases} 3x^2 & \text{if } 0 \leq x \leq 1 \\ 0 & \text{otherwise.} \end{cases} \]

Problem:

Compute \( \mathbb{E}[X] \) using both Riemann and Lebesgue approaches.

Solution:

Using Riemann Integral:

Divide the domain of \( f(x) \), \([0, 1]\), into small intervals \( [x_i, x_{i+1}] \) and approximate \( \mathbb{E}[X] \) as: \[ \mathbb{E}[X] = \int_0^1 x f(x) \, dx = \int_0^1 x \cdot 3x^2 \, dx = \int_0^1 3x^3 \, dx. \] Computing this integral: \[ \int_0^1 3x^3 \, dx = 3 \cdot \frac{x^4}{4} \Big|_0^1 = 3 \cdot \frac{1}{4} = \frac{3}{4}. \]

Using Lebesgue Integral:

The Lebesgue integral partitions the range of \( f(x) \) instead of the domain. The expected value is given by: \[ \mathbb{E}[X] = \int_0^1 x \, dF(x), \] where \( F(x) \) is the cumulative distribution function (CDF) of \( f(x) \). First, compute \( F(x) \): \[ F(x) = \int_0^x 3t^2 \, dt = \left[ t^3 \right]_0^x = x^3. \] Now compute \( \mathbb{E}[X] \): \[ \mathbb{E}[X] = \int_0^1 x \, d(x^3). \] Using integration by parts: \[ \int_0^1 x \, d(x^3) = \Big[x \cdot x^3 \Big]_0^1 - \int_0^1 x^3 \, dx = \Big[1^4 - 0^4\Big] - \int_0^1 x^3 \, dx. \] The second term is: \[ \int_0^1 x^3 \, dx = \frac{x^4}{4} \Big|_0^1 = \frac{1}{4}. \] Thus: \[ \mathbb{E}[X] = 1 - \frac{1}{4} = \frac{3}{4}. \]

Conclusion:

Both methods yield the same result, \( \mathbb{E}[X] = \frac{3}{4} \). The Riemann integral divides the domain into intervals, while the Lebesgue integral considers the contributions of ranges of \( f(x) \) directly, which is especially advantageous for discontinuous or more complex functions.