In statistics, the population refers to the complete set of all items, individuals, events, or observations that are of interest in a particular study. It can be large (e.g., all people living in a country) or small (e.g., all students in a class). The population can be finite or infinite, depending on the scope of the study.
A statistical unit is the individual element or entity in a population or sample that is being observed, measured, or counted in a study. These units are what the data points are collected from, and they must be clearly defined to avoid confusion.
A distribution in statistics refers to how the values of a variable are spread or distributed across a dataset. It shows the frequency or likelihood of each value or range of values. There are different types of distributions (e.g., normal distribution, binomial distribution), but they all describe how data points are arranged within a dataset.
Frequency refers to the number of times a particular value or group of values occurs within a dataset. There are three key types of frequency that are often used:
The arithmetic average, also known as the mean, is one of the most commonly used measures of central tendency in statistics. It represents the sum of all values in a dataset divided by the number of values.
When calculating the arithmetic mean (or any other statistic) on computers, floating-point representation introduces potential challenges due to the way real numbers are stored.
To address computational issues like precision loss, algorithms such as those proposed by Donald Knuth offer more stable numerical solutions. One such technique is Kahan summation (or compensated summation), which improves the accuracy of summing floating-point numbers.
The idea is to track small errors introduced during the summation process and compensate for them, thereby reducing the effect of floating-point precision issues. The algorithm works as follows:
Steps:
By carefully managing the addition of small differences, this algorithm helps to minimize the errors that accumulate when adding a large number of values, especially in datasets with values of different magnitudes.
Assignment:
We have n servers with m attackers. The hacker has probability p to penetrate each server. Make a graphical representation (line flat if hacker doesn’t penetrate and a jump to 1 if he penetrates), try different n, m, p. At time n we want to complete distribution how many reached each level. (Draw the distribution histogram vertically at the end of the chart, so that each rectangle representing the attackers’ frequency is placed on the corresponding number of penetrations (or “successes”) they achieved).
You can find the code for the exercise here, while the online result can be accessed here.