« Back to Statistics

THE ESSENCE OF STATISTICS


Also understanding its application in artificial intelligence

If mathematics is the science of measurement, then statistics is the mathematics of data; it is the science of measuring, i.e. quantifying different aspects of data. Data is the main resource in understanding a random process or a seemingly random process, i.e. a process with one or more factors that - for practical or other reasons - cannot be accounted for.

NOTE: A process is identified as a unique process based on common factors underlying its results, outcomes or samples. Even a so-called “pure” random process is identified as a unique process based on the common factor or set of factors (known or unknown) that lead to a constant probability for any outcome.

Furthermore, when observing the distribution of samples from a process as frequencies of occurrence across time, space or some other metric, observing a convergence in the patterns of occurrence suggest common factors that may be generalised (though not yet, for certain); this is because in a random process, we expect the effect of one or more of the unaccounted variables to keep changing across samples, so convergence to a pattern indicates that there are constant, universal factors (universal to all possible samples of the process) that cause these aggregate similarities despite the constant changes (at least in the long-run) in many other factors. We can try to generalise this pattern by omitting deviations from averages over time to arrive at theoretical distributions, mathematical objects that are essentially abstractions of the observed distribution(s) of a random process or a class of random processes.

However, statistics does not by itself grant any certainty to a process of induction; by its nature, statistics can only indicate promising leads. This is because statistics measures data, but data as such are particulars; to move from conclusions on particulars to conclusions on universals requires conceptual integration, i.e. the process of identifying and integrating the concepts and previous generalisations underlying a process. This is beyond the scope of statistics as such.

Nevertheless, knowing where to look for answers and knowing how and why the data offer promise are invaluable in the search for knowledge. Since statistics quantifies and systematises such a search, it is a vital tool in automatising learning. This points to the relevance of statistics and statistical principles in artificial intelligence (AI). AI problems, in the broadest terms, are search problems, i.e. problems requiring an automated search that starts from limited information (or even no information). In environments that are harder to generalise and thus require more experience to learn from, statistics enables us to either streamline or boil down the learning process to a (potentially) much smaller number of steps than any brute-force approach, or even a single step in some cases.