The most important in 50 years, 8 major statistical developments: Columbia University professor’s paper lists statistical ideas that promote the AI ​​revolution

Silent mathematics

The most important in 50 years, 8 major statistical developments: Columbia University professor's paper lists statistical ideas that promote the AI ​​revolution

Source: tandfonline

[Guide] Generally we believe that the development of artificial intelligence is due to the rapid advancement of computing power. Recently, a professor at Columbia University published a paper revealing the unknown statistical thinking behind it in the past 50 years.

‍Although deep learning and artificial intelligence have become household names, the statistical breakthroughs driving this revolution are rarely known .

In a recent paper, Andrew Gelman, a professor of statistics at Columbia University, and Aki Vehtari, a professor of computer science at Aalto University in Finland, detailed the most important statistical ideas of the past 50 years.

The most important in 50 years, 8 major statistical developments: Columbia University professor's paper lists statistical ideas that promote the AI ​​revolution

The author categorizes these statistical ideas into 8 categories:

  • Counterfactual causal inference
  • Bootstrapping and simulation-based inference (bootstrapping and simulation-based inference)
  • Overparameterized models and regularization (overparameterized models and regularization)
  • Bayesian multilevel models
  • Generic computation algorithms
  • Adaptive decision analysis
  • Robust inference
  • Exploratory data analysis

1. Counterfactual causal inference

Under hypothetical conditions, causal identification is possible, and these hypotheses can be stated strictly, and they can be resolved in various ways through design and analysis.

Different causal inference methods have been developed in different fields. In econometrics, it is the structural model and its influence on the average treatment effect, and in epidemiology, it is the inference of observed data.

Recognition based on causality is the core task of cognition, so it should be a calculable problem that can be formalized mathematically. Path analysis and causal discovery can be constructed based on potential outcomes, and vice versa.

2. Bootstrapping and simulation-based inference (bootstrapping and simulation-based inference)

A trend in statistics is to replace mathematical analysis with calculations, even before “big data” analysis begins.

The bootstrap method regards estimation as an approximate sufficient statistic of the data, and regards the bootstrap distribution as an approximation of the sampling distribution of the data.

At the same time, due to the universality and simple calculation of the bootstrap method, it can be applied to scenarios where traditional analytical approximation cannot be used, thus gaining great influence.

In the permutation test, the resampled data set is generated by randomly shuffling the target value to break the (possible) dependency between the predictor variable and the target.

Parameter bootstrapping, a priori and posterior prediction checks, and simulation-based calibration all create replicated data sets from a model, rather than directly resampling from the data.

When analyzing complex models or algorithms, sampling from known data generation mechanisms is often used to create simulation experiments to supplement or replace mathematical theories.

3. Hyperparameterized models and regularization (overparameterized models and regularization)

One of the main changes in statistics is to use some regularization procedures to fit a model with a large number of parameters, so as to obtain stable estimates and good predictions.

This is to avoid overfitting while gaining the flexibility of non-parametric or highly parameterized methods. Among them, regularization can be implemented as a parameter or a penalty function on the prediction curve.

Early examples of models include: Markov random fields, splines and Gaussian processes, classification and regression trees, neural networks, wavelet shrinkage, alternatives to least squares, and support vector machines.

Bayesian non-parametric priors have also made great progress in the family of infinite-dimensional probability models. These models have a feature that they expand as the sample size increases, and the parameters do not always have direct explanations, but Part of a larger forecasting system.

4. Bayesian multilevel models

Multi-level or hierarchical models have parameters that vary from group to group, allowing the model to adapt to cluster sampling, longitudinal research, time series cross-sectional data, meta-analysis, and other structured settings.

Multi-level models can be regarded as Bayesian models because they include probability distributions of unknown potential features or changing parameters. On the contrary, the Bayesian model has a multi-level structure, with the data of the given parameters and the distribution of the parameters of the given hyperparameters.

Similarly, Bayesian inference is not only used as a way to combine prior information with data, but also as a way to consider uncertainty for inference and decision-making.

5. Generic computation algorithms

Innovative statistical algorithms are developed in the context of the structure of statistical problems. EM algorithms, Gibbs sampling, particle filters, variational inference, and expectation propagation utilize the conditional independent structure of statistical models in different ways.

The Metropolis-Hastings algorithm and Hamilton Monte Carlo are less directly affected by statistical problems. They are similar to the earlier methods of calculating least squares and maximum likelihood estimates using optimization algorithms.

The method called approximate Bayesian calculation generates a model through simulation instead of evaluating the likelihood function to obtain posterior inference. If the analysis form of likelihood is difficult to solve or the computational cost is high, then this method can be used.

6. Adaptive decision analysis (adaptive decision analysis)

Through utility maximization, error rate control and empirical Bayesian analysis, as well as Bayesian decision theory and error discovery rate analysis, the development of adaptive decision analysis can be seen.

Some important developments in statistical decision analysis involve Bayesian optimization and reinforcement learning, which are related to the revival of experimental design for A/B testing.

The development of computing power makes it possible to use parameter-enriched models such as Gaussian processes and neural networks as function priors and perform large-scale reinforcement learning. For example, create AI to control robots, generate text, and play games such as Go.

Most of this work is done outside of statistics. The methods used include non-negative matrix factorization, nonlinear dimensionality reduction, generative adversarial networks, and autoencoders, all of which are unsupervised learning for finding structure and decomposition method.

7. Robust inference

The concept of robustness is the core of modern statistics, and its significance is that even if the model’s assumptions are incorrect, it can still be used.

An important part of statistical theory is the development of models that work well in violation of these assumptions.

Generally speaking, the main influence of robustness in statistical research lies not in the development of specific methods, but in the evaluation of statistical procedures, in which the data-generation process does not belong to the category of fitting probability models.

Researchers’ concerns about robustness are related to dense parameterized models that are features of modern statistics, which will have an impact on more general model evaluation.

8. Exploratory data analysis

Exploratory data analysis emphasizes the limitations of asymptotic theory and the corresponding benefits of open exploration and communication. This is in line with the point of view of statistical modeling, which is more focused on discovery rather than testing of fixed hypotheses.

Advances in computing enable practitioners to quickly build large and complex models, leading to the idea of ​​statistical graphics to help understand the relationship between data, fitted models, and predictions.

to sum up

Since the demand for modeling inevitably grows with the growth of computing power, so does the value of analytical summaries and approximations.

At the same time, statistical theory can help understand the working principles of statistical methods, and mathematical logic can inspire new models and methods for data analysis.

The author believes that these methods have opened up new ways of thinking about statistics and new methods of data analysis.

The counterfactual framework places causal inferences within a statistical or predictive framework, in which causal estimates can be precisely defined and expressed based on unobserved data in statistical models, and are linked to ideas in survey sampling and missing data imputation .

The bootstrap method opens the door to a form of implicit non-parametric modeling. It can be used for bias correction and variance estimation of complex surveys, experimental designs and other data structures that cannot be analyzed and calculated.

Over-parameterized models and regularization based on the ability to estimate its parameters from the data to formalize and generalize the existing practice of limiting the size of the model, which is related to cross-validation and information standards. Among them, regularization allows users to include more predictors in the model without worrying about overfitting.

The multi-level model formalizes the “empirical Bayes” technique of estimating the prior distribution from data, and uses methods with higher computational and reasoning stability in a wider range of problem categories.

General computing algorithms enable application practitioners to quickly fit advanced models used in causal reasoning, multi-level analysis, reinforcement learning, and many other fields, thereby having a broader impact on core ideas in statistics and machine learning.

Adaptive decision analysis links the engineering problem of optimal control with the field of statistical learning, far beyond the classic experimental design.

Robust inference allows formal evaluation and modeling of different programs to construct these problems to deal with outliers and other ambiguities that are incorrectly specified by the model, while the idea of ​​robust inference provides information for non-parametric estimation.

Exploratory data analysis pushes graphical techniques and discoveries into the mainstream of statistical practice, using these tools to better understand and diagnose problems in the new complex probability model categories that fit the data.

about the author

The most important in 50 years, 8 major statistical developments: Columbia University professor's paper lists statistical ideas that promote the AI ​​revolution

Andrew Gelman is a professor of statistics at Columbia University. He has won the American Statistical Association Outstanding Statistical Application Award, and the Statistical Society Chairperson Council Under 40 Outstanding Contribution Award.

Reference materials:

Posted by:CoinYuppie,Reprinted with attribution to:
Coinyuppie is an open information publishing platform, all information provided is not related to the views and positions of coinyuppie, and does not constitute any investment and financial advice. Users are expected to carefully screen and prevent risks.

Like (0)
Donate Buy me a coffee Buy me a coffee
Previous 2021-07-08 13:36
Next 2021-07-08 13:37

Related articles