Applications of Bootstrapping
A basic introduction to bootstrapping and real-world applications.
Introduction
Bootstrapping is a resampling technique used to estimate the distributions of a statistic given independent observations. It differs from traditional statistical theory, in that it requires modern computing power. In this article, we review the evolution of bootstrapping, how it works, and examples of how it’s been put into practice to help solve complex real-world problems.
The Evolution of Bootstrapping
Bootstrapping derived in the 20th century as an expansion of Jackknife resampling. Jackknife was developed in 1949 by Maurice Quenouille, and expanded in 1959 by John Tukey. It was named “jackknife” because while it was a useful tool, it was not ideal, much like the folding knifes that many men carried around during the Cold War (LaFontaine, 2021).
Jackknife is a non-parametric method aiming to reduce bias of an estimator or approximate unknown standard error. It estimates the bias and variance of a statistic of interest by estimating (n-1) sample statistics (each sample excludes one observation) from a sample size n. We calculate θ_hat (the original sample), by calculating θi_hat for each sample where x_i (one observation) is excluded for i = 1,2,…,n.
To test the null hypothesis, that the statistic is centered at some point (Efron, 1979), we take the average of sample statistic to find the center of the sampling distribution (LaFontaine, 2021).
The potential that Jackknife had, justifying inferences made on parameters, gained traction in the statistical community (LaFontaine, 2021). Rupert Miller, a Stanford University professor, wrote multiple papers on Jackknifing. He seemingly impacted his Ph.D. student and American statistician, Bradley Efron, who officially invited bootstrapping in 1977 (Wasserman & Bockenholt, 1989). Similar to Jackknife, Bootstrapping is based on the concept of using samples to create a sample distribution (LaFontaine, 2021). While bootstrapping had advantages (such as randomization), it took time to gain popularity. But it was hard to ignore.
The goal of sampling is to accurately represent a population of inference. Once we have samples, we can make inferences. The accuracy of inferences depends on these assumptions:
- Normality of the sampling distribution of parameter
- The estimated parameter’s standard error is close to the sampling distribution of the estimated parameter’s standard error
- Estimated parameter has a little bias in estimate (LaFontaine, 2021)
For some parameters (like mean), we can prove these assumptions relatively easily using the Central Limit Theorem and other parameter characteristics. For other parameters (like median), we have more difficulty proving these assumptions. Some parameters lack a theorem to prove normality of sampling distribution, or fail to establish bias. Since Bootstrapping has few to no assumptions about the distribution of the underlying population, it is a powerful tool that allows us to approximate distribution for statistical inferences when the accuracy of inferences assumptions above are not met.
While many statistical tools require knowledge, bootstrapping doesn’t need to know the actual data distribution.
So What Exactly is Bootstrapping?
Bootstrapping is a technical tool that uses random sampling with replacement to estimate a sampling distribution for a given statistic.
Before exploring further, lets review some sampling concepts.
- Sampling: selecting a subset of items from a given set of data (population) to estimate a characteristic of the population as a whole. Sample error describes how well a random sample actually represents the population
- Sampling with Replacement: Once we draw an item from the data, we put it back before drawing again
- Random Sampling: each sample we choose is independent of the others
Why is Sampling in Bootstrap Useful?
If you don’t think the Bootstrap method is important at first, you’re not alone. In the early 1980s, bootstrapping was not widely trusted by statisticians. Statisticians were not only uncomfortable assuming a sample could represent a population, but also reluctant due to the tedious computation required by hand (LaFontaine, 2021). A lack of software capabilities did not suit a technique that combined statistical inference with computer power.
Luckily, lack of software capabilities is no longer an issue. So still, what’s the point of creating Bootstrap Samples?
Lets say your company is performing poorly. You think this is correlated to happiness. You want to ask your employees to rate their happiness on a scale of 1–10.
If you’re a small start-up of 15 people, seems doable. But what if you’re a large corporation of 1000s of employees? No one has time to interview that many people. Instead of speaking to each employee, we can find the happiness scores of a random sample of n employees. If we do this B times, we can find the happiness scores of n x B employees, and use this as an estimate of “happiness” across the company.
This is the basic idea of Bootstrap Sampling!
Breaking Down the Bootstrap Method
Recapping, the basic idea of bootstrapping is that given some sample data with size N, we take independent samples with replacement, estimate parameter θ, and infer an estimate for some population using resampled data (Yen, 2019).
Lets look at each step in this diagram in more detail.
Step One: Sample Size N
Gather sample size N. Let x_1, x_2, …. ,x_n represent independent random variables.
Step Two: B Bootstrap Samples, each size n
Create a sample size n, by randomly and independently drawing elements from N, replacing the element each time, which maintains a probability of 1/n for each element. We do this B times to create B bootstrap samples. Let a bootstrap sample be represented by X* =[X_1*, X_2*,….X_n*].
Step Three: Estimator, θ
We can estimate our population parameter with many point estimators (the “best estimate”). Point estimators may include mean, median, variance, bayesian, maximum-likelihood, etc. For each bootstrap sample X*, we calculate the bootstrapped statistic (Wasserman & Bockenholt, 1989). By applying the same statistic on each bootstrap sample, we can create a sampling distribution to better understand the shape and center of our data (LaFontaine, 2021).
We can evaluate the accuracy of an estimator to get a better gauge of the accuracy of our inference (Yen, 2019). How do we know our estimator is accurate? To help evaluate our estimator we can calculate the standard deviation of the R*’s to get the standard error of the statistic (Wasserman & Bockenholt, 1989). The standard error tells us of how far our sample estimate is from the actual answer (Yen, 2019). We find the standard deviation of θ_hat to be,
where θ_1*,…θ_B* are calculated from B bootstrap samples with size n. In general, the larger B is the more our approximation will be arbitrarily accurate (NTU).
Step Four: Inference
In this step we draw a further inference from our sampling distribution. For example, we can estimate standard error for θ or find confidence interval for θ. We create a confidence interval based on the original sample statistic, and use bootstrapping to gather the sampling distribution to justify the accuracy of the interval (LaFontaine, 2021).
Let’s take a look at a bootstrap example and it’s notation. We have X1,X2,…Xn independent and identically distributed samples with some distribution F from a k-dimensional population. We want to estimate the distribution using the following formula:
where Rn is a real-valued functional of F and Tn is the statistic of interest.
We represent each bootstrap sample as X1*, X2*,…,Xn* which take samples from Fn, the empirical distribution. The empirical distribution is based on our original samples (X1, X2,…Xn), Rn* (see below), Tn* (see below),
and by placing mass 1/n at each Xi original sample. Thus, we represent Fn as:
where -∞ < x < ∞. We define the bootstrap estimator of Hn as:
This example produces a bootstrap statistic, which assesses the error of the primary statistical results (like standard errors or confidence intervals). It is an example of non-parametric bootstrap, since the bootstrap samples are generated from Fn.
It’s important to note that there are different types of bootstrapping. To list a few,
- Nonparametric bootstrapping does not make assumptions and thus knows nothing about how your observations are distributed. This led to biased estimation.
- Parametric bootstrapping is when you resample from a known distribution function, but use the sample to estimate parameters
- Bayesian bootstrapping is when new data is created by re-weighting the original data; the posterior distribution of a parameter is simulated rather than simulating the sampling distribution of a statistic estimating a parameter. To clarify, the “posterior” probability is when we know “outside” information. For example, two individuals are playing whiffle ball. We want to predict who is more likely to hit a home run. Before knowing anything about the individuals, we may assume their probabilities are similar (anterior or prior probability). After finding out that person 1 is a professional baseball player, we’d assume player 1 is more likely to hit a home run (posterior probability).
- Smooth bootstrapping is when a small amount of random noise centered at zero is added to each resampled observation
A very simple Python Example of Bootstrap
Say you have a fair die with 6 sides. On your first roll, the probability of rolling a 3 is 1/6. On your second roll, the probability of rolling a 3 is still 1/6. This is the idea of replacement, or resampling. If we want to see the outcome of rolling a die 1000 times, we’re better off using Python than tediously rolling a die 1000 times and straining our wrist.
We first generate a list, die_options, which is simply a grid of values that represent each side on a fair die.
Next, we want to create B bootstrap samples, each with size n. We can use NumPys’ random.choice function which returns a randomly selected element from a given sequence or list.
But we want more than one random element. We can specify the length of each bootstrap sample, with size=15. Below, we append 1000 bootstrap samples to bootstrap_samples.
Next, we calculate the estimator θ, which in this case will be the mean. For each bootstrap sample, we find the mean and add it to mean_list.
Now let’s look at the distribution.
The distribution is what we expect; the Central Limit Theorem says that adding independent random variables will result in a sum that tends toward a normal distribution even when the original data is not normally distributed.
To verify the accuracy of our results, we can then compare the true mean with our bootstrapped mean.
Of course, bootstrapping isn’t just helpful for understanding happiness scores and rolling dice. The Bootstrap method is a powerful tool which has applications in important machine learning and real-world scenarios.
Applications of a Real-World Bootstrap Example, The Mind and Body
Psychophysiological research often incorporates data analytic techniques, correlation analysis and analysis of variance. In this section, we are going to look further at bootstrapping and the correlation coefficient (let’s call it “r”).
We want to understand the relationship between two continuous variables. Let n be the random sample size of a population. Let x1 = [x11, x12, … x1n] and x2=[x21,x22,…,x2n] be our variables. We can use the sample correlation coefficient (r) to help us understand the relationship.
where x1_bar and x2_bar are sample means. The issue with this is that the distribution of r is often complicated, and transforming it (usually with Fisher’s z-transformation) does a better job at approximating large samples (Wasserman & Bockenholt, 1989).
“Brain Detector”, a study by Farwell and Donchin (1986), studies the use the P300 component of ERP (event-related brain potential) to detect human deception. The P300 component measures a person’s reaction. Farwell and Donchin hypothesized that the P300 could help determine if a suspect has knowledge of a criminal event (Wasserman & Bockenholt, 1989).
In the study subjects were given 3 stimuli under guilty and innocent conditions. For guilty conditions, they were presented probe (words related to the specific crime), irrelevant (words unrelated to the specific crime) and target (asked to count flashed stimuli) stimuli. The innocent condition was presented stimuli about a crime unknown to the test subject (Wasserman & Bockenholt, 1989). They predicted that the target stimuli would draw out a P300, that the irrelevant stimuli would not, and that the probe would only draw out a P300 when the subject had guilty knowledge. Each subject was shown 36 unique stimuli (6 probes, 6 targets, 24 irrelevant) 4 times. Their EEG was recorded and each subject received an ERP waveform under innocent and guilty conditions.
If the probe and target waveforms were similar, it was concluded that the subject had knowledge of the crime. If the probe and irrelevant waveform were similar, it was concluded that the subject did not have knowledge of the crime. The correlation coefficient (r) was used to measure the similarity between waveforms . A significant positive difference indicates knowledge, whereas a significant negative correlation indicates lack of knowledge (Wasserman & Bockenholt, 1989).
“Bootstrapping: Applications to Psychophysiology” by Stanley Wasserman and Ulf Bockenholt presented the concept of bootstrap to an audience of psychophysiologists, using data from the “Brain Detector” study. To bootstrap this problem, 10 waveforms per category (probe, irrelevant, target) were chosen at random with replacement. Each category was averaged. The averaged waveform’s correlation was computed and the study compared probe with target and irrelevant, separately. To find the bootstrap estimate standard deviation of the correlation differences and the numerous percentiles of the bootstrap distributions, this process was simulated 1000 times.
In the end, this bootstrap example found that the observations correlated over time, and that there was little doubt that subjects had crime relevant knowledge. They also concluded that a straightforward bootstrap method was best for solving this far from trivial analytical problem (Wasserman & Bockenholt, 1989).
Conclusion
Bootstrapping is a powerful computer-based tool that creates statistical inferences without relaying on many assumptions. It is often applied in statistical inferences such as regression models, confidence intervals, and other machine learning topics. The use of bootstrapping has greatly improved research opportunities.
The original non-parametric bootstrap method has changed the field of statistics. It has improved research opportunities by allowing researchers to work with smaller samples and more accurately measure uncertainty of estimates and inferences. Bootstrapping is now a common and popular tool among researchers, who can use inferences on data that would not have been possible otherwise.
Resources:
Efron, B (1979) “Bootstrap methods: another look at the jackknife” Vol 7, №1, 1–26
LaFontaine, Denise (2021) “The History of Bootstrapping: Tracing the Development of Resampling with Replacement,” The Mathematics Enthusiast: Vol. 18 : №1 , Article 8 https://scholarworks.umt.edu/cgi/viewcontent.cgi?article=1515&context=tme
Malato, Gianluca (2019) https://medium.com/data-science-reporter/the-bootstrap-the-swiss-army-knife-of-any-data-scientist-acd6e592be13
NTU, Bootstrap Method http://www.math.ntu.edu.tw/~hchen/teaching/LargeSample/notes/notebootstrap.pdf
Stanley Wasserman, Ulf Bockenholt (1989) “Bootstrapping: Applications to Psychophysiology”, Vol 26, Issue 2, 208–221
Yen, Lorna (2019) https://towardsdatascience.com/an-introduction-to-the-bootstrap-method-58bcb51b4d60