Central Limit Theorem and Sampling Distribution


Sample Mean as a Random Variable

Suppose that you have a population of size \(N\), and you are randomly sampling \(n\) units from it. Before conducting sampling, the outcome of each sampling unit \(X_i\) (\(i = 1,2,...,n\)) is not known and is determined by the random sampling process. Thus, the value of \(X_i\) is a random variable, with possible values ranging from the minimum to the maximum value in the population.

Now, let's say that you collected \(n\) sample data points from a random sampling and denote them by \(x_1, x_2, ..., x_n\). The collected data points at hand are the realizations of \(n\) independently and identically distributed (hereinafter i.i.d) random variables: \(X_1 = x_1, X_2 = x_2, ..., X_n = x_n\). 

Using these realized sample values, we calculate the sample mean as:

\(\bar{x} = \frac{1}{n} \sum_{i=1}^n x_i\)

It's important to note that the sample mean \(\bar{x}\) is itself a realization of another random variable, denoted as \(\bar{X}\). If you take another set of samples from the same population under the same, independent manner, you will obtain different set of values, \(x_1, x_2, ..., x_n\), resulting in a sample mean \(\bar{x}\) whose value is different from before. 

Put simply, the sample mean \(\bar{X}\) itself is a random variable rather than a fixed number. The variability of \(\bar{X}\) arises from the random sampling process of \(X_i\). Thus, if you independently repeat the sampling process multiple times, you will observe a range of different sample means.

For example, consider a scenario where you are sampling from a population whose distribution looks like below:

Seoul, South Korea, operates a public bike rental system with more than 800 stations scattered across the city. This system aims to improve urban mobility and convenience for residents and visitors alike. To predict the number and ensure a stable supply of rental bikes, the city has been recording the number of bike rentals along with weather and calendar information on an hourly basis. The histogram above describes the distribution of 8,465 historical records of the number of rented bikes.

Now, let's assume that this data set represents the population of our interest[1]. From these 8,465 population data points, we can draw a random sample of 5 values as follows:

/* Randomly select 5 observations out of 8,465 */
PROC SURVEYSELECT DATA=BikeSharing METHOD=SRS N=5 SEED=1123 OUT=Sample_1; TITLE "Bike Rental Sample";
RUN;
PROC PRINT DATA=Sample_1; TITLE "List of selected data points"; VAR RentedBikes; RUN;

PROC MEANS DATA=Sample_1; TITLE;
VAR RentedBikes;
RUN;

The SURVEYSELECT procedure selects 5 observations from the population using simple random sampling method. Subsequently, the MEANS procedure calculates the sample mean based on these 5 selected values, yielding a result of 476.4.

This sample mean of 476.4 is just one realization of the random variable \(\bar{X}\); if you draw another sample with a different seed, you will obtain a different result:

PROC SURVEYSELECT DATA=BikeSharing METHOD=SRS N=5 SEED=2234 OUT=Sample_2; TITLE "Bike Rental Sample"; RUN;
PROC PRINT DATA=Sample_2; TITLE "List of selected data points"; VAR RentedBikes; RUN; PROC MEANS DATA=Sample_2; TITLE; VAR RentedBikes; RUN;

After replacing the seed, we obtained another sample mean of 1,022. Indeed, there could be infinitely many possible sample means, as you can apply the same sampling method as many times as desired. Using SAS, let's draw 1,000 different samples:

%MACRO seed_generator;
DATA _NULL_; T=TIME(); X=RANUNI(0); Seed=X*(2**31-1); CALL SYMPUT('sampling_seed', Seed); RUN; %MEND seed_generator; %MACRO srs_sampling(sampl_size, sampling_seed); PROC SURVEYSELECT DATA=BikeSharing METHOD=SRS N=&sample_size SEED=&sampling_seed OUT=SelectedSample NOPRINT;
RUN; %MEND; %MACRO repeat_sampling(num_samples, n); DATA SamplingDist; ATTRIB _FREQ_ LABEL="Sample Size" RentedBikes LABEL="Sample Mean"; STOP; RUN;
%DO i=1 %TO &num_samples; %seed_generator; %srs_sampling(&n, &sampling_seed); PROC MEANS DATA=SelectedSample NOPRINT; VAR RentedBikes; OUTPUT OUT=SampleMean(WHERE=(_STAT_='MEAN')); RUN;
DATA SamplingDist; SET SamplingDist SampleMean; DROP _STAT_ _TYPE_; RUN; %END; %MEND repeat_sampling; %repeat_sampling(1000, 5);

The program above repeats the same sampling process 1,000 times. Each time, it generates and assign a new seed for the sampling, making sure that each set of 1,000 samples is unique and independent of previous runs. 

Using the 1,000 sample mean values obtained from the macro above, we can calculate their average (average of the averages). Let's calculate this through the MEANS procedure and compare it to the population mean:

PROC MEANS DATA=BikeSharing MEAN;
TITLE "Population Mean";
VAR RentedBikes;
RUN;

PROC MEANS DATA=SamplingDist MEAN;
TITLE "Average of 1,000 Sample Means";
RUN;

Observe that the average of the 1,000 sample means fairly close to the actual value of the population mean. 

Sampling Distribution of \(\bar{X}\)

Expected Value of Sample Means

In fact, the expected value of sample mean \(\bar{X}\) is equal to the population mean \(\mu\). Let \(X\) be a random variable with a population mean of \(\mu\), and \(\bar{X}_n\) denote the sample mean based on \(n\) observations. 

\(\begin{aligned} E(\bar{X}_n) &= E(\frac{1}{n}\sum_{i=1}^nX_i) = \frac{1}{n}E(\sum_{i=1}^nX_i) \\ & = \frac{1}{n}E(X_1+X_2+...+X_n) \\ & = \frac{1}{n}[E(X_1)+E(X_2)+...+E(X_n)] \\ & = \frac{1}{n} n \mu = \mu \end{aligned}\)

By the linearity of expected value, we equate \(E(\frac{1}{n}\sum^n X_i) = \frac{1}{n} E(\sum^n X_i)\) and \(\frac{1}{n}E(X_1 + X_2 + ... + X_n) = \frac{1}{n}[E(X_1) + E(X_2) + ... + E(X_n)]\). Thus, the expected value of \(\bar{X}_n\) is indeed \(\mu\). This implies that the sample mean is an unbiased estimator of the population mean

Standard Errors

Let \(X\) be a random variable with a population variance of \(\sigma^2\), and \(bar{X}_n\) denote the sample mean based on \(n\) observations. With this information, we can determine the variance of the sample mean \(\bar{X}_n\) as below:

\(\begin{aligned} Var(\bar{X}) & = Var(\frac{1}{n}(X_1 + X_2 + ... X_n))\\ & = \frac{1}{n^2} Var(X_1 + X_2 + ... + X_n)\\ & = \frac{1}{n^2}(Var(X_1) + Var(X_2) + ... + Var(X_n))\\ & = \frac{1}{n^2} \times nVar(X) = \frac{\sigma^2}{n}\end{aligned}\)

Since \(X_i \stackrel{i.i.d}{\sim} F\), where \(F\) is the CDF of an unknown probability distribution, we equate \(Var(X_1 + X_2 + ... + X_n) = (Var(X_1) + Var(X_2) + ... + Var(X_n)\). 

This shows how \(Var(\bar{X})\) relates to \(\sigma^2\) and \(n\). Essentially, as the sample size increases, the sample means become closer together (lower variance). This aligns with the intuitive understanding of the variability of \(\bar{X}\); each outcome of \(\bar{X}\) will more be tightly clustered around \(\mu\), as \(n\) approaches the population size \(N\), because they better represent the entire population.

If you take the square root of \(Var(\bar{X})\), you obtain the std. dev of the sample means, commonly referred to as the standard error, \(\sigma/\sqrt{n}\). 

Normal Distribution

Before we proceed to the central limit theorem, let's briefly review the normal distribution, also known as the Gaussian distribution named after German mathematician Johann Carl Friedrich Gauss. It is characterized by its well-known bell-shaped probability density function:

\(f_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp{\begin{pmatrix}-\frac{(x-\mu)^2}{2\sigma^2}\end{pmatrix}}\)

and cumulative density function of:

\(F_X(x) = \int_{-\infty}^x f_X(x)dx = \int_{-\infty}^x \frac{1}{\sqrt{2\pi\sigma^2}} \exp{\begin{pmatrix}-\frac{(x-\mu)^2}{2\sigma^2}\end{pmatrix}}\)

Specifically, when a random variable \(X \sim N(\mu, \sigma^2)\) is centered around its mean \(\mu\) and spread by its standard deviation \(\sigma\), we can standardize it by shifting it be centered around 0 and scaling it down by 1. This transformed random variable \(Z = \frac{X - \mu}{\sigma}\) follows the standard normal distribution \(N(0,\;1)\).

  • The PDF of \(Z\), denoted as \(\phi(z)\), is represented by \(f_Z(z) = \frac{1}{\sqrt{2\pi}}\exp\begin{pmatrix}-\frac{z^2}{2}\end{pmatrix}\).
  • The CDF of \(Z\), denoted as \(\Phi(z)\), is given by \(F_Z(z) = \int_{-\infty}^z \phi(t) dt\)

The moment generating function (MGF) is a mathematical function used in probability and statistics to characterize a probability distribution. It encodes information about the distribution's moments (mean, variance, skewness, etc.) in a single function. The MGF of the standard normal distribution is:

\(\begin{aligned}M_Z(t) &= E\begin{pmatrix}e^{tz}\end{pmatrix}\\ &= \int_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi}} e^{-z^2/2} e^{zt} dz \\ &=e^{t^2/2}\int_{-\infty}^{\infty}\frac{1}{\sqrt{2\pi}}e^{-(z-t)^2/2}dz \\ &=e^{t^2/2}\end{aligned}\)

One notable property of MGF, including that of the standard normal distribution, is that if MGF for a random variable \(X\) is the same as that for another random variable \(Y\), then the PDFs for the two random variables are identical. This property has special meaning that the MGF can uniquely determines the distribution of a random variable.

Central Limit Theorem

The concept of sampling distribution of \(\bar{X}\) directly related to the central limit theorem. In probability theory, the central limit theorem (CLT) states that, for random variables \(X_1, X_2, ... X_n\) that are independent and identically distributed with \(E(X_i)=\mu\) and \(Var(X_i) = \sigma^2\) for \(i = 1, 2, ..., n\), \(\bar{X} \stackrel{d}{\rightarrow} N(\mu,\;\sigma/\sqrt{n})\), as \(n \rightarrow \infty\). That being said, regardless of the distribution of the original population, the sample mean will converge towards a normal distribution, as the sample size \(n\) increases[2]. You can prove this theorem by showing that the MGF of the standardized sample mean converges to that of the standard normal distribution[3], as \(n \rightarrow \infty\).

Let \(X\) be a random variable from an unknown distribution with mean \(\mu\) and std. dev \(\sigma\), and \(\bar{X}_n\) denote the sample mean of \(X\) based on \(n\) observations. As discussed earlier, the random variable \(\bar{X}_n\) has an expected value of \(\mu\) and std. dev of \(\frac{\sigma}{\sqrt{n}}\). Using this, let's first standardize the \(\bar{X}\): 

\(\frac{\bar{X}_n - \mu}{\sigma / \sqrt{n}} = \frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma}\). 

Then, the MGF of the standardized sample mean is:

\(\begin{aligned} M_{\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}}(t) & = E\begin{pmatrix} \exp\begin{pmatrix}t\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma}\end{pmatrix}\end{pmatrix}\\ & = E\begin{pmatrix} \exp\begin{pmatrix}t\frac{\sqrt{n}\begin{pmatrix}\frac{\sum_i^n X_i}{n}-\frac{n\mu}{n}\end{pmatrix}}{\sigma}\end{pmatrix}\end{pmatrix}\\ & = E\begin{pmatrix} \exp\begin{pmatrix}\frac{\sum_i^n (X_i-\mu)}{\sigma}\frac{\sqrt{n}}{n}t\end{pmatrix}\end{pmatrix}\\ & = E\begin{pmatrix} \exp\begin{pmatrix}\frac{(X_1 - \mu) + (X_2 - \mu) + ... + (X_n - \mu)}{\sigma}\frac{\sqrt{n}}{n}t\end{pmatrix}\end{pmatrix}\\& = E\begin{pmatrix} \exp\begin{pmatrix}\frac{(X_1 - \mu)}{\sigma}\frac{\sqrt{n}}{n}t\end{pmatrix} \exp\begin{pmatrix}\frac{(X_2 - \mu)}{\sigma}\frac{\sqrt{n}}{n}t\end{pmatrix} \cdots \exp\begin{pmatrix}\frac{(X_n - \mu)}{\sigma}\frac{\sqrt{n}}{n}t\end{pmatrix} \end{pmatrix}\\ & = E\begin{pmatrix} \exp\begin{pmatrix}\frac{(X_1 - \mu)}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix} \exp\begin{pmatrix}\frac{(X_2 - \mu)}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix} \cdots \exp\begin{pmatrix}\frac{(X_n - \mu)}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix} \end{pmatrix}\end{aligned}\)

Since each sampling unit \(X_i\) is independent to each other:

\(\begin{aligned} \Rightarrow & E\begin{pmatrix} \exp\begin{pmatrix}\frac{(X_1 - \mu)}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix} \exp\begin{pmatrix}\frac{(X_2 - \mu)}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix} \cdots \exp\begin{pmatrix}\frac{(X_n - \mu)}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix} \end{pmatrix} \\ &= E\begin{pmatrix}\exp\begin{pmatrix}\frac{X_1 - \mu}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix}\end{pmatrix}E\begin{pmatrix}\exp\begin{pmatrix}\frac{X_2 - \mu}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix}\end{pmatrix} \cdots E\begin{pmatrix}\exp\begin{pmatrix}\frac{X_n - \mu}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix}\end{pmatrix}\end{aligned}\)

For \(i = 1, 2, ..., n\), random variable \(X_i\) are identically distributed. So, let's denote each \(X_i\) as \(X\). Then we obtain:

\(\Rightarrow M_{\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}}(t) = E\begin{pmatrix}\exp\begin{pmatrix} \frac{X-\mu}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix}\end{pmatrix}^n\)

Remember the definition of MGF for a random variable \(X\) is \(M_X(t) = E(e^{tx})\). Thus, the MGF of the standardized sample mean \(\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma}\) is:

\(\begin{aligned}\Rightarrow M_{\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}}(t) & = E\begin{pmatrix}\exp\begin{pmatrix}\frac{X-\mu}{\sigma}\frac{t}{\sqrt{n}}\end{pmatrix}\end{pmatrix}^n \\ & = \begin{pmatrix}M_{\frac{X-\mu}{\sigma}}(\frac{t}{\sqrt{n}})\end{pmatrix}^n\end{aligned}\)

This implies the relationship between the MGF of \(\bar{X}\) and that of random variable \(X\): MGF of \(\bar{X}\) is the \(n\)-th powered MGF of \(X\).

Now, let \(n \rightarrow \infty\):

\(\begin{aligned}\Rightarrow \lim_{n \rightarrow \infty} M_{\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma}}(t) & = \lim_{n \rightarrow \infty} \begin{pmatrix}M_{\frac{X-\mu}{\sigma}}(\frac{t}{\sqrt{n}})\end{pmatrix}^n \\ & = \exp\begin{pmatrix}\lim_{n \rightarrow \infty} n \ast \log M_{\frac{X-\mu}{\sigma}}(\frac{t}{\sqrt{n}})\end{pmatrix}\end{aligned}\)

Let \(h = \frac{1}{\sqrt{n}}\). Since \(h \rightarrow 0\) as \(n \rightarrow \infty \), we obtain:

\(\begin{aligned}\lim_{n \rightarrow \infty} M_{\frac{\sqrt{n}(\bar{X}-\mu)}{\sigma}}(t) & = \exp\begin{pmatrix}\lim_{n \rightarrow \infty} n \ast \log M_{\frac{X - \mu}{\sigma}}(\frac{t}{\sqrt{n}}) \end{pmatrix}\\ & = \exp\begin{pmatrix}\lim_{h \rightarrow 0}\frac{1}{h^2} \ast \log M_{\frac{X - \mu}{\sigma}}(t \ast h)\end{pmatrix}\\ & = \exp\begin{pmatrix}\lim_{h \rightarrow 0} \frac{\log M_{\frac{X - \mu}{\sigma}}(t \ast h)}{h^2}\end{pmatrix}\end{aligned}\)

Notice that \(\lim_{h \rightarrow 0}M_{\frac{X - \mu}{\sigma}}(t \ast h) = \lim_{h \rightarrow 0} E(e^{th}) = E(e^0) = 1\). Thus, \(\lim_{h \rightarrow 0} \log M_{\frac{X - \mu}{\sigma}}(t \ast h) = \log 1 = 0\). Also, \(\lim_{h \rightarrow 0} h^2 = 0\). These two functions are defined on an open interval including 0 and are differentiable on the interval. So, we can apply L'Hospital's rule. 

\(\begin{aligned}\exp \begin{pmatrix}\lim_{h \rightarrow 0} \frac{\log M_{\frac{X - \mu}{\sigma}}(t \ast h)}{h^2}\end{pmatrix} & = \exp \begin{pmatrix}\lim_{h \rightarrow 0} \frac{t \ast M'_{\frac{X - \mu}{\sigma}}}{2h \ast M_{\frac{X - \mu}{\sigma}}(t \ast h)}\end{pmatrix} \\ & = \exp \begin{pmatrix}\frac{t}{2}\ast\lim_{h \rightarrow 0}\frac{M'_{\frac{X - \mu}{\sigma}}(t \ast h) - 0}{h}\end{pmatrix}\\ & = \exp\begin{pmatrix}\frac{t}{2} \ast \lim_{h \rightarrow 0}\frac{M'_{\frac{X - \mu}{\sigma}}(t \ast h) -M'_{\frac{X - \mu}{\sigma}}(t \ast 0)}{h}\end{pmatrix}\end{aligned}\)

Observe that the expression inside the \(\exp\) is the derivative of \(M'_{\frac{X - \mu}{\sigma}}(t\ast h)\) at \(h = 0\), with respect to \(h\). Thus:

\(\begin{aligned} \Rightarrow \exp\begin{pmatrix}\frac{t}{2} \ast \lim_{h \rightarrow 0}\frac{M'_{\frac{X - \mu}{\sigma}}(t \ast h) -M'_{\frac{X - \mu}{\sigma}}(t \ast 0)}{h}\end{pmatrix} & = \exp\begin{pmatrix}\frac{t}{2} \ast M^{''}_{\frac{X - \mu}{\sigma}}(t \ast 0) \end{pmatrix} \\ & = \exp\begin{pmatrix}\frac{t}{2} \ast M^{''}_{\frac{X - \mu}{\sigma}}(0) \end{pmatrix}\end{aligned}\) 

In the equation, by the definition of MGF and the properties of expectation and variance:

\(\begin{aligned}M^{''}_{\frac{X-\mu}{\sigma}}(0) & = E\begin{pmatrix}\begin{pmatrix}\frac{X-\mu}{\sigma}\end{pmatrix}^2\end{pmatrix}\\ & = Var\begin{pmatrix}\frac{X-\mu}{\sigma}\end{pmatrix} + E\begin{pmatrix}\frac{X-\mu}{\sigma}\end{pmatrix}^2\\ & = \frac{1}{\sigma^2}Var(X) + \begin{pmatrix}\frac{E(X) - E(\mu)}{\sigma}\end{pmatrix}^2\\ & = \frac{1}{\sigma^2}Var(X) \begin{pmatrix}\frac{E(X) - \mu}{\sigma}\end{pmatrix}^2\end{aligned}\)

Since \(Var(X) = \sigma^2 \) and \(E(X) = \mu \), \(M^{''}_{\frac{X-\mu}{\sigma}}(0) = 1 + 0^2 = 1\). Therefore:

\(\lim_{n \rightarrow \infty} M_{\frac{\sqrt{n}(\bar{X} - \mu)}{\sigma}}(t) = \exp\begin{pmatrix}\frac{t}{2}\end{pmatrix}\)

We see that the MGF of the standardized sample mean \(\frac{\sqrt{n}(\bar{X} - \mu)}{\sigma}\) becomes that of the standard normal distribution, as \(n \rightarrow \infty \). Therefore, we can conclude that \(\bar{X} \stackrel{d}{\rightarrow} N(\mu, \sigma/\sqrt{n})\), when \(n \rightarrow \infty \). The main point here is that we made no assumptions about the distribution of \(X\); all we assumed is that it follows an unknown distribution centered at \(\mu\) with a spread of \(\sigma\)!

Now, using SAS, let's see how CLT applies. In the following code, the first SGPLOT procedure draws histogram of the population set (number rented bikes stored in the variable RentedBikes). Following this, the macro call %repeat_sampling is invoked to draw 1,000 samples of size 10 from this population and calculates sample means. Subsequently, the second SGPLOT procedure plots the distribution of the sample means derived from these 1,000 samples. 

Notably, despite the original population distribution of rented bikes not being well-shaped, the resulting distribution of 1,000 sample means closely approximates a normal distribution. This observation underscores the CLT principle, which states that the distribution of sample means tends to be normal, regardless of the underlying population distribution.

/* Distribution of Population */ PROC SGPLOT DATA=BikeSharing;
TITLE "Distribution of X"; HISTOGRAM RentedBikes / SCALE=COUNT; RUN;

%repeat_sampling(1000, 10); /* Distribution of Sample Means */ PROC SGPLOT DATA=SamplingDist;
TITLE "Distribution of X Bar"; HISTOGRAM RentedBikes / SCALE=COUNT; DENSITY RentedBikes / TYPE=NORMAL; XAXIS LABEL = "Sample Means"; RUN;

As discussed earlier, \(\bar{X}\) are normally distributed with an average equal to the \(\mu\) and std. dev equal to \(\sigma/\sqrt{n}\). That being said, as the sample size \(n\) increases, the sampling distribution of \(\bar{X}\) has a smaller variance, indicating \(\bar{X}\) is more likely to closely approximate the true population mean \(\mu\). This alignment makes intuitive sense because a larger sample provides a more complete picture of the population; if \(n = N\), the variability of \(bar{X}\) is zero, because the sample mean of size equal to the population size \(N\) will always result in \(\mu\), which is constant. 

Let's check if this is true:

/* Standard error diminishes as n gets larger */
%MACRO hist_sampling_dist(n); %repeat_sampling(1000, &n); PROC SGPLOT DATA=SamplingDist; TITLE "Sampling Distribution of n = &n"; HIHISTOGRAM RentedBikes / SCALE=COUNT; DENSITY RentedBikes / TYPE=NORMAL; DENSITY RentedBikes / TYPE=KERNEL;
XAXIS LABEL = "Sample Means" MIN=0 MAX=1500; RUN;
%MEND; %hist_sampling_dist(30); %hist_sampling_dist(50); %hist_sampling_dist(100);

In the histograms above, we can observe that the variabilities in \(\bar{X}\) diminishes, as the sample size \(n\) gets larger.


[1] For the problem of determining the average number of bike rentals, technically, this data set cannot fully represent our population of interest. While it includes historical records of bike rentals, our interest extends beyond past records; ideally, the population should include future bike rentals as well, which are inherently unknowable. However, for the purpose of discussing the sampling distribution of means, let's assume this data set represents our population of interest for now.  
[2] If the population where you sampled from has a normal distribution, the sample mean obtained from your sample will be normally distributed, regardless of its size. On the other hand, even if the population is not normally distributed, given a large enough sample size, the CLT ensures that the sample means will approximate a normal distribution. In this case, while there's no strict definition of a "sufficiently large" sample size n, empirically, sample sizes of 30 or more are considered adequate for the CLT to hold true. 

[3] This proof only applies to distributions for which the MGF exists. However, the CLT remains applicable even when the population has a distribution whose MGF does not exist. To prove this, you should use characteristic function \(\mathbb{E}(e^{itX})\). 

Post a Comment

0 Comments