07.06.2020

Build a confidence interval online. Confidence interval


The confidence interval came to us from the field of statistics. This is a defined range that serves to estimate an unknown parameter with a high degree of reliability. The easiest way to explain this is with an example.

Suppose you need to investigate some random variable, for example, the speed of the server's response to a client request. Every time the user types in the address of a particular site, the server responds with different speed. Thus, the investigated response time has a random character. So, the confidence interval allows you to determine the boundaries of this parameter, and then it will be possible to assert that with a probability of 95% the server will be in the range we calculated.

Or you need to find out how many people know about trademark firms. When the confidence interval is calculated, it will be possible, for example, to say that with a 95% probability the share of consumers who know about this is in the range from 27% to 34%.

Closely related to this term is confidence level. It represents the probability that the desired parameter is included in the confidence interval. This value determines how large our desired range will be. The larger the value it takes, the narrower the confidence interval becomes, and vice versa. Usually it is set to 90%, 95% or 99%. The value of 95% is the most popular.

This indicator is also affected by the variance of observations and its definition is based on the assumption that the feature under study obeys. This statement is also known as Gauss' Law. According to him, such a distribution of all probabilities of a continuous random variable, which can be described by a probability density, is called normal. If the assumption of a normal distribution turned out to be wrong, then the estimate may turn out to be wrong.

First, let's figure out how to calculate the confidence interval for Here, two cases are possible. Dispersion (the degree of spread of a random variable) may or may not be known. If it is known, then our confidence interval is calculated using the following formula:

xsr - t*σ / (sqrt(n))<= α <= хср + t*σ / (sqrt(n)), где

α - sign,

t is a parameter from the Laplace distribution table,

σ is the square root of the dispersion.

If the variance is unknown, then it can be calculated if we know all the values ​​of the desired feature. For this, the following formula is used:

σ2 = х2ср - (хр)2, where

х2ср - the average value of the squares of the trait under study,

(xsr)2 is the square of this feature.

The formula by which the confidence interval is calculated in this case changes slightly:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n)), где

xsr - sample mean,

α - sign,

t is a parameter that is found using the Student's distribution table t \u003d t (ɣ; n-1),

sqrt(n) is the square root of the total sample size,

s is the square root of the variance.

Consider this example. Assume that, based on the results of 7 measurements, the trait under study was determined to be 30 and the sample variance equal to 36. It is necessary to find, with a probability of 99%, a confidence interval that contains the true value of the measured parameter.

First, let's determine what t is equal to: t \u003d t (0.99; 7-1) \u003d 3.71. Using the above formula, we get:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n))

30 - 3.71*36 / (sqrt(7))<= α <= 30 + 3.71*36 / (sqrt(7))

21.587 <= α <= 38.413

The confidence interval for the variance is calculated both in the case of a known mean and when there is no data on the mathematical expectation, and only the value of the unbiased point estimate of the variance is known. We will not give here the formulas for its calculation, since they are quite complex and, if desired, they can always be found on the net.

We only note that it is convenient to determine the confidence interval using the Excel program or a network service, which is called so.

Probabilities, recognized as sufficient to confidently judge the general parameters based on sample characteristics, are called fiduciary .

Usually, values ​​of 0.95 are chosen as confidence probabilities; 0.99; 0.999 (they are usually expressed as a percentage - 95%, 99%, 99.9%). The higher the measure of responsibility, the higher the level of confidence: 99% or 99.9%.

A confidence level of 0.95 (95%) is considered sufficient in scientific research in the field of physical culture and sports.

The interval in which the sample arithmetic mean of the general population is found with a given confidence probability is called confidence interval .

Assessment Significance Level is a small number α, the value of which implies the probability that it is outside the confidence interval. In accordance with the confidence probabilities: α 1 = (1-0.95) = 0.05; α 2 \u003d (1 - 0.99) \u003d 0.01, etc.

Confidence interval for mean (expectation) a normal distribution:

,

where is the reliability (confidence probability) of estimation; - sample mean; s - corrected standard deviation; n is the sample size; t γ is the value determined from the Student's distribution table (see Appendix, Table 1) for given n and γ.

To find the boundaries of the confidence interval of the mean value of the general population, it is necessary:

1. Calculate and s.

2. It is necessary to set the confidence probability (reliability) γ of estimation 0.95 (95%) or the significance level α 0.05 (5%)

3. According to the table t - Student's distributions (Appendix, Table 1) find the boundary values ​​of t γ .

Since the t-distribution is symmetrical about the zero point, it is sufficient to know only the positive value of t. For example, if the sample size is n=16, then the number of degrees of freedom (degrees of freedom, df) t– distributions df=16 - 1=15 . According to the table 1 application t 0.05 = 2.13 .

4. We find the boundaries of the confidence interval for α = 0.05 and n=16:

Limits of trust:

For large sample sizes (n ≥ 30) t – Student's distribution becomes normal. Therefore, the confidence interval for for n ≥ 30 can be written as follows:

Where u are the percentage points of the normalized normal distribution.

For standard confidence probabilities (95%, 99%; 99.9%) and significance levels α values ​​( u) are given in Table 8.

Table 8

Values ​​for standard confidence levels α

α u
0,05 1,96
0,01 2,58
0,001 3,28

Based on the data of example 1, we define the boundaries of the 95% confidence interval (α = 0.05) for the average result of jumping up from the spot. In our example, the sample size is n = 65, then recommendations for a large sample size can be used to determine the boundaries of the confidence interval.

Confidence interval for mathematical expectation - this is such an interval calculated from the data, which with a known probability contains the mathematical expectation of the general population. The natural estimate for the mathematical expectation is the arithmetic mean of its observed values. Therefore, further during the lesson we will use the terms "average", "average value". In problems of calculating the confidence interval, the answer most often required is "The confidence interval of the average number [value in a specific problem] is from [lower value] to [higher value]". With the help of the confidence interval, it is possible to evaluate not only the average values, but also the share of one or another feature of the general population. Mean values, variance, standard deviation and error, through which we will come to new definitions and formulas, are analyzed in the lesson Sample and Population Characteristics .

Point and interval estimates of the mean

If the mean value of the general population is estimated by a number (point), then a specific mean calculated from a sample of observations is taken as an estimate of the unknown mean of the general population. In this case, the value of the sample mean - a random variable - does not coincide with the mean value of the general population. Therefore, when indicating the mean value of the sample, it is also necessary to indicate the sample error at the same time. The standard error is used as a measure of sampling error, which is expressed in the same units as the mean. Therefore, the following notation is often used: .

If the estimate of the mean is required to be associated with a certain probability, then the parameter of the general population of interest must be estimated not by a single number, but by an interval. A confidence interval is an interval in which, with a certain probability, P the value of the estimated indicator of the general population is found. Confidence interval in which with probability P = 1 - α is a random variable , is calculated as follows:

,

α = 1 - P, which can be found in the appendix to almost any book on statistics.

In practice, the population mean and variance are not known, so the population variance is replaced by the sample variance, and the population mean by the sample mean. Thus, the confidence interval in most cases is calculated as follows:

.

The confidence interval formula can be used to estimate the population mean if

  • the standard deviation of the general population is known;
  • or the standard deviation of the population is not known, but the sample size is greater than 30.

The sample mean is an unbiased estimate of the population mean. In turn, the sample variance is not an unbiased estimate of the population variance . To obtain an unbiased estimate of the population variance in the sample variance formula, the sample size is n should be replaced with n-1.

Example 1 Information is collected from 100 randomly selected cafes in a certain city that the average number of employees in them is 10.5 with a standard deviation of 4.6. Determine the confidence interval of 95% of the number of cafe workers.

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

Thus, the 95% confidence interval for the average number of cafe employees was between 9.6 and 11.4.

Example 2 For a random sample from a general population of 64 observations, the following total values ​​were calculated:

sum of values ​​in observations ,

sum of squared deviations of values ​​from the mean .

Calculate the 95% confidence interval for the expected value.

calculate the standard deviation:

,

calculate the average value:

.

Substitute the values ​​in the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

We get:

Thus, the 95% confidence interval for the mathematical expectation of this sample ranged from 7.484 to 11.266.

Example 3 For a random sample from a general population of 100 observations, a mean value of 15.2 and a standard deviation of 3.2 were calculated. Calculate the 95% confidence interval for the expected value, then the 99% confidence interval. If the sample power and its variation remain the same, but the confidence factor increases, will the confidence interval narrow or widen?

We substitute these values ​​into the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,05 .

We get:

.

Thus, the 95% confidence interval for the average of this sample was from 14.57 to 15.82.

Again, we substitute these values ​​into the expression for the confidence interval:

where is the critical value of the standard normal distribution for the significance level α = 0,01 .

We get:

.

Thus, the 99% confidence interval for the average of this sample was from 14.37 to 16.02.

As you can see, as the confidence factor increases, the critical value of the standard normal distribution also increases, and, therefore, the start and end points of the interval are located further from the mean, and thus the confidence interval for the mathematical expectation increases.

Point and interval estimates of the specific gravity

The share of some feature of the sample can be interpreted as a point estimate of the share p the same trait in the general population. If this value needs to be associated with a probability, then the confidence interval of the specific gravity should be calculated p feature in the general population with a probability P = 1 - α :

.

Example 4 There are two candidates in a certain city A And B running for mayor. 200 residents of the city were randomly polled, of which 46% answered that they would vote for the candidate A, 26% - for the candidate B and 28% do not know who they will vote for. Determine the 95% confidence interval for the proportion of city residents who support the candidate A.

Confidence intervals ( English Confidence Intervals) one of the types of interval estimates used in statistics, which are calculated for a given level of significance. They allow us to make a statement that the true value of an unknown statistical parameter of the general population is in the obtained range of values ​​with a probability that is given by the chosen level of statistical significance.

Normal distribution

When the variance (σ 2 ) of the population of data is known, a z-score can be used to calculate confidence limits (boundary points of the confidence interval). Compared to using a t-distribution, using a z-score will not only provide a narrower confidence interval, but also provide more reliable estimates of the mean and standard deviation (σ), since the Z-score is based on a normal distribution.

Formula

To determine the boundary points of the confidence interval, provided that the standard deviation of the population of data is known, the following formula is used

L = X - Z α/2 σ
√n

Example

Assume that the sample size is 25 observations, the sample mean is 15, and the population standard deviation is 8. For a significance level of α=5%, the Z-score is Z α/2 =1.96. In this case, the lower and upper limits of the confidence interval will be

L = 15 - 1.96 8 = 11,864
√25
L = 15 + 1.96 8 = 18,136
√25

Thus, we can state that with a probability of 95% the mathematical expectation of the general population will fall in the range from 11.864 to 18.136.

Methods for narrowing the confidence interval

Let's say the range is too wide for the purposes of our study. There are two ways to decrease the confidence interval range.

  1. Reduce the level of statistical significance α.
  2. Increase the sample size.

Reducing the level of statistical significance to α=10%, we get a Z-score equal to Z α/2 =1.64. In this case, the lower and upper limits of the interval will be

L = 15 - 1.64 8 = 12,376
√25
L = 15 + 1.64 8 = 17,624
√25

And the confidence interval itself can be written as

In this case, we can make the assumption that with a probability of 90%, the mathematical expectation of the general population will fall into the range.

If we want to keep the level of statistical significance α, then the only alternative is to increase the sample size. Increasing it to 144 observations, we obtain the following values ​​of the confidence limits

L = 15 - 1.96 8 = 13,693
√144
L = 15 + 1.96 8 = 16,307
√144

The confidence interval itself will look like this:

Thus, narrowing the confidence interval without reducing the level of statistical significance is only possible by increasing the sample size. If it is not possible to increase the sample size, then the narrowing of the confidence interval can be achieved solely by reducing the level of statistical significance.

Building a confidence interval for a non-normal distribution

If the population standard deviation is not known or the distribution is non-normal, the t-distribution is used to construct a confidence interval. This technique is more conservative, which is expressed in wider confidence intervals, compared to the technique based on the Z-score.

Formula

The following formulas are used to calculate the lower and upper limits of the confidence interval based on the t-distribution

L = X - tα σ
√n

The Student's distribution or t-distribution depends on only one parameter - the number of degrees of freedom, which is equal to the number of individual feature values ​​(the number of observations in the sample). The value of Student's t-test for a given number of degrees of freedom (n) and the level of statistical significance α can be found in the lookup tables.

Example

Assume that the sample size is 25 individual values, the mean value of the sample is 50, and the standard deviation of the sample is 28. You need to construct a confidence interval for the level of statistical significance α=5%.

In our case, the number of degrees of freedom is 24 (25-1), therefore, the corresponding tabular value of Student's t-test for the level of statistical significance α=5% is 2.064. Therefore, the lower and upper bounds of the confidence interval will be

L = 50 - 2.064 28 = 38,442
√25
L = 50 + 2.064 28 = 61,558
√25

And the interval itself can be written as

Thus, we can state that with a probability of 95% the mathematical expectation of the general population will be in the range.

Using a t-distribution allows you to narrow the confidence interval, either by reducing statistical significance or by increasing the sample size.

Reducing the statistical significance from 95% to 90% in the conditions of our example, we get the corresponding tabular value of Student's t-test 1.711.

L = 50 - 1.711 28 = 40,418
√25
L = 50 + 1.711 28 = 59,582
√25

In this case, we can say that with a probability of 90% the mathematical expectation of the general population will be in the range.

If we do not want to reduce the statistical significance, then the only alternative is to increase the sample size. Let's say that it is 64 individual observations, and not 25 as in the initial condition of the example. The tabular value of Student's t-test for 63 degrees of freedom (64-1) and the level of statistical significance α=5% is 1.998.

L = 50 - 1.998 28 = 43,007
√64
L = 50 + 1.998 28 = 56,993
√64

This gives us the opportunity to assert that with a probability of 95% the mathematical expectation of the general population will be in the range.

Large Samples

Large samples are samples from a population of data with more than 100 individual observations. Statistical studies have shown that larger samples tend to be normally distributed, even if the distribution of the population is not normal. In addition, for such samples, the use of z-score and t-distribution give approximately the same results when constructing confidence intervals. Thus, for large samples, it is acceptable to use a z-score for a normal distribution instead of a t-distribution.

Summing up

One of the methods for solving statistical problems is the calculation of the confidence interval. It is used as a preferred alternative to point estimation when the sample size is small. It should be noted that the process of calculating the confidence interval is rather complicated. But the tools of the Excel program allow you to somewhat simplify it. Let's find out how this is done in practice.

This method is used in the interval estimation of various statistical quantities. The main task of this calculation is to get rid of the uncertainties of the point estimate.

In Excel, there are two main options to calculate using this method: when the variance is known, and when it is unknown. In the first case, the function is used for calculations CONFIDENCE NORM, and in the second TRUST.STUDENT.

Method 1: CONFIDENCE NORM function

Operator CONFIDENCE NORM, which refers to the statistical group of functions, first appeared in Excel 2010. Earlier versions of this program use its counterpart TRUST. The task of this operator is to calculate a confidence interval with a normal distribution for the population mean.

Its syntax is as follows:

CONFIDENCE NORM(alpha, standard_dev, size)

"Alpha" is an argument indicating the level of significance that is used to calculate the confidence level. The confidence level is equal to the following expression:

(1-"Alpha")*100

"Standard deviation" is an argument, the essence of which is clear from the name. This is the standard deviation of the proposed sample.

"Size" is an argument that determines the size of the sample.

All arguments to this operator are required.

Function TRUST has exactly the same arguments and possibilities as the previous one. Its syntax is:

TRUST(alpha, standard_dev, size)

As you can see, the differences are only in the name of the operator. This feature has been retained in Excel 2010 and newer versions in a special category for compatibility reasons. "Compatibility". In versions of Excel 2007 and earlier, it is present in the main group of statistical operators.

The confidence interval boundary is determined using the formula of the following form:

X+(-)CONFIDENCE NORM

Where X is the sample mean, which is located in the middle of the selected range.

Now let's look at how to calculate the confidence interval using a specific example. 12 tests were carried out, resulting in different results, which are listed in the table. This is our totality. The standard deviation is 8. We need to calculate the confidence interval at the 97% confidence level.

  1. Select the cell where the result of data processing will be displayed. Clicking on the button "Insert Function".
  2. Appears Function Wizard. Go to category "Statistical" and highlight the name "CONFIDENCE.NORM". After that click on the button OK.
  3. The arguments window opens. Its fields naturally correspond to the names of the arguments.
    Set the cursor to the first field - "Alpha". Here we should specify the level of significance. As we remember, our level of trust is 97%. At the same time, we said that it is calculated in this way:

    (1-trust level)/100

    That is, by substituting the value, we get:

    By simple calculations, we find out that the argument "Alpha" equals 0,03 . Enter this value in the field.

    As you know, the standard deviation is equal to 8 . Therefore, in the field "Standard deviation" just write down that number.

    In field "Size" you need to enter the number of elements of the tests performed. As we remember, they 12 . But in order to automate the formula and not edit it every time a new test is performed, let's set this value not to an ordinary number, but using the operator CHECK. So, we set the cursor in the field "Size", and then click on the triangle, which is located to the left of the formula bar.

    A list of recently used functions appears. If the operator CHECK used by you recently, it should be on this list. In this case, you just need to click on its name. Otherwise, if you do not find it, then go to the point "More features...".

  4. Appears already familiar to us Function Wizard. Moving back to the group "Statistical". We select the name there "CHECK". Click on the button OK.
  5. The argument window for the above operator appears. This function is designed to calculate the number of cells in the specified range that contain numeric values. Its syntax is the following:

    COUNT(value1, value2,…)

    Argument group "Values" is a reference to the range in which you want to calculate the number of cells filled with numeric data. In total, there can be up to 255 such arguments, but in our case we need only one.

    Set the cursor in the field "Value1" and, holding down the left mouse button, select the range on the sheet that contains our population. Then its address will be displayed in the field. Click on the button OK.

  6. After that, the application will perform the calculation and display the result in the cell where it is itself. In our particular case, the formula turned out like this:

    CONFIDENCE NORM(0.03,8,COUNT(B2:B13))

    The overall result of the calculations was 5,011609 .

  7. But that is not all. As we remember, the boundary of the confidence interval is calculated by adding and subtracting from the average sample value of the calculation result CONFIDENCE NORM. In this way, the right and left boundaries of the confidence interval are calculated, respectively. The sample mean itself can be calculated using the operator AVERAGE.

    This operator is designed to calculate the arithmetic mean of the selected range of numbers. It has the following rather simple syntax:

    AVERAGE(number1, number2,…)

    Argument "Number" can be either a single numeric value or a reference to cells or even entire ranges that contain them.

    So, select the cell in which the calculation of the average value will be displayed, and click on the button "Insert Function".

  8. opens Function Wizard. Back to category "Statistical" and select a name from the list "AVERAGE". As always, click on the button OK.
  9. The arguments window is launched. Set the cursor in the field "Number1" and with the left mouse button pressed, select the entire range of values. After the coordinates are displayed in the field, click on the button OK.
  10. After that AVERAGE outputs the result of the calculation to a sheet element.
  11. We calculate the right boundary of the confidence interval. To do this, select a separate cell, put the sign «=» and add the contents of the sheet elements in which the results of the calculation of functions are located AVERAGE And CONFIDENCE NORM. In order to perform the calculation, press the button Enter. In our case, we got the following formula:

    Calculation result: 6,953276

  12. In the same way, we calculate the left boundary of the confidence interval, only this time from the result of the calculation AVERAGE subtract the result of the calculation of the operator CONFIDENCE NORM. It turns out the formula for our example of the following type:

    Calculation result: -3,06994

  13. We tried to describe in detail all the steps for calculating the confidence interval, so we described each formula in detail. But you can combine all the actions in one formula. The calculation of the right bound of the confidence interval can be written as follows:

    AVERAGE(B2:B13)+CONFIDENCE(0.03,8,COUNT(B2:B13))

  14. A similar calculation of the left border would look like this:

    AVERAGE(B2:B13)-CONFIDENCE.NORM(0.03,8,COUNT(B2:B13))

Method 2: TRUST.STUDENT function

In addition, there is another function in Excel that is related to the calculation of the confidence interval - TRUST.STUDENT. It has only appeared since Excel 2010. This operator performs the calculation of the population confidence interval using Student's t-distribution. It is very convenient to use it in the case when the variance and, accordingly, the standard deviation are unknown. The operator syntax is:

TRUST.STUDENT(alpha,standard_dev,size)

As you can see, the names of the operators in this case remained unchanged.

Let's see how to calculate the boundaries of the confidence interval with an unknown standard deviation using the example of the same population that we considered in the previous method. The level of confidence, like last time, we will take 97%.

  1. Select the cell in which the calculation will be made. Click on the button "Insert Function".
  2. In the opened Function Wizard go to category "Statistical". Choose a name "TRUST.STUDENT". Click on the button OK.
  3. The argument window for the specified operator is launched.

    In field "Alpha", given that the confidence level is 97%, we write down the number 0,03 . The second time we will not dwell on the principles of calculating this parameter.

    After that, set the cursor in the field "Standard deviation". This time, this indicator is unknown to us and it needs to be calculated. This is done using a special function - STDEV.V. To call the window of this operator, click on the triangle to the left of the formula bar. If we do not find the desired name in the list that opens, then go to the item "More features...".

  4. is running Function Wizard. Moving to category "Statistical" and mark the name "STDEV.B". Then click on the button OK.
  5. The arguments window opens. operator task STDEV.V is the definition of standard deviation in sampling. Its syntax looks like this:

    STDEV.V(number1,number2,…)

    It is easy to guess that the argument "Number" is the address of the selection element. If the selection is placed in a single array, then using only one argument, you can give a link to this range.

    Set the cursor in the field "Number1" and, as always, holding down the left mouse button, select the set. After the coordinates are in the field, do not rush to press the button OK because the result will be incorrect. First we need to return to the operator arguments window TRUST.STUDENT to make the final argument. To do this, click on the appropriate name in the formula bar.

  6. The argument window of the already familiar function opens again. Set the cursor in the field "Size". Again, click on the triangle already familiar to us to go to the choice of operators. As you understand, we need a name "CHECK". Since we used this function in the calculations in the previous method, it is present in this list, so just click on it. If you do not find it, then follow the algorithm described in the first method.
  7. Getting into the arguments window CHECK, put the cursor in the field "Number1" and with the mouse button held down, select the collection. Then click on the button OK.
  8. After that, the program calculates and displays the value of the confidence interval.
  9. To determine the boundaries, we will again need to calculate the sample mean. But, given that the calculation algorithm using the formula AVERAGE the same as in the previous method, and even the result has not changed, we will not dwell on this in detail a second time.
  10. Adding up the results of the calculation AVERAGE And TRUST.STUDENT, we obtain the right boundary of the confidence interval.
  11. Subtracting from the calculation results of the operator AVERAGE calculation result TRUST.STUDENT, we have the left bound of the confidence interval.
  12. If the calculation is written in one formula, then the calculation of the right border in our case will look like this:

    AVERAGE(B2:B13)+STUDENT CONFIDENCE(0.03,STDV(B2:B13),COUNT(B2:B13))

  13. Accordingly, the formula for calculating the left border will look like this:

    AVERAGE(B2:B13)-STUDENT CONFIDENCE(0.03,STDV(B2:B13),COUNT(B2:B13))

As you can see, the tools of the Excel program make it possible to significantly facilitate the calculation of the confidence interval and its boundaries. For these purposes, separate operators are used for samples whose variance is known and unknown.


2023
newmagazineroom.ru - Accounting statements. UNVD. Salary and personnel. Currency operations. Payment of taxes. VAT. Insurance premiums