Confidence interval calculation formula physics. Confidence intervals for frequencies and proportions

Confidence interval

Confidence interval- a term used in mathematical statistics for interval (as opposed to point) estimation of statistical parameters, which is preferable when the sample size is small. A confidence interval is one that covers an unknown parameter with a given reliability.

The method of confidence intervals was developed by the American statistician Jerzy Neumann, based on the ideas of the English statistician Ronald Fisher.

Definition

Confidence interval of the parameter θ random variable distribution X with confidence level 100 p%, generated by the sample ( x 1 ,…,x n), is called an interval with boundaries ( x 1 ,…,x n) and ( x 1 ,…,x n), which are realizations of random variables L(X 1 ,…,X n) and U(X 1 ,…,X n), such that

.

The boundary points of the confidence interval are called confidence limits.

An intuition-based interpretation of the confidence interval would be: if p is large (say 0.95 or 0.99), then confidence interval almost certainly contains the true value θ .

Another interpretation of the concept of a confidence interval: it can be considered as an interval of parameter values θ compatible with experimental data and not contradicting them.

Examples

  • Confidence interval for the mathematical expectation of a normal sample;
  • Confidence interval for normal sample variance.

Bayesian confidence interval

In Bayesian statistics, there is a similar but different in some key details definition of a confidence interval. Here, the estimated parameter itself is considered a random variable with some given prior distribution (in the simplest case, uniform), and the sample is fixed (in classical statistics everything is exactly the opposite). A Bayesian confidence interval is an interval covering the parameter value with the posterior probability:

.

In general, classical and Bayesian confidence intervals are different. In the English-language literature, the Bayesian confidence interval is usually called the term credible interval, and the classic one - confidence interval.

Notes

Sources

Wikimedia Foundation. 2010.

  • Kids (film)
  • Colonist

See what “Confidence interval” is in other dictionaries:

    Confidence interval- an interval calculated from sample data, which with a given probability (confidence) covers the unknown true value of the estimated distribution parameter. Source: GOST 20522 96: Soils. Methods for statistical processing of results... Dictionary-reference book of terms of normative and technical documentation

    confidence interval- for a scalar parameter population is a segment that most likely contains this parameter. This phrase is meaningless without further elaboration. Since the boundaries of the confidence interval are estimated from the sample, it is natural to... ... Dictionary of Sociological Statistics

    CONFIDENCE INTERVAL- a method of estimating parameters that differs from point estimation. Let the sample x1, . . ., xn from a distribution with probability density f(x, α), and a*=a*(x1, . . ., xn) estimate α, g(a*, α) probability density estimate. Are looking for… … Geological encyclopedia

    CONFIDENCE INTERVAL- (confidence interval) An interval in which the reliability of the parameter value for the population obtained on the basis of a sample survey has a certain degree of probability, for example 95%, which is due to the sample itself. Width… … Economic dictionary

    confidence interval- is the interval in which the true value of the determined quantity is located with a given confidence probability. General chemistry: textbook / A. V. Zholnin ... Chemical terms

    Confidence interval CI- Confidence interval, CI * data interval, CI * confidence interval interval of the characteristic value, calculated for k.l. distribution parameter (for example, the average value of a characteristic) across the sample and with a certain probability (for example, 95% for 95% ... Genetics. encyclopedic Dictionary

    CONFIDENCE INTERVAL- a concept that arises when estimating a statistical parameter. distribution by interval of values. D. and. for parameter q, corresponding to this coefficient. trust P is equal to such an interval (q1, q2) that for any probability distribution of inequality... ... Physical encyclopedia

    confidence interval- - Telecommunications topics, basic concepts EN confidence interval ... Technical Translator's Guide

    confidence interval- pasikliovimo intervalas statusas T sritis Standartizacija ir metrologija apibrėžtis Dydžio verčių intervalas, kuriame su pasirinktąja tikimybe yra matavimo rezultato vertė. atitikmenys: engl. confidence interval vok. Vertrauensbereich, m rus.… … Penkiakalbis aiškinamasis metrologijos terminų žodynas

    confidence interval- pasikliovimo intervalas statusas T sritis chemija apibrėžtis Dydžio verčių intervalas, kuriame su pasirinktąja tikimybe yra matavimo rezultatų vertė. atitikmenys: engl. confidence interval rus. trust area; confidence interval... Chemijos terminų aiškinamasis žodynas

"Katren-Style" continues the publication of Konstantin Kravchik's series on medical statistics. In two previous articles, the author dealt with the explanation of concepts such as and.

Konstantin Kravchik

Mathematician-analyst. Specialist in the field of statistical research in medicine and humanities

Moscow city

Very often in articles on clinical studies you can find a mysterious phrase: “confidence interval” (95 % CI or 95 % CI - confidence interval). For example, an article might write: “To assess the significance of differences, the Student’s t-test was used to calculate the 95 % confidence interval.”

What is the value of the “95 % confidence interval” and why calculate it?

What is a confidence interval? - This is the range within which the true population means lie. Are there “untrue” averages? In a sense, yes, they do. In we explained that it is impossible to measure the parameter of interest in the entire population, so researchers are content with a limited sample. In this sample (for example, based on body weight) there is one average value (a certain weight), by which we judge the average value in the entire population. However, it is unlikely average weight in a sample (especially a small one) will coincide with the average weight in the general population. Therefore, it is more correct to calculate and use the range of average values ​​of the population.

For example, imagine that the 95% confidence interval (95% CI) for hemoglobin is 110 to 122 g/L. This means that there is a 95% chance that the true mean hemoglobin value in the population will be between 110 and 122 g/L. In other words, we do not know the average hemoglobin value in the population, but we can, with 95 % probability, indicate a range of values ​​for this trait.

Confidence intervals are particularly relevant for differences in means between groups, or effect sizes as they are called.

Let's say we compared the effectiveness of two iron preparations: one that has been on the market for a long time and one that has just been registered. After the course of therapy, we assessed the hemoglobin concentration in the studied groups of patients, and the statistical program calculated that the difference between the average values ​​of the two groups was, with a 95 % probability, in the range from 1.72 to 14.36 g/l (Table 1).

Table 1. Test for independent samples
(groups are compared by hemoglobin level)

This should be interpreted as follows: in some patients in the general population who take a new drug, hemoglobin will be higher on average by 1.72–14.36 g/l than in those who took an already known drug.

In other words, in the general population, the difference in average hemoglobin values ​​between groups is within these limits with a 95% probability. It will be up to the researcher to judge whether this is a lot or a little. The point of all this is that we are not working with one average value, but with a range of values, therefore, we more reliably estimate the difference in a parameter between groups.

In statistical packages, at the discretion of the researcher, you can independently narrow or expand the boundaries of the confidence interval. By lowering the confidence interval probabilities, we narrow the range of means. For example, at 90 % CI the range of means (or difference in means) will be narrower than at 95 %.

Conversely, increasing the probability to 99 % expands the range of values. When comparing groups, the lower limit of the CI may cross the zero mark. For example, if we expanded the boundaries of the confidence interval to 99 %, then the boundaries of the interval ranged from –1 to 16 g/l. This means that in the general population there are groups, the difference in means between which for the characteristic being studied is equal to 0 (M = 0).

Using a confidence interval, you can test statistical hypotheses. If the confidence interval crosses the zero value, then the null hypothesis, which assumes that the groups do not differ on the parameter being studied, is true. The example is described above where we expanded the boundaries to 99 %. Somewhere in the general population we found groups that did not differ in any way.

95% confidence interval of the difference in hemoglobin, (g/l)


The figure shows the 95% confidence interval for the difference in mean hemoglobin values ​​between the two groups. The line passes through the zero mark, therefore there is a difference between the means of zero, which confirms the null hypothesis that the groups do not differ. The range of difference between groups is from –2 to 5 g/L. This means that hemoglobin can either decrease by 2 g/L or increase by 5 g/L.

The confidence interval is a very important indicator. Thanks to it, you can see whether the differences in the groups were really due to the difference in means or due to a large sample, since with a large sample the chances of finding differences are greater than with a small one.

In practice it might look like this. We took a sample of 1000 people, measured hemoglobin levels and found that the confidence interval for the difference in means ranged from 1.2 to 1.5 g/l. The level of statistical significance in this case p

We see that the hemoglobin concentration increased, but almost imperceptibly, therefore, statistical significance appeared precisely due to the sample size.

Confidence intervals can be calculated not only for means, but also for proportions (and risk ratios). For example, we are interested in the confidence interval of the proportions of patients who achieved remission while taking a developed drug. Let us assume that the 95 % CI for the proportions, i.e., for the proportion of such patients, lies in the range of 0.60–0.80. Thus, we can say that our medicine has a therapeutic effect in 60 to 80 % of cases.

Confidence intervals ( English Confidence Intervals) one of the types of interval estimates used in statistics, which are calculated for a given significance level. They allow us to make the statement that the true value of an unknown statistical parameter of the population is within the obtained range of values ​​with a probability that is specified by the selected level of statistical significance.

Normal distribution

When the variance (σ 2) of the population of data is known, the z-score can be used to calculate confidence limits (the end points of the confidence interval). Compared to using the t-distribution, using the z-score will allow you to construct not only a narrower confidence interval, but also more reliable estimates of the expected value and standard deviation (σ), since the z-score is based on a normal distribution.

Formula

To determine the boundary points of the confidence interval, provided that the standard deviation of the population of data is known, the following formula is used

L = X - Z α/2 σ
√n

Example

Assume that the sample size is 25 observations, the sample expected value is 15, and the population standard deviation is 8. For a significance level of α=5%, the Z-score is Z α/2 =1.96. In this case, the lower and upper limits of the confidence interval will be

L = 15 - 1.96 8 = 11,864
√25
L = 15 + 1.96 8 = 18,136
√25

Thus, we can say that with a 95% probability the mathematical expectation of the population will fall in the range from 11.864 to 18.136.

Methods for narrowing the confidence interval

Let us assume that the range is too wide for the purposes of our study. There are two ways to reduce the range of the confidence interval.

  1. Reduce the level of statistical significance α.
  2. Increase sample size.

Reducing the level of statistical significance to α=10%, we obtain a Z-score equal to Z α/2 =1.64. In this case, the lower and upper boundaries of the interval will be

L = 15 - 1.64 8 = 12,376
√25
L = 15 + 1.64 8 = 17,624
√25

And the confidence interval itself can be written in the form

In this case, we can make the assumption that with a 90% probability the mathematical expectation of the population will fall within the range .

If we want not to reduce the level of statistical significance α, then the only alternative is to increase the sample size. Increasing it to 144 observations, we get following values confidence limits

L = 15 - 1.96 8 = 13,693
√144
L = 15 + 1.96 8 = 16,307
√144

The confidence interval itself will have the following form

Thus, narrowing the confidence interval without reducing the level of statistical significance is only possible by increasing the sample size. If increasing the sample size is not possible, then narrowing the confidence interval can be achieved solely by reducing the level of statistical significance.

Constructing a confidence interval for a distribution other than normal

If the standard deviation of the population is not known or the distribution is different from normal, the t-distribution is used to construct a confidence interval. This technique is more conservative, which is reflected in wider confidence intervals, compared to the technique based on the Z-score.

Formula

To calculate the lower and upper limits of the confidence interval based on the t-distribution, use the following formulas

L = X - t α σ
√n

The Student distribution or t-distribution depends on only one parameter - the number of degrees of freedom, which is equal to the number of individual values ​​of the attribute (the number of observations in the sample). The value of the Student's t-test for a given number of degrees of freedom (n) and the level of statistical significance α can be found in the reference tables.

Example

Assume that the sample size is 25 individual values, the sample expected value is 50, and the sample standard deviation is 28. It is necessary to construct a confidence interval for the level of statistical significance α=5%.

In our case, the number of degrees of freedom is 24 (25-1), therefore the corresponding table value of Student’s t-test for the level of statistical significance α=5% is 2.064. Therefore, the lower and upper limits of the confidence interval will be

L = 50 - 2.064 28 = 38,442
√25
L = 50 + 2.064 28 = 61,558
√25

And the interval itself can be written in the form

Thus, we can say that with a 95% probability the mathematical expectation of the population will be in the range .

Using the t distribution allows you to narrow the confidence interval either by reducing statistical significance or by increasing the sample size.

Reducing the statistical significance from 95% to 90% in the conditions of our example, we obtain the corresponding table value of the Student’s t-test of 1.711.

L = 50 - 1.711 28 = 40,418
√25
L = 50 + 1.711 28 = 59,582
√25

In this case, we can say that with a 90% probability the mathematical expectation of the population will be in the range .

If we do not want to reduce statistical significance, then the only alternative is to increase the sample size. Let's say that it is 64 individual observations, and not 25 as in the original condition of the example. The table value of the Student's t-test for 63 degrees of freedom (64-1) and the level of statistical significance α=5% is 1.998.

L = 50 - 1.998 28 = 43,007
√64
L = 50 + 1.998 28 = 56,993
√64

This allows us to say that with a 95% probability the mathematical expectation of the population will be in the range .

Large samples

Large samples are samples from a population of data in which the number of individual observations exceeds 100. Statistical studies have shown that larger samples tend to be normally distributed, even if the distribution of the population is not normal. In addition, for such samples, the use of a z-score and a t-distribution gives approximately the same results when constructing confidence intervals. Thus, for large samples, it is acceptable to use the z-score for the normal distribution instead of the t-distribution.

Let's sum it up

The confidence interval comes to us from the field of statistics. This is a certain range that serves to estimate an unknown parameter with a high degree of reliability. The easiest way to explain this is with an example.

Suppose you need to study some random variable, for example, the server's response speed to a client request. Every time a user types the address of a specific website, the server responds with at different speeds. Thus, the response time under study is random. So, the confidence interval allows us to determine the boundaries of this parameter, and then we can say that with a 95% probability the server will be in the range we calculated.

Or you need to find out how many people know about trademark companies. When the confidence interval is calculated, it will be possible to say, for example, that with a 95% probability the share of consumers aware of this is in the range from 27% to 34%.

Closely related to this term is the value of confidence probability. It represents the probability that the desired parameter is included in the confidence interval. How large our desired range will be depends on this value. The larger the value it takes, the narrower the confidence interval becomes, and vice versa. Typically it is set to 90%, 95% or 99%. The value 95% is the most popular.

This indicator is also influenced by the dispersion of observations and its definition is based on the assumption that the characteristic under study obeys. This statement is also known as Gauss’s Law. According to him, such a distribution of all probabilities of a continuous random variable, which can be described by a probability density. If the assumption of a normal distribution is incorrect, then the estimate may be incorrect.

First, let's figure out how to calculate the confidence interval for There are two possible cases here. Dispersion (the degree of spread of a random variable) may or may not be known. If it is known, then our confidence interval is calculated using the following formula:

xsr - t*σ / (sqrt(n))<= α <= хср + t*σ / (sqrt(n)), где

α - sign,

t - parameter from the Laplace distribution table,

σ is the square root of the variance.

If the variance is unknown, then it can be calculated if we know all the values ​​of the desired feature. The following formula is used for this:

σ2 = х2ср - (хср)2, where

х2ср - average value of squares of the studied characteristic,

(хср)2 is the square of this characteristic.

The formula by which the confidence interval is calculated in this case changes slightly:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n)), где

xsr - sample average,

α - sign,

t is a parameter that is found using the Student distribution table t = t(ɣ;n-1),

sqrt(n) - square root of the total sample size,

s is the square root of the variance.

Consider this example. Suppose that based on the results of 7 measurements, the studied characteristic was determined to be equal to 30 and the sample variance to be equal to 36. It is necessary to find, with a probability of 99%, a confidence interval that contains the true value of the measured parameter.

First, let's determine what t is equal to: t = t (0.99; 7-1) = 3.71. Using the above formula, we get:

xsr - t*s / (sqrt(n))<= α <= хср + t*s / (sqrt(n))

30 - 3.71*36 / (sqrt(7))<= α <= 30 + 3.71*36 / (sqrt(7))

21.587 <= α <= 38.413

The confidence interval for the variance is calculated both in the case of a known mean and when there is no data on the mathematical expectation, and only the value of the point unbiased estimate of the variance is known. We will not give formulas for calculating it here, since they are quite complex and, if desired, can always be found on the Internet.

Let us only note that it is convenient to determine the confidence interval using Excel or a network service, which is called that way.

Any sample gives only an approximate idea of ​​the general population, and all sample statistical characteristics (mean, mode, variance...) are some approximation or say an estimate of general parameters, which in most cases are not possible to calculate due to the inaccessibility of the general population (Figure 20) .

Figure 20. Sampling error

But you can specify the interval in which, with a certain degree of probability, the true (general) value of the statistical characteristic lies. This interval is called d confidence interval (CI).

So the general average value with a probability of 95% lies within

from to, (20)

Where t – table value of Student’s test for α =0.05 and f= n-1

A 99% CI can also be found, in this case t selected for α =0,01.

What is the practical significance of a confidence interval?

    A wide confidence interval indicates that the sample mean does not accurately reflect the population mean. This is usually due to an insufficient sample size, or to its heterogeneity, i.e. large dispersion. Both give a larger error of the mean and, accordingly, a wider CI. And this is the basis for returning to the research planning stage.

    The upper and lower limits of the CI provide an estimate of whether the results will be clinically significant

Let us dwell in some detail on the question of the statistical and clinical significance of the results of the study of group properties. Let us remember that the task of statistics is to detect at least some differences in general populations based on sample data. The challenge for clinicians is to detect differences (not just any) that will aid diagnosis or treatment. And statistical conclusions are not always the basis for clinical conclusions. Thus, a statistically significant decrease in hemoglobin by 3 g/l is not a cause for concern. And, conversely, if some problem in the human body is not widespread at the level of the entire population, this is not a reason not to deal with this problem.

Let's look at this situation example.

Researchers wondered whether boys who have suffered from some kind of infectious disease lag behind their peers in growth. For this purpose, a sample study was conducted in which 10 boys who had suffered from this disease took part. The results are presented in Table 23.

Table 23. Results of statistical processing

lower limit

upper limit

Standards (cm)

average

From these calculations it follows that the sample average height of 10-year-old boys who have suffered from some infectious disease is close to normal (132.5 cm). However, the lower limit of the confidence interval (126.6 cm) indicates that there is a 95% probability that the true average height of these children corresponds to the concept of “short height”, i.e. these children are stunted.

In this example, the results of the confidence interval calculations are clinically significant.

Share