Introduction – Statistics and Probability

The branch of mathematics concerned with the laws governing random events, including the collection, analysis, interpretation, and display of numerical data.

Subscribe to their Channel!

Statistics and Probability:

This science can help us understand our past and make predictions about the future. Using statistics, we can analyze data in different fields to monitor changing patterns, then use this analysis to draw conclusions and make forecasts.

Real-life Application of Statistics and Probability

Winning or losing a lottery is one of the most interesting examples of probability. In a typical Lottery game, each player chooses six distinct numbers from a particular range. If all the six numbers on a ticket match with that of the winning lottery ticket, the ticket holder is a Jackpot winner- regardless of the order of the numbers.

Probability helps in analyzing the best plan of insurance which suits you and your family the most. For example, you are an active smoker, and chances of getting lungs disease are higher in you. So, instead of choosing an insurance scheme for your vehicle or house, you may go for your health insurance first, because the chance of your getting sick are higher. For instance, nowadays people are getting their mobile phones insured because they know that the chances of their mobile phones getting damaged or lost are high.

Many politics analysts use the tactics of probability to predict the outcome of the election’s results. For example, they may predict a certain political party to come into power; based on the results of exit polls.


Sub-Topics List:

ARE YOU READY?

Take On The Challange!

Facts are stubborn, but

statistics are more pliable.

Mark Twain

Sub-Topics

Random Variables

A random variable is a variable whose value is unknown or a function that assigns values to each of an experiment’s outcomes. A random variable can be either discrete (having specific values) or continuous (any value in a continuous range).
Discrete
A discrete random variable is one which may take on only a countable number of distinct values such as 0,1,2,3,4,…….. Discrete random variables are usually (but not necessarily) counts. If a random variable can take only a finite number of distinct values, then it must be discrete.
If a random variable can take only a finite number of distinct values, then it must be discrete. Examples of discrete random variables include the number of children in a family, the Friday night attendance at a cinema, the number of patients in a doctor’s surgery, the number of defective light bulbs in a box of ten.
Continuous
A continuous random variable is one which takes an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile. A continuous random variable is not defined at specific values.
When do we use Continuous? : A continuous random variable is a random variable where the data can take infinitely many values. For example, a random variable measuring the time taken for something to be done is continuous since there are an infinite number of possible times that can be taken.

Go Back To List


Values of a Random Variable

A random variable is a variable that takes specific values with specific probabilities. It can be thought of as a variable whose value depends on the outcome of an uncertain event. 2. We usually denote random variables by capital letters near the end of the alphabet; e.g., X,Y,Z.

Go Back To List

Probability Distribution for a Discrete Random Variable and its Properties

The probability distribution of a discrete random variable X is a listing of each possible value x taken by X along with the probability P(x) that X takes that value in one trial of the experiment.
The function f(x) p(x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X.

Go Back To List

Probabilities Corresponding to a Given Random Variable

The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable. For a discrete random variable, x, the probability distribution is defined by a probability mass function, denoted by f(x).

Go Back To List

The Mean and Variance of a Discrete Random Variable

Discussion: For a discrete random variable X, the variance of X is obtained as follows: var(X)=∑(x−μ)2pX(x), where the sum is taken over all values of x for which pX(x)>0 so the variance of X is the weighted average of the squared deviations from the mean μ, where the weights are given by the probability function pX(x) of X.
For a discrete random variable X, the variance of X is obtained as follows: var(X)=∑(x−μ)2pX(x), where the sum is taken over all values of x for which pX(x)>0 so the variance of X is the weighted average of the squared deviations from the mean μ, where the weights are given by the probability function pX(x) of X.

Go Back To List

Normal Random Variables

A standard normal random variable is a normally distributed random variable with mean μ=0 and standard deviation σ=1. It will always be denoted by the letter Z.

Go Back To List

Regions Under the Normal Curve Corresponding to Different Standard Normal Values

Regions Under the Normal Curve Corresponding to Different Standard Normal Values

Go Back To List

Conversion of Normal Random Variable to a Standard Normal Variable and Vice Versa

The standard normal distribution (z distribution) is a normal distribution with a mean of 0 and a standard deviation of 1. Any point (x) from a normal distribution can be converted to the standard normal distribution (z) with the formula z = (x-mean) / standard deviation.

Go Back To List

Probabilities and Percentiles Using the Standard Normal Table

In statistics, a percentile is a term that describes how a score compares to other scores from the same set. While there is no universal definition of percentile, it is commonly expressed as the percentage of values in a set of data scores that fall below a given value. The  standard normal distribution table is a compilation of areas from the standard normal distribution, more commonly known as a bell curve, which provides the area of the region located under the bell curve and to the left of a given z-score to represent probabilities of occurrence in a given population.

Go Back To List

Parameter and Statistics

 A parameter is a measure that describes an entire population while a statistic is a measure that describes a sample from the population. Understand how to use statistics to understand populations, learn about population and samples, and differentiate parameters from statistics.

Go Back To List

Sampling Distributions of Statistics

A sampling distribution is a probability distribution of a statistic obtained from a larger number of samples drawn from a specific population. The sampling distribution of a given population is the distribution of frequencies of a range of different outcomes that could possibly occur for a statistic of a population.

Go Back To List

The Mean and Variance of the Sampling Distribution of the Sample Mean

 The mean of the sampling distribution of the mean is the mean of the population from which the scores were sampled. Therefore, if a population has a mean μ, then the mean of the sampling distribution of the mean is also μ. The symbol μM is used to refer to the mean of the sampling distribution of the mean.

Formula: μM = μ

Go Back To List

Central Limit Theorem

The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution.

Go Back To List

Sampling Distribution of the Sample Mean Using the Central Limit Theorem

The central limit theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution.

Go Back To List

The T-Distribution

The t-distribution describes the standardized distances of sample means to the population mean when the population standard deviation is not known, and the observations come from a normally distributed population.

Go Back To List

Identifies Percentiles Using the T-table

The t distribution table values are critical values of the t distribution. The column header are the t distribution probabilities (alpha). The row names are the degrees of freedom (df). Student t table gives the probability that the absolute t value with a given degrees of freedom lies above the tabulated value.

Go Back To List

Length of a Confidence Interval

The definition of the length of a confidence interval is perhaps obvious, but let’s formally define it anyway. Length of the Interval. If a confidence interval for a parameter is: L < θ < U. then the length of the interval is simply the difference in the two endpoints.

Go Back To List

Sample Size Using the Length of the Interval

The definition of the length of a confidence interval is perhaps obvious, but let’s formally define it anyway. Length of the Interval. If a confidence interval for a parameter is: L < θ < U. then the length of the interval is simply the difference in the two endpoints.

Go Back To List

(a) null hypothesis; (b) alternative hypothesis; (c) level of significance; (d) rejection region; and (e) types of errors in hypothesis testing

In Statistics, a hypothesis is defined as a formal statement, which gives the explanation about the relationship between the two or more variables of the specified population.

Go Back To List

Null and Alternative Hypotheses on a Population Mean

Go Back To List

Z-Test


When the population variance is known, the z-test is used. The test statistic is assumed to have a normal distribution, and nuisance parameters such as standard deviation should be known in order for an accurate z-test to be performed.

Formula:

Go Back To List

T-Test


When the population variance is unknown, however, the t-test is used. It is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features. It is mostly used when the data sets, like the data set recorded as the outcome from flipping a coin 100 times, would follow a normal distribution and may have unknown variances.


Formula:

Go Back To List

Using the Central Limit Theorem


It is important for you to understand when to use the central limit theorem. If you are being asked to find the probability of the mean, use the CLT for the mean. If you are being asked to find the probability of a sum or total, use the CLT for sums. This also applies to percentiles for means and sums.

Formula:

Go Back To List

Rejection Region


After the test statistic in a significance test is realized (that is, a posteriori) and compared to the rejection region, the test decision is either correct or in error-there is no “probability” of correctness. However, within this limitation in the interpretation of classical statistical testing, p-values are a frequently used method for quantifying the “weight of evidence” against the working hypothesis.

Go Back To List

Unknown Variance


If the variance is unknown, the t statistic is used in place of the z statistic.

Formula:

Go Back To List

Known Variance

Formula:

Go Back To List

Hypothesis Testing for Population Mean with Known and Unknown Population Standard Deviation

Hypothesis tests are used to make decisions or judgments about the value of a parameter, such as the
population mean. There are two approaches for conducting a hypothesis test; the critical value approach and the P-value approach.

Go Back To List

Rejection Value and Rejection Region

A rejection region, also known as the critical region, is a set of values for the test statistic for which the null hypothesis is rejected. An example of this is if the observed test statistic is in the critical region, then we reject the null hypothesis and accept the alternative hypothesis.
 
The rejection value at a certain significance level can be thought of as a cut-off point. If a test statistic on one side of the critical value results in accepting the null hypothesis, a test statistic on the other side will result in rejecting the null hypothesis.

Go Back To List

Hypothesis Test for a Population Proportion

In deciding the hypothesis, these are needed to be considered: The hypotheses are claims about the population proportion, p.
The null hypothesis is a hypothesis that the proportion equals a specific value, p0. The alternative hypothesis is the competing claim that the parameter is less than, greater than, or not equal to p0.

Go Back To List

Population Proportion

Population proportion is the fraction or size of a population or group of people that has the same characteristics. The value of the population is usually appraised through the use of unbiased sample statistics given the observational study or research where the number of items/people involved are not under the control of the researchers. The population proportion can be identified through the help of the formula p = x / n, where refers to the population proportion, x stands for count of successes or number of items you’re interested with, and n refers to the total size or  number of the items involved within the population.

Go Back To List

Test-Statistic 

Test-Static is a form of statistic that is used for the statistical hypothesis testing that will quantify the characteristics and behavior of the observed data or population which distinguishes their null or possibilities of being the same from each other. This will determine if the data present will support the created hypothesis.

Go Back To List

Independent and Dependent Variables

Variables are used in mathematical modeling, statistical modeling, and experimental sciences. There are two types of variables which are the independent and dependent variables. Independent variables are variables that can stand on their own and not influenced by the other variables present. While, the dependent variables are variables that require the support of other variables as they are being studied with the correlation of the main variables, they are also being changed due to the factors used. 

Go Back To List

The Value of the Dependent Variable Given the Value of the Independent Variable

The values of the independent variables are commonly graphed. In statistics, Pearson’s Correlation Coefficient is a measuring material used to determine the strength  and direction of the correlation between two variables. It also identifies the influence of a variable to the other which is illustrated using lines on the graph.  An x-axis and its dependent variable will be put on the y-axis. The value of the dependent variable can be determined by the value of the independent variable depending on its corresponding equivalent .


Go Back To List

Bivariate Data

In statistics, bivariate data is used to perceive the relationship between two variables. This shows the correlation of the two variables on their strengths, how they differ with each other, as well as on how one variable affects or influences the other. 

For example, if you are studying a group of individuals/students on finding out their average science scores and their age, you have two variables present, which are the independent and dependent variable, (science score and the age of the students). 

While if you are just studying one variable which is the science score for the specified students, then we have univariate data.

Go Back To List

Scatter Plot

Scatter Plot is a type of graph used to illustrate the relationship of the variables for the many to easily see the differences and connections between them. This is used to plot the present variables and data through a grid paper with a x (x-axis) and y (y-axis) to see the behavior of the given data.

Go Back To List

Pearson’s Correlation Coefficient

In the formula, the r refers to the correlation coefficient, N for the number of pairs of scores,

Σxy is for the sum of the products of paired scores, Σx for the sum of x scores, Σy for the sum of y scores, Σx2 for the sum of squared x scores, and Σy2 for the sum of squared y scores. 

Go Back To List

Regression Slope Intercept

In the regression line intercept, the value of y is being assumed when its x-axis is equivalent to 0 which can also be used in a linear regression. You can use the formula b0 = y – b1 x , where b0 is the y-intercept while b1x is the slope.

Go Back To List