• Prompt Library
  • DS/AI Trends
  • Stats Tools
  • Interview Questions
  • Generative AI
  • Machine Learning
  • Deep Learning

Linear regression hypothesis testing: Concepts, Examples

Simple linear regression model

In relation to machine learning , linear regression is defined as a predictive modeling technique that allows us to build a model which can help predict continuous response variables as a function of a linear combination of explanatory or predictor variables. While training linear regression models, we need to rely on hypothesis testing in relation to determining the relationship between the response and predictor variables. In the case of the linear regression model, two types of hypothesis testing are done. They are T-tests and F-tests . In other words, there are two types of statistics that are used to assess whether linear regression models exist representing response and predictor variables. They are t-statistics and f-statistics. As data scientists , it is of utmost importance to determine if linear regression is the correct choice of model for our particular problem and this can be done by performing hypothesis testing related to linear regression response and predictor variables. Many times, it is found that these concepts are not very clear with a lot many data scientists. In this blog post, we will discuss linear regression and hypothesis testing related to t-statistics and f-statistics . We will also provide an example to help illustrate how these concepts work.

Table of Contents

What are linear regression models?

A linear regression model can be defined as the function approximation that represents a continuous response variable as a function of one or more predictor variables. While building a linear regression model, the goal is to identify a linear equation that best predicts or models the relationship between the response or dependent variable and one or more predictor or independent variables.

There are two different kinds of linear regression models. They are as follows:

  • Simple or Univariate linear regression models : These are linear regression models that are used to build a linear relationship between one response or dependent variable and one predictor or independent variable. The form of the equation that represents a simple linear regression model is Y=mX+b, where m is the coefficients of the predictor variable and b is bias. When considering the linear regression line, m represents the slope and b represents the intercept.
  • Multiple or Multi-variate linear regression models : These are linear regression models that are used to build a linear relationship between one response or dependent variable and more than one predictor or independent variable. The form of the equation that represents a multiple linear regression model is Y=b0+b1X1+ b2X2 + … + bnXn, where bi represents the coefficients of the ith predictor variable. In this type of linear regression model, each predictor variable has its own coefficient that is used to calculate the predicted value of the response variable.

While training linear regression models, the requirement is to determine the coefficients which can result in the best-fitted linear regression line. The learning algorithm used to find the most appropriate coefficients is known as least squares regression . In the least-squares regression method, the coefficients are calculated using the least-squares error function. The main objective of this method is to minimize or reduce the sum of squared residuals between actual and predicted response values. The sum of squared residuals is also called the residual sum of squares (RSS). The outcome of executing the least-squares regression method is coefficients that minimize the linear regression cost function .

The residual e of the ith observation is represented as the following where [latex]Y_i[/latex] is the ith observation and [latex]\hat{Y_i}[/latex] is the prediction for ith observation or the value of response variable for ith observation.

[latex]e_i = Y_i – \hat{Y_i}[/latex]

The residual sum of squares can be represented as the following:

[latex]RSS = e_1^2 + e_2^2 + e_3^2 + … + e_n^2[/latex]

The least-squares method represents the algorithm that minimizes the above term, RSS.

Once the coefficients are determined, can it be claimed that these coefficients are the most appropriate ones for linear regression? The answer is no. After all, the coefficients are only the estimates and thus, there will be standard errors associated with each of the coefficients.  Recall that the standard error is used to calculate the confidence interval in which the mean value of the population parameter would exist. In other words, it represents the error of estimating a population parameter based on the sample data. The value of the standard error is calculated as the standard deviation of the sample divided by the square root of the sample size. The formula below represents the standard error of a mean.

[latex]SE(\mu) = \frac{\sigma}{\sqrt(N)}[/latex]

Thus, without analyzing aspects such as the standard error associated with the coefficients, it cannot be claimed that the linear regression coefficients are the most suitable ones without performing hypothesis testing. This is where hypothesis testing is needed . Before we get into why we need hypothesis testing with the linear regression model, let’s briefly learn about what is hypothesis testing?

Train a Multiple Linear Regression Model using R

Before getting into understanding the hypothesis testing concepts in relation to the linear regression model, let’s train a multi-variate or multiple linear regression model and print the summary output of the model which will be referred to, in the next section. 

The data used for creating a multi-linear regression model is BostonHousing which can be loaded in RStudioby installing mlbench package. The code is shown below:

install.packages(“mlbench”) library(mlbench) data(“BostonHousing”)

Once the data is loaded, the code shown below can be used to create the linear regression model.

attach(BostonHousing) BostonHousing.lm <- lm(log(medv) ~ crim + chas + rad + lstat) summary(BostonHousing.lm)

Executing the above command will result in the creation of a linear regression model with the response variable as medv and predictor variables as crim, chas, rad, and lstat. The following represents the details related to the response and predictor variables:

  • log(medv) : Log of the median value of owner-occupied homes in USD 1000’s
  • crim : Per capita crime rate by town
  • chas : Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  • rad : Index of accessibility to radial highways
  • lstat : Percentage of the lower status of the population

The following will be the output of the summary command that prints the details relating to the model including hypothesis testing details for coefficients (t-statistics) and the model as a whole (f-statistics) 

linear regression model summary table r.png

Hypothesis tests & Linear Regression Models

Hypothesis tests are the statistical procedure that is used to test a claim or assumption about the underlying distribution of a population based on the sample data. Here are key steps of doing hypothesis tests with linear regression models:

  • Hypothesis formulation for T-tests: In the case of linear regression, the claim is made that there exists a relationship between response and predictor variables, and the claim is represented using the non-zero value of coefficients of predictor variables in the linear equation or regression model. This is formulated as an alternate hypothesis. Thus, the null hypothesis is set that there is no relationship between response and the predictor variables . Hence, the coefficients related to each of the predictor variables is equal to zero (0). So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis for each test states that a1 = 0, a2 = 0, a3 = 0 etc. For all the predictor variables, individual hypothesis testing is done to determine whether the relationship between response and that particular predictor variable is statistically significant based on the sample data used for training the model. Thus, if there are, say, 5 features, there will be five hypothesis tests and each will have an associated null and alternate hypothesis.
  • Hypothesis formulation for F-test : In addition, there is a hypothesis test done around the claim that there is a linear regression model representing the response variable and all the predictor variables. The null hypothesis is that the linear regression model does not exist . This essentially means that the value of all the coefficients is equal to zero. So, if the linear regression model is Y = a0 + a1x1 + a2x2 + a3x3, then the null hypothesis states that a1 = a2 = a3 = 0.
  • F-statistics for testing hypothesis for linear regression model : F-test is used to test the null hypothesis that a linear regression model does not exist, representing the relationship between the response variable y and the predictor variables x1, x2, x3, x4 and x5. The null hypothesis can also be represented as x1 = x2 = x3 = x4 = x5 = 0. F-statistics is calculated as a function of sum of squares residuals for restricted regression (representing linear regression model with only intercept or bias and all the values of coefficients as zero) and sum of squares residuals for unrestricted regression (representing linear regression model). In the above diagram, note the value of f-statistics as 15.66 against the degrees of freedom as 5 and 194. 
  • Evaluate t-statistics against the critical value/region : After calculating the value of t-statistics for each coefficient, it is now time to make a decision about whether to accept or reject the null hypothesis. In order for this decision to be made, one needs to set a significance level, which is also known as the alpha level. The significance level of 0.05 is usually set for rejecting the null hypothesis or otherwise. If the value of t-statistics fall in the critical region, the null hypothesis is rejected. Or, if the p-value comes out to be less than 0.05, the null hypothesis is rejected.
  • Evaluate f-statistics against the critical value/region : The value of F-statistics and the p-value is evaluated for testing the null hypothesis that the linear regression model representing response and predictor variables does not exist. If the value of f-statistics is more than the critical value at the level of significance as 0.05, the null hypothesis is rejected. This means that the linear model exists with at least one valid coefficients. 
  • Draw conclusions : The final step of hypothesis testing is to draw a conclusion by interpreting the results in terms of the original claim or hypothesis. If the null hypothesis of one or more predictor variables is rejected, it represents the fact that the relationship between the response and the predictor variable is not statistically significant based on the evidence or the sample data we used for training the model. Similarly, if the f-statistics value lies in the critical region and the value of the p-value is less than the alpha value usually set as 0.05, one can say that there exists a linear regression model.

Why hypothesis tests for linear regression models?

The reasons why we need to do hypothesis tests in case of a linear regression model are following:

  • By creating the model, we are establishing a new truth (claims) about the relationship between response or dependent variable with one or more predictor or independent variables. In order to justify the truth, there are needed one or more tests. These tests can be termed as an act of testing the claim (or new truth) or in other words, hypothesis tests.
  • One kind of test is required to test the relationship between response and each of the predictor variables (hence, T-tests)
  • Another kind of test is required to test the linear regression model representation as a whole. This is called F-test.

While training linear regression models, hypothesis testing is done to determine whether the relationship between the response and each of the predictor variables is statistically significant or otherwise. The coefficients related to each of the predictor variables is determined. Then, individual hypothesis tests are done to determine whether the relationship between response and that particular predictor variable is statistically significant based on the sample data used for training the model. If at least one of the null hypotheses is rejected, it represents the fact that there exists no relationship between response and that particular predictor variable. T-statistics is used for performing the hypothesis testing because the standard deviation of the sampling distribution is unknown. The value of t-statistics is compared with the critical value from the t-distribution table in order to make a decision about whether to accept or reject the null hypothesis regarding the relationship between the response and predictor variables. If the value falls in the critical region, then the null hypothesis is rejected which means that there is no relationship between response and that predictor variable. In addition to T-tests, F-test is performed to test the null hypothesis that the linear regression model does not exist and that the value of all the coefficients is zero (0). Learn more about the linear regression and t-test in this blog – Linear regression t-test: formula, example .

Recent Posts

Ajitesh Kumar

  • ROC Curve & AUC Explained with Python Examples - August 28, 2024
  • Accuracy, Precision, Recall & F1-Score – Python Examples - August 28, 2024
  • Logistic Regression in Machine Learning: Python Example - August 26, 2024

Ajitesh Kumar

One response.

Very informative

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Search for:

ChatGPT Prompts (250+)

  • Generate Design Ideas for App
  • Expand Feature Set of App
  • Create a User Journey Map for App
  • Generate Visual Design Ideas for App
  • Generate a List of Competitors for App
  • ROC Curve & AUC Explained with Python Examples
  • Accuracy, Precision, Recall & F1-Score – Python Examples
  • Logistic Regression in Machine Learning: Python Example
  • Reducing Overfitting vs Models Complexity: Machine Learning
  • Model Parallelism vs Data Parallelism: Examples

Data Science / AI Trends

  • • Prepend any arxiv.org link with talk2 to load the paper into a responsive chat application
  • • Custom LLM and AI Agents (RAG) On Structured + Unstructured Data - AI Brain For Your Organization
  • • Guides, papers, lecture, notebooks and resources for prompt engineering
  • • Common tricks to make LLMs efficient and stable
  • • Machine learning in finance

Free Online Tools

  • Create Scatter Plots Online for your Excel Data
  • Histogram / Frequency Distribution Creation Tool
  • Online Pie Chart Maker Tool
  • Z-test vs T-test Decision Tool
  • Independent samples t-test calculator

Recent Comments

I found it very helpful. However the differences are not too understandable for me

Very Nice Explaination. Thankyiu very much,

in your case E respresent Member or Oraganization which include on e or more peers?

Such a informative post. Keep it up

Thank you....for your support. you given a good solution for me.

Linear regression - Hypothesis testing

by Marco Taboga , PhD

This lecture discusses how to perform tests of hypotheses about the coefficients of a linear regression model estimated by ordinary least squares (OLS).

Table of contents

Normal vs non-normal model

The linear regression model, matrix notation, tests of hypothesis in the normal linear regression model, test of a restriction on a single coefficient (t test), test of a set of linear restrictions (f test), tests based on maximum likelihood procedures (wald, lagrange multiplier, likelihood ratio), tests of hypothesis when the ols estimator is asymptotically normal, test of a restriction on a single coefficient (z test), test of a set of linear restrictions (chi-square test), learn more about regression analysis.

The lecture is divided in two parts:

in the first part, we discuss hypothesis testing in the normal linear regression model , in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors;

in the second part, we show how to carry out hypothesis tests in linear regression analyses where the hypothesis of normality holds only in large samples (i.e., the OLS estimator can be proved to be asymptotically normal).

How to choose which test to carry out after estimating a linear regression model.

We also denote:

We now explain how to derive tests about the coefficients of the normal linear regression model.

It can be proved (see the lecture about the normal linear regression model ) that the assumption of conditional normality implies that:

How the acceptance region is determined depends not only on the desired size of the test , but also on whether the test is:

one-tailed (only one of the two things, i.e., either smaller or larger, is possible).

For more details on how to determine the acceptance region, see the glossary entry on critical values .

[eq28]

The F test is one-tailed .

A critical value in the right tail of the F distribution is chosen so as to achieve the desired size of the test.

Then, the null hypothesis is rejected if the F statistics is larger than the critical value.

In this section we explain how to perform hypothesis tests about the coefficients of a linear regression model when the OLS estimator is asymptotically normal.

As we have shown in the lecture on the properties of the OLS estimator , in several cases (i.e., under different sets of assumptions) it can be proved that:

These two properties are used to derive the asymptotic distribution of the test statistics used in hypothesis testing.

The test can be either one-tailed or two-tailed . The same comments made for the t-test apply here.

[eq50]

Like the F test, also the Chi-square test is usually one-tailed .

The desired size of the test is achieved by appropriately choosing a critical value in the right tail of the Chi-square distribution.

The null is rejected if the Chi-square statistics is larger than the critical value.

Want to learn more about regression analysis? Here are some suggestions:

R squared of a linear regression ;

Gauss-Markov theorem ;

Generalized Least Squares ;

Multicollinearity ;

Dummy variables ;

Selection of linear regression models

Partitioned regression ;

Ridge regression .

How to cite

Please cite as:

Taboga, Marco (2021). "Linear regression - Hypothesis testing", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-statistics/linear-regression-hypothesis-testing.

Most of the learning materials found on this website are now available in a traditional textbook format.

  • F distribution
  • Beta distribution
  • Conditional probability
  • Central Limit Theorem
  • Binomial distribution
  • Mean square convergence
  • Delta method
  • Almost sure convergence
  • Mathematical tools
  • Fundamentals of probability
  • Probability distributions
  • Asymptotic theory
  • Fundamentals of statistics
  • About Statlect
  • Cookies, privacy and terms of use
  • Loss function
  • Almost sure
  • Type I error
  • Precision matrix
  • Integrable variable
  • To enhance your privacy,
  • we removed the social buttons,
  • but don't forget to share .

Teach yourself statistics

Hypothesis Test for Regression Slope

This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y .

The test focuses on the slope of the regression line

Y = Β 0 + Β 1 X

where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of the independent variable, and Y is the value of the dependent variable.

If we find that the slope of the regression line is significantly different from zero, we will conclude that there is a significant relationship between the independent and dependent variables.

Test Requirements

The approach described in this lesson is valid whenever the standard requirements for simple linear regression are met.

  • The dependent variable Y has a linear relationship to the independent variable X .
  • For each value of X, the probability distribution of Y has the same standard deviation σ.
  • The Y values are independent.
  • The Y values are roughly normally distributed (i.e., symmetric and unimodal ). A little skewness is ok if the sample size is large.

The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results.

State the Hypotheses

If there is a significant linear relationship between the independent variable X and the dependent variable Y , the slope will not equal zero.

H o : Β 1 = 0

H a : Β 1 ≠ 0

The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero.

Formulate an Analysis Plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.

  • Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value between 0 and 1 can be used.
  • Test method. Use a linear regression t-test (described in the next section) to determine whether the slope of the regression line differs significantly from zero.

Analyze Sample Data

Using sample data, find the standard error of the slope, the slope of the regression line, the degrees of freedom, the test statistic, and the P-value associated with the test statistic. The approach described in this section is illustrated in the sample problem at the end of this lesson.

Predictor Coef SE Coef T P
Constant 76 30 2.53 0.01
X 35 20 1.75 0.04

SE = s b 1 = sqrt [ Σ(y i - ŷ i ) 2 / (n - 2) ] / sqrt [ Σ(x i - x ) 2 ]

  • Slope. Like the standard error, the slope of the regression line will be provided by most statistics software packages. In the hypothetical output above, the slope is equal to 35.

t = b 1 / SE

  • P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the test statistic is a t statistic, use the t Distribution Calculator to assess the probability associated with the test statistic. Use the degrees of freedom computed above.

Interpret Results

If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level , and rejecting the null hypothesis when the P-value is less than the significance level.

Test Your Understanding

The local utility company surveys 101 randomly selected customers. For each survey participant, the company collects the following: annual electric bill (in dollars) and home size (in square feet). Output from a regression analysis appears below.

Annual bill = 0.55 * Home size + 15

Predictor Coef SE Coef T P
Constant 15 3 5.0 0.00
Home size 0.55 0.24 2.29 0.01

Is there a significant linear relationship between annual bill and home size? Use a 0.05 level of significance.

The solution to this problem takes four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We work through those steps below:

H o : The slope of the regression line is equal to zero.

H a : The slope of the regression line is not equal to zero.

  • Formulate an analysis plan . For this analysis, the significance level is 0.05. Using sample data, we will conduct a linear regression t-test to determine whether the slope of the regression line differs significantly from zero.

We get the slope (b 1 ) and the standard error (SE) from the regression output.

b 1 = 0.55       SE = 0.24

We compute the degrees of freedom and the t statistic, using the following equations.

DF = n - 2 = 101 - 2 = 99

t = b 1 /SE = 0.55/0.24 = 2.29

where DF is the degrees of freedom, n is the number of observations in the sample, b 1 is the slope of the regression line, and SE is the standard error of the slope.

  • Interpret results . Since the P-value (0.0242) is less than the significance level (0.05), we cannot accept the null hypothesis.

hypothesis testing in regression

Hypothesis Testing On Linear Regression

Ankita Banerji

Ankita Banerji

Nerd For Tech

W hen we build a multiple linear regression model, we may have a few potential predictor/independent variables. Therefore, it is extremely important to select the variables which are really significant and influence the experiment strongly. To get the optimal model, we can try all the possible combinations of independent variables and see which model fits best. But this method is time-consuming and infeasible. Hence, we need another method to get a decent model. We can do the same either by manual feature elimination or by using any automated approach (RFE, Regularization, etc.).

In manual feature elimination, we can:

  • Build a model with all the features,
  • Drop the features that are least helpful in prediction (high p-value),
  • Drop the features that are redundant (using correlations and VIF),
  • Rebuild the model and repeat.

It is generally recommended that we follow a balanced approach, i.e., use a combination of automated (coarse tuning) + manual (fine tuning) selection in order to get an optimal mode. In this blog we will discuss the second step of manual feature elimination i.e., Drop the features that are least helpful in prediction (insignificant features).

First question that arises is: ‘ What do we mean by significant variable? ’. Let us understand it in Simple Linear Regression first.

When we fit a straight line through the data, we get two parameters i.e., the intercept (β₀) and the slope (β₁).

Now, β₀ is not of much importance right now, but there are a few aspects around β₁ which needs to be checked and verified. Suppose we have a dataset for which the scatter plot looks like the following:

When we run a linear regression on this dataset in Python, Python will fit a line on the data which looks like the following:

We can clearly see that the data in randomly scattered and doesn’t seem to follow linear trend. Python will anyway fit a line through the data using the least squared method. We can see that the fitted line is of no use in this case. Hence, every time we perform linear regression, we need to test whether the fitted line is a significant one or not (in other terms, test whether β₁ is significant or not). We will use Hypothesis Testing on β₁ for the same.

Steps to Perform Hypothesis testing:

  • Set the Hypothesis
  • Set the Significance Level, Criteria for a decision
  • Compute the test statistics
  • Make a decision

Step 1: We start by saying that β₁ is not significant, i.e., there is no relationship between x and y, therefore slope β₁ = 0.

Step 2: Typically, we set the Significance level at 10%, 5%, or 1%.

Step 3: After formulating the null and alternate hypotheses, next step to follow in order to make a decision using the p-value method are as follows:

  • Calculate the value of t-score for the mean on the distribution.

Where, μ is the population mean and s is the sample standard deviation which when divided by √n is also known as standard error.

2. Calculate the p-value from the cumulative probability for the given t-score using the t-table

3. Make the decision on the basis of the p-value with respect to the given value of significance level.

Step 4: Making Decision

p-value < 0.05 , we can reject the null hypothesis.

p-value> 0.05 , we fail to reject the null hypothesis.

If we fail to reject the null hypothesis that would mean β₁ is zero (in other words β₁ is insignificant) and of no use in the model. Similarly, if we reject the null hypothesis, it would mean that β₁ is not zero and the line fitted is a significant one.

NOTE: The above steps are performed by Python automatically.

Similarly in multiple linear regression, we will perform the same steps as in linear regression except the null and alternate hypothesis will be different. For the multiple regression model :

Example in Python

Let us take housing datase t which contains the prices of properties in the Delhi region. We wish to use this data to optimise the sale prices of the properties based on important factors such as area, bedrooms, parking, etc.

Top five rows of dataset look something like this:

After preparing, cleaning and analysing the data we will build a linear regression model by using all the variables (Fit a regression line through the data using statsmodels )

We get the following output:

Looking at the p-values (P>|t|), some of the variables like bedrooms, semi-furnished aren’t really significant (p>0.05). We could simply drop the variable with the highest, non-significant p value.

Conclusion: Generally we use two main parameters to judge the insignificant variables, the p-values and the VIFs (variance inflation factor).

Ankita Banerji

Written by Ankita Banerji

Aspiring Data scientist

Text to speech

Save 10% on All AnalystPrep 2024 Study Packages with Coupon Code BLOG10 .

  • Payment Plans
  • Product List
  • Partnerships

AnalystPrep

  • Try Free Trial
  • Study Packages
  • Levels I, II & III Lifetime Package
  • Video Lessons
  • Study Notes
  • Practice Questions
  • Levels II & III Lifetime Package
  • About the Exam
  • About your Instructor
  • Part I Study Packages
  • Parts I & II Packages
  • Part I & Part II Lifetime Package
  • Part II Study Packages
  • Exams P & FM Lifetime Package
  • Quantitative Questions
  • Verbal Questions
  • Data Insight Questions
  • Live Tutoring
  • About your Instructors
  • EA Practice Questions
  • Data Sufficiency Questions
  • Integrated Reasoning Questions

Hypothesis Tests and Confidence Intervals in Multiple Regression

Hypothesis Tests and Confidence Intervals in Multiple Regression

After completing this reading you should be able to:

  • Construct, apply, and interpret hypothesis tests and confidence intervals for a single coefficient in a multiple regression.
  • Construct, apply, and interpret joint hypothesis tests and confidence intervals for multiple coefficients in a multiple regression.
  • Interpret the \(F\)-statistic.
  • Interpret tests of a single restriction involving multiple coefficients.
  • Interpret confidence sets for multiple coefficients.
  • Identify examples of omitted variable bias in multiple regressions.
  • Interpret the \({ R }^{ 2 }\) and adjusted \({ R }^{ 2 }\) in a multiple regression.

Hypothesis Tests and Confidence Intervals for a Single Coefficient

This section is about the calculation of the standard error, hypotheses testing, and confidence interval construction for a single regression in a multiple regression equation.

Introduction

In a previous chapter, we looked at simple linear regression where we deal with just one regressor (independent variable). The response (dependent variable) is assumed to be affected by just one independent variable.  M ultiple regression, on the other hand ,  simultaneously considers the influence of multiple explanatory variables on a response variable Y. We may want to establish the confidence interval of one of the independent variables. We may want to evaluate whether any particular independent variable has a significant effect on the dependent variable. Finally, We may also want to establish whether the independent variables as a group have a significant effect on the dependent variable. In this chapter, we delve into ways all this can be achieved.

Hypothesis Tests for a single coefficient

Suppose that we are testing the hypothesis that the true coefficient \({ \beta }_{ j }\) on the \(j\)th regressor takes on some specific value \({ \beta }_{ j,0 }\). Let the alternative hypothesis be two-sided. Therefore, the following is the mathematical expression of the two hypotheses:

$$ { H }_{ 0 }:{ \beta }_{ j }={ \beta }_{ j,0 }\quad vs.\quad { H }_{ 1 }:{ \beta }_{ j }\neq { \beta }_{ j,0 } $$

This expression represents the two-sided alternative. The following are the steps to follow while testing the null hypothesis:

  • Computing the coefficient’s standard error.

hypothesis testing in regression

$$ p-value=2\Phi \left( -|{ t }^{ act }| \right) $$

  • Also, the \(t\)-statistic can be compared to the critical value corresponding to the significance level that is desired for the test.

Confidence Intervals for a Single Coefficient

The confidence interval for a regression coefficient in multiple regression is calculated and interpreted the same way as it is in simple linear regression. 

hypothesis testing in regression

The t-statistic has n – k – 1 degrees of freedom where k = number of independents

Supposing that an interval contains the true value of \({ \beta }_{ j }\) with a probability of 95%. This is simply the 95% two-sided confidence interval for \({ \beta }_{ j }\). The implication here is that the true value of \({ \beta }_{ j }\) is contained in 95% of all possible randomly drawn variables.

Alternatively, the 95% two-sided confidence interval for \({ \beta }_{ j }\) is the set of values that are impossible to reject when a two-sided hypothesis test of 5% is applied. Therefore, with a large sample size:

$$ 95\%\quad confidence\quad interval\quad for\quad { \beta }_{ j }=\left[ { \hat { \beta } }_{ j }-1.96SE\left( { \hat { \beta } }_{ j } \right) ,{ \hat { \beta } }_{ j }+1.96SE\left( { \hat { \beta } }_{ j } \right) \right] $$

Tests of Joint Hypotheses

In this section, we consider the formulation of the joint hypotheses on multiple regression coefficients. We will further study the application of an \(F\)-statistic in their testing.

Hypotheses Testing on Two or More Coefficients

Joint null hypothesis.

In multiple regression, we canno t test the null hypothesis that all slope coefficients are equal 0 based on t -tests that each individual slope coefficient equals 0. Why? individual t-tests do not account for the effects of interactions among the independent variables.

For this reason, we conduct the F-test which uses the F-statistic .  The F-test tests the null hypothesis that all of the slope coefficients in the multiple regression model are jointly equal to 0, .i.e.,

\(F\)-Statistic

The F-statistic, which is always a one-tailed test , is calculated as:

hypothesis testing in regression

To determine whether at least one of the coefficients is statistically significant, the calculated F-statistic is compared with the one-tailed critical F-value, at the appropriate level of significance.

Decision rule:

hypothesis testing in regression

Rejection of the null hypothesis at a stated level of significance indicates that at least one of the coefficients is significantly different than zero, i.e, at least one of the independent variables in the regression model makes a significant contribution to the dependent variable.

An analyst runs a regression of monthly value-stock returns on four independent variables over 48 months.

The total sum of squares for the regression is 360, and the sum of squared errors is 120.

Test the null hypothesis at the 5% significance level (95% confidence) that all the four independent variables are equal to zero.

\({ H }_{ 0 }:{ \beta }_{ 1 }=0,{ \beta }_{ 2 }=0,\dots ,{ \beta }_{ 4 }=0 \)

\({ H }_{ 1 }:{ \beta }_{ j }\neq 0\) (at least one j is not equal to zero, j=1,2… k )

ESS = TSS – SSR = 360 – 120 = 240

The calculated test statistic = (ESS/k)/(SSR/(n-k-1))

=(240/4)/(120/43) = 21.5

\({ F }_{ 43 }^{ 4 }\) is approximately 2.44 at 5% significance level.

Decision: Reject H 0 .

Conclusion: at least one of the 4 independents is significantly different than zero.

Omitted Variable Bias in Multiple Regression

This is the bias in the OLS estimator arising when at least one included regressor gets collaborated with an omitted variable. The following conditions must be satisfied for an omitted variable bias to occur:

  • There must be a correlation between at least one of the included regressors and the omitted variable.
  • The dependent variable \(Y\) must be determined by the omitted variable.

Practical Interpretation of the \({ R }^{ 2 }\) and the adjusted \({ R }^{ 2 }\), \({ \bar { R } }^{ 2 }\)

To determine the accuracy within which the OLS regression line fits the data, we apply the coefficient of determination and the regression’s standard error . 

The coefficient of determination, represented by \({ R }^{ 2 }\), is a measure of the “goodness of fit” of the regression. It is interpreted as the percentage of variation in the dependent variable explained by the independent variables

hypothesis testing in regression

\({ R }^{ 2 }\) is not a reliable indicator of the explanatory power of a multiple regression model.Why? \({ R }^{ 2 }\) almost always increases as new independent variables are added to the model, even if the marginal contribution of the new variable is not statistically significant. Thus, a high \({ R }^{ 2 }\) may reflect the impact of a large set of independents rather than how well the set explains the dependent.This problem is solved by the use of the adjusted \({ R }^{ 2 }\) (extensively covered in chapter 8)

The following are the factors to watch out when guarding against applying the \({ R }^{ 2 }\) or the \({ \bar { R } }^{ 2 }\):

  • An added variable doesn’t have to be statistically significant just because the \({ R }^{ 2 }\) or the \({ \bar { R } }^{ 2 }\) has increased.
  • It is not always true that the regressors are a true cause of the dependent variable, just because there is a high \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).
  • It is not necessary that there is no omitted variable bias just because we have a high \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).
  • It is not necessarily true that we have the most appropriate set of regressors just because we have a high \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).
  • It is not necessarily true that we have an inappropriate set of regressors just because we have a low \({ R }^{ 2 }\) or \({ \bar { R } }^{ 2 }\).

An economist tests the hypothesis that GDP growth in a certain country can be explained by interest rates and inflation.

Using some 30 observations, the analyst formulates the following regression equation:

$$ GDP growth = { \hat { \beta } }_{ 0 } + { \hat { \beta } }_{ 1 } Interest+ { \hat { \beta } }_{ 2 } Inflation $$

Regression estimates are as follows:

 

Intercept

0.10

0.5%

Interest rates

0.20

0.05

Inflation

0.15

0.03

Is the coefficient for interest rates significant at 5%?

  • Since the test statistic < t-critical, we accept H 0 ; the interest rate coefficient is  not   significant at the 5% level.
  • Since the test statistic > t-critical, we reject H 0 ; the interest rate coefficient is not significant at the 5% level.
  • Since the test statistic > t-critical, we reject H 0 ; the interest rate coefficient is significant at the 5% level.
  • Since the test statistic < t-critical, we accept H 1 ; the interest rate coefficient is significant at the 5% level.

The correct answer is  C .

We have GDP growth = 0.10 + 0.20(Int) + 0.15(Inf)

Hypothesis:

$$ { H }_{ 0 }:{ \hat { \beta } }_{ 1 } = 0 \quad vs \quad { H }_{ 1 }:{ \hat { \beta } }_{ 1 }≠0 $$

The test statistic is:

$$ t = \left( \frac { 0.20 – 0 }{ 0.05 } \right)  = 4 $$

The critical value is t (α/2, n-k-1) = t 0.025,27  = 2.052 (which can be found on the t-table).

t-table-25-29

Conclusion : The interest rate coefficient is significant at the 5% level.

Offered by AnalystPrep

hypothesis testing in regression

Modeling Cycles: MA, AR, and ARMA Models

Empirical approaches to risk metrics and hedging, fundamentals of probability.

After completing this reading, you should be able to: Describe an event and... Read More

Insurance Companies and Pension Plans

After completing this reading, you should be able to: Describe the key features... Read More

GARP Code of Conduct

After completing this reading, you should be able to: Describe the responsibility of... Read More

Binomial Trees

After completing this reading you should be able to: Calculate the value of... Read More

Leave a Comment Cancel reply

You must be logged in to post a comment.

hypothesis testing in regression

T-test and Hypothesis Testing (Explained Simply)

Understand the concept and find how to avoid typical mistakes.

Artem Dementyev

Artem Dementyev

Towards Data Science

Student’s t-tests are commonly used in inferential statistics for testing a hypothesis on the basis of a difference between sample means. However, people often misinterpret the results of t-tests, which leads to false research findings and a lack of reproducibility of studies. This problem exists not only among students. Even instructors and “serious” researchers fall into the same trap. To prove my words, I can link this article , but there are others.

Another problem is that I’ve often seen and heard complaints from some students that their teachers don’t explain the concept of t-tests sufficiently. Instead, they focus on calculations and interpretation of the results. Nowadays, scientists use computers to calculate t-statistic automatically, so there is no reason to drill the usage of formulas and t-distribution tables, except for the purpose of understanding how it works . As for interpretation, there is nothing wrong with it, although without comprehension of the concept it may look like blindly following the rules. Actually, it is. Do you remember?

“Absolute t-value is greater than t-critical, so the null hypothesis is rejected and the alternate hypothesis is accepted”.

If you are familiar with this statement and still have problems with understanding it, most likely, you’ve been unfortunate to get the same training. These problems with intuition can lead to problems with decision-making while testing hypotheses. So, besides knowing what values to paste into the formula and how to use t-tests, it is necessary to know when to use it, why to use it, and the meaning of all that stuff.

This article is intended to explain two concepts: t-test and hypothesis testing. At first, I wanted to explain only t-tests. Later, I decided to include hypothesis testing because these ideas are so closely related that it would be difficult to tell about one thing while losing sight of another. Eventually, you will see that t-test is not only an abstract idea but has good common sense.

Be prepared, this article is pretty long. Take a look at the article outline below to not get lost.

Article outline:

Hypothesis testing.

  • T-test definition and formula explanation

Choosing the level of significance

T-distribution and p-value.

Meet David! He is a high school student and he has started to study statistics recently.

David wants to figure out whether his schoolmates from class A got better quarter grades in mathematics than those from class B. There is a 5-point grading system at school, where 5 is the best score. Students have no access to other students' grades because teachers keep their data confidential and there are approximately 30 students in both classes.

David cannot ask all the students about their grades because it is weird and not all the students are happy to tell about their grades. If he asks just his friends from both classes, the results will be biased. Why? Because we tend to make friends with people with similar interests. So, it is very likely that friends of David have more or less similar scores.

That is, David decided to take a sample of 6 random students from both classes and he asked them about math quarter grades. He got the following results:

It seems that students from class B outperform students from class A. But David did not ask other people! Maybe if he asked all the students, he could get the reverse result. Who knows? So, here is the problem and it needs to be solved scientifically.

To check whether the result was not likely to occur randomly or by chance, David can use the approach called hypothesis testing . A hypothesis is a claim or assumption that we want to check. The approach is very similar to a court trial process, where a judge should decide whether an accused person is guilty or not. There are two types of hypotheses:

  • Null hypothesis (H₀) — the hypothesis that we have by default, or the accepted fact. Usually, it means the absence of an effect. By analogy with the trial process, it is “presumption of innocence” — a legal principle that every person accused of any crime is considered innocent until proven guilty.
  • Alternative hypothesis (H₁) — the hypothesis that we want to test. In other words, the alternative hypothesis will be accepted only if we gather enough evidence to claim that the effect exists.

The null hypothesis and alternative hypothesis are always mathematically opposite. The possible outcomes of hypothesis testing:

  • Reject the null hypothesis —a person is found guilty.
  • Fail to reject the null hypothesis — the accused is acquitted.

David decided to state hypotheses in the following way:

  • H₀ — There is no difference in the grade means of those students in class A and those from class B.
  • H₁ — There is a difference in the grade means of those students in class A and those from class B.

Now, David needs to gather enough evidence to show that students in two classes have different academic performances. But, what can he consider as “evidence”?

T-test definition, formula explanation, and assumptions.

The T-test is the test, which allows us to analyze one or two sample means, depending on the type of t-test. Yes, the t-test has several types:

  • One-sample t-test — compare the mean of one group against the specified mean generated from a population. For example, a manufacturer of mobile phones promises that one of their models has a battery that supports about 25 hours of video playback on average. To find out if the manufacturer is right, a researcher can sample 15 phones, measure the battery life and get an average of 23 hours. Then, he can use a t-test to determine whether this difference is received not just by chance.
  • Paired sample t-test — compares the means of two measurements taken from the same individuals, objects, or related units. For instance, students passed an additional course for math and it would be interesting to find whether their results became better after course completion. It is possible to take a sample from the same group and use the paired t-test.
  • An Independent two-sample t-test —is used to analyze the mean comparison of two independent groups. Like two groups of students. Does it remind you of something?

Exactly. David wants to use the independent two-sample t-test to check if there is a real difference between the grade means in A and B classes, or if he got such results by chance. Two groups are independent because students who study in class A cannot study in class B and reverse. And the question is how David can use such a test?

We have the following formula of t-statistic for our case, where the sample size of both groups is equal:

The formula looks pretty complicated. However, it can be presented in another way:

Basically, t-statistic is a signal-to-noise ratio . When we assume that the difference between the two groups is real, we don’t expect that their means are exactly the same. Therefore, the greater the difference in the means, the more we are confident that the populations are not the same. However, if the data is too scattered (with high variance), then the means may have been a result of randomness and we got ones by chance. Especially, when we have a small sample size, like 3–5 observations.

Why is that? Take for example the salary of people living in two big Russian cities — Moscow and St. Petersburg.

There is a very high variance because the salary ranges from approximately $100 up to millions of dollars. So, if you decided to find whether the difference in means between the two cities exists, you may take a sample of 10 people and ask about their salaries. I know, it is very unlikely that you’ll face some millionaire on a street and I know, it is a bit strange to compare average salaries instead of median salaries. Nevertheless, if you took the sample correctly, you may find that the salary of people is highly scattered in both cities. For instance, in St. Petersburg, the mean is $7000 and the standard deviation is $990, in Moscow — $8000 is the mean and $1150 standard deviation. In such a situation, you can’t be confident whether the difference in means is statistically significant. That’s because you asked only 10 people and the variance of salary is high, hence you could get such results just by chance.

Thus, the concept of t-statistic is just a signal-to-noise ratio. With less variance, more sample data, and a bigger mean difference, we are more sure that this difference is real.

I could take an even closer look at the formula of t-statistic, but for the purpose of clarity, I won’t. If you want, you can read the proof here . Knowing the idea of the t-test would be enough for effective usage.

Let’s also cover some assumptions regarding the t-test . There are 5 main assumptions listed below:

  • The data is collected from a representative, randomly selected portion of the total population. This is necessary to generalize our findings to our target population (in the case of David — to all students in two classes).
  • Data should follow a continuous or discrete scale of measurement. We can consider grades as an example of discrete data.
  • Means should follow the normal distribution, as well as the population. Not sample data, as some people may think, but means and population. This needs a more detailed explanation, which I give in the section about t-distributions.
  • (for independent t-test) Independence of the observations. Each subject should belong to only one group. There is no relationship between the observations in each group. Otherwise, use the paired t-test .
  • (for an independent t-test with equal variance) Homogeneity of variances. Homogeneous, or equal, variance exists when the standard deviations of samples are approximately equal. It is possible to test for variance equality using F-test or Levene test. Otherwise, we should use Welch’s t-test.

So, t-statistic is the evidence that David needs to gather in order to claim that the difference in means of two groups of students is not taking place by chance. If there will be enough evidence, then David can reject the null hypothesis. The question is how much evidence is enough?

David needs to determine whether a result he has got is likely due to chance or to some factor of interest. He can find t-statistic as the evidence, but how much risk David is willing to take for making a wrong decision ? This risk can be represented as the level of significance (α).

The significance level is the desired probability of rejecting the null hypothesis when it is true . For instance, if a researcher selects α=0.05, it means that he is willing to take a 5% risk of falsely rejecting the null hypothesis. Or, in other words, to take the 5% risk of conviction of an innocent. Statisticians often choose α=0.05, while α=0.01 and α=0.1 are also widely used. However, this choice is only a convention, based on R. Fisher’s argument that a 1/20 chance represents an unusual sampling occurrence. This arbitrary threshold was established in the 1920s when a sample size of more than 100 was rarely used.

We don’t want to set the level of significance mindlessly. But what approach we should use to choose this value? Well, describing such an approach in detail is a topic for another article because there are a lot of things to talk about. Still, I’m going to give a quick explanation of the factors to consider while choosing an optimal level of significance. According to J. Kim (2021), these factors include:

  • losses from incorrect decisions;
  • the researcher’s prior belief for the H₀ and H₁ ;
  • the power of the test;
  • substantive importance of the relationship being tested.

By saying “the researcher should consider losses from incorrect decisions”, it is meant that the researcher has to figure out whether Type I error is more important than Type II error, or reverse.

Type I error means rejecting the null hypothesis when it’s actually true .

Type II error occurs when a statistician fails to reject a null hypothesis that is actually false .

Notice that Type I error has almost the same definition as the level of significance (α). The difference is that Type I error is the actual error, while the level of significance represents the desired risk of committing such error. The risk of committing Type II error is represented by the β sign and 1-β stands for the power of the test. In other words, the power is the probability that the test correctly rejects the null hypothesis . It is also called as “true positive rate”.

There may be cases when a Type I error is more important than a Type II error, and the reverse is also true. Take A/B testing as an example. A researcher wants to test two versions of a page on a website. After running the t-test one incorrectly concludes that version B is better than version A. As a consequence, the website starts to lose conversions. Another case is testing for pregnancy. Suppose, there are two tests available. Test 1 has a 5% chance of Type I error and a 20% chance of Type II error. Test 2 has a 20% chance of Type I error and 5% of Type II error. In this case, a doctor would prefer using Test 2 because misdiagnosing a pregnant patient (Type II error) can be dangerous for the patient and her baby.

The second thing that needs to be considered is the researcher’s prior belief in two hypotheses. The word “prior” means that a researcher has a personal assumption on the probability of H₀ relative to H₁ before looking at one’s data. However, the assumption should not be arbitrary or irrational just because it is “personal”. It needs to be based on good argumentation. For example, the judgment can preferably be informed by previous data and experiences. Let’s say that some researcher has invented a drug, which can cure cancer. There had been many researchers before him with similar “inventions”, whose attempts had failed. That is, the researcher believes that the probability of H₁ (i. e. the drug can cure cancer) is highly unlikely and is about 0.001. In another case, if a statistician a priori believes that H₀ and H₁ are equally likely, then the probability for both hypotheses will be 0.5.

The third factor is substantive importance or the effect size. It accounts for the question of how big the effect size is of the relationship being tested. When there is a big sample size, the t-test often shows the evidence in favor of the alternative hypothesis, although the difference between the means is negligible. While testing on small sample sizes, the t-test can suggest that H₀ should not be rejected, despite a large effect. That’s why it is recommended to set a higher level of significance for small sample sizes and a lower level for large sample sizes.

While reading all this, you may think: “OK, I understand that the level of significance is the desired risk of falsely rejecting the null hypothesis. Then, why not set this value as small as possible in order to get the evidence as strongest as possible ? So, if I conduct a study, I can always set α around 0.00001 (or less) and get valid results”.

There is a reason why we shouldn’t set α as small as possible. Partially, we’ve already talked about it when presenting the concept of substantive importance — on small sample sizes we can miss a large effect if α is too small. But the answer is hidden in the fourth factor that we haven’t discussed yet. And it is the power.

There is a relationship between the level of significance and the power. These values depend on each other. Making decisions on them is like deciding where to spend money or how to spend free time. There are benefits in one area and there are losses in another area. The relationship between α and β is represented in a very simple diagram below. Note that β is the probability of Type II error, not power (power is 1-β).

As you see, there is a trade-off between α and β. The optimal value of α can be chosen after estimating the value of β. It can be done in one of the following two ways:

  • using the assumption of normality
  • using bootstrapping

It is preferred to use the second method for calculating the power because there are many cases when the assumption of normality fails or is unjustifiable. The bootstrapping approach doesn’t rely on this assumption and takes full account of sampling variability. That’s why it is widely used in practice.

So, how to use bootstrapping to calculate the power?

In the case of David, there are 3 steps:

  • Generate independent samples from class A and class B;
  • Perform the test, comparing class A to class B, and record whether the null hypothesis was rejected;
  • Repeat steps 1–2 many times and find the rejection rate — this is the estimated power.

Calculating the power is only one step in the calculation of expected losses.

The optimal value of α can be chosen in 3 steps:

  • Choose a grid of α ∈ (0,1)
  • For each value of α, calculate β (using the 3-step process described above) and expected loss by the formula above
  • Find the value of α that minimizes expected loss

Let’s get back to David. He wants to set the desired risk of falsely rejecting H₀. To do this correctly David considers 4 factors that we’ve already discussed. First, he thinks that Type I and Type II errors are equally important. Second, David believes that students in both classes do not have the same grades. That is, he gives more weight to his alternative hypothesis (P=0.4, 1-P=0.6). Third, because the sample size is small, David decides to raise α much higher than 0.05 to not to miss a possible substantial effect size. The last thing that he needs to do is to estimate the power. For estimating the power it is necessary to choose a grid of possible values of α and for each α carry out multiple t-tests to estimate the power. For now, David knows that the null hypothesis should be rejected if the p-value is greater than the level of significance. Otherwise, one fails to reject the null hypothesis. In the following section I explain the meaning of the p-value, but let’s leave this for now.

The whole process of calculating the optimal level of significance can be expressed in the R code below:

David found that α = 0.8 is the optimal value. Notice how far it is from the conventional level of 0.05.

So, David set the level of significance equal to 0.8. Now, he can calculate the t-statistic.

After calculation, he figured out that t-statistic = -0.2863. Why this value is negative? Because we observe a negative effect. In this sample, students from class B perform better in math, though David supposed that students from class A are better. The other thing that we found is that the signal is about 28.6% from the noise. It almost gets lost. Perhaps, the difference in the means is explained by variance. But how big t-statistic should be to reject the null hypothesis?

That’s where t-distribution comes in. It connects the level of significance and t-statistic so that we could compare the proof boundary and the proof itself. The idea of t-distribution is not as hard as one might think. Consider the example of comparing the mean SAT scores of two cities. We know that in both cities SAT scores follow the normal distribution and the means are equal, i.e. the null hypothesis is true. Note that SAT scores from both cities represent two populations, not samples.

From this point, we can start to develop our logic. We decided to emulate the actions of a person, who wants to compare the means of two cities but have no information about the population. Of course, one would take samples from each distribution. Let’s say, the sample size was 10. The following R code generates SAT distributions, takes samples from both, and calculates the t-statistic.

We got value of t-statistic equal to 1.09. It shows some signal, which is strange because we know that H₀ is true and t-value should be equal to zero. Why is that? That’s because we got unlucky with our samples. It would be interesting to know how t-statistic would change if we take samples 70 thousand times. Let’s do it.

Well, we’ve got a huge list of t-values. Let’s plot ones.

That’s it. Now we have a distribution of t-statistic that is very similar to Student’s t-distribution. T-distribution looks like the normal distribution but it has heavier tails. Also, it can look different depending on sample size, and with more observations, it approximates the normal distribution. T-distribution can be interpreted as follows. There is a high chance of getting a t-value equal to zero when taking samples. It makes sense — when the null hypothesis is true, the t-value should be equal to zero because there is no signal. But the further away the t-value is from zero, the less likely we are to get it. For instance, it is very unlikely to get t=6. But a question arises there. How much it is likely or unlikely to get a certain t-value?

The probability of getting a t-value at least as extreme as the t-value actually observed under the assumption that the null hypothesis is correct is called the p-value . In the figure below the probability of observing t>=1.5 corresponds to the red area under the curve.

A very small p-value means that getting a such result is very unlikely to happen if the null hypothesis was true. The concept of p-value helps us to make decisions regarding H₀ and H₁. T-statistic shows the proportion between the signal and the noise, the p-value tells us how often we could observe such a proportion if H₀ would be true, and the level of significance acts as a decision boundary. By analogy to a court trial process, p-value=0.01 is somewhat similar to the next statement: “ If this man is innocent, there is a 1% probability that one would behave like this (change testimony, hide evidence) or even more weirdly ”. The jury can determine whether the evidence is sufficient by comparing the p-value with some standard of evidence (the level of significance). Thus, if α = 0.05 and p-value=0.01, the jury can deliver a “guilty” verdict.

Several notes need to be taken. First, there is a common misinterpretation of the p-value, when people say that “the p-value is the probability that H₀ is true” . Of course, the p-value doesn’t tell us anything about H₀ or H₁, it only assumes that the null hypothesis is true. Consider the example, when David took a sample of students in both classes, who get only 5’s. T-statistic would be obviously 0 because there is no observed difference in the means. In this case, a p-value would be equal to 1, but does it mean that the null hypothesis is true “for certain”? No, not at all! It rather means that David did sampling incorrectly, choosing only the “good” students in math, or that he was extremely unfortunate to get a sample like this. Second, t-distribution was not actually derived by bootstrapping (like I did for educational purposes). In the times of Willam Gosset, there were no computers, so t-distribution was derived mathematically . I decided not to dive deep into math, otherwise, it would be hard to agree that the t-test is “explained simply”. Third, because t-statistic have to follow t-distribution, the t-test requires normality of the population . However, the population should not necessarily have a “perfect” normal distribution, otherwise, the usage of the t-test would be too limited. There may be some skewness or other “imperfections” in the population distribution as long as these “imperfections” allow us to make valid conclusions.

Finally, the critical region (red area on the figure 8) doesn’t have to take only one side. If there is a possibility that the effect (the mean difference) can be positive or negative, it is better to use a two-tailed t-test . The two-tailed t-test can detect the effect from both directions. For David, it is appropriate to use a two-tailed t-test because there is a possibility that students from class A perform better in math (positive mean difference, positive t-value) as well as there is a possibility that students from class B can have better grades (negative mean difference, negative p-value). The one-tailed t-test can be appropriate in cases, when the consequences of missing an effect in the untested direction are negligible, or when the effect can exist in only one direction.

David has calculated a p-value.

It equals 0.7805.

Because David set α = 0.8, he has to reject the null hypothesis .

That’s it. The t-test is done. David now can say with some degree of confidence that the difference in the means didn’t occur by chance. But David still has doubts about whether his results are valid. Perhaps, the problem is connected with the level of significance. David allowed himself to falsely reject the null hypothesis with the probability of 80%. On the other hand, if the level of significance would be set lower, there would be a higher chance of erroneously claiming that the null hypothesis should not be rejected.

Well, that’s the nature of statistics. We never know for certain. Maybe, David could get more confidence in results if he’d get more samples. Who knows what the result of the t-test would show?

Suppose, we are a head teacher, who has access to students’ grades, including grades from class A and class B. We can figure out whether David was right or wrong. Here are the actual results:

Indeed, students from class A did better in math than those from class B. There is a difference between the means, but it is pretty small. Therefore, the alternative hypothesis is true. Let’s calculate the true β (true α we cannot calculate because the null hypothesis is false, therefore, it is impossible to falsely reject the null hypothesis). For our α = 0.8, we found that β = 0.184. Comparing this value to the estimate of β = 0.14, we can say that our bootstrapping approach worked pretty well. Nevertheless, we underestimated the probability of Type II error.

What is the lesson to learn from this information?

Again, don’t be too confident, when you’re doing statistics. You shouldn’t rely on t-tests exclusively when there are other scientific methods available. Your logic and intuition matter. There is another thing to point out. David’s goal was to find out whether students from class A get better quarter grades than those from class B. Suppose that David conducted a rigorous study and figured out the right answer. But do the results have practical significance? Probably, not. What can he do with these results? Yes, students in class A got better quarter grades. But does it mean that students in class A are better in math than students from class B? It is impossible to answer this question, using the data only from one quarter. Perhaps, it would be useful to gather the information from other periods and conduct a time-series analysis. But still, using only observational data it is extremely difficult to find out some causal relationship, if not impossible. So here is another lesson. Do not try to make conclusions about the causality of the relationship observed while using statistical methods, such as t-test or regression.

If you want to take a look at David’s dataset and R code, you can download all of that using this link . A full dataset of students’ grades is also available in the archive. All the datasets were created by me.

Finally, if you have questions, comments, or criticism, feel free to write in the comments section. We all learn from each other.

Thank you for reading!

  • Colquhoun, David. (2017). The reproducibility of research and the misinterpretation of p -values. Royal Society Open Science. 4. 171085. 10.1098/rsos.171085.
  • Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology , 31 (4), 337–350. https://doi.org/10.1007/s10654-016-0149-3
  • Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124. https://doi.org/10.1371/journal.pmed.0020124
  • Kim, J.H. and Choi, I. (2021), Choosing the Level of Significance: A Decision-theoretic Approach. Abacus, 57: 27–71. https://doi.org/10.1111/abac.12172

Artem Dementyev

Written by Artem Dementyev

Aspiring Data Scientist and student at HSE university in St. Petersburg, Russia

Text to speech

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Interpreting Z-Scores of Linear Regression Coefficients

OLS

I am quite new to hypothesis testing and I want to confirm:

A Z-Score $|Z_k|>2$ is said to be at significant at the $5\%$ level. Does this mean that the coefficient $P(\beta_k≠0)=5\%$ ?

The ESL book stated that "A large (absolute) Z-Score $Z_k$ will lead to the rejection of the null hypothesis $\beta_k=0$ ". Why? Shouldn't a large Z-Score imply that $P(\beta_k≠0)$ is much higher given that it is the area to the left or right of the boundary formed by $Z_k$ ?

The statement of the book on (2) contradicts the findings on the book (1)?

  • hypothesis-testing

wd violet's user avatar

The Z-score is a measure of how extreme the observed regression coefficient is under the hypothetical scenario that the true regression coefficient is equal to 0. A large Z score means that the observed regression coefficient is extreme, and therefore unlikely, in this hypothetical scenario. Getting such an extreme coefficient under this scenario makes one doubt the validity of that scenario. That is hypothesis testing, with this hypothetical scenario often called the "null hypothesis".

How do we decide what Z-score counts as too extreme? Well, under the hypothetical scenario that the true regression coefficient is equal to 0, statisticians have figured out how likely a given Z-score is (using the normal distribution curve). Z-scores greater than 2 (in absolute value) only occur about 5% of the time when the true regression coefficient is equal to 0. If we actually witness an event that occurs only only 5% of the time in some hypothetical scenario, we say that result is incompatible with the assumptions of that scenario.

For example, if you were wondering whether a coin was fair (i.e., 50-50 head/tails) and you flipped it 6 times and got the same face each time, such an event would occur about 3% of the time if the coin was actually fair. Such an usual result under the hypothetical scenario that the coin is fair makes us doubt that the coin is fair, so we reject the assumptions of that scenario and claim the coin must not be fair.

So, observing a Z-score greater than 2 would be rare under the assumption that the true regression coefficient is equal to 0. Therefore, we reject this assumption (the null hypothesis) and claim that the true regression coefficient is different from 0.

With this in mind, let's answer your questions directly.

No. What this means is that a Z-score greater than 2 (in absolute value) occurs less than 5% of the time under the assumption that the true regression coefficient is equal to 0. This would be a very unusual event under this assumption, which makes us doubt the assumption. Observations that make us doubt our assumption are described as "significant". If you wanted to use a different standard for what counts as "too extreme" not to doubt our assumption, e.g., you only doubt the assumption if an event occurs that would only occur 1% of the time if the assumption were true, we would use a different critical Z-score to judge our observed Z-scores by (in the case of 1%, that would be 2.58).

A large Z score leads to rejection of the null hypothesis because large Z scores are very unusual when the null hypothesis is true, making us doubt the null hypothesis. The area under the normal distribution curve that is more "extreme" than a Z score of 2 is quite small, representing less than 5% of the total area under the curve, indicating a very unusual result under the assumption that the true regression coefficient is equal to 0.

These statements do not contradict. They are describing the same methodology of hypothesis testing.

Noah's user avatar

Your Answer

Sign up or log in, post as a guest.

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy .

Not the answer you're looking for? Browse other questions tagged regression hypothesis-testing z-score or ask your own question .

  • Featured on Meta
  • Announcing a change to the data-dump process
  • Bringing clarity to status tag usage on meta sites

Hot Network Questions

  • Can unlimited Duration spells be Dismissed?
  • Which version of Netscape, on which OS, appears in the movie “Cut” (2000)?
  • How specific does the GDPR require you to be when providing personal information to the police?
  • printglossary doesnt work if czech symbols are in the glossary
  • Journal keeps messing with my proof
  • Why is the movie titled "Sweet Smell of Success"?
  • What unique phenomena would be observed in a system around a hypervelocity star?
  • If the Hom-space of finite length modules is generated by single elements, must the elements be conjugate?
  • How to create a grid of numbers 25x25 from 1 to 625 in geometry node?
  • magnetic boots inverted
  • Do angels have different dwelling places?
  • Whatever happened to Chessmaster?
  • How can I install NetHunter on a Nokia Lumina 635?
  • Is this screw inside a 2-prong receptacle a possible ground?
  • I overstayed 90 days in Switzerland. I have EU residency and never got any stamps in passport. Can I exit/enter at airport without trouble?
  • Why was this lighting fixture smoking? What do I do about it?
  • Is there a nonlinear resistor with a zero or infinite differential resistance?
  • Why are complex coordinates outlawed in physics?
  • Can you use 'sollen' the same way as 'should' in sentences about possibility that something happens?
  • How can one be honest with oneself if one IS oneself?
  • Determining Error Rate of Phase Shift Keying via Monte Carlo Simulation
  • Has the US said why electing judges is bad in Mexico but good in the US?
  • How would you say a couple of letters (as in mail) if they're not necessarily letters?
  • I have some questions about theravada buddhism

hypothesis testing in regression

arXiv's Accessibility Forum starts next month!

Help | Advanced Search

Statistics > Methodology

Title: hypothesis testing in high-dimensional regression under the gaussian random design model: asymptotic theory.

Abstract: We consider linear regression in the high-dimensional regime where the number of observations $n$ is smaller than the number of parameters $p$. A very successful approach in this setting uses $\ell_1$-penalized least squares (a.k.a. the Lasso) to search for a subset of $s_0< n$ parameters that best explain the data, while setting the other parameters to zero. Considerable amount of work has been devoted to characterizing the estimation and model selection problems within this approach. In this paper we consider instead the fundamental, but far less understood, question of \emph{statistical significance}. More precisely, we address the problem of computing p-values for single regression coefficients. On one hand, we develop a general upper bound on the minimax power of tests with a given significance level. On the other, we prove that this upper bound is (nearly) achievable through a practical procedure in the case of random design matrices with independent entries. Our approach is based on a debiasing of the Lasso estimator. The analysis builds on a rigorous characterization of the asymptotic distribution of the Lasso estimator and its debiased version. Our result holds for optimal sample size, i.e., when $n$ is at least on the order of $s_0 \log(p/s_0)$. We generalize our approach to random design matrices with i.i.d. Gaussian rows $x_i\sim N(0,\Sigma)$. In this case we prove that a similar distributional characterization (termed `standard distributional limit') holds for $n$ much larger than $s_0(\log p)^2$. Finally, we show that for optimal sample size, $n$ being at least of order $s_0 \log(p/s_0)$, the standard distributional limit for general Gaussian designs can be derived from the replica heuristics in statistical physics.
Comments: 63 pages, 10 figures, 11 tables, Section 5 and Theorem 4.5 are added. Other modifications to improve presentation
Subjects: Methodology (stat.ME); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as: [stat.ME]
  (or [stat.ME] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • Other Formats

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • Write My Statistics Paper
  • Jamovi Assignment Help
  • Business Statistics Assignment Help
  • RapidMiner Assignment Help
  • Econometric Software Assignment Help
  • Econometrics Assignment Help
  • Game Theory Assignment Help
  • Take My Statistics Exam
  • Statistics Assignment Helper
  • Statistics Research Paper Assignment Help
  • Do My Statistics Assignment
  • Pay Someone to Do My Statistics Assignment
  • Professional Statistics Assignment Writers
  • Need Help with My Statistics Assignment
  • Hire Someone to Do My Statistics Assignment
  • Statistics Assignment Experts
  • Statistics Midterm Assignment Help
  • Statistics Capstone Project Help
  • Urgent Statistics Assignment Help
  • Take My Statistics Quiz
  • Professional Statistics Assignment Help
  • Statistics Assignment Writers
  • Best Statistics Assignment Help
  • Affordable Statistics Assignment Help
  • Final Year Statistics Project Help
  • Statistics Coursework Help
  • Reliable Statistics Assignment Help
  • Take My Statistics Test
  • Custom Statistics Assignment Help
  • Doctorate Statistics Assignment Help
  • Undergraduate Statistics Assignment Help
  • Graduate Statistics Assignment Help
  • College Statistics Assignment Help
  • ANCOVA Assignment Help
  • SPSS Assignment Help
  • STATA Assignment Help
  • SAS Assignment Help
  • Excel Assignment Help
  • Statistical Hypothesis Formulation Assignment Help
  • LabVIEW Assignment Help
  • LISREL Assignment Help
  • Minitab Assignment Help
  • Analytica Software Assignment Help
  • Statistica Assignment Help
  • Design Expert Assignment Help
  • Orange Assignment Help
  • KNIME Assignment Help
  • WinBUGS Assignment Help
  • Statistix Assignment Help
  • Calculus Assignment Help
  • JASP Assignment Help
  • JMP Assignment Help
  • Alteryx Assignment Help
  • Statistical Software Assignment Help
  • GRETL Assignment Help
  • Apache Hadoop Assignment Help
  • XLSTAT Assignment Help
  • Linear Algebra Assignment Help
  • Data Analysis Assignment Help
  • Finance Assignment Help
  • ANOVA Assignment Help
  • Black Scholes Assignment Help
  • Experimental Design Assignment Help
  • CNN (Convolutional Neural Network) Assignment Help
  • Statistical Tests and Measures Assignment Help
  • Multiple Linear Regression Assignment Help
  • Correlation Analysis Assignment Help
  • Data Classification Assignment Help
  • Decision Making Assignment Help
  • 2D Geometry Assignment Help
  • Distribution Theory Assignment Help
  • Decision Theory Assignment Help
  • Data Manipulation Assignment Help
  • Binomial Distributions Assignment Help
  • Linear Regression Assignment Help
  • Statistical Inference Assignment Help
  • Structural Equation Modeling (SEM) Assignment Help
  • Numerical Methods Assignment Help
  • Markov Processes Assignment Help
  • Non-Parametric Tests Assignment Help
  • Multivariate Statistics Assignment Help
  • Data Flow Diagram Assignment Help
  • Bivariate Normal Distribution Assignment Help
  • Matrix Operations Assignment Help
  • CFA Assignment Help
  • Mathematical Methods Assignment Help
  • Probability Assignment Help
  • Kalman Filter Assignment Help
  • Kruskal-Wallis Test Assignment Help
  • Stochastic Processes Assignment Help
  • Chi-Square Test Assignment Help
  • Six Sigma Assignment Help
  • Hypothesis Testing Assignment Help
  • GAUSS Assignment Help
  • GARCH Models Assignment Help
  • Simple Random Sampling Assignment Help
  • GEE Models Assignment Help
  • Principal Component Analysis Assignment Help
  • Multicollinearity Assignment Help
  • Linear Discriminant Assignment Help
  • Logistic Regression Assignment Help
  • Survival Analysis Assignment Help
  • Nonparametric Statistics Assignment Help
  • Poisson Process Assignment Help
  • Cluster Analysis Assignment Help
  • ARIMA Models Assignment Help
  • Measures of central tendency
  • Time Series Analysis Assignment Help
  • Factor Analysis Assignment Help
  • Regression Analysis Assignment Help
  • Survey Methodology Assignment Help
  • Statistical Modeling Assignment Help
  • Survey Design Assignment Help
  • Linear Programming Assignment Help
  • Confidence Interval Assignment Help
  • Quantitative Methods Assignment Help

Handling Complex Regression Analysis and Hypothesis Testing Assignments

Mason Reynolds

Submit Your Hypothesis Testing Assignment

Get FREE Quote

Avail Your Offer

Unlock success this fall with our exclusive offer! Get 20% off on all statistics assignments for the fall semester at www.statisticsassignmenthelp.com. Don't miss out on expert guidance at a discounted rate. Enhance your grades and confidence. Hurry, this limited-time offer won't last long!

accept Master Card payments

  • Tackling Complex Regression Analysis and Hypothesis Testing Tasks

Grasping the Assignment

Organizing and preparing data, conducting regression analysis, interpreting data and evaluating models, practical tips for analysis.

Statistics assignments that involve regression analysis and hypothesis testing can be intricate and challenging. However, with the right approach and a solid understanding of the concepts, students can handle these tasks proficiently. This blog will outline a structured method for addressing assignments related to regression analysis , hypothesis testing , and data interpretation . By following these steps, students can not only gain clarity on their assignments but also develop their statistical skills. If you're looking for help with statistics assignment , this comprehensive guide will empower you to tackle similar problems with confidence and accuracy, ensuring you achieve the best results possible.

The first step in tackling any statistics assignment is to thoroughly read and comprehend the problem statement. It's crucial to identify the dependent and independent variables and understand the relationships you're expected to explore. For instance, if you are tasked with investigating the link between campaign expenditures on television advertisements and voter turnout, recognize which variable is dependent (voter turnout) and which is independent (campaign expenditures). This initial understanding will guide your entire analysis.

Tackling-Complex-Regression-Analysis-and-Hypothesis-Testing-Tasks

Once you've identified the variables, clarify the specific tasks you're required to complete. Whether you need to perform regression analysis, hypothesis testing, or interpret data, having a clear grasp of the problem will streamline your approach and ensure that you address each component effectively.

After understanding the problem, the next step is to collect and organize the data. This may involve compiling data from tables, graphs, or datasets provided in the assignment. For example, if you have data on sales revenue and advertising expenditure, organize this information in a structured format that will facilitate analysis.

Ensure that the data is clean and accurately reflects the variables you are studying. Proper organization of data not only simplifies the analysis process but also helps in visualizing the information effectively. Use tables or spreadsheets to sort the data, making it easier to perform subsequent calculations and interpretations.

Regression analysis is a fundamental aspect of many statistics assignments. Here's a breakdown of how to approach this task:

  • Calculate Regression Parameters: Begin by calculating the regression parameters, such as the slope and intercept of the regression line. These parameters will help you understand the relationship between the dependent and independent variables. For example, if analyzing the effect of campaign expenditures on voter turnout, determine how changes in expenditures are likely to affect voter turnout.
  • Create Scatter Plots: Visualize the relationship between the variables by creating a scatter plot. Place the dependent variable on the Y-axis and the independent variable on the X-axis. Overlay the regression line to assess how well it fits the data. A well-fitting line indicates a strong relationship between the variables.
  • Perform Hypothesis Testing: Test the null hypothesis to determine if there is a significant relationship between the variables. For instance, you might test whether the slope of the regression line is zero, which would indicate no effect. Conduct appropriate statistical tests and interpret the results to understand the significance of the relationship.
  • Calculate Confidence Intervals: Determine confidence intervals for the regression parameters to assess their precision. For example, a 95% confidence interval for the slope provides a range within which the true slope is likely to fall. This interval helps gauge the reliability of your estimates.
  • Sketch Confidence Bands: If required, draw confidence bands on your scatter plot. These bands represent the range within which future data points are expected to fall. Use these bands to estimate values and interpret their implications in the context of your problem.

For assignments involving more complex models or multiple variables, consider the following steps:

  • Construct ANOVA Tables: When dealing with multiple independent variables, create ANOVA tables to evaluate the significance of the overall regression model. ANOVA helps partition the total variation into components explained by the model and those due to random error.
  • Calculate R-squared Values: Determine the R-squared value to understand how well the model explains the variability in the dependent variable. This statistic provides insight into the proportion of the total variation accounted for by the model.
  • Compare Models: When comparing different models, use statistical tests to assess which model fits the data best. This may involve comparing nested models or evaluating the inclusion of additional variables. Choose the model that most accurately represents the relationship between the variables.
  • Ensure Accuracy: Double-check all calculations and ensure that data is accurately entered and processed. Mistakes in data handling can lead to incorrect conclusions.
  • Use Statistical Software: Leverage statistical software to perform complex calculations and generate plots. Software tools can simplify the process and reduce the likelihood of errors.
  • Interpret Results in Context: Always interpret your results within the context of the problem. Understand what the statistical outputs mean in real-world terms and how they address the research question.
  • Review and Revise: After completing your analysis, review your work to ensure that all parts of the assignment are addressed. Revise any sections as needed to improve clarity and accuracy.

Tackling regression analysis and hypothesis testing assignments requires a systematic approach that includes understanding the problem, organizing data, performing regression analysis, and interpreting results. By following these steps, students can effectively handle similar assignments and develop a strong grasp of statistical methods.

Practicing these techniques will not only help you in completing your assignments but also enhance your overall statistical skills. With a clear approach and attention to detail, you'll be well-equipped to tackle any statistics problem that comes your way.

You Might Also Like

Our popular services.

  • DOI: 10.3390/math12162588
  • Corpus ID: 272115242

A U-Statistic for Testing the Lack of Dependence in Functional Partially Linear Regression Model

  • Fanrong Zhao , Baoxue Zhang
  • Published in Mathematics 21 August 2024
  • Mathematics

25 References

Robust hypothesis testing in functional linear models, gene association analysis of quantitative trait based on functional linear regression model with local sparse estimator, testing linearity in functional partially linear models, functional data analysis for the detection of outliers and study of the effects of the covid-19 pandemic on air quality: a case study in gijón, spain, a new test for high‐dimensional regression coefficients in partially linear models, nonparametric testing of lack of dependence in functional linear models, general linear hypothesis testing in functional response model, estimation and testing for partially functional linear errors-in-variables models, estimation on semi-functional linear errors-in-variables models, robust u-type test for high dimensional regression coefficients using refitted cross-validation variance estimation, related papers.

Showing 1 through 3 of 0 Related Papers

A comprehensive comparison of goodness-of-fit tests for logistic regression models

  • Original Paper
  • Published: 30 August 2024
  • Volume 34 , article number  175 , ( 2024 )

Cite this article

hypothesis testing in regression

  • Huiling Liu 1 ,
  • Xinmin Li 2 ,
  • Feifei Chen 3 ,
  • Wolfgang Härdle 4 , 5 , 6 &
  • Hua Liang 7  

We introduce a projection-based test for assessing logistic regression models using the empirical residual marked empirical process and suggest a model-based bootstrap procedure to calculate critical values. We comprehensively compare this test and Stute and Zhu’s test with several commonly used goodness-of-fit (GoF) tests: the Hosmer–Lemeshow test, modified Hosmer–Lemeshow test, Osius–Rojek test, and Stukel test for logistic regression models in terms of type I error control and power performance in small ( \(n=50\) ), moderate ( \(n=100\) ), and large ( \(n=500\) ) sample sizes. We assess the power performance for two commonly encountered situations: nonlinear and interaction departures from the null hypothesis. All tests except the modified Hosmer–Lemeshow test and Osius–Rojek test have the correct size in all sample sizes. The power performance of the projection based test consistently outperforms its competitors. We apply these tests to analyze an AIDS dataset and a cancer dataset. For the former, all tests except the projection-based test do not reject a simple linear function in the logit, which has been illustrated to be deficient in the literature. For the latter dataset, the Hosmer–Lemeshow test, modified Hosmer–Lemeshow test, and Osius–Rojek test fail to detect the quadratic form in the logit, which was detected by the Stukel test, Stute and Zhu’s test, and the projection-based test.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

hypothesis testing in regression

Similar content being viewed by others

hypothesis testing in regression

A generalized Hosmer–Lemeshow goodness-of-fit test for a family of generalized linear models

hypothesis testing in regression

Fifty Years with the Cox Proportional Hazards Regression Model

hypothesis testing in regression

CPMCGLM: an R package for p -value adjustment when looking for an optimal transformation of a single explanatory variable in generalized linear models

Explore related subjects.

  • Artificial Intelligence

Data availibility

No datasets were generated or analysed during the current study.

Chen, K., Hu, I., Ying, Z.: Strong consistency of maximum quasi-likelihood estimators in generalized linear models with fixed and adaptive designs. Ann. Stat. 27 (4), 1155–1163 (1999)

Article   MathSciNet   Google Scholar  

Dardis, C.: LogisticDx: diagnostic tests and plots for logistic regression models. R package version 0.3 (2022)

Dikta, G., Kvesic, M., Schmidt, C.: Bootstrap approximations in model checks for binary data. J. Am. Stat. Assoc. 101 , 521–530 (2006)

Ekanem, I.A., Parkin, D.M.: Five year cancer incidence in Calabar, Nigeria (2009–2013). Cancer Epidemiol. 42 , 167–172 (2016)

Article   Google Scholar  

Escanciano, J.C.: A consistent diagnostic test for regression models using projections. Economet. Theor. 22 , 1030–1051 (2006)

Härdle, W., Mammen, E., Müller, M.: Testing parametric versus semiparametric modeling in generalized linear models. J. Am. Stat. Assoc. 93 , 1461–1474 (1998)

MathSciNet   Google Scholar  

Harrell, F.E.: rms: Regression modeling strategies. R package version 6.3-0 (2022)

Hosmer, D.W., Hjort, N.L.: Goodness-of-fit processes for logistic regression: simulation results. Stat. Med. 21 (18), 2723–2738 (2002)

Hosmer, D.W., Lemesbow, S.: Goodness of fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9 , 1043–1069 (1980)

Hosmer, D.W., Hosmer, T., Le Cessie, S., Lemeshow, S.: A comparison of goodness-of-fit tests for the logistic regression model. Stat. Med. 16 (9), 965–980 (1997)

Hosmer, D., Lemeshow, S., Sturdivant, R.: Applied Logistic Regression. Wiley Series in Probability and Statistics, Wiley, New York (2013)

Book   Google Scholar  

Jones, L.K.: On a conjecture of Huber concerning the convergence of projection pursuit regression. Ann. Stat. 15 , 880–882 (1987)

Kohl, M.: MKmisc: miscellaneous functions from M. Kohl. R package version, vol. 1, p. 8 (2021)

Kosorok, M.R.: Introduction to Empirical Processes and Semiparametric Inference, vol. 61. Springer, New York (2008)

Lee, S.-M., Tran, P.-L., Li, C.-S.: Goodness-of-fit tests for a logistic regression model with missing covariates. Stat. Methods Med. Res. 31 , 1031–1050 (2022)

Lindsey, J.K.: Applying Generalized Linear Models. Springer, Berlin (2000)

McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. Chapman and Hall (1989)

Nelder, J.A., Wedderburn, R.W.M.: Generalized linear models. J. R. Stat. Soc. Ser. A 135 , 370–384 (1972)

Oguntunde, P.E., Adejumo, A.O., Okagbue, H.I.: Breast cancer patients in Nigeria: data exploration approach. Data Brief 15 , 47 (2017)

Osius, G., Rojek, D.: Normal goodness-of-fit tests for multinomial models with large degrees of freedom. J. Am. Stat. Assoc. 87 (420), 1145–1152 (1992)

Rady, E.-H.A., Abonazel, M.R., Metawe’e, M.H.: A comparison study of goodness of fit tests of logistic regression in R: simulation and application to breast cancer data. Appl. Math. Sci. 7 , 50–59 (2021)

Google Scholar  

Stukel, T.A.: Generalized logistic models. J. Am. Stat. Assoc. 83 (402), 426–431 (1988)

Stute, W., Zhu, L.-X.: Model checks for generalized linear models. Scand. J. Stat. Theory Appl. 29 , 535–545 (2002)

van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer (1996)

van Heel, M., Dikta, G., Braekers, R.: Bootstrap based goodness-of-fit tests for binary multivariate regression models. J. Korean Stat. Soc. 51 (1), 308–335 (2022)

Yin, C., Zhao, L., Wei, C.: Asymptotic normality and strong consistency of maximum quasi-likelihood estimates in generalized linear models. Sci. China Ser. A Math. 49 , 145–157 (2006)

Download references

Acknowledgements

Li’s research was partially supported by NNSFC grant 11871294. Härdle gratefully acknowledges support through the European Cooperation in Science & Technology COST Action grant CA19130 - Fintech and Artificial Intelligence in Finance - Towards a transparent financial industry; the project “IDA Institute of Digital Assets”, CF166/15.11.2022, contract number CN760046/ 23.05.2024 financed under the Romanias National Recovery and Resilience Plan, Apel nr. PNRR-III-C9-2022-I8; and the Marie Skłodowska-Curie Actions under the European Union’s Horizon Europe research and innovation program for the Industrial Doctoral Network on Digital Finance, acronym DIGITAL, Project No. 101119635

Author information

Authors and affiliations.

Department of Statistics, South China University of Technology, Guangzhou, China

Huiling Liu

School of Mathematics and Statistics, Qingdao University, Shandong, 266071, China

Center for Statistics and Data Science, Beijing Normal University, Zhuhai, 519087, China

Feifei Chen

BRC Blockchain Research Center, Humboldt-Universität zu Berlin, 10178, Berlin, Germany

Wolfgang Härdle

Dept Information Management and Finance, National Yang Ming Chiao Tung U, Hsinchu, Taiwan

IDA Institute Digital Assets, Bucharest University of Economic Studies, Bucharest, Romania

Department of Statistics, George Washington University, Washington, DC, 20052, USA

You can also search for this author in PubMed   Google Scholar

Contributions

LHL, LXM and LH wrote the main manuscript text, LHL and CFF program, HW commented on the methodological section. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hua Liang .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Liu, H., Li, X., Chen, F. et al. A comprehensive comparison of goodness-of-fit tests for logistic regression models. Stat Comput 34 , 175 (2024). https://doi.org/10.1007/s11222-024-10487-5

Download citation

Received : 02 December 2023

Accepted : 19 August 2024

Published : 30 August 2024

DOI : https://doi.org/10.1007/s11222-024-10487-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Consistent test
  • Model based bootstrap (MBB)
  • Residual marked empirical process (RMEP)
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 26 August 2024

Competence over confidence: uncovering lower self-efficacy for women residents during central venous catheterization training

  • Haroula Tzamaras 1 ,
  • Elizabeth Sinz 2 ,
  • Michael Yang 3 ,
  • Phillip Ng 3 ,
  • Jason Moore 4 &
  • Scarlett Miller 1  

BMC Medical Education volume  24 , Article number:  923 ( 2024 ) Cite this article

89 Accesses

Metrics details

While women make up over 50% of students enrolled in medical school, disparities in self-efficacy of medical skills between men and women have been observed throughout medical education. This difference is significant because low self-efficacy can impact learning, achievement, and performance, and thus create gender-confidence gaps. Simulation-based training (SBT) employs assessments of self-efficacy, however, the Dunning-Kruger effect in self-assessment posits that trainees often struggle to recognize their skill level. Additionally, the impact of gender on self-efficacy during SBT has not been as widely studied. The objective of this study was to identify if the gender-confidence gap and the Dunning-Kruger effect exist in SBT for central venous catheterization (CVC) on the dynamic haptic robotic trainer (DHRT) utilizing comparisons of self-efficacy and performance.

173 surgical residents (N women =61, N men =112) underwent training on the DHRT system over two years. Before and after using the DHRT, residents completed a 14-item Central Line Self-Efficacy survey (CLSE). During training on the DHRT, CVC performance metrics of the number of insertion attempts, backwall puncture, and successful venipuncture were also collected. The pre- and post-CLSE, DHRT performance and their relationship were compared between men and women.

General estimating equation results indicated that women residents were significantly more likely to report lower self-efficacy for 9 of the 14 CLSE items ( p  < .0035). Mann-Whitney U and Fisher’s exact tests showed there were no performance differences between men and women for successfully accessing the vein on the DHRT. Regression models relating performance and self-efficacy found no correlation for either gender.

Conclusions

These results indicate that despite receiving the same SBT and performing at the same level, the gender-confidence gap exists in CVC SBT, and the Dunning-Kruger effect may also be evident.

Peer Review reports

Despite the steady increase of the percentage of women enrolled in medical school in the United States from 27.9% in 2000 [ 1 ] to 54.6% in 2023 [ 2 ], researchers have identified a gender-confidence gap in medical training [ 3 , 4 ]. This gender-confidence gap manifests as disparities in self-confidence where women underestimate and undervalue themselves compared to men [ 5 ]. This is worrisome because self-efficacy, or task- or goal- specific confidence [ 6 , 7 ], has been shown to be vital in challenging environments, like medicine, due to its relationship with an individuals’ motivation to engage in tasks and to persevere when faced with training challenges [ 8 ]. Self-efficacy and self-confidence are often used interchangeably, with self-efficacy measures being used to understand confidence for specific tasks and procedures.

It is important to note that higher self-efficacy does not always correlate with improved accuracy and performance [ 9 ]. For example, research has shown that people with lower skill levels often overestimate their abilities, or people with higher skill levels underestimate their abilities, a phenomenon referred to as the Dunning-Kruger Effect [ 10 ]. This effect has been found throughout medical training, from medical students [ 11 ], to medical residency [ 12 ], to attending physicians [ 13 ] where underperformers often rate themselves higher than their actual skill levels while high performers often rate themselves lower. Understanding the Dunning-Kruger Effect is pertinent due to the clinical consequences of this phenomena. For example, medical residents with less knowledge may appear overconfident and knowledgeable but this may not be truly reflective of their actual competency for patient interaction [ 13 ] which can impact physician judgement and lead to misdiagnoses [ 14 ]. In addition, this can lead to mismatches in understanding of new physician skill levels between new physicians and their superiors [ 15 ].

Recent research has indicated that there may be gender effects in the Dunning-Kruger Effect [ 16 ]. For example, women medical students often report lower confidence than men in bedside procedures [ 17 ], and in self-rated performance on surgical clerkship [ 18 ] regardless of actual performance. Similarly, women in surgical residency often rate their knowledge on patient care [ 19 ] and general competency [ 20 ] lower than men despite no gender-based performance differences [ 21 ]. While it has been shown that women have disproportionately lower confidence than men, it is less known how different types of training may impact this.

One area that incorporates self-assessments of confidence is simulation-based training (SBT). SBT using imitations of real procedures and environments [ 22 , 23 ], or simulators which can be full or part body manikins, part task trainers, or other forms, to allow physicians to practice before working with live patients [ 24 , 25 ]. SBT has been widely integrated throughout medical education due to its low-risk, hands-on practice [ 26 , 27 ] and often includes approaches to mastery-based learning including checklists and other forms of assessment [ 28 , 29 ]. Measures of confidence, including self-efficacy, are sometimes used in SBT to provide an indication of trainee learning [ 8 , 30 ], though self-assessments in medical education are often reinforced with other skills assessments [ 31 ]. Self-efficacy is an important construct in SBT due to its relationship with learning [ 6 , 30 ] and achievement [ 32 , 33 ]. For example, in medical training, participation in SBT has been shown to significantly increase trainee self-efficacy for acute skills [ 34 ], emergency room preparedness [ 35 ], intercostal drain insertion [ 36 ], and central venous catheterization (CVC) [ 37 ]. However, the gender-confidence gap has been shown to manifest in SBT [ 38 ] with one study showing that women had lower self-efficacy for obstetric emergency after SBT training despite the fact that there were no gendered performance differences [ 39 ]. However, few studies exist that explore the relationship between gender, confidence, and simulation training. Exploring these effects is important due to of the relationship between confidence and competence [ 40 ] in medicine and how they often change together.

One procedure that is useful for exploring the impact of the gender-confidence gap in SBT in medical residency is central venous catheterization (CVC). CVC is a complex, high volume [ 41 ] medical procedure with up to a 15% error rate [ 42 ] ranging from infectious to mechanical complications. CVC is a bi-manual task that requires a physician to utilize both their dominant and nondominant hands to manage an ultrasound probe and needle and insert a catheter into a central vein for critical medication delivery to the bloodstream [ 41 ]. A physician who was performed less than 50 catheterizations is twice as likely to cause complications [ 41 ], indicating a need to better understand CVC training.

CVC is typically taught using SBT [ 43 ] including one of the newest SBT systems, the Dynamic Haptic Robotic Trainer (DHRT) [ 44 ], see Fig.  1 . The DHRT uses haptic robotic simulation and mock ultrasound to train residents on CVC needle insertion for multiple patient anatomies, because in live patients the location and depth of the internal jugular vein can vary [ 45 ]. The DHRT is as effective as manikin training for CVC in skill and self-efficacy gains [ 46 , 47 ], and is more beneficial than manikin training due to its objective scoring and real-time feedback.

figure 1

The DHRT trainer used in residency training

In light of this previous work, the objective of this study was to compare self-efficacy and performance between men and women residents to answer the following research questions (RQs):

Is there a gender-confidence gap in CVC SBT pre- or post-training?

Are there gender-based performance differences in CVC SBT for technical skills at the end of training?

RQ3: Does the Dunning-Kruger effect exist in CVC SBT post-training?

For RQ1, we hypothesized that women residents would have lower CVC self-efficacy than men residents both pre- and post-training based on previous work thatindicated that women in graduate and post-graduate medical training rate themselves lower in perceived clinical skills [ 20 ], performance [ 18 ], and confidence [ 39 ] than men. For RQ2, we hypothesized that there would be no significant differences in CVC SBT performance between genders based on prior work that found that men and women do not differ in performance for clinical knowledge or technical skills at the residency level [ 20 , 39 ]. For RQ3, we hypothesized that there would be no significant relationship between these variables for either gender, thus supporting the existence of the Dunning-Kruger effect in CVC SBT based on prior literature that found that medical residents’ ability to accurately self-assess skills was weak, [ 12 ], and that medical trainees are often unaware of their actual skill level [ 10 ].

At the start of the study, and prior to training, residents consented to participate in this research by providing informed consent, through an online platform. Next, participants completed an online central line training that consisted of a demographic survey, a pre-online training knowledge assessment, eight interactive video modules that focused on CVC content and steps of the procedure, and a post-online training knowledge assessment, see [ 48 ] and the appendix for detailed description of the online training protocol.

After completing the online training, residents attended an in-person training session. At the start of the training, residents completed a pre-training central line self-efficacy (CLSE) survey. Next, each resident completed a set of trials on the DHRT. In the 2021 cohort, residents at both medical centers performed a total of six trials on the DHRT regardless of performance. In 2022 the system was updated so that the number of trials each resident completed was based on previous performance. To complete training, residents in 2022 had to complete two successful venipunctures on the DHRT after a mandatory training trial, defined as vessel access with minimal insertion attempts and no serious error (e.g. arterial puncture). Thus, the minimum number of insertions required based on the procedure per 2022 resident was 3 and the maximum was 6. Additionally, surgical residents (June_M1) at medical center 1 in 2021, and all residents at medical center 1 in 2022 received additional hands-on procedural practice covering the steps of CVC in greater depth than provided on the DHRT system. For this extra procedural training, the DHRT was extended so residents had a full CVC kit and interactive feedback on steps of the procedure past the needle insertion that is covered on the DHRT. Finally, residents completed a post-training CLSE survey. All residents received the same training regardless of participation in the study. Non-study residents still filled out the CLSE survey for the flow of the training and so that they could visualize their own changes in self-efficacy, however, their results were destroyed after training and not use for analysis.” See Fig.  2 for the complete procedural flow.

figure 2

Procedural flow for medical residents in CVC training

In order to answer our research questions, the following metrics were computed.

Performance metrics

The DHRT measures performance on each trial based on previous research [ 44 , 49 ]. For the current study, the performance on the last trial was used as this was the Verification of Proficiency test. The performance variables of interest for the current study were number of insertion attempts, backwall puncture, and successful venipuncture without arterial puncture. These metrics are defined below.

Insertion attempts

Insertion attempts was computed by the system as the number of insertions it took to achieve access to the vein. For example, if the trainee pierced the needle into the DHRT and then removed the needle fully and re-insert it to readjust, two insertion attempts were computed. Limiting insertion attempts is important to reduce the likelihood of infectious complications associated with multiple needle sticks [ 50 ].

Backwall puncture

A backwall puncture was computed every time a resident inserted the needle into the vein but also punctured the back side of the vessel. Avoiding backwall puncture is necessary to limit the risk of accidental arterial puncture and decrease the risk of treatment complexity caused by mechanical complications [ 41 ].

Successful venipuncture

A successful trial was computed when a resident accessed the vein without puncturing the carotid artery or through the backwall of the vein. Puncturing the carotid artery can lead to serious complications like stroke and death [ 41 ] and potentially the insertion of the catheter into the wrong vessel and as such needs to be avoided [ 51 ].

Central-line self-efficacy (CLSE)

A five-point, 14-item Likert-scale CVC self-efficacy, referred to as the Central-Line Self-Efficacy (CLSE) developed in prior work [39] was used to assess resident confidence on the procedure. On the CLSE, residents rated themselves in their belief in their ability where a one represented not at all confident and five represented extremely confident. The first ten items on the CLSE survey focused on the specific steps of the procedure such as “modifying the needle trajectory” while the last four questions related to broader aspects of the procedure such as “conducting the entire procedure on a simulator”. The full CLSE survey can be found here , however, please note that in later trainings the CLSE was updated to 19-items, so only the first 14 items are relevant to the current study.

Statistical analysis

To assess gender-confidence gaps in CVC SBT (RQ1), a general estimating equation (GEE) was computed with gender and self-efficacy type (pre- or post-training) and their interaction as the independent variables and the 14 CLSE questions as the dependent variables. To account for any potential effect of the additional procedural training in 2022 on self-efficacy, training year was also included as a variable. A GEE was used to extend the standard generalized linear regression model and account for the repeated measures of the pre and posttest. All assumptions were met for the GEE. To assess gender-based performance gaps (RQ2), a Mann-Whitney U test was conducted for the continuous variable, insertion attempts. Fisher’s exact test was conducted for the dichotomous performance variables, backwall puncture and successful venipuncture. All assumptions were met for both of these analyses. Finally, to assess the Dunning-Kruger effect (RQ3), regression analyses were conducted to determine if there was a correlation between self-efficacy and performance. Prior to this, the internal reliability of the CLSE was verified (Cronbach’s alpha = 0.952) justifying the aggregation of the 14 items on the CLSE into one average score. For each regression analysis, the performance metric was the response variable and post self-efficacy, gender, and their interaction were the predictor variables. Linear regression was conducted for the continuous variable, insertion attempts, and binary logistic regression was run for the two dichotomous variables, backwall puncture and successful venipuncture. The analysis was conducted with the entire dataset to determine the significance of the interaction term and then the dataset was split and a follow-up analysis was run within each gender to determine if one had a stronger significant relationship than the other. Assumptions were checked and outliers were found for all three variables, determined true outliers not due to measurement error, and kept in for the analysis. All other assumptions were met for all regression models. A power hoc power calculation was conducted based on the effect size from found the chi-square statistic used in the GEE and found that the statistical power ranged from 0.694 to 0.975.

Participants

One hundred and seventy-three residents (N women =61, N men =112) from two residency cohorts (N 2021  = 72 and N 2022  = 101) and two medical centers (N M1 =103, N M2 =70) were recruited from the new resident bootcamp over a span of two summers with trainings running from June through September, see Table 1 for participant breakdown. The sample size used in the study was a convenience sample dictated by the number of residents in training and willing to participate each year. While the bootcamp was mandatory for all residents, participation in this research was voluntary and only residents who provided informed consent were included in this study. There were no major demographic differences between the two training years.

RQ1: Is there a gender-confidence gap in CVC SBT pre- or post- training?

A Bonferroni adjustment was applied to account for repeated measures on the CLSE survey [ 52 ], resulting in an family wise error rate ( α ) of 0.0035. GEE results indicated that gender was a significant predictor with women ranking lower for 9 of the 14 variables (see Fig.  3 for mean values) including using tactile feedback during placement ( Wald χ 2  = 18.814, p  < .001), using tactile feedback to identify the vessel ( Wald χ 2  = 20.045, p  < .001), advancing the introducer needle ( Wald χ 2  = 11.053, p  < .001), modifying the needle trajectory ( Wald χ 2  = 12.492, p  < .001), identifying the needle in location ( Wald χ 2  = 8.733, p  = .003), using tactile feedback to guide the needle ( Wald χ 2  = 14.216, p  < .001), placing the needle in one attempt ( Wald χ 2  = 17.888, p  < .001), placing the needle in multiple attempts ( Wald χ 2  = 9.314, p  = .002), and conducting the entire procedure without mistakes ( Wald χ 2  = 9.975, p  = .002), aligning with our hypothesis. Parameter estimates for the nine CLSE items where gender was a significant predictor indicated that a resident who identified as a woman was more likely to rate themselves lower than their men counterparts, see Table  1 . Additionally, the interaction between self-efficacy type (pre or post) and gender was significant for conducting the entire procedure without mistakes ( Wald χ 2  = 12.350, p  < .001) meaning that the impact of gender on this variable varied based on the test condition, though gender was not a significant predictor for this variable ( p  = .004). Positive parameter values [0.987(0.2822), <  0.001] for women indicate that identifying as a woman impacted pre-CLSE more than post-CLSE for this variable, though both were lower than for men. Training year did not have a significant impact on any of the 14 variables. See Table  1 for full significant results. For all 14 CLSE questions, there were significant increases pre- to post-test for both genders ( p  < .001), aligning with our hypothesis. These results indicate that the gender-confidence gap is evident in CVC training both before and after exposure to SBT.

figure 3

Self efficacy means of men and women post training for significant values

A Bonferroni adjustment was applied to account for repeat testing for the three performance variables [ 52 ], resulting in an family wise error rate ( α ) of 0.017. For backwall puncture, 95.5% of men avoided backwall puncture and 95.1% of women avoided backwall puncture, with Fisher’s exact test finding no statistically significant difference ( p  = 1.00). For successful venipuncture, 84.8% of men successfully accessed the vein and 90.1% of women successfully accessed the vein, with Fisher’s exact test finding no significant difference ( p  = .360). Finally, for the number of insertion attempts, a Mann-Whitney U-test found no significant differences ( U  = 271.94, z  = 0.507, p  = .612) between men (M = 1.78, SD = 1.324) and women (M = 1.75, SD = 1.633). These results support our hypothesis that no gender differences in performance in CVC SBT exist post-training.

A Bonferroni adjustment was applied to account for repeat testing for the three performance variables [ 52 ], resulting in an family wise error rate ( α ) of 0.017. For insertion attempts, the linear regression model was unable to significantly predict performance for the whole population based on gender, self-efficacy, and their interaction F (3,169) = 2.719, p  = .046. When divided by gender, the linear regression models for insertion attempts were unable to significantly predict performance for men, F (1,109) = 4.214, p  = .042, or for women, F [ 1 , 53 ] = 3.308, p  = .074 based on self-efficacy. For backwall puncture, the binary logistic regression model was not significant for the whole population χ 2 [ 3 ] = 0.720, p  = .869. When divided by gender, the binary logistic regression models for backwall puncture were not significant for men, χ 2 [ 1 ] = 0.570, p  = .450 or for women χ 2 [ 1 ] = 0.604, p  = .437. For successful insertion, the binary logistic regression model was not significant for the whole population χ 2 [ 3 ] = 4.306, p  = .230. When divided by gender, the binary logistic regression models for successful insertion were not significant for men, χ 2 [ 1 ] = 1.349, p  = .245 or for women χ 2 [ 1 ] = 0.2.473, p  = .116. The results indicating that no models were able to significantly predict performance based on the aggregated post CLSE support our hypothesis that neither men nor women would be able to accurately assess their performance based on confidence. These results indicate that the Dunning-Kruger effect may exist for both genders in CVC SBT.

The objective of this study was to compare self-efficacy and DHRT performance between men and women residents to assess for the gender confidence gap and the Dunning-Kruger effect. The main findings were that the gender confidence gap was evident for nine of the variables on the CLSE, there were no significant differences in CVC SBT performance between men and women, and initial evidence of the Dunning-Kruger effect was found.

These results on self-efficacy support previous literature in obstetrics [ 39 ] and general and plastic surgery [ 20 ] that found that women had lower self-efficacy than men in training despite there being no performance differences [16–17].Of the variables that women had lower self-efficacy for, three were related to using tactile feedback, five were related to using the needle, and one was related to overall procedural confidence. Importantly, despite having lower self-efficacy for items related to the use of the needle, there were no actual differences in the ability to achieve successful venipuncture, avoid backwall puncture, or reduce insertion attempts on the DHRT, making these lower self-efficacy ratings unfounded and aligning with previous studies [ 20 , 39 , 54 ].

Previous literature has also indicated the existence of the Dunning-Kruger effect in medical training for decades [ 10 ] positing that new medical trainees are unable to accurately assess their performance [ 11 ] regardless of gender [14], [34]. Our results align with previous literature in this area, finding that neither men nor women were able to accurately assess their performance. Regardless of inaccuracies, women still rated themselves lower on the majority of items on the post training CLSE compared to men, suggesting a potential variance in self-rating between genders. To fully explain this finding, a follow-up study should be conducted with a larger, more balanced sample size. Both the Dunning-Kruger effect and the gender confidence gap are important in medical education because they can impact a physician’s performance in clinical practice [ 14 , 20 ]. As such, this study adds to the literature that these phenomena may be occurring early in training.

This study focused specifically on self-efficacy and performance for CVC SBT based on training with the DHRT, but the results are reflective of a greater problem with the gender-confidence gap in residency training. Understanding the gender-confidence gap in medicine and why women underestimate their abilities more than men do is pertinent because of the impact that low self-efficacy can gave on learning, achievement, persistence [ 55 ], well-being and burnout [ 56 ]. Leaving confidence discrepancies unaddressed could lead to increased challenges as women progress in their fields [ 56 , 57 ], and more women leaving the field [ 58 ].

Some programs have started to highlight resources or created tasks forces for physicians to utilize to help fix gender disparities in medicine [ 59 ], including The American Medical Association (AMA) [ 60 ]. More studies should also be conducted exploring the gender-confidence gap in simulation training for other procedures to better understand how this impacts medical education and mastery-based learning. The field of medical education would benefit greatly by lessening the gender-confidence gap for trainees due to the relationships between confidence and competence in the medical profession [ 40 ].

There are several limitations in this study that must be addressed. One limitation of this study was that we did not evaluate gender and race/ethnicity interactions due to the limited sample size in race/ethnicity. Another limitation is that the dataset lacked adequate representation for genders other than men and women and therefore we were only able to study gender as binary. Future work should explore larger sample sizes with more demographic representation to analyze self-efficacy on a larger scale. In addition, this study contained data from only two U.S. medical centers that integrated the DHRT training. As such, the generalizability of the findings is needed across training systems and across institutions. Another limitation of this work is the duality of the Dunning-Kruger effect meaning that it is impossible to know from this study if women were rating themselves lower than men for self-efficacy because they were truly less confident, or if it could be because they were learning more and more aware of where their skills lacked. To this point, we also cannot tell from this study if men overestimated their abilities. To validate this, future work should include a longitudinal study to follow their progression of learning throughout training and to fully understand if and how these phenomena impact clinical skill transfer and patient care. Finally, the system flow of the DHRT changed between training years modifying how many trials each person needed to complete which may have contributed to changes in self-efficacy between years. As such, this should be explored in future work.

While medical education has reached gender parity, the gender-confidence gap and the Dunning-Kruger effect are still found to impact self-efficacy at the residency level for SBT. We found that women were significantly more likely to have lower self-efficacy for half of the CLSE survey items, there were no performance differences between men and women on the DHRT, and no performance and self-efficacy correlation. These results indicate an increased need to evaluate gender-differences and the Dunning-Kruger effect in resident SBT. Future work should be conducted to further evaluate these findings.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

National Institutes of Health

Central Venous Catheterization

Ultrasound Guided Central Venous Catheterization

Dynamic Haptic Robotic Trainer

Simulation-Based Training

Central Line Self-Efficacy Survey

Campbell KE, Mccammon HJ. Elizabeth Blackwell’s heirs: women as physicians in the United States, 1880–1920. Work Occup. 2005;32(3):290–318.

Article   Google Scholar  

AAMC, Total US. Medical School Enrollment by Race/Ethnicity (Alone) and Gender 2019–2024. 2023.

Zern N, Shalhub S, Wood D, Calhoun K. Association of sex with perceived career barriers among surgeons. J Bone Jt Surg - Am Vol. 2019;154(12):1155–7.

Google Scholar  

Colbert-Getz JM, Fleishman C, Jung J, Shilkofski N. How do gender and anxiety affect students’ self-assessment and actual performance on a high-stakes clinical skills examination? Acad Med. 2013;88(1):44–8.

Kay K, Shipman C. The confidence gap. The Atlantic [Internet]. 2014;56–66. http://www.washingtonpost.com/wp-dyn/content/article/2009/07/10/AR2009071002358.html .

Artino AR. Academic self-efficacy: from educational theory to instructional practice. Perspect Med Educ. 2012;1(2):76–85.

Morony S, Kleitman S, Lee YP, Stankov L. Predicting achievement: Confidence vs self-efficacy, anxiety, and self-concept in Confucian and European countries. Int J Educ Res [Internet]. 2013;58:79–96. https://doi.org/10.1016/j.ijer.2012.11.002 .

Klassen RM, Klassen JRL. Self-efficacy beliefs of medical students: a critical review. Perspect Med Educ. 2018;7(2):76–82.

Zwaan L, Hautz WE. Bridging the gap between uncertainty, confidence and diagnostic accuracy: calibration is key. BMJ Qual Saf. 2019;28(5):352–5.

Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J Pers Soc Psychol. 1999;77(6):1121–34.

Sawdon M, Finn G. The unskilled and unaware effect is linear in a real-world setting. J Anat. 2014;224(3):279–85.

Gude T, Finset A, Anvik T, Bærheim A, Fasmer OB, Grimstad H, et al. Do medical students and young physicians assess reliably their self-efficacy regarding communication skills? A prospective study from end of medical school until end of internship. BMC Med Educ. 2017;17(1):1–7.

Rahmani M. Medical trainees and the Dunning–Kruger Effect: when they don’t know what they don’t know. J Grad Med Educ. 2020;34:532–4.

Rubin A, Froustis E. How the Dunning-Kruger Effect impairs Professional Judgement in High-risk professions. J Student Res. 2023;12(4):1–10.

Prozesky DR, Molwantwa MC, Nkomazana O, Kebaetse MB. Intern preparedness for the CanMEDS roles and the Dunning-Kruger effect: a survey. BMC Med Educ. 2019;19(1):1–13.

Vajapey SP, Weber KL, Samora JB. Confidence gap between men and women in Medicine: a systematic review. Educ Pract Manag. 2020;31(5):494–502.

Barr J, Graffeo CS. Procedural Experience and Confidence Among Graduating Medical Students. J Surg Educ [Internet]. 2016;73(3):466–73. https://doi.org/10.1016/j.jsurg.2015.11.014 .

Lind DS, Rekkas S, Bui V, Lam T, Beierle E, Copeland EM. Competency-based student self-assessment on a surgery rotation. J Surg Res. 2002;105(1):31–4.

Kwasny L, Shebrain S, Munene G, Sawyer R. Is there a gender bias in milestones evaluations in general surgery residency training? Am J Surg [Internet]. 2021;221(3):505–8. https://doi.org/10.1016/j.amjsurg.2020.12.020 .

Minter RM, Gruppen LD, Napolitano KS, Gauger PG. Gender differences in the self-assessment of surgical residents. Am J Surg. 2005;189(6):647–50.

Watson RS, Borgert AJ, Heron CTO, Kallies KJ, Sidwell RA, Mellinger JD et al. A Multicenter Prospective Comparison of the Accreditation Council for Graduate Medical Education Milestones: Clinical Competency Committee vs. Resident. J Surg Educ [Internet]. 2017;74(6):e8–14. https://doi.org/10.1016/j.jsurg.2017.06.009 .

Vozenilek J, Huff JS, Reznek M, Gordon JA. See one, do one, teach one: Advanced technology in medical education. Acad Emerg Med. 2004;11(11):1149–54.

So HY, Chen PP, Wong GKC, Chan TTN. Simulation in medical education. J R Coll Physicians Edinb. 2019;49(1):52–7.

Kunkler K. The role of medical simulation: an overview. Int J Med Robot Comput Assist Surg. 2006;2(April):203–10.

Cooper JB, Taqueti VR. A brief history of the development of mannequin simulators for clinical education and training. Postgrad Med J. 2004;84(997):563–70.

Pinar G. An Educational Revolution and innovative technologies: the role of Simulation. Creat Educ. 2020;11(11):2218–32.

Okuda Y, Bond W, Bonfante G, McLaughlin S, Spillane L, Wang E, et al. National growth in simulation training within emergency medicine residency programs, 2003–2008. Acad Emerg Med. 2008;15(11):1113–6.

Barsuk JH, McGaghie WC, Cohen ER, Balachandran JS, Wayne DB. Use of simulation-based mastery learning to improve the quality of central venous catheter placement in a medical intensive care unit. J Hosp Med. 2009;4(7):397–403.

Kattan E, De La Fuente R, Putz F, Vera M, Corvetto M, Inzunza O, et al. Simulation-based mastery learning of bronchoscopy-guided percutaneous dilatational tracheostomy: competency acquisition and skills transfer to a cadaveric model. Simul Healthc. 2021;16(3):157–62.

Nayar SK, Musto L, Baruah G, Fernandes R, Bharathan R. Self-Assessment of Surgical Skills: A Systematic Review. J Surg Educ [Internet]. 2020;77(2):348–61. https://doi.org/10.1016/j.jsurg.2019.09.016 .

Downing SM, Yudkowsk R. Assessment in health professions education. Assessment in Health professions Education. Routledge; 2009. pp. 1–317.

Hayat AA, Shateri K, Amini M, Shokrpour N. Relationships between academic self-efficacy, learning-related emotions, and metacognitive learning strategies with academic performance in medical students: a structural equation model. BMC Med Educ. 2020;20(1):1–11.

Talsma K, Schüz B, Schwarzer R, Norris K. I believe, therefore I achieve (and vice versa): A meta-analytic cross-lagged panel analysis of self-efficacy and academic performance. Learn Individ Differ [Internet]. 2018;61(April 2017):136–50. https://doi.org/10.1016/j.lindif.2017.11.015 .

Paskins Z, Peile E. Final year medical students’ views on simulation-based teaching: a comparison with the best evidence medical education systematic review. Med Teach. 2010;32(7):569–77.

Stroben F, Schröder T, Dannenberg KA, Thomas A, Exadaktylos A, Hautz WE. A simulated night shift in the emergency room increases students ’ self-efficacy independent of role taking over during simulation. BMC Med Educ [Internet]. 2016;1–7. https://doi.org/10.1186/s12909-016-0699-9 .

Kerins J, McCully E, Stirling SA, Smith SE, Tiernan J, Tallentire VR. The impact of simulation-based mastery learning, booster session timing and clinical exposure on confidence in intercostal drain insertion: a survey of internal medicine trainees in Scotland. BMC Med Educ [Internet]. 2022;22(1):1–9. https://doi.org/10.1186/s12909-022-03654-7 .

Schwab K, Friedman J, Lazarus M, Williams J. Preparing residents for Emergent Vascular Access: the comparative effectiveness of central venous and Intraosseous Catheter Simulation-based training. Int J Crit Care Emerg Med. 2019;5(1):1–8.

Amacher SA, Schumacher C, Legeret C, Tschan F, Semmer NK, Marsch S, et al. Influence of gender on the performance of cardiopulmonary rescue teams: a randomized, prospective Simulator Study. Crit Care Med. 2017;45(7):1184–91.

Fritz J, Montoya A, Lamadrid-figueroa H, Flores-pimentel D, Walker D, Treviño-siller S, et al. Training in obstetric and neonatal emergencies in Mexico: effect on knowledge and self-efficacy by gender, age, shift, and profession. BMC Med Educ. 2020;20(97):1–10.

Gottlieb M, Chan TM, Zaver F, Ellaway R. Confidence-competence alignment and the role of self-confidence in medical education: a conceptual review. Med Educ. 2022;56(1):37–47.

McGee DC, Gould MK. Preventing complications of central venous catheterization. N Engl J Med. 2003;348(12):1123–33.

Kusminsky RE. Complications of central venous catheterization. J Am Coll Surg. 2007;204(4):681–96.

Soffler MI, Hayes MM, Smith C. Central venous catheterization training: current perspectives on the role of simulation. Adv Med Educ Pract [Internet]. 2018;9–395. https://doi.org/10.2147/AMEP.S142605 .

Pepley DF, Gordon AB, Yovanoff MA, Mirkin KA, Miller SR, Han DC et al. Training Surgical Residents With a Haptic Robotic Central Venous Catheterization Simulator. J Surg Educ [Internet]. 2017;74(6):1066–73. https://doi.org/10.1016/j.jsurg.2017.06.003 .

Parmar S, Parikh S, Mehta H. Anatomical variations of the internal jugular vein in relation to carotid artery: an ultrasound study. Int J Med Sci Public Heal. 2013;2(2):223.

Yovanoff M, Pepley D, Mirkin K, Moore J, Han D, Miller S. Improving medical education: simulating changes in patient anatomy using dynamic haptic feedback. Proc Hum Factors Ergon Soc. 2016;603–7.

Yovanoff MA, Chen HE, Pepley DF, Mirkin KA, Han DC, Moore JZ et al. Investigating the Effect of Simulator Functional Fidelity and Personalized Feedback on Central Venous Catheterization Training. J Surg Educ [Internet]. 2018;75(5):1410–21. https://doi.org/10.1016/j.jsurg.2018.02.018 .

Gonzalez-Vargas JM, Tzamaras HM, Martinez J, Brown DC, Moore JZ, Han DC et al. Going the (social) distance: Comparing the effectiveness of online versus in-person Internal Jugular Central Venous Catheterization procedural training. Am J Surg [Internet]. 2022;224(3):903–7. https://doi.org/10.1016/j.amjsurg.2021.12.006 .

Gonzalez-vargas JM, Brown DC, Moore JZ, Han DC, Sinz EH, Sonntag CC, et al. OBJECTIVE ASSESSMENT METRICS FOR CENTRAL LINE SIMULATORS: AN EXPLORATION OF CAUSAL FACTORS. Proc Hum Factors Ergon Soc Annu Meet. 2020;64(1):2008–12.

IJsselmuiden CB, Faden RR. Complications and failures of subclavian-vein catheterization. N Engl J Med. 1992;331(26):1735–8.

De Cassai A, Geraldini F, Pasin L, Boscolo A, Zarantonello F, Tocco M et al. Safety in training for ultrasound guided internal jugular vein CVC placement: a propensity score analysis. BMC Anesthesiol [Internet]. 2021;21(1):1–6. https://doi.org/10.1186/s12871-021-01460-0 .

Schober P, Thomas V. Adjustments for multiple testing in Medical Research. Anesth Anaglesia. 2020;130(1):2020.

Gann M, Sardi A. Improved results using ultrasound guidance for central venous access. Am Surg. 2003;69(12):1104–7.

Reder SR, Rohou A, Keric N, Beiser KU, Othman AE, Abello Mercado MA et al. Gender differences in self-assessed performance and stress level during training of basic interventional radiology maneuvers. Eur Radiol [Internet]. 2023;(0123456789). https://doi.org/10.1007/s00330-023-09993-3 .

Nabavi RT. Bandura ’ s Social Learning Theory & Social Cognitive Learning Theory. Theor Dev Psychol Title. 2014;(January 2012):24.

Milam LA, Cohen GL, Mueller C, Salles A. The Relationship Between Self-Efficacy and Well-Being Among Surgical Residents. J Surg Educ [Internet]. 2019;76(2):321–8. https://doi.org/10.1016/j.jsurg.2018.07.028 .

Stephens EH, Heisler CA, Temkin SM, Miller P. The current status of women in surgery: how to affect the future. JAMA Surg. 2020;155(9):876–85.

Moak TN, Cress PE, Tenenbaum M, Casas LA. The leaky pipeline of women in plastic surgery: embracing diversity to close the gender disparity gap. Aesthetic Surg J. 2020;40(11):1241–8.

McKinley SK, Wang LJ, Gartland RM, Westfal ML, Costantino CL, Schwartz D, et al. Yes, I’m the doctor: one Department’s Approach to assessing and addressing gender-based discrimination in the Modern Medical Training Era. Acad Med. 2019;94(11):1691–8.

Advancing Gender Equity in Medicine: Resources for Physicians [Internet], American Medical Association. 2022. https://www.ama-assn.org/delivering-care/health-equity/advancing-gender-equity-medicine-resources-physicians .

Ishizuka M, Nagata H, Takagi K, Kubota K. Right internal jugular vein is recommended for central venous catheterization. J Investig Surg. 2010;23(2):110–4.

Hodzic S, Golic D, Smajic J, Sijercic S, Umihanic S, Umihanic S. Complications related to insertion and use of central venous catheters (CVC). Med Arch (Sarajevo Bosnia Herzegovina). 2014;68(5):300–3.

Yovanoff M, Pepley D, Mirkin K, Moore J, Han D, Miller S. Personalized learning in medical education: Designing a user interface for a dynamic haptic robotic trainer for central venous catheterization. Proc Hum Factors Ergon Soc. 2017;2017–Octob2014:615–9.

Blaivas M, Adhikari S. An unseen danger: frequency of posterior vessel wall penetration by needles during attempts to place internal jugular vein central catheters using ultrasound guidance. Crit Care Med. 2009;37(8):2345–9.

Graham A, Ozment C, Tegtmeyer K, Lai S, Braner D. Central Venous Catheterization [Internet]. 2007. https://www.nejm.org/doi/full/ https://doi.org/10.1056/NEJMvcm055053 .

Download references

Acknowledgements

Not applicable.

This work was supported by the National Institutes of Health (NIH) under Award Number RO1HL127316. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

Authors and affiliations.

Penn State, Department of Industrial Engineering, State College, 307 Engineering Design and Innovation Building, University Park, PA, 16801, USA

Haroula Tzamaras & Scarlett Miller

WVU Critical Care and Trauma Institute, Morgantown, WV, USA

Elizabeth Sinz

General Internal Medicine Cedars-Sinai, Los Angeles, CA, USA

Michael Yang & Phillip Ng

Penn State Department of Mechanical Engineering, State College, University Park, PA, USA

Jason Moore

You can also search for this author in PubMed   Google Scholar

Contributions

HT performed study design, data collection, and analysis and wrote this manuscriptLS provided study design support, data collection support and manuscript editingMY provided data collection supportPN provided data collection supportJM provided study design support and data collection support SM acted as the PI, provided study design support, data collection support and manuscript editing.

Corresponding author

Correspondence to Scarlett Miller .

Ethics declarations

Ethics approval and consent.

All experimental protocols used in this study were approved by the Institutional Review Board (IRB) at Penn State University. All participants in this study were over 18 and provided informed consent for this study, as per the IRB approved protocol. The participants elected to participate in this study as part of their required residency training. Before giving consent, residents were informed that the study was to “compare the learning gains on central venous catheterization (CVC) insertion procedures of first year residents” the data would be used to “guide the development of better training systems for central venous catheterization.” The decision to use existing data to analyze self-efficacy differences between men and women was made post hoc.”

Consent for publication

Competing interests.

Two of the authors (Drs. Miller and Moore) are listed as inventors for the DHRT system on a patent (United States Patent No. US 11,373,553 B2, approved June 28, 2022). Additionally, Drs. Miller and Moore own equity in Medulate, which has an interest in this project. Dr. Miller and Moore’s ownership in this company has been reviewed by the University’s Individual Conflict of Interest Committee and is currently being managed by the University.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: CVC Details

CVC is most commonly conducted with ultrasound guidance into the right internal jugular vein (US-IJCVC) [ 53 , 61 ], and requires a series of bi-manual steps to complete the procedure. These steps include manipulating an ultrasound probe in one hand while inserting a needle into the internal jugular vein, while avoiding other anatomy like the carotid artery [ 62 ]. Once the vein is accessed, a catheter can be inserted; however, most training for CVC focuses just on the initial needle insertion.

Appendix B: DHRT details

Each time the trainee uses the system, they are presented with a graphical user interface (GUI) with personalized feedback on their performance, including number of insertion attempts and where to improve if insertion was not successful [ 63 ]. The focus of training on the DHRT is achieving successful venipuncture by inserting the needle into the vein in one attempt without puncturing through the backwall of the vein. The DHRT aims to reduce the likelihood of mechanical complications that are often caused by human error and training deficits [ 41 ] such as puncturing the vein backwall [ 64 ] or puncturing the carotid artery [ 41 , 51 ].

Appendix C: description of online training

The eight online video modules trained residents on: [ 1 ] introduction to CVC, [ 2 ] an overview of CVC steps as defined by the New England Journal of Medicine [ 3 , 65 ] an overview of the benefits and risks of each access site for CVC, [ 4 ] best practices to use CVC equipment, [ 5 ] rapid central vein assessment with ultrasound, [ 6 ] mechanical procedures for troubleshooting, [ 7 ] complication types and how to identify them, and [ 8 ] monitoring the patient and removing the catheter. To pass the online training, residents needed to receive a post-training assessment score of 80% or higher, multiple attempts were allowed.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .

Reprints and permissions

About this article

Cite this article.

Tzamaras, H., Sinz, E., Yang, M. et al. Competence over confidence: uncovering lower self-efficacy for women residents during central venous catheterization training. BMC Med Educ 24 , 923 (2024). https://doi.org/10.1186/s12909-024-05747-x

Download citation

Received : 22 March 2024

Accepted : 04 July 2024

Published : 26 August 2024

DOI : https://doi.org/10.1186/s12909-024-05747-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Gender-confidence gap
  • Medical simulation
  • Central venous catheterization

BMC Medical Education

ISSN: 1472-6920

hypothesis testing in regression

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

land-logo

Article Menu

hypothesis testing in regression

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Influence of self-identity and social identity on farmers’ willingness for cultivated land quality protection.

hypothesis testing in regression

1. Introduction

2. theoretical analysis and research hypotheses, 2.1. self-identity and farmers’ wcqp, 2.1.1. impact of cognitive identity on farmers’ wcqp, 2.1.2. impact of emotional identity on farmers’ wcqp, 2.1.3. impact of behavioral identity on farmers’ wcqp, 2.2. social identity and farmers’ wcqp, 2.3. self-identity, social identity, and farmers’ wcqp, 3. materials and methods, 3.1. data source, 3.2. variable selection, 3.2.1. dependent variable, 3.2.2. core independent variables, 3.2.3. control variables, 3.3. research methods, 3.3.1. structural equation modeling (sem), 3.3.2. instrumental variable method, 4.1. descriptive statistics of the sample, 4.2. reliability and validity tests, 4.3. model fit tests and estimation results, 4.4. endogenous treatment, 4.4.1. instrumental variable estimation results, 4.4.2. propensity score matching (psm), 4.5. robustness checks, 4.6. further analysis of the relationship between emotional identity, social identity, and farmers’ wcqp, 5. discussion, 6. conclusions and policy recommendations, author contributions, data availability statement, acknowledgments, conflicts of interest.

  • Xu, H.; Fan, Z.; Ahmad, F.; Zhang, D. Exploring the Ecological Protection Impacts of Cultivated Land Transfer: Explanation Based on Fertilizers and Pesticides. Ecol. Indic. 2023 , 154 , 110681. [ Google Scholar ] [ CrossRef ]
  • Marques, M.; dos Anjos, L.; Delgado, A. Land Recovery and Soil Management with Agroforestry Systems. Span. J. Soil Sci. 2022 , 12 , 10457. [ Google Scholar ] [ CrossRef ]
  • Craig, E.; Lowe, K.; Akerman, G.; Dawson, J.; May, B.; Reaves, E.; Lowit, A. Reducing the Need for Animal Testing While Increasing Efficiency in a Pesticide Regulatory Setting: Lessons from the EPA Office of Pesticide Programs’ Hazard and Science Policy Council. Regul. Toxicol. Pharmacol. 2019 , 108 , 104481. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Ouyang, W.; Lian, Z.; Hao, X.; Gu, X.; Hao, F.; Lin, C.; Zhou, F. Increased Ammonia Emissions from Synthetic Fertilizers and Land Degradation Associated with Reduction in Arable Land Area in China. Land Degrad. Dev. 2018 , 29 , 3928–3939. [ Google Scholar ] [ CrossRef ]
  • Finger, R.; Schneider, K.; Candel, J.; Möhring, N. Europe Needs Better Pesticide Policies to Reduce Impacts on Biodiversity. Food Policy 2024 , 125 , 102632. [ Google Scholar ] [ CrossRef ]
  • McGuire, J.M.; Morton, L.W.; Arbuckle, J.G.; Cast, A.D. Farmer Identities and Responses to the Social–Biophysical Environment. J. Rural Stud. 2015 , 39 , 145–155. [ Google Scholar ] [ CrossRef ]
  • van Dam, Y.K.; Fischer, A.R.H. Buying Green Without Being Seen. Environ. Behav. 2015 , 47 , 328–356. [ Google Scholar ] [ CrossRef ]
  • Dent, H.; Ward, T. An Enactive View of Identity Transformation: Implications for Correctional Rehabilitation. Aggress. Violent Behav. 2023 , 69 , 101810. [ Google Scholar ]
  • Zhu, X.; Wang, G. Impact of Agricultural Cooperatives on Farmers’ Collective Action: A Study Based on the Socio-Ecological System Framework. Agriculture 2024 , 14 , 14010096. [ Google Scholar ] [ CrossRef ]
  • Schulte, M.; Bamberg, S.; Rees, J.; Rollin, P. Social Identity as a Key Concept for Connecting Transformative Societal Change with Individual Environmental Activism. J. Environ. Psychol. 2020 , 72 , 101525. [ Google Scholar ]
  • Xu, Q.; Perkins, D.; Chow, J. Sense of Community, Neighboring, and Social Capital as Predictors of Local Political Participation in China. Am. J. Community Psychol. 2010 , 45 , 259–271. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lundh, L.-G.; Foster, L. Embodiment as a Synthesis of Having a Body and Being a Body, and Its Role in Self-Identity and Mental Health. New Ideas Psychol. 2024 , 74 , 101083. [ Google Scholar ] [ CrossRef ]
  • Rise, J.; Sheeran, P.; Sommer Hukkelberg, S. The Role of Self-identity in the Theory of Planned Behavior: A Meta-Analysis. J. Appl. Soc. Psychol. 2010 , 40 , 1085–1105. [ Google Scholar ] [ CrossRef ]
  • Stets, J.E.; Burke, P.J. A Sociological Approach to Self and Identity. In Handbook of Self and Identity ; The Guilford Press: New York, NY, USA, 2003; pp. 128–152. [ Google Scholar ]
  • Rowe, I.; Marcia, J.E. Ego Identity Status, Formal Operations, and Moral Development. J. Youth Adolesc. 1980 , 9 , 87–99. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Stets, J.; Burke, P. Identity Theory and Social Identity Theory. Soc. Psychol. Q. 2000 , 63 , 224–237. [ Google Scholar ] [ CrossRef ]
  • Ryan, R.M.; Deci, E.L. Chapter 2—When Rewards Compete with Nature: The Undermining of Intrinsic Motivation and Self-Regulation. In Intrinsic and Extrinsic Motivation ; Sansone, C., Harackiewicz, J.M., Eds.; Academic Press: San Diego, CA, USA, 2000; pp. 13–54. [ Google Scholar ]
  • Lequin, S.; Grolleau, G.; Mzoughi, N. Harnessing the Power of Identity to Encourage Farmers to Protect the Environment. Environ. Sci. Policy 2019 , 93 , 112–117. [ Google Scholar ] [ CrossRef ]
  • Vande Velde, F.; Hudders, L.; Cauberghe, V.; Claerebout, E. Changing Farmers’ Behavior Intention with a Hint of Wit: The Moderating Influence of Humor on Message Sidedness. J. Environ. Psychol. 2018 , 56 , 97–103. [ Google Scholar ] [ CrossRef ]
  • Nazu, S.; Saha, S.; Hossain, M.; Haque, S.; Khan, M. Willingness to Pay for Adopting Conservation Tillage Technologies in Wheat Cultivation: Policy Options for Small-Scale Farmers. Environ. Sci. Pollut. Res. 2022 , 29 , 63458–63471. [ Google Scholar ] [ CrossRef ]
  • Bonasia, M.; De Simone, E.; D’Uva, M.; Napolitano, O. Environmental Protection and Happiness: A Long-Run Relationship in Europe. Environ. Impact Assess. Rev. 2022 , 93 , 106704. [ Google Scholar ] [ CrossRef ]
  • Tiba, A.; Manea, L. The Embodied Simulation Account of Cognition in Rational Emotive Behaviour Therapy. New Ideas Psychol. 2018 , 48 , 12–20. [ Google Scholar ] [ CrossRef ]
  • Weed, E.A.; Smith-Lovin, L. Theory in Sociology of Emotions. In Handbook of Contemporary Sociological Theory ; Abrutyn, S., Ed.; Springer International Publishing: Cham, Switzerland, 2016; pp. 411–433. [ Google Scholar ]
  • Vishkin, A.; Tamir, M. Emotion Norms Are Unique. Affect. Sci. 2023 , 4 , 453–457. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Burnham, E.; Zabel, S.; Navarro-Villarroel, C.; Ermakov, D.; Castro, M.; Neaman, A.; Otto, S. Enhancing Farmers’ Soil Conservation Behavior: Beyond Soil Science Knowledge. Geoderma 2023 , 437 , 116583. [ Google Scholar ] [ CrossRef ]
  • Brown, É. Kantian Constructivism and the Normativity of Practical Identities. Dialogue-Can. Philos. Rev. 2018 , 57 , 571–590. [ Google Scholar ] [ CrossRef ]
  • Savari, M.; Damaneh, H.; Damaneh, H.; Cotton, M. Integrating the Norm Activation Model and Theory of Planned Behaviour to Investigate Farmer Pro-Environmental Behavioural Intention. Sci. Rep. 2023 , 13 , 5584. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Scheepers, D.; Ellemers, N. Social Identity Theory. In Social Psychology in Action: Evidence-Based Interventions from Theory to Practice ; Sassenberg, K., Vliek, M.L.W., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 129–143. [ Google Scholar ]
  • Xin, Q.; Hao, S.; Xiaoqin, W.; Jiali, P. Brain Source Localization and Functional Connectivity in Group Identity Regulation of Overbidding in Contest. Neuroscience 2024 , 541 , 101–117. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Syropoulos, S.; Markowitz, E. Responsibility towards Future Generations Is a Strong Predictor of Proenvironmental Engagement. J. Environ. Psychol. 2024 , 93 , 102218. [ Google Scholar ] [ CrossRef ]
  • Tay, M.-J.; Ng, T.-H.; Lim, Y.-S. Fostering Sustainable Agriculture: An Exploration of Localised Food Systems through Community Supported Agriculture. Environ. Sustain. Indic. 2024 , 22 , 100385. [ Google Scholar ] [ CrossRef ]
  • Akerlof, G.; Kranton, R. Economics and Identity. Q. J. Econ. 2000 , 115 , 715–753. [ Google Scholar ] [ CrossRef ]
  • Burke, J.; Running, K. Role Identities and Pro-Environmental Behavior among Farmers. Hum. Ecol. Rev. 2019 , 25 , 3–22. [ Google Scholar ] [ CrossRef ]
  • Kiral Ucar, G.; Gezici Yalcin, M.; Özdemir Planalı, G.; Reese, G. Social Identities, Climate Change Denial, and Efficacy Beliefs as Predictors of pro-Environmental Engagements. J. Environ. Psychol. 2023 , 91 , 102144. [ Google Scholar ] [ CrossRef ]
  • Song, B.; Robinson, G.M.; Zhou, Z. Agricultural Transformation and Ecosystem Services: A Case Study from Shaanxi Province, China. Habitat Int. 2017 , 69 , 114–125. [ Google Scholar ] [ CrossRef ]
  • Zhang, Q.; Li, F. Correlation between Land Use Spatial and Functional Transition: A Case Study of Shaanxi Province, China. Land Use Policy 2022 , 119 , 106194. [ Google Scholar ] [ CrossRef ]
  • Liu, H.; Zhou, Y. Farmers’ Cognition and Behavioral Response towards Cultivated Land Quality Protection in Northeast China. Sustainability 2018 , 10 , 1905. [ Google Scholar ] [ CrossRef ]
  • Wang, X. Exploration on the Level and Influencing Factors of Rural Teachers’ Self-Identification of Social Identity from the Perspective of Space—Based on the Machine Learning Model of SHAP Interpretation Method. Learn. Motiv. 2023 , 83 , 101895. [ Google Scholar ] [ CrossRef ]
  • Faghani, A.; Valizadeh, N.; Bijani, M.; Haghighi, N. Towards Participation in Pro-Environmental Activities: Application of Dual-Pathway Model of Collective Action. J. Agric. Sci. Technol. 2023 , 25 , 565–579. [ Google Scholar ]
  • Burton, R. The Influence of Farmer Demographic Characteristics on Environmental Behaviour: A Review. J. Environ. Manag. 2014 , 135 , 19–26. [ Google Scholar ] [ CrossRef ]
  • Tatlidil, F.F.; Boz, I.; Tatlidil, H. Farmers’ Perception of Sustainable Agriculture and Its Determinants: A Case Study in Kahramanmaras Province of Turkey. Environ. Dev. Sustain. 2009 , 11 , 1091–1106. [ Google Scholar ] [ CrossRef ]
  • Chin, W.W.; Marcolin, B.L.; Newsted, P.R. A Partial Least Squares Latent Variable Modeling Approach for Measuring Interaction Effects: Results from a Monte Carlo Simulation Study and an Electronic-Mail Emotion/Adoption Study. Inf. Syst. Res. 2003 , 14 , 189–217. [ Google Scholar ] [ CrossRef ]
  • Hair, J.F.; Sarstedt, M.; Ringle, C.M.; Mena, J.A. An Assessment of the Use of Partial Least Squares Structural Equation Modeling in Marketing Research. J. Acad. Mark. Sci. 2012 , 40 , 414–433. [ Google Scholar ] [ CrossRef ]
  • Dijkstra, T. Latent Variables and Indices: Herman Wold’s Basic Design and Partial Least Squares. In Handbook of Partial Least Squares. Springer Handbooks of Computational Statistics ; Springer: Berlin/Heidelberg, Germany, 2010; pp. 23–46. [ Google Scholar ]
  • Guo, H.; Rogers, S.; Li, J.; Li, C. Farmers to Urban Citizens? Understanding Resettled Households’ Adaptation to Urban Life in Shaanxi, China. Cities 2024 , 145 , 104667. [ Google Scholar ] [ CrossRef ]
  • Ye, J. Left-behind Elderly: Shouldering a Disproportionate Share of Production and Reproduction in Supporting China’s Industrial Development Introduction: The Issue of Left-behind Elderly People in the Context of China’s Modernization. J. Peasant Stud. 2017 , 44 , 971–978. [ Google Scholar ] [ CrossRef ]
  • Zhu, L.; Zhang, C.; Cai, Y. Varieties of Agri-Environmental Schemes in China: A Quantitative Assessment. Land Use Policy 2018 , 71 , 505–517. [ Google Scholar ] [ CrossRef ]
  • Marine, S.C.; Martin, D.A.; Adalja, A.; Mathew, S.; Everts, K.L. Effect of Market Channel, Farm Scale, and Years in Production on Mid-Atlantic Vegetable Producers’ Knowledge and Implementation of Good Agricultural Practices. Food Control 2016 , 59 , 128–138. [ Google Scholar ] [ CrossRef ]
  • Liu, J.; Fang, Y.; Wang, G.; Liu, B.; Wang, R. The Aging of Farmers and Its Challenges for Labor-Intensive Agriculture in China: A Perspective on Farmland Transfer Plans for Farmers’ Retirement. J. Rural Stud. 2023 , 100 , 103013. [ Google Scholar ] [ CrossRef ]
  • Ye, S.; Ren, S.; Song, C.; Du, Z.; Wang, K.; Du, B.; Cheng, F.; Zhu, D. Spatial Pattern of Cultivated Land Fragmentation in Mainland China: Characteristics, Dominant Factors, and Countermeasures. Land Use Policy 2024 , 139 , 107070. [ Google Scholar ] [ CrossRef ]
  • Alhebshi, S.; Hilary, S.; Safi, S.K.H.; Ali, H.I.; Cheikh Ismail, L.; Al Dhaheri, A.; Stojanovska, L. Validity and Reliability of the Arabic Version of the Three-Factor Eating Questionnaire-R18. Heliyon 2023 , 9 , e17623. [ Google Scholar ] [ CrossRef ]
  • Asaye, M.; Gelaye, K.; Matebe, Y.; Lindgren, H.; Erlandsson, K. Valid and Reliable Neonatal Near-Miss Assessment Scale in Ethiopia: A Psychometric Validation. Glob. Health Action 2022 , 15 , 2029334. [ Google Scholar ] [ CrossRef ]
  • Boudreau, M.-C.; Gefen, D.; Straub, D.W. Validation in Information Systems Research: A State-of-the-Art Assessment. MIS Q. 2001 , 25 , 1–16. [ Google Scholar ] [ CrossRef ]
  • Fornell, C.; Larcker, D.F. Evaluating Structural Equation Models with Unobservable Variables and Measurement Error. J. Mark. Res. 1981 , 18 , 39–50. [ Google Scholar ] [ CrossRef ]
  • Hu, L.; Bentler, P.M. Fit Indices in Covariance Structure Modeling: Sensitivity to Underparameterized Model Misspecification. Psychol. Methods 1998 , 3 , 424–453. [ Google Scholar ] [ CrossRef ]
  • Harrison, J.; Freeman, R. Is Organizational Democracy Worth the Effort? Acad. Manag. Exec. 2004 , 18 , 49–53. [ Google Scholar ]
  • Garrone, M.; Emmers, D.; Olper, A.; Swinnen, J. Jobs and Agricultural Policy: Impact of the Common Agricultural Policy on EU Agricultural Employment. Food Policy 2019 , 87 , 101744. [ Google Scholar ] [ CrossRef ]
  • Hébert, B.; Woodford, M. Neighborhood-Based Information Costs. Am. Econ. Rev. 2021 , 111 , 3225–3255. [ Google Scholar ] [ CrossRef ]
  • Xu, G.; Liu, Y.; Huang, X.; Xu, Y.; Wan, C.; Zhou, Y. How Does Resettlement Policy Affect the Place Attachment of Resettled Farmers? Land Use Policy 2021 , 107 , 105476. [ Google Scholar ] [ CrossRef ]
  • Zhang, C.; Song, Y. Road to the City: Impact of Land Expropriation on Farmers’ Urban Settlement Intention in China. Land Use Policy 2022 , 123 , 106432. [ Google Scholar ] [ CrossRef ]
  • Yin, Z.; Wang, R.; Wu, X. Financial Inclusion, Natural Disasters and Energy Poverty: Evidence from China. Energy Econ. 2023 , 126 , 106986. [ Google Scholar ] [ CrossRef ]
  • Li, H.; Chen, Y.; Chang, W. Place Attachment, Self-Efficacy, and Farmers’ Farmland Quality Protection Behavior: Evidence from China. Land 2023 , 12 , 1711. [ Google Scholar ] [ CrossRef ]
  • Abadie, A.; Imbens, G.W. Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica 2006 , 74 , 235–267. [ Google Scholar ] [ CrossRef ]
  • Lavuri, R.; Roubaud, D.; Grebinevych, O. Sustainable Consumption Behaviour: Mediating Role of pro-Environment Self-Identity, Attitude, and Moderation Role of Environmental Protection Emotion. J. Environ. Manag. 2023 , 347 , 119106. [ Google Scholar ] [ CrossRef ]
  • Shipley, N.; van Riper, C.; Stewart, W.; Chu, M.; Stedman, R.; Dolcos, F. Pride and Guilt as Place-Based Affective Antecedents to pro-Environmental Behavior. Front. Psychol. 2023 , 13 , 1084741. [ Google Scholar ] [ CrossRef ]
  • Carfora, V.; Caso, D.; Sparks, P.; Conner, M. Moderating Effects of Pro-Environmental Self-Identity on pro-Environmental Intentions and Behaviour: A Multi-Behaviour Study. J. Environ. Psychol. 2017 , 53 , 92–99. [ Google Scholar ] [ CrossRef ]
  • Van der Werff, E.; Steg, L.; Keizer, K. I Am What I Am, by Looking Past the Present The Influence of Biospheric Values and Past Behavior on Environmental Self-Identity. Environ. Behav. 2014 , 46 , 626–657. [ Google Scholar ] [ CrossRef ]
  • Jacobs, T.P.; McConnell, A.R. Self-Transcendent Emotion Dispositions: Greater Connections with Nature and More Sustainable Behavior. J. Environ. Psychol. 2022 , 81 , 101797. [ Google Scholar ] [ CrossRef ]
  • Ruvalcaba-Romero, N.A.; Fernández-Berrocal, P.; Salazar-Estrada, J.G.; Gallegos-Guajardo, J. Positive Emotions, Self-Esteem, Interpersonal Relationships and Social Support as Mediators between Emotional Intelligence and Life Satisfaction. J. Behav. Health Soc. Issues 2017 , 9 , 1–6. [ Google Scholar ] [ CrossRef ]
  • Githinji, M.; van Noordwijk, M.; Muthuri, C.; Speelman, E.N.; Kampen, J.; Hofstede, G.J. “You Never Farm Alone”: Farmer Land-Use Decisions Influenced by Social Relations. J. Rural Stud. 2024 , 108 , 103284. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Variable (Abbreviation)Question (Units)Scale/Definition
Dependent variable
WCQPWCQP1—Are you willing to replace chemical fertilizers with organic fertilizers?1 = Very Unwilling,
2 = Unwilling,
3 = Neutral,
4 = Willing,
5 = Very Willing
WCQP2—Are you willing to reduce the use of pesticides?
Core independent variable
Cognitive identityCog1—You are a true rural person.1 = Strongly Disagree,
2 = Disagree,
3 = Neutral,
4 = Agree,
5 = Strongly Agree
Cog2—You fully accept the characteristics of rural people.
Cog3—You believe your rural identity deserves respect.
Emotional identityEmo1—You do not feel lonely in the countryside.
Emo2—Compared to urban identities, you do not feel inferior.
Emo3—You enjoy the lifestyle of rural people.
Behavior identityBeh1—You enjoy interacting with rural people.
Beh2—You like to participate in rural community activities.
Social identitySoc1—You feel a sense of belonging to your village.
Soc2—You strongly identify with your village.
Soc3—You are proud to be a member of your village.
Control variable
GenderMale or Female.0 = Male and 1 = Female
AgeYour age (years).Actual age
EduEducation level.1= No schooling,
2 = Elementary school,
3 = Junior high school,
4 = High school,
5 = College or above
HlthHealth condition.1 = Very Poor,
2 = Poor,
3 = Average,
4 = Good,
5 = Very Good
ALYYears engaged in agricultural production (years).Years engaged in agricultural production
CLAYour family’s grain crop cultivation area (mu).Area of grain crops cultivated
CLQHow many plots of cultivated land does your family own (number of plots).Number of cultivated land plots
IncomeYour family’s total income in 2022 (CNY 10,000).Actual family income in 2022
HouseholdsNumber of people in your household (people).Actual number of family members
ALFNumber of family members engaged in agricultural labor (people).Actual number of labor force members
VariablesIndicatorMeanStd. Dev.MinMax
Dependent Variable
WCQP-3.7521.00415
-WCQP13.5001.18415
-WCQP24.0051.15715
Core Independent Variables
CID-4.2080.84115
-Cog14.4690.92615
-Cog24.3710.93015
-Cog33.7841.25715
EID-3.4231.20115
-Emo13.3391.45015
-Emo23.4671.37815
-Emo33.4621.41416
BID-4.1310.98615
-Beh14.2120.99915
-Beh24.0501.09315
SID-4.1211.01315
-Soc14.1851.06615
-Soc24.1321.06215
-Soc34.0481.11615
Control Variables
Gender-0.5150.50001
Age-58.78814.8501287
Edu-2.6511.16015
Hlth-3.5241.24415
ALY-39.30817.471078
CLA-5.0213.738025
CLQ-2.9591.972018
Income-5.0194.696035
Households-4.7771.956114
ALF-2.0751.234010
VariablesFactor LoadingCronbach’s AlphaCRAVECIDEIDBIDSIDWCQP
CID0.8440.7490.8580.6710.819
0.911
0.687
EID0.8300.8070.8860.7220.2970.850
0.846
0.872
BID0.9570.8740.940 0.8860.2670.2210.942
0.925
SID0.9220.9300.9550.8760.4020.1890.1680.936
0.953
0.933
WCQP0.8740.6390.8470.7340.4130.4570.2580.3200.857
0.839
IndexModel ValueRecommended ValueAcceptance
SRMR0.043<0.08 good fit; <0.1 reasonable fitGood
d_ULS0.508Below 0.95Reasonable
d_G0.290Below 0.95Reasonable
NFI0.813>0.9 good fit; >0.8 reasonable fitReasonable
VariablesFirst-Stage RegressionSecond-Stage Regression
Model 1Model 2Model 3Model 4Model 5
CID- 0.260 ***
- (0.057)
EID (subsidize policy) 0.134 * 2.111 ***
(0.067) (0.271)
BID - −0.445 ***
- (0.083)
SID (resettlement policy) 0.200 ***0.685 ***
(0.069)(0.220)
Control variablesYESYESYESYESYES
First-stage Chi 0.1906.379 **1.34125.343 ***
First-stage Cragg–Donald Wald F statistic37.551109.92063.846148.807
Second-stage R 0.341
VariablesCIDEIDBIDSID
Coef.Robust Std. Err.Coef.Robust Std. Err.Coef.Robust Std. Err.Coef.Robust Std. Err.
Gender0.466 **0.2080.0030.208−0.654 ***0.2120.1590.206
Age−0.0140.016−0.037 **0.016−0.0130.017−0.035 **0.016
Edu0.0580.1080.333 ***0.1020.237 **0.1060.175 *0.106
Hlth0.0260.089−0.1250.0890.0920.0870.0290.086
ALY0.035 **0.0140.032 **0.0130.034 **0.0150.038 ***0.013
CLA0.0250.0300.0090.0290.067 **0.032−0.0150.029
CLQ0.0220.060−0.0410.0500.0160.053−0.0420.051
Income0.082 ***0.0300.046 **0.0230.0180.0220.0310.023
Households0.0010.0610.0410.060−0.0020.0560.0700.057
ALF−0.0310.085−0.1220.088−0.0830.089−0.0760.088
Prob > chi 0.0000.0020.0010.065
VariableMatching MethodAverage Treatment EffectStandard Deviationt-Value
CID1:3 nearest neighbor matching0.501 ***0.1114.99
Caliper match0.493 ***0.1235.28
Nuclear matching0.555 ***0.1005.57
ATT mean0.516--
EID1:3 nearest neighbor matching0.685 ***0.1115.67
Caliper match0.737 ***0.0987.32
Nuclear matching0.760 ***0.0948.09
ATT mean0.727--
BID1:3 nearest neighbor matching0.805 ***0.1166.61
Caliper match0.740 ***0.1007.05
Nuclear matching0.766 ***0.0987.72
ATT mean0.770--
SID1:3 nearest neighbor matching0.309 **0.1182.72
Caliper match0.102 ***0.1033.42
Nuclear matching0.357 **0.1012.34
ATT mean0.256--
GroupMatching StageMatching MethodPseudo-R LRMean BiasMed. BiasB
CIDBefore matching0.05432.45 ***30.532.354.6
After matching1:3 nearest neighbor matching0.0010.462.81.36.0
Caliper match0.0021.195.14.210.0
Nuclear matching0.0010.933.73.08.8
EIDBefore matching0.03722.36 ***18.719.145.1
After matching1:3 nearest neighbor matching0.0021.295.75.710.8
Caliper match0.0010.815.35.58.6
Nuclear matching0.0010.403.53.86.0
BIDBefore matching0.05130.72 ***22.428.953.7
After matching1:3 nearest neighbor matching0.0010.813.13.48.9
Caliper match0.0000.211.71.54.6
Nuclear matching0.0010.773.23.48.8
SIDBefore matching0.01810.87 **9.37.632.3
After matching1:3 nearest neighbor matching0.0011.006.16.78.7
Caliper match0.0000.152.93.33.5
Nuclear matching0.0010.441.00.75.8
VariablesModel 6Model 7
Coef.Robust Std. Err.Coef.Robust Std. Err.
CID0.085 *0.0500.270 ***0.059
EID0.333 ***0.0410.309 ***0.041
BID0.089 **0.0420.098 **0.041
SID0.227 ***0.0510.140 ***0.049
CID×SID0.131 *0.0690.087 **0.044
EID×SID−0.129 ***0.044−0.191 ***0.036
BID×SID−0.0410.0660.0150.044
Gender0.080.0430.0270.108
Age−0.0150.1030.0070.041
Edu−0.0160.0480.0720.048
Hlth0.0570.0440.088 **0.044
ALY0.0990.0990.0820.104
CLA0.0620.0420.0460.042
CLQ−0.088 **0.040−0.108 ***0.038
Income−0.0890.046−0.0700.049
Households0.0660.0490.0650.047
ALF−0.0240.048−0.0390.043
VariablesIIIIIIIV
Coef.Robust Std. Err.Coef.Robust Std. Err.Coef.Robust Std. Err.Coef.Robust Std. Err.
CID0.0200.116−0.0110.1770.168 *0.0870.267 ***0.099
EID0.1350.1050.956 ***0.162−0.0600.0980.250 ***0.085
BID0.171 *0.098−0.0140.090−0.0250.0810.155 *0.085
SID0.256 **0.114−0.0290.1020.477 ***0.0520.0520.090
EID×SID−0.314 **0.1510.1580.174−0.1010.1390.288 ***0.079
Control variablesYESYESYESYES
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Li, H.; Liu, J.; Chang, W.-Y. Influence of Self-Identity and Social Identity on Farmers’ Willingness for Cultivated Land Quality Protection. Land 2024 , 13 , 1392. https://doi.org/10.3390/land13091392

Li H, Liu J, Chang W-Y. Influence of Self-Identity and Social Identity on Farmers’ Willingness for Cultivated Land Quality Protection. Land . 2024; 13(9):1392. https://doi.org/10.3390/land13091392

Li, Hao, Junchi Liu, and Wei-Yew Chang. 2024. "Influence of Self-Identity and Social Identity on Farmers’ Willingness for Cultivated Land Quality Protection" Land 13, no. 9: 1392. https://doi.org/10.3390/land13091392

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. Multiple Linear Regression Hypothesis Testing in Matrix Form

    hypothesis testing in regression

  2. PPT

    hypothesis testing in regression

  3. Regression Analysis and Introduction to Hypothesis Testing

    hypothesis testing in regression

  4. PPT

    hypothesis testing in regression

  5. How to Test Hypotheses in Regression Analysis, Correlation, and

    hypothesis testing in regression

  6. Hypothesis Testing On Linear Regression

    hypothesis testing in regression

VIDEO

  1. Hypothesis Testing in Simple Linear Regression

  2. Hypothesis Testing

  3. Hypothesis Testing

  4. Hypothesis testing: Linear & Multiple Regression in R

  5. Chapter 12

  6. اختبارات الفروض : تحليل الانحدار المتعدد Hypothesis tests: multiple regression analysis

COMMENTS

  1. 12.2.1: Hypothesis Test for Linear Regression

    The two test statistic formulas are algebraically equal; however, the formulas are different and we use a different parameter in the hypotheses. The formula for the t-test statistic is t = b1 (MSE SSxx)√ t = b 1 ( M S E S S x x) Use the t-distribution with degrees of freedom equal to n − p − 1 n − p − 1.

  2. Linear regression hypothesis testing: Concepts, Examples

    Learn how to perform hypothesis testing for linear regression models using t-statistics and f-statistics. See how to train a multiple linear regression model using R and interpret the hypothesis testing results.

  3. Linear regression

    Learn how to perform tests of hypotheses about the coefficients of a linear regression model estimated by OLS. Compare different methods for normal and non-normal models, and see examples and proofs.

  4. Hypothesis Test for Regression Slope

    Hypothesis Test for Regression Slope. This lesson describes how to conduct a hypothesis test to determine whether there is a significant linear relationship between an independent variable X and a dependent variable Y.. The test focuses on the slope of the regression line Y = Β 0 + Β 1 X. where Β 0 is a constant, Β 1 is the slope (also called the regression coefficient), X is the value of ...

  5. Understanding the Null Hypothesis for Linear Regression

    x: The value of the predictor variable. Simple linear regression uses the following null and alternative hypotheses: H0: β1 = 0. HA: β1 ≠ 0. The null hypothesis states that the coefficient β1 is equal to zero. In other words, there is no statistically significant relationship between the predictor variable, x, and the response variable, y.

  6. PDF Lecture 5 Hypothesis Testing in Multiple Linear Regression

    As in simple linear regression, under the null hypothesis t 0 = βˆ j seˆ(βˆ j) ∼ t n−p−1. We reject H 0 if |t 0| > t n−p−1,1−α/2. This is a partial test because βˆ j depends on all of the other predictors x i, i 6= j that are in the model. Thus, this is a test of the contribution of x j given the other predictors in the model.

  7. How to Test the Significance of a Regression Slope

    Step 1. State the hypotheses. The null hypothesis (H0): B1 = 0. The alternative hypothesis: (Ha): B1 ≠ 0. Step 2. Determine a significance level to use. Since we constructed a 95% confidence interval in the previous example, we will use the equivalent approach here and choose to use a .05 level of significance. Step 3.

  8. Hypothesis testing in linear regression part 1

    This video explains how hypothesis testing works in practice, using a particular example. Check out https://ben-lambert.com/econometrics-course-problem-sets-...

  9. PDF Chapter 6 Hypothesis Testing

    Calculating SSR. Population mean: y. Independent variable (x) The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean. Regression Formulas. The Total Sum of Squares (SST) is equal to SSR + SSE. Mathematically,

  10. PDF Lecture 9: Linear Regression

    Regression. Technique used for the modeling and analysis of numerical data. Exploits the relationship between two or more variables so that we can gain information about one of them through knowing values of the other. Regression can be used for prediction, estimation, hypothesis testing, and modeling causal relationships.

  11. Linear regression

    See all my videos at https://www.tilestats.com/In this video, we will see how we can use hypothesis testing in linear regression to, for example, test if the...

  12. Hypothesis Testing On Linear Regression

    Steps to Perform Hypothesis testing: Step 1: We start by saying that β₁ is not significant, i.e., there is no relationship between x and y, therefore slope β₁ = 0. Step 2: Typically, we set ...

  13. Hypothesis Testing in Regression Analysis

    Hypothesis Testing in Regression Analysis. Hypothesis testing is used to confirm if the estimated regression coefficients bear any statistical significance. Either the confidence interval approach or the t-test approach can be used in hypothesis testing. In this section, we will explore the t-test approach.

  14. Regression Analysis

    Hypothesis Testing: Regression analysis provides a statistical framework for hypothesis testing. Researchers can test the significance of individual coefficients, assess the overall model fit, and determine if the relationship between variables is statistically significant. This allows for rigorous analysis and validation of research hypotheses.

  15. Hypothesis Tests and Confidence Intervals in Multiple Regression

    Confidence Intervals for a Single Coefficient. The confidence interval for a regression coefficient in multiple regression is calculated and interpreted the same way as it is in simple linear regression. The t-statistic has n - k - 1 degrees of freedom where k = number of independents. Supposing that an interval contains the true value of ...

  16. Understanding the t-Test in Linear Regression

    Whenever we perform linear regression, we want to know if there is a statistically significant relationship between the predictor variable and the response variable. We test for significance by performing a t-test for the regression slope. We use the following null and alternative hypothesis for this t-test: H 0: β 1 = 0 (the slope is equal to ...

  17. Hypothesis Test for Simple Linear Regession

    Organized by textbook: https://learncheme.com/ The spreadsheet can be found at https://learncheme.com/student-resources/excel-files/ Made by faculty at the ...

  18. T-test and Hypothesis Testing (Explained Simply)

    Aug 5, 2022. --. 6. Photo by Andrew George on Unsplash. Student's t-tests are commonly used in inferential statistics for testing a hypothesis on the basis of a difference between sample means. However, people often misinterpret the results of t-tests, which leads to false research findings and a lack of reproducibility of studies.

  19. Understanding t-test for linear regression

    With linear regression we basically get the same thing. In vector form, ˆβ ∼ N(β, σ2(XTX) − 1). Let S2j = (XTX) − 1jj and assume the predictors X are non-random. If we knew σ2 we'd have ˆβj − 0 σSj ∼ N(0, 1) under the null H0: βj = 0 so we'd actually have a Z test. But once we estimate σ2 we end up with a χ2 random variable ...

  20. PDF 09

    - 90 - F Test for the Slope The t test can be used to test if the regression slope is equal to any constant, not just zero. If you wish only to determine if there is a statistically significant linear relationship between X and Y regardless of slope, an alternative is to use an F test to determine if the differences between the predicted Y values and the measured Y values are significantly ...

  21. hypothesis testing

    A large Z score means that the observed regression coefficient is extreme, and therefore unlikely, in this hypothetical scenario. Getting such an extreme coefficient under this scenario makes one doubt the validity of that scenario. That is hypothesis testing, with this hypothetical scenario often called the "null hypothesis".

  22. Hypothesis Testing in High-Dimensional Regression under the Gaussian

    Title: Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory. Authors: Adel Javanmard, Andrea Montanari. View a PDF of the paper titled Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory, by Adel Javanmard and Andrea Montanari.

  23. Tackling Complex Regression Analysis and Hypothesis Testing Tasks

    Place the dependent variable on the Y-axis and the independent variable on the X-axis. Overlay the regression line to assess how well it fits the data. A well-fitting line indicates a strong relationship between the variables. Perform Hypothesis Testing: Test the null hypothesis to determine if there is a significant relationship between the ...

  24. A U-Statistic for Testing the Lack of Dependence in Functional

    The functional partially linear regression model comprises a functional linear part and a non-parametric part. Testing the linear relationship between the response and the functional predictor is of fundamental importance. In cases where functional data cannot be approximated with a few principal components, we develop a second-order U-statistic using a pseudo-estimate for the unknown non ...

  25. The Four Assumptions of Linear Regression

    However, before we conduct linear regression, we must first make sure that four assumptions are met: 1. Linear relationship: There exists a linear relationship between the independent variable, x, and the dependent variable, y. 2. Independence: The residuals are independent. In particular, there is no correlation between consecutive residuals ...

  26. A comprehensive comparison of goodness-of-fit tests for logistic

    We introduce a projection-based test for assessing logistic regression models using the empirical residual marked empirical process and suggest a model-based bootstrap procedure to calculate critical values. We comprehensively compare this test and Stute and Zhu's test with several commonly used goodness-of-fit (GoF) tests: the Hosmer-Lemeshow test, modified Hosmer-Lemeshow test, Osius ...

  27. Competence over confidence: uncovering lower self-efficacy for women

    Fisher's exact test was conducted for the dichotomous performance variables, backwall puncture and successful venipuncture. All assumptions were met for both of these analyses. Finally, to assess the Dunning-Kruger effect (RQ3), regression analyses were conducted to determine if there was a correlation between self-efficacy and performance.

  28. Land

    Exploring farmers' willingness for cultivated land quality protection (WCQP) is crucial for preserving land quality. The existing sociopsychological research often examines farmers' WCQP from a single perspective—either self-identity or social identity—overlooking the structural relationship between the two. This oversight hinders the development of synergistic policies for cultivated ...