DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Simultaneous, Multiple Proportion Comparisons Using Marascuilo Procedure
  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • How Doris and Hive Work Together to Maximize Data Analysis Efficiency
  • Teradata Performance and Skew Prevention Tips

Trending

  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Blue Skies Ahead: An AI Case Study on LLM Use for a Graph Theory Related Application
  • Concourse CI/CD Pipeline: Webhook Triggers
  • Medallion Architecture: Why You Need It and How To Implement It With ClickHouse
  1. DZone
  2. Data Engineering
  3. Data
  4. Key Terms to Know: Regression Analysis

Key Terms to Know: Regression Analysis

When trying to decipher the results of a regression analysis, you must understand the lingo, as well. Learn some of the most common terms used in regression analysis.

By 
Sunil Kappal user avatar
Sunil Kappal
·
Feb. 28, 18 · Opinion
Likes (4)
Comment
Save
Tweet
Share
20.6K Views

Join the DZone community and get the full member experience.

Join For Free

When trying to decipher the results of a regression analysis, it is mandatory to understand the lingo, as well. This article will introduce you to some of the most common terminologies that are used in regression analysis.

It might be a "duh" scenario for a veteran analyst when it comes to these terms and terminologies. However, I strongly feel that there will be many newbies digging for this information on the world wide web.

So, here is a comprehensive list of regression analysis terms:

Estimator: A formula or algorithm for generating estimates of parameters, given relevant data.

Bias: An estimate is unbiased if its expectation equals the value of the parameter being estimated; otherwise it is biased.

Efficiency: An estimator A is more efficient than an estimator B if A has a smaller sampling variance — that is, if the particular values generated by A are more tightly clustered around their expectation.

Consistency: An estimator is consistent if the estimates it produces converge on the true parameter value as the sample size increases without limit. Consider an estimator that produces estimates θ^ of some parameter θ, and let ^ denote a small number. If the estimator is consistent, we can make the probability as close to 1.0 as we like or as small as we like by drawing a sufficiently large sample. Note that a biased estimator may nonetheless be consistent if the bias tends to zero in the limit. Conversely, an unbiased estimator may be inconsistent if its sampling variance fails to shrink appropriately as the sample size increases.

Standard error of the Regression (SER): An estimate of the standard deviation of the error term in a regression model.

R-squared: A standardized measure of the goodness of fit for a regression model.

Standard error of regression coefficient: An estimate of the standard deviation of the sampling distribution for the coefficient in question.

P-value: The probability, supposing the null hypothesis to be true, of drawing sample data that are as adverse to the null as the data actually drawn, or more so. When a small p-value is found, the two possibilities are that we happened to draw a low-probability unrepresentative sample or that the null hypothesis is in fact false.

Significance level: For a hypothesis test, this is the smallest p-value for which we will not reject the null hypothesis. If we choose a significance level of 1%, we're saying that we'll reject the null if and only if the p-value for the test is less than 0.01. The significance level is also the probability of making a type 1 error (that is, rejecting a true null hypothesis).

T-test: The t-test (or z-test, which is the same thing asymptotically) is a common test for the null hypothesis that a particular regression parameter, βi, has some specific value (commonly zero, but generically βH0).

F-test: A common procedure for jointly testing a set of linear restrictions on a regression model.

Multicollinearity: A situation where there is a high degree of correlation among the independent variables in a regression model — or, more generally, where some of the Xs are close to being linear combinations of other Xs. Symptoms include large standard errors and the inability to produce precise parameter estimates. This is not a serious problem if one is primarily interested in forecasting; it is a problem is one is trying to estimate causal influences.

Omitted variable bias: Bias in the estimation of regression parameters that arises when a relevant independent variable is omitted from a model and the omitted variable is correlated with one or more of the included variables.

Log variables: A common transformation that permits the estimation of a nonlinear model using OLS to substitute the natural log of a variable for the level of that variable. This can be done for the dependent variable and/or one or more independent variables. A key point to remember about logs is that for small changes, the change in the log of a variable is a good approximation to the proportional change in the variable itself. For example, if log(y) changes by 0.04, y changes by about 4%.

Quadratic terms: Another common transformation. When both xi and x 2 i are included as regressors, it is important to remember that the estimated effect of xi on y is given by the derivative of the regression equation with respect to xi. If the coefficient on xi is β and the coefficient on x 2 i is γ, the derivative is β + 2γ xi.

Interaction terms: Pairwise products of the "original" independent variables. The inclusion of interaction terms in a regression allows for the possibility that the degree to which xi affects y depends on the value of some other variable x j. In other words, x j modulates the effect of xi on y. For example, the effect of experience on wages (xi) might depend on the gender (x j) of the worker.

Hypothesis (drama) Data (computing) Testing Interaction Deviation (statistics) Efficiency (statistics) clustered Distribution (differential geometry)

Published at DZone with permission of Sunil Kappal, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Simultaneous, Multiple Proportion Comparisons Using Marascuilo Procedure
  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • How Doris and Hive Work Together to Maximize Data Analysis Efficiency
  • Teradata Performance and Skew Prevention Tips

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: