DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Smart Routing Using AI for Efficient Logistics and Green Solutions
  • Predicting Traffic Volume With Artificial Intelligence and Machine Learning
  • A Guide to Regression Analysis Forecasting in Python
  • Inventory Predictions With Databricks

Trending

  • A Guide to Developing Large Language Models Part 1: Pretraining
  • It’s Not About Control — It’s About Collaboration Between Architecture and Security
  • Why High-Performance AI/ML Is Essential in Modern Cybersecurity
  • Unlocking the Benefits of a Private API in AWS API Gateway
  1. DZone
  2. Coding
  3. Languages
  4. Predictive Analysis Using Linear Regression With SAS

Predictive Analysis Using Linear Regression With SAS

In SAS, PROC REG can be used for linear regression to find the relationship between two variables. It is one of the most-used predictive technologies.

By 
Jitendra Bafna user avatar
Jitendra Bafna
DZone Core CORE ·
Mar. 01, 17 · Opinion
Likes (3)
Comment
Save
Tweet
Share
13.2K Views

Join the DZone community and get the full member experience.

Join For Free

Linear regression is used to establish the relationship between a scalar dependent variable and one or more independent variables (explanatory variables). Linear regression consists of finding the best-fitted straight line through the points. This is known as a regression line. It is one of the most-used predictive technologies.


For example, if you want to predict the weight of person depending on their height, then the weight will be the dependent variable, as it needs to predict, and the height is the independent variable.

In SAS, PROC REG is used for linear regression to find the relationship between two variables.

Syntax

PROC REG DATA = dataset;
MODEL var1 = var2;
  • dataset is the name of the dataset.

  • var1 and var2 are the variables' names in the dataset used to find the correlation.

Here, you need to check for the P value and R-squared value.

  • If the R-squared Value is greater than 0.7, then your model is good.

  • If the P value is greater than 0.05, then the null hypothesis (h0) will be accepted. Otherwise, we will go for alternate hypothesis.

Simple Linear Regression

Modeling and establishing the relationship between one dependent variable and one independent variable is known as Simple Linear Regression.

y = β0 + β1x1 + ϵ

  • x1 is the independent variable.

  • y is the dependent variable.

  • β0 is constant.

  • β1 is the regression coefficient.

  • ϵ is an error.

Data Person_Data ;
input weight height;
datalines;
30 130
40 140
45 145
50 160
55 170
60 172
;

proc reg data=work.person_data alpha=0.05 plots(only)=(diagnostics residuals observedbypredicted);;
model weight=height;
output out=WORK.Reg_stats p=p_ lcl=lcl_ ucl=ucl_ rstudent = r ;
run;
quit;
  • weight is the dependent variable.

  • height is the independent variable.

Output

Image title

Image title

Image title

  • R-squared value is 0.9541 (95.41%) > 0.7 (70%). This means that your model is a good fit.

  • P value is 0.0008 < 0.05. This means that the height is a significant variable in your model.

Image title

  • The value of r is calculated to know if there are any outliers in any observations. If the value of r is greater than 2 and less than -2, then that observation has outliers. (Note: -2 < r < 2.)

In this case, there is no observation that falls under the outliers range. ucl is upper confidence limit and lcl is lower confidence limit.

Final Conclusion

The weight of the person is explained 95% by a significant variable (height).

Multiple Linear Regression

Modeling and establishing the relationship between one dependent variable and two or more independent variables is known as Multiple Linear Regression.

y = β0 + β1x1 + β2x2 + β3x3 + ϵ

x1, x2,and x3 are independent variables.

y is a dependent variable.

β0 is constant.

β1, β2, and β3 are regression coefficients.

ϵ is an error.

Data realstate_data ;
input sales_price no_of_bedroom no_of_flats no_of_garrage;
datalines;
300000 1 10 2
400000 1 10 3
600000 2 5 2
800000 3 3 2
1000000 4 3 2
;
proc reg data=work.realstate_data alpha=0.05 plots(only)=(diagnostics residuals observedbypredicted);;
model 'sales_price'n='no_of_bedroom'n 'no_of_flats'n 'no_of_garrage'n /;
output out=WORK.realstate_data_stats p=p_ lcl=lcl_ ucl=ucl_ rstudent = r ;
run;
quit;
  • sales_price is the dependent variable.

  • no_of_bedroom, no_of_flats, and no_of_garrage are independent variables.

Output

Image title

Image title

Image title

Image title

The value of R-squared is 0.9990 (99.99%)  > 0.7 (70%). This means that your model is a good fit.

The value of P for all independent variables is > 0.05, so there's no significant variable in your model.

I hope this article gives you understanding for implementing linear regression with SAS.

Linear regression SAS (software)

Opinions expressed by DZone contributors are their own.

Related

  • Smart Routing Using AI for Efficient Logistics and Green Solutions
  • Predicting Traffic Volume With Artificial Intelligence and Machine Learning
  • A Guide to Regression Analysis Forecasting in Python
  • Inventory Predictions With Databricks

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!