Quick and Dirty Way to Fit Regression Models Using (Only) SQL.

SQL programmers hardly fit any ML models.
Someone else will do it unless they have either Python or R knowledge. While Python and scikit-learn are often my go-to tools for Machine Learning, it is worth noting that SQL can also do some quick and dirty model fitting.
Regression models are a common one almost everyone needs. I remember using it in high school physics for the first time.
In such situations, if you have your data in a Postgres table, you don't have to leave your Sql environment to fit such trivial models.
3 SQL Optimization Techniques That Can Instantly Boost Query Speed
Here's how we do this.
Regression modeling in SQL
Postgres has built-in utilities to work with regression models. You don't have to install or activate any special modules.
We can easily fit linear regression models quickly and make predictions using them.
A linear regression model is about finding the equation of a line that generalizes the dataset. Thus, we only need to find the line's intercept and slope.
The regr_slope
and regr_intercept
functions help us with this task.
Let's suppose we have a table with the rainfall and temperature columns. And we need to predict the missing values in the temperature column using the rainfall information.
Here's a SELECT statement that retrieves the rainfall and temperature values from the weather table, where missing temperature values are filled with predictions using the regr_slope
and regr_intercept
functions:
SELECT
rainfall,
CASE
WHEN temperature IS NULL
THEN regr_slope(temperature, rainfall) * rainfall
+ regr_intercept(temperature, rainfall)
ELSE temperature
END AS temperature
FROM
weather;
We use a CASE
statement to check if the temperature value is missing (i.e., NULL). If it is missing, we use the regr_slope
and regr_intercept
Functions to predict the temperature value based on the corresponding rainfall value. Otherwise, we use the original temperature value.
Persisting the predictions
If I want to fill in the missing values in the table permanently, I can use a slightly modified version of the above code. You can create a materialized view for the predictions or insert the forecasts into a different table.
-- Populate the table with predicted temperature values
INSERT INTO predicted_temperature (rainfall, temperature)
SELECT
t1.rainfall,
CASE
WHEN t1.temperature IS NULL
THEN regr_slope(t2.temperature, t2.rainfall) * t1.rainfall
+ regr_intercept(t2.temperature, t2.rainfall)
ELSE t1.temperature
END AS temperature
FROM
weather t1
LEFT JOIN weather t2 ON t1.rainfall = t2.rainfall
WHERE
t1.temperature IS NULL;
And here's the same but creating a materialized view. Materialized views are simply queries saved in the database along with their results from the latest run.
CREATE MATERIALIZED VIEW predicted_temperature_mv AS
SELECT
rainfall,
CASE
WHEN temperature IS NULL
THEN regr_slope(temperature, rainfall) * rainfall
+ regr_intercept(temperature, rainfall)
ELSE temperature
END AS temperature
FROM
weather;
Limitations of fitting regression models in SQL
Anyone who worked on regression models before could attest that the above examples still need to be completed. Fitting a correct model is more complex than what we've discussed here.
Especially in real-life situations, we should consider more than one independent variable to predict the dependent one. Sometimes we have to use polynomial order in fitting the regression models rather than the linear one. We may also need to use them both.
Postgres's regression utilities aren't capable of handling such complex modeling. We can only build a linea regressor with one independent variable.
Conclusion
Linear regression models are probably the most used ones for predicting continuous data. Data scientists often use it as a starting point for more complex ML modeling.
Although we need the support of programming languages such as Python for more sophisticated machine-learning tasks, simple tasks like linear regressions can be done inside SQL itself.
I hope this little technique discussed in this post will help you in your daily work.
Thanks for reading, friend! If you enjoyed my article, let's keep in touch on LinkedIn, Twitter, and Medium.
Not a Medium member yet? Please use this link to become a member because, at no extra cost for you, I earn a small commission for referring you.