X
Blog | Data Analytics

Choosing Regressors for Scenario-Based Forecasting

Introduction

This post will focus on the external data selection process when it comes to choosing regressors to include in scenario-based forecasting.

There are a few different types of external data we can use in forecasting: covariates which are series with no known future, static features which are single variables that describe your target, and future regressors where you have both past and future values that line up with your target series’ history and forecast horizon. We’ll be focused on future regressors here because of their centrality to scenario-based forecasts.

Scenario-based forecasting is all about measuring and planning for the effect that some external variable, or set of variables, will have on a target series to forecast. This is a powerful forecasting technique because it allows for the direct entry of human intuition into a forecast. In traditional precision forecasting, the levers a data scientist has to work with to adjust their forecast are often limited to model choice and corresponding parameters. While this is important for generating a well fit model (or properly generalized model), what we get is that every well-fit model will likely produce very similar results. This is because these forecasts are generated solely based on the historical patterns of the target series.

Scenario-based forecasting allows us to move beyond this limited approach and into one where we have more ways to apply intuition to the forecast. Without it, we are limited to forecasting recognized trends in the data, which in many cases is all that is needed, but for when the future is expected to behave differently because of external factors, we use scenario-based forecasting.

Getting started

In a previous post, we make the point that regressors should be seen as driving factors of the target variable, not a causality source. This is a nuanced point, but makes up the fundamentals of our understanding on how to choose a set of regressors to use in a scenario-based forecast.

​​We make the distinction between regressors that have an effect on a forecast variable and regressors that are helpful to a model used in forecasting. If you were to look at the statistical significance of the regressors, we must be aware that any procedure involving selecting regressors first will invalidate the assumptions behind the p-values. What we are interested in here is selecting predictors that are helpful when a model is used for forecasting; the methods discussed here would not hold if you wanted to study the effect of any predictor on the target variable.

Selection process

We focus heavily on the regressor selection process because it really becomes the driver of the forecasting process. One could easily select regressors that move the forecast in any direction they like, so making an informed decision on which regressors to use is critically important.

Let’s look at a few ways of breaking down different types of regressors. This will give us ideas pointing us where to look for the external data we’ll use.

Upstream relations

These series can be described as series that are an upstream component of the target time series.

Y = f(x) where Y is your target and x is your regressor, f is some potentially unknown function.

  • Target: total revenue, upstream regressor: number of units sold. This is the flipped version of downstream relations. The upstream regressor is an input into the equation that leads us to our target variable.
  • Target: price of oil, upstream regressor: # of barrels exported from Canada
  • Target: number of units sold, upstream regressor: $ spent on marketing
  • Target: airline ticket from L1 to L2, upstream regressor: price of oil

This kind of regressor can be helpful when we only have a partial understanding of all the inputs that go into a target variable. Otherwise, it simply makes more sense to just calculate it outright given we have all the inputs to that function.

Downstream relations

These series can be described as series that are a downstream product of the target time series.

Y = f(x) where Y is your regressor and x is your target, f is some potentially unknown function.

  • Target: number of units sold, downstream regressor: total revenue. Total revenue is a function of the number of units sold. If we have a better understanding of revenue projections, perhaps because of SBF done on that variable, we have the ability to “pull” the forecasted number of units sold into place.
  • Target: price of oil, downstream regresor: price of gasoline

These examples often don’t make the most logical sense because an analyst would usually have a similar, if not a decreased level of intuition about the future values of the regressor. Nonetheless, if downstream regressors can be forecasted with a similar process, their effect can be valuable.

Complementary & substitutional relations

These series can be described as series that have a certain correlation to the target series. However, the analyst must be careful not to blindly assume that correlation equals causation. See https://www.tylervigen.com/spurious-correlations if you’re wondering what this means.

Z = ax + by, where x is your target, y is your regressor, and a & b are unknown yet correlated coefficients. Z is unknown, irrelevant, and fixed at any given point in time.

Target: number of units sold, complementary regressor: number of complement units sold
Target: price of gold, complementary regressor: inflation rate in the US

We expect that in most cases the target and complementary/substitutional regressors are completely independent of each other. Compared to other kinds of regressors like upstream and downstream that directly influence the target, either by pushing from upstream or pulling from downstream, relations between this kind of regressors are through some kind of proxy. Proxies to think about could be things like supply & demand, etc.

Working with Chronos

Chronos has the ability to add external time series to your precision forecast experiments. These additional series are best used to help the forecasting algorithms fit the target variable data and also to inform the model about a future scenario.

The scenario-creation tool is a powerful interface that lets the user upload their own data. This data is likely to be as current as your target variable and with the addition of custom linear components the user can then design the shape of the scenario into the future. As you design the scenario, you should ensure that it expands into the future as far as the horizon of your target series will.


* This content was originally published on Nousot.com. Nousot and Lovelytics merged in April 2025.

Author

Related Posts

DocInsights blog featured image
May 05 2026

Your Business Is Drowning in Documents. How We Fix That with Databricks.

Learn how you can use Databricks AI to automate document extraction, reduce labor costs, and turn PDFs into business intelligence.

May 05 2026

Unlock $20M–$80M in Incremental Margin with Energylytics

Retail’s age-old problems remain, but the solutions are evolving. Discover how AI is finally solving CPG’s core issues.

Apr 28 2026

Double Recognition: Reaffirming Our Status as Databricks Brickbuilder Specialists for AI, Security, and Governance

In a fast-evolving landscape where data complexity is the primary hurdle to innovation, general knowledge is no longer enough. To thrive in the age of Intelligence,...
Apr 23 2026

Data Context – The Missing Ingredient Critical for AI Success

In our practice, we actively counsel our clients regarding the critical importance of data availability and data quality for successful AI use case performance. Without...
A featured image for the blog that has the title with a background featuring retail shelves.
Apr 13 2026

Same Challenges, New Opportunities: Why AI is Finally Closing the Retail Execution Gap

Retail’s age-old problems remain, but the solutions are evolving. Discover how AI is finally solving CPG’s core issues.

Apr 09 2026

Why AI Transformation in Retail & CPG Requires Domain Experts, Not Just Technology

Discover why domain knowledge is the missing ingredient in Retail and CPG AI transformation strategies in this blog.

Mar 26 2026

Building a Workforce, Not a Chatbot, with Databricks Agent Bricks

Over the last couple years, we’ve seen a lot of enterprises focus their AI implementations solely on "generative" tasks: summarizing long documents, drafting emails, or...
Mar 13 2026

Beyond Reactive Analytics: Transforming Warranty Risk Management with Compound LLM and Databricks

Executive Overview   Traditional warranty analytics systems share a fatal flaw- they tell you what broke yesterday, not what will break tomorrow. By the time a warranty...
Robert Herjavec headshot on stylized teal background with Lovelytics colors
Feb 26 2026

Shark Tank’s Robert Herjavec Makes Strategic Investment in Lovelytics, Joins Board of Directors

AI-focused Databricks consulting firm secures investment from renowned technology entrepreneur to accelerate growth in enterprise AI[Arlington, VA] — Lovelytics, a...
Feb 17 2026

Alex Wiss Is Our New CTO and We’re Changing How We Work

We have some big news to share. Alex Wiss is stepping into the role of Chief Technology Officer at Lovelytics. Most of you already know Alex. He has spent his whole...