Sales Forecast Regression Exercise

INSEAD MBA Core Course UDJ

Exercise

A marketing manager wishes to forecast corporate sales for the coming year and to understand better the factors that influence them. The marketing manager wants the forecast for at least four reasons:

(1) to supply her with estimates needed as her part in the corporate planning activity,

(2) to giver her an idea of the kind of staffing requirements she will have in sales and sales service to handle to company's increased sales,

(3) to help in planning budget allocations for advertising, dealer discounts, and so on, and

(4) to help her make better policy decisions concerning price, advertising, and product development expenditures.

With this initial identification of the problem, the marketing manager sits down with the sales manager and others in her marketing organization to determine the factors that might affect the company's sales. They come up with the following model:

The manager believes that sales depend upon: personal disposable income, dealers' allowances prices, product development expenditures, capital investments, advertising, sales expenses, total industry advertising, and some random effects.

The marketing manager recognizes that some of these factors will have a more important effect than others on the company's sales; others may turn out to be unimportant. Since any one of them, however, may have an important impact, data are gathered on all of them at this early stage in the process.

The Data

This file SALES includes semiannual data, covering the period from 1972 through 1991, on nine variables (eight independent variables and the dependent variable SALES). A list of the nine variables is given below:

Personal Disposable Income in Millions of Dollars. (PDI)
Dealers' Allowances in Thousands of Dollars. (DEALS)
Price in Dollars. (PRICE)
Product Development in Thousands of Dollars. (R&D)
Capital Investments in Thousands of Dollars. (INVEST)
Advertising in Thousands of Dollars. (ADVERTIS)
Sales Expenses in Thousands of Dollars. (EXPENSE)
Total Industry Advertising in Thousands of Dollars. (TOTINDAD)
Company Sales in Thousands of Dollars. (SALES)

Find the most appropriate regression equation for SALES. How do you interpret this regression results you found?

The Analysis

Let's first see some summary statistics.

Summary Statistics
	PDI	DEALS	PRICE	R.D	INVEST	ADVERTIS	EXPENSE	TOTINDAD	SALES
min	268.000	104.000	56.200	7.740	-49.570	12.460	154.050	59.610	4033.130
25%	358.000	120.000	63.133	12.995	9.473	40.025	205.725	136.660	4767.525
median	415.000	130.000	69.075	19.655	60.310	65.790	224.250	242.995	5103.850
mean	406.100	129.125	69.372	19.587	57.508	60.334	232.727	238.441	5078.957
75%	455.250	138.000	75.455	24.907	113.860	83.208	266.335	315.253	5376.128
max	554.000	152.000	82.330	35.740	148.450	96.070	328.250	428.980	6180.060
std	70.765	12.845	7.821	7.880	61.538	26.087	43.924	110.407	513.181

This is a histogram of our dependent variable SALES:

plot of chunk unnamed-chunk-2

How has our dependent variable NA changed over time? Here is an interactive chart of the SALES time series.

Place the mouse on the datapoints on the plot to see daily values, and zoom using click-and-drag with the mouse in the smaller graph below:

This gives us a first rough idea about how the variable evolved.

Let's now see the correlation between all our variables:

	PDI	DEALS	PRICE	R.D	INVEST	ADVERTIS	EXPENSE	TOTINDAD	SALES
PDI	1.00	-0.05198	0.5818	0.1628	0.1066	0.2057	0.9017	-0.05611	0.7301
DEALS	-0.05198	1.00	0.0442	0.008086	-0.09995	-0.1096	-0.0409	-0.1528	0.01248
PRICE	0.5818	0.0442	1.00	0.4371	-0.08938	0.2764	0.6387	-0.2135	0.2949
R.D	0.1628	0.008086	0.4371	1.00	0.1673	0.1269	0.3628	-0.1295	0.04157
INVEST	0.1066	-0.09995	-0.08938	0.1673	1.00	0.238	0.1956	-0.1008	0.4239
ADVERTIS	0.2057	-0.1096	0.2764	0.1269	0.238	1.00	0.144	-0.1986	0.5306
EXPENSE	0.9017	-0.0409	0.6387	0.3628	0.1956	0.144	1.00	-0.04053	0.6675
TOTINDAD	-0.05611	-0.1528	-0.2135	-0.1295	-0.1008	-0.1986	-0.04053	1.00	-0.1802
SALES	0.7301	0.01248	0.2949	0.04157	0.4239	0.5306	0.6675	-0.1802	1.00

and maybe a plot between our dependent variable and, say, the first independent variable PDI:

plot of chunk unnamed-chunk-5

One can generate more of these plots (e.g. using the web applicatin for this case).

Regression Model

Let's try this regression model:

SALES = b1*PDI + b2*DEALS + b3*PRICE + b4*R.D + b5*INVEST + b6*ADVERTIS + b7*EXPENSE + b8*TOTINDAD

Let's see now the regression output for our choice of dependent and independent variables:

Regression Analysis - Dependent variableSALES
	Estimate	Std. Error	t value	Pr(> \|t\|)
(Intercept)	3027.6336	572.3599	5.29	0.0000
PDI	3.3723	1.4137	2.39	0.0234
DEALS	4.6953	3.0103	1.56	0.1290
PRICE	-18.1112	7.4927	-2.42	0.0217
R.D	-9.9033	6.0021	-1.65	0.1090
INVEST	1.6895	0.7157	2.36	0.0247
ADVERTIS	8.2907	1.6445	5.04	0.0000
EXPENSE	4.4434	2.5076	1.77	0.0862
TOTINDAD	-0.4427	0.3646	-1.21	0.2339

Residual standard error: 234.6363 on 31 degrees of freedom
Multiple R-squared: 0.8338 , Adjusted R-squared: 0.791
F-statistic: 19.4449 on 8 and 31
p-value: 4.4365 × 10^-10

Residual Analysis

Let's look now at the residuals. Here are some plots and statistics.

Let's see first the residuals.

Of course we can make various plots of the residuals to “spot” (visually for now) and heteroskedasticity or autocorrelation. For example this is the plot of the residuals against the dependent variable SALES:

plot of chunk unnamed-chunk-9

or the histogram of the residuals:

plot of chunk unnamed-chunk-10