INSEAD MBA Core Course UDJ
A marketing manager wishes to forecast corporate sales for the coming year and to understand better the factors that influence them. The marketing manager wants the forecast for at least four reasons:
(1) to supply her with estimates needed as her part in the corporate planning activity,
(2) to giver her an idea of the kind of staffing requirements she will have in sales and sales service to handle to company's increased sales,
(3) to help in planning budget allocations for advertising, dealer discounts, and so on, and
(4) to help her make better policy decisions concerning price, advertising, and product development expenditures.
With this initial identification of the problem, the marketing manager sits down with the sales manager and others in her marketing organization to determine the factors that might affect the company's sales. They come up with the following model:
The manager believes that sales depend upon: personal disposable income, dealers' allowances prices, product development expenditures, capital investments, advertising, sales expenses, total industry advertising, and some random effects.
The marketing manager recognizes that some of these factors will have a more important effect than others on the company's sales; others may turn out to be unimportant. Since any one of them, however, may have an important impact, data are gathered on all of them at this early stage in the process.
This file SALES includes semiannual data, covering the period from 1972 through 1991, on nine variables (eight independent variables and the dependent variable SALES). A list of the nine variables is given below:
Find the most appropriate regression equation for SALES. How do you interpret this regression results you found?
Let's first see some summary statistics.
PDI | DEALS | PRICE | R.D | INVEST | ADVERTIS | EXPENSE | TOTINDAD | SALES | |
---|---|---|---|---|---|---|---|---|---|
min | 268.000 | 104.000 | 56.200 | 7.740 | -49.570 | 12.460 | 154.050 | 59.610 | 4033.130 |
25% | 358.000 | 120.000 | 63.133 | 12.995 | 9.473 | 40.025 | 205.725 | 136.660 | 4767.525 |
median | 415.000 | 130.000 | 69.075 | 19.655 | 60.310 | 65.790 | 224.250 | 242.995 | 5103.850 |
mean | 406.100 | 129.125 | 69.372 | 19.587 | 57.508 | 60.334 | 232.727 | 238.441 | 5078.957 |
75% | 455.250 | 138.000 | 75.455 | 24.907 | 113.860 | 83.208 | 266.335 | 315.253 | 5376.128 |
max | 554.000 | 152.000 | 82.330 | 35.740 | 148.450 | 96.070 | 328.250 | 428.980 | 6180.060 |
std | 70.765 | 12.845 | 7.821 | 7.880 | 61.538 | 26.087 | 43.924 | 110.407 | 513.181 |
This is a histogram of our dependent variable SALES:
How has our dependent variable NA changed over time? Here is an interactive chart of the SALES time series.
Place the mouse on the datapoints on the plot to see daily values, and zoom using click-and-drag with the mouse in the smaller graph below:
This gives us a first rough idea about how the variable evolved.
Let's now see the correlation between all our variables:
PDI | DEALS | PRICE | R.D | INVEST | ADVERTIS | EXPENSE | TOTINDAD | SALES | |
---|---|---|---|---|---|---|---|---|---|
PDI | 1.00 | -0.05198 | 0.5818 | 0.1628 | 0.1066 | 0.2057 | 0.9017 | -0.05611 | 0.7301 |
DEALS | -0.05198 | 1.00 | 0.0442 | 0.008086 | -0.09995 | -0.1096 | -0.0409 | -0.1528 | 0.01248 |
PRICE | 0.5818 | 0.0442 | 1.00 | 0.4371 | -0.08938 | 0.2764 | 0.6387 | -0.2135 | 0.2949 |
R.D | 0.1628 | 0.008086 | 0.4371 | 1.00 | 0.1673 | 0.1269 | 0.3628 | -0.1295 | 0.04157 |
INVEST | 0.1066 | -0.09995 | -0.08938 | 0.1673 | 1.00 | 0.238 | 0.1956 | -0.1008 | 0.4239 |
ADVERTIS | 0.2057 | -0.1096 | 0.2764 | 0.1269 | 0.238 | 1.00 | 0.144 | -0.1986 | 0.5306 |
EXPENSE | 0.9017 | -0.0409 | 0.6387 | 0.3628 | 0.1956 | 0.144 | 1.00 | -0.04053 | 0.6675 |
TOTINDAD | -0.05611 | -0.1528 | -0.2135 | -0.1295 | -0.1008 | -0.1986 | -0.04053 | 1.00 | -0.1802 |
SALES | 0.7301 | 0.01248 | 0.2949 | 0.04157 | 0.4239 | 0.5306 | 0.6675 | -0.1802 | 1.00 |
and maybe a plot between our dependent variable and, say, the first independent variable PDI:
One can generate more of these plots (e.g. using the web applicatin for this case).
Let's try this regression model:
SALES = b1*PDI + b2*DEALS + b3*PRICE + b4*R.D + b5*INVEST + b6*ADVERTIS + b7*EXPENSE + b8*TOTINDAD
Let's see now the regression output for our choice of dependent and independent variables:
Estimate | Std. Error | t value | Pr(> |t|) | |
---|---|---|---|---|
(Intercept) | 3027.6336 | 572.3599 | 5.29 | 0.0000 |
PDI | 3.3723 | 1.4137 | 2.39 | 0.0234 |
DEALS | 4.6953 | 3.0103 | 1.56 | 0.1290 |
PRICE | -18.1112 | 7.4927 | -2.42 | 0.0217 |
R.D | -9.9033 | 6.0021 | -1.65 | 0.1090 |
INVEST | 1.6895 | 0.7157 | 2.36 | 0.0247 |
ADVERTIS | 8.2907 | 1.6445 | 5.04 | 0.0000 |
EXPENSE | 4.4434 | 2.5076 | 1.77 | 0.0862 |
TOTINDAD | -0.4427 | 0.3646 | -1.21 | 0.2339 |
Residual standard error: 234.6363 on 31 degrees of freedom
Multiple R-squared: 0.8338 , Adjusted R-squared: 0.791
F-statistic: 19.4449 on 8 and 31
p-value: 4.4365 × 10-10
Residual Analysis
Let's look now at the residuals. Here are some plots and statistics.
Let's see first the residuals.
Of course we can make various plots of the residuals to “spot” (visually for now) and heteroskedasticity or autocorrelation. For example this is the plot of the residuals against the dependent variable SALES:
or the histogram of the residuals: