Thursday, September 20, 2007

Using Linear Regression Functions to Classify Customers
By Mauricio Cotes
This article explains a way to classify the customers in three categories based on their sales. This technique uses MDX Linear Regression Functions. The categories are defined based on the following rules:
  • Growing Customers: Customers which have been increasing their sales in the last six months. The sales increment should be greater than 5%.
  • Shrinking Customers: Customers which have been decreasing their sales in the last six months. The sales decrement should be less than 5%.
  • Regular Customers: The remaining customers, it means customers that don’t grow or shrink.

In this article, the samples are using the Adventure Works DW Analysis Services Database. For more information about this sample database see SQL Server Books on Line.

First we explain the basic linear regression concepts used to determine whether a customer is growing or not, and to determine if the growing rate exceeds the 5% threshold for the growing customer or the shrinking rate is less than the 5%. Then there are some basic concepts regarding the Time dimension and finally we apply the concepts to the solution.

Linear Regression Concepts

Given the linear regression equation: y = ax + b

For more information about the use of linear regression functions see Mosha Pasumansky’s article: Using Linear Regression MDX functions for forecasting at http://www.sqljunkies.com/WebLog/mosha/archive/2004/12/21/5689.aspx

Our purpose is to use the sales values for each customer over the lasts six months, obtain the Slope from a linear regression and determine if a given customer is growing or shrinking. Thus, if the slope is greater than zero the customer is growing, and if it is les than zero is shrinking. However, based on the business rules, we need to introduce a 5% tolerance for the growing and shrinking rates.

Considering sales in the current period of time (CurrentSales), and comparing it to sales in the previous period (PreviousSales), we can calculate the variation rate like this:

VariationRate = (CurrentSales / PreviousSales) - 1

Now as shown in the following chart
The linear regression equation gives us a measurement of sales for each period of time. Thus X0 is the first period, X1 the second, and so on. Then we can calculate the VariationRate in terms of the y = ax + b equation like:Because we are using a linear regression, we can suppose that the VariationRate remains the same if we use any pair of contiguous values in the X axis. So we can use X0 and X1 values and assume that X0 = 0 and X1 = 1. Replacing these values in the VariationRate equation we have:
Then
Going back to our table of functions, we can use LineRegSlope and LineRegIntercept to calculate the VariationRate and determine which customers grow more than 5% or shrink less than 5%.
The Time Dimension
The Date.Calendar hierarchy is used to navigate across the Date dimension levels. This hierarchy is important because to determine if a customer is growing we have to evaluate its sales over the last six months from the current period of time. Thus, if we take into account year 2003, we should go to the last period at Month level, which is December 2003, and use a linear regression to evaluate the last six months. The same applies to levels between Year and Month.

The MDX expression to obtain the last period at Month level is:
ClosingPeriod([Date].[Calendar].[Month],
[Date].[Calendar].CurrentMember)
This expression is the starting point, now we need to obtain a set of the last six periods (months in this case):
MEMBER Measures.NumberOfPeriods AS 6

SET [LastPeriodsSet] AS
LastPeriods(Measures.NumberOfPeriods,
ClosingPeriod([Date].[Calendar].[Month],
[Date].[Calendar].CurrentMember)
)
Note that the ClosingPeriod expression is used as an argument for the LastPeriods function, and also note that we are using a measure called NumerOfPeriods to generalize the number of periods to analyze. This will allow us to change it anytime.

The Solution

Now we can use the linear regression functions together with the sales over the date.calendar hierarchy and solve the problem.
Here is the use of the slope function.
MEMBER Measures.Slope AS
LinRegSlope([LastPeriodsSet],
[Measures].[Sales Amount],
Rank([Date].[Calendar].CurrentMember,
[Date].[Calendar].CurrentMember.Level.Members)
)
Notice the use of the LineRegSlope function which uses the LastPeriodsSet calculated before. This set is calculated with the Sales Amount Measure. In a similar manner we use the LineRegIntercept as follows:
MEMBER Measures.Intercept AS
LinRegIntercept([LastPeriodsSet],
[Measures].[Sales Amount],
Rank([Date].[Calendar].CurrentMember,
[Date].[Calendar].CurrentMember.Level.Members)
)
Once we have the main elements we can calculate the VariationRate as follows:

MEMBER Measures.VariationRate AS
Measures.Slope / Measures.Intercept

Finally we are ready to create three sets of customers: Shrinking customers, Growing and Regular. This is done using the slope sign, which is negative for shrinking, positive for growing, and if the variation is not as big as the threshold the customers are considered regular. The following code creates these sets:

MEMBER Measures.Threshold as 0.05

SET [ShrinkingCustomers]
AS FILTER([Customer].[Customer].[All Customers].CHILDREN,
Measures.Slope < 0
AND Measures.VariationRate < - Measures.Threshold)

SET [GrowingCustomers]
AS FILTER([Customer].[Customer].[All Customers].CHILDREN,
Measures.Slope > 0
AND Measures.VariationRate > Measures.Threshold)

SET [RegularCustomers]
AS EXCEPT([Customer].[Customer].[All Customers].CHILDREN,
[ShrinkingCustomers] + [GrowingCustomers])
To test the solution for each set, use the following code
SELECT
[Measures].[Sales Amount] ON COLUMNS,
{[ShrinkingCustomers]} ON ROWS
FROM [Adventure Works]
The Rows axis specifies the Set used in each case. Instead of using ShrinkingCustomers, the other sets can also be used.
--------------------------
WITH

MEMBER Measures.NumberOfPeriods AS 6

SET [LastPeriodsSet] AS
LastPeriods(Measures.NumberOfPeriods,
ClosingPeriod([Date].[Calendar].[Month],
[Date].[Calendar].CurrentMember)
)

MEMBER Measures.Slope AS
LinRegSlope([LastPeriodsSet],
[Measures].[Sales Amount],
Rank([Date].[Calendar].CurrentMember, [Date].[Calendar].CurrentMember.Level.Members)
)

MEMBER Measures.Intercept AS
LinRegIntercept([LastPeriodsSet],
[Measures].[Sales Amount],
Rank([Date].[Calendar].CurrentMember, [Date].[Calendar].CurrentMember.Level.Members)
)

MEMBER Measures.Threshold as 0.0005

MEMBER Measures.VariationRate AS
Measures.Slope / Measures.Intercept

SET [ShrinkingCustomers]
AS FILTER([Customer].[Customer].[All Customers].CHILDREN,
Measures.Slope < 0
AND Measures.VariationRate < - Measures.Threshold)

SET [GrowingCustomers]
AS FILTER([Customer].[Customer].[All Customers].CHILDREN,
Measures.Slope > 0
AND Measures.VariationRate > Measures.Threshold)

SET [RegularCustomers]
AS EXCEPT([Customer].[Customer].[All Customers].CHILDREN,
[ShrinkingCustomers] + [GrowingCustomers])

SELECT
{[Measures].[Sales Amount],
Measures.Slope,
Measures.VariationRate} ON COLUMNS,
{[GrowingCustomers]} ON ROWS
FROM [Adventure Works]