Analysis
The problem with parametric studies
15/10/09 07:13
A few days ago I was in a meeting reviewing another
teams fairly complex piece of analysis. It was a
fairly obvious question, but someone asked ‘how do
you know if your result is correct’. It’s probably
the most fundamental question you can ever ask when
analysing a problem, and in this case the answer
was a common one ‘we undertook a lot of parametric
studies’.
There is a really major issue with this answer. Just because you have done parametric studies with your model, it still doesn’t actually mean that any of the answers the model is giving are correct. Assuming that parametric studies covers the variability and inaccuracy in your model is not enough.
Lets consider a some working on marketing Cuke Cola trying to work out how much they should spend on advertising. To do this they build themselves a nice little spreadsheet model. The basic assumption in the model is that for every dollar spent on advertising, each person who sees the advert spends an extra 0.01 cents on Cuke Cola on average. So in this case provided 10,000 people see the advert, we will get all the money spent on advertising back. That all makes sense so far, so basically we make a return on the money spent on advertising provided we get at least 10,000 people to see the advert.
We can even do parametric studies on this model, looking at how the break even point varies as the income from each person who sees the advert varies. The problem is that the model is massively flawed. If we assume that 20,000 people will see the advert and we spend 100 dollars on advertising we will get 100 x 0.01 cents x 20,000 = $200. The net profit will therefore be $100. The problem is if we double the amount we spend on advertising, the net profit will also double, and so on. So this model predicts that we should spend every penny we have on advertising because we will double our money. This is clearly flawed for many reasons, but fundamentally, it doesn’t matter how good our advertising is, there is only so much Cuke Cola that a person can drink in a day. The model needs to have some sort of law of diminishing returns built in, and it doesn’t matter how many parametric studies we do, this would not be reflected in the results.
It is not just the inputs to our models, but the models themselves that can be flawed, no matter how much time we spend creating them and how sophisticated they are. Simple use of parametric studies never guarantees that you are getting good results from your model.
There is a really major issue with this answer. Just because you have done parametric studies with your model, it still doesn’t actually mean that any of the answers the model is giving are correct. Assuming that parametric studies covers the variability and inaccuracy in your model is not enough.
Lets consider a some working on marketing Cuke Cola trying to work out how much they should spend on advertising. To do this they build themselves a nice little spreadsheet model. The basic assumption in the model is that for every dollar spent on advertising, each person who sees the advert spends an extra 0.01 cents on Cuke Cola on average. So in this case provided 10,000 people see the advert, we will get all the money spent on advertising back. That all makes sense so far, so basically we make a return on the money spent on advertising provided we get at least 10,000 people to see the advert.
We can even do parametric studies on this model, looking at how the break even point varies as the income from each person who sees the advert varies. The problem is that the model is massively flawed. If we assume that 20,000 people will see the advert and we spend 100 dollars on advertising we will get 100 x 0.01 cents x 20,000 = $200. The net profit will therefore be $100. The problem is if we double the amount we spend on advertising, the net profit will also double, and so on. So this model predicts that we should spend every penny we have on advertising because we will double our money. This is clearly flawed for many reasons, but fundamentally, it doesn’t matter how good our advertising is, there is only so much Cuke Cola that a person can drink in a day. The model needs to have some sort of law of diminishing returns built in, and it doesn’t matter how many parametric studies we do, this would not be reflected in the results.
It is not just the inputs to our models, but the models themselves that can be flawed, no matter how much time we spend creating them and how sophisticated they are. Simple use of parametric studies never guarantees that you are getting good results from your model.
Monte Carlo analysis in Excel
30/11/08 19:48
Monte-Carlo analysis is one of those tools that I
have always found useful to keep in my pocket as an
engineer. I like to find an elegant solution to
problems limiting the number of variables and
amount of analysis that has to be done. Sometimes a
brute force approach can not however be avoided and
Monte Carlo analysis is a very effective way of
solving a complex risk or probability problem.
Because I don't use it much specialist tools such as @risk are not really an option for me. The basic version of Excel is however more than capable of performing Monte-Carlo analysis with relative simplicity.
The first stage is to build your model with the usual input and output parameters using the same format as you would normally do. Once the is done create a new worksheet for the input parameters that you want to use in he Monte-Carlo analysis. To generate the random numbers to put into the Monte-Carlo analysis you will need to use a Rand() function for each input parameter you want to consider. You can then use Excel's built in distribution functions to generate your output value or you can create your own - I will consider some of the different distribution functions in later posts.
If you have done this correctly, every time you amend your spreadsheet now the values should change on this worksheet. Now link the input values on your model worksheet to the values on the inputs worksheet. The entire model should now change each time values are adjusted on the spreadsheet. At this point it is worth taking time to check how your model is performing. Just pressing the delete key on an empty cell should run one Monte-Carlo step. It is worth doing this a few times because Monte-Carlo analyses often test the limit of a spreadsheet with combinations of very high and very low values.
It is possible to manually run a Monte-Carlo analysis manually with the spreadsheet in this form. The power of this solution is however when it is automated, and to do that we can make use of a Macro.
First create a new worksheet to hold the outputs from the model. Now create a new Macro. The typical code that should be used for the Macro is something like:
Sub MonteCarlo()
With Application
.Calculation = xlManual
End With
For Counter = 1 To 500
Sheets("Model").Select
Calculate
Output1 = Range("L3").Value
Output2 = Range("H17").Value
Set curCell = Worksheets("Results").Cells(Counter, 2)
curCell.Value = Counter
Set curCell = Worksheets("Results").Cells(Counter, 3)
curCell.Value = Output1
Set curCell = Worksheets("Results").Cells(Counter, 4)
curCell.Value = Output2
Next Counter
With Application
.Calculation = xlAutomatic
End With
End Sub
This Macro runs the Monte-Carlo analysis 500 times. The results from the analysis are taken from cells L3 and H17 from the worksheet Model. They are then put into columns 3 and 4 of the worksheet Results.
Post processing of the output data should be done in a separate workbook. If this is not done then re-running the Monte-Carlo analysis will take much longer as the post processing will be done for each step of the Monte-Carlo analysis.
Because I don't use it much specialist tools such as @risk are not really an option for me. The basic version of Excel is however more than capable of performing Monte-Carlo analysis with relative simplicity.
The first stage is to build your model with the usual input and output parameters using the same format as you would normally do. Once the is done create a new worksheet for the input parameters that you want to use in he Monte-Carlo analysis. To generate the random numbers to put into the Monte-Carlo analysis you will need to use a Rand() function for each input parameter you want to consider. You can then use Excel's built in distribution functions to generate your output value or you can create your own - I will consider some of the different distribution functions in later posts.
If you have done this correctly, every time you amend your spreadsheet now the values should change on this worksheet. Now link the input values on your model worksheet to the values on the inputs worksheet. The entire model should now change each time values are adjusted on the spreadsheet. At this point it is worth taking time to check how your model is performing. Just pressing the delete key on an empty cell should run one Monte-Carlo step. It is worth doing this a few times because Monte-Carlo analyses often test the limit of a spreadsheet with combinations of very high and very low values.
It is possible to manually run a Monte-Carlo analysis manually with the spreadsheet in this form. The power of this solution is however when it is automated, and to do that we can make use of a Macro.
First create a new worksheet to hold the outputs from the model. Now create a new Macro. The typical code that should be used for the Macro is something like:
Sub MonteCarlo()
With Application
.Calculation = xlManual
End With
For Counter = 1 To 500
Sheets("Model").Select
Calculate
Output1 = Range("L3").Value
Output2 = Range("H17").Value
Set curCell = Worksheets("Results").Cells(Counter, 2)
curCell.Value = Counter
Set curCell = Worksheets("Results").Cells(Counter, 3)
curCell.Value = Output1
Set curCell = Worksheets("Results").Cells(Counter, 4)
curCell.Value = Output2
Next Counter
With Application
.Calculation = xlAutomatic
End With
End Sub
This Macro runs the Monte-Carlo analysis 500 times. The results from the analysis are taken from cells L3 and H17 from the worksheet Model. They are then put into columns 3 and 4 of the worksheet Results.
Post processing of the output data should be done in a separate workbook. If this is not done then re-running the Monte-Carlo analysis will take much longer as the post processing will be done for each step of the Monte-Carlo analysis.
Seeding random numbers
22/11/08 13:52
Statistical modelling is part of the standard
engineers tool set. The desktop computer means
running scenarios with multiple inputs an easy task
for all engineers. Most of these tools use random
numbers to generate varying outputs. The mechanics
of generating random numbers is a mathematical art
to itself, but there are some key features when
using statistical methods that need to be
understood for effective modelling.
One of these key features is the seed. Random numbers are generated using algorithms that take an input value and generate a sequence of numbers that are apparently random. The input value is called the seed. Most programmes use the time when the first random number is generated as the seed; this then ensures a different sequence is generated each time a programme is run.
Overriding the use of the time as the seed does however have a number of benefits:
● It allows scenarios to be re-run, particularly important where scenarios are used to generate the design in a safety critical situation.
● It allows scenarios to be partially re-run. By using the same input parameter the scenario can be started in the same way, but using the seed, the random number sequence can be effectively re-set. This then allows a bifurcation in the model with a different outcome part way through the model.
● It gives repeatability to models, a key component of any analysis that is quality audited. If you are having your analysis checked and it doesn't have built in auditing features such as date stamped reports, your checker must be capable of recreating identical results to yours. The only way of doing this in a statistical model such as UDEC is to set the deed of the model prior to the runs so that the checker can recreate exactly the same model.
So with all that in mind, here are a few ways of setting random number seeds in your models:
Excel
With Excel the easiest way to generate random numbers with a seed is to use the Data Analysis tool pack. I usually find that when I'm working with Excel and random numbers I'm having to use visual basic. For me setting the seed programmatically in Excel using Randomize statement.
Apple Numbers 1.0
Bad news here, I don't know how to set the seed in numbers. It doesn't really surprise me given the lack of scripting associated with Numbers. I remain hopeful of many improvements in the next version of Numbers and this is definitely one of them.
Cocoa
In Cocoa just use the srandom() function. There are plenty of examples out there with this in.
Applescript
In Applescript just set the seed when you first generate a random number using set firstRandom to random number from 1 to 100 with seed seedValue. I've been using this with no problems on an Application that I am currently writing in Applescript Studio. One thing to be aware of though is the some item of function. This doesn't seem to respond to seeding so well so other methods of taking a random value from a list using a directly generated random number seem to work better.
One of these key features is the seed. Random numbers are generated using algorithms that take an input value and generate a sequence of numbers that are apparently random. The input value is called the seed. Most programmes use the time when the first random number is generated as the seed; this then ensures a different sequence is generated each time a programme is run.
Overriding the use of the time as the seed does however have a number of benefits:
● It allows scenarios to be re-run, particularly important where scenarios are used to generate the design in a safety critical situation.
● It allows scenarios to be partially re-run. By using the same input parameter the scenario can be started in the same way, but using the seed, the random number sequence can be effectively re-set. This then allows a bifurcation in the model with a different outcome part way through the model.
● It gives repeatability to models, a key component of any analysis that is quality audited. If you are having your analysis checked and it doesn't have built in auditing features such as date stamped reports, your checker must be capable of recreating identical results to yours. The only way of doing this in a statistical model such as UDEC is to set the deed of the model prior to the runs so that the checker can recreate exactly the same model.
So with all that in mind, here are a few ways of setting random number seeds in your models:
Excel
With Excel the easiest way to generate random numbers with a seed is to use the Data Analysis tool pack. I usually find that when I'm working with Excel and random numbers I'm having to use visual basic. For me setting the seed programmatically in Excel using Randomize statement.
Apple Numbers 1.0
Bad news here, I don't know how to set the seed in numbers. It doesn't really surprise me given the lack of scripting associated with Numbers. I remain hopeful of many improvements in the next version of Numbers and this is definitely one of them.
Cocoa
In Cocoa just use the srandom() function. There are plenty of examples out there with this in.
Applescript
In Applescript just set the seed when you first generate a random number using set firstRandom to random number from 1 to 100 with seed seedValue. I've been using this with no problems on an Application that I am currently writing in Applescript Studio. One thing to be aware of though is the some item of function. This doesn't seem to respond to seeding so well so other methods of taking a random value from a list using a directly generated random number seem to work better.