How are you using the Essbase Outline Extractor?

The venerable Tim Tow and his crew have recently released a next generation version of the Essbase Outline Extractor. As many of you know, this is a tool that has been around for years and is used to ransack a Hyperion cube’s outline an generate a text file. I have talked to people for years that absolutely swear by this tool and attest to it saving their lives.

And yet, I’ve never used this tool in a professional context. I mean, I’ve used to the tool to see that it works and otherwise experiment with it, but I’ve never had an occasion where I had to use this tool to unlock some of my outline data on my Hyperion servers. 

Generally speaking when I have been in situations where I needed the data that was in an Essbase outline, there has been some upstream system that had the data that I really wanted and could be used to build the dimension I needed. So from where I sit (which is apparently an architectural ivory tower, but I digress), the use case for needing the outline extractor is that metadata is trapped in the outline itself.

Business metadata being trapped in the outline isn’t inherently bad. However, I suspect that the propensity to manage the system this way has a strong correlation to the slow and steady migration of Hyperion system management from the laissez-faire finance department to the fortress of the IT department. In other words, finance users had the Hyperion server at their knees and could manage it by shooting from the hip, wild west style – meanwhile, IT wants forms filled out in triplicate in order to dole out EAS access. 

A question for my Hyperion readers and enthusiasts

That all being said, I am curious to hear about your  real world and practical usage of the tool. I am very curious to hear about the context you are using it in, such as:

  • Is it part of the normal automation process?
  • Did you use it to import the data into some other system that would then be used to update the outline (EIS, Studio, etc.)
  • Did you just need the data for something else?
  • Was ODI available, if it was, why didn’t you use it?

I don’t need any particulars unless you care and are able to share them. I am just very curious about the context that you use or have used this tool in. And for the record, there’s nothing wrong or indicative of a bad environment if you have used this tool, it’s just that I haven’t personally been able to use it, owing to having other options available. Please leave comments or email me, thanks! 

Some performance observations changing an ASO hierarchy from Dynamic to Stored

There are numerous ASO cubes among my flock.  Usually the choice to use ASO was not arrived at lightly — it was/is for very specific technical reasons.  Typically, the main reason I have for using ASO is to get the fast data loads and the ability to load oodles of noodles… I mean data.  Yes, oodles of data.  The downsides (with version 7.x) is that I’m giving up calc scripts, incremental loading (although this has been somewhat addressed in later versions), native Dynamic Time Series, some flexiblity with my hierarchies, and I have to have just one database per application (you… uhh… were doing that already, right?).  Also, due to the sparsity of much of the data, trying to use BSO would result in a very unwieldy cube in this particular instance.

I have a set of four cubes that are all very similar, except for different Measures dimensions.  They range from 10,000 to 40,000 members.  This isn’t huge, but in conjunction with the sizes of the other dimensions, there is an incredible “maximum possible blocks” potential (sidenote for EAS: one of the most worthless pieces of information to know about your cube.  Really, why?).  The performance of these cubes is generally pretty acceptable (considering the amount of data), but occasionally user Excel queries (especially with attribute dimensions) really pound the server and take awhile to come back.  So I started looking into ways to squeeze out a little more performance.

Due to the nature of the aggregations in the cubes, they all have Dynamic hierarchies in the Accounts/Measures dimension.  This is due to using the minus (-) operator, some label only stuff, and shared members, all of which ASO is very particular with, especially in this version.  All of the other dimensions have Stored hiearchies or are set to Multiple Hierarchies (such as the MDX calcs in the Time dimension to get me some Year-To-Date members).

Actually, it turns out that all of the these cubes have Measures dimensions that make it prohibitively difficult to set Measures to Stored instead of Dynamic, except for one.  So, even though I would need to spin off a separate EIS metaoutline in order to build the dimension differently (these cubes are all generated from the same metaoutline but with different filters), it might be worth it if I can get some better performance on retrieves to this cube — particularly when the queries start to put some of the attribute dimensions or other MDX calcs into play.

What I need is some sort of method to test the performance of some typical retrieves against the two variants of the cube.  I setup one cube as normal, loaded it up with data, and materialized 1 gig worth of aggregations.  Prior to this I had also copied the cube within EAS, made the tweak to Measures to change it from Dynamic to Stored, loaded the data, did a gig of aggregations.  At this point I had two cubes with identical data but one with a Dynamic hierarchy (Measures with 10,000 or so members) and one with stored.  Time to compare.

I cooked up some report scripts, MaxL scripts, and some batch files.  The batch file loads a configuration file which specifies which database to hit and which report to run.  It then runs the report against the database, sets a timestamp before and after it runs, and dumps it all to a text file.  It’s not an exact science, but in theory it’ll give me somewhat of an idea as to whether making the hierarchy Stored is going to help my users’ retrieval operations.  And without further ado, here are the results:

Starting new process at Tue 01/20/2009 10:11:08.35 Time Duration Winner
start-report_01-DB (Dynamic) 11:10.0
finish-report_01-DB (Dynamic) 11:13.4 00:03.4
start-report_01-DB (Stored) 11:14.6
finish-report_01-DB (Stored) 11:21.4 00:06.8 Dynamic
start-report_02-DB (Dynamic) 11:22.6
finish-report_02-DB (Dynamic) 11:51.9 00:29.3
start-report_02-DB (Stored) 11:53.0
finish-report_02-DB (Stored) 12:00.0 00:07.0 Stored
start-report_03-DB (Dynamic) 12:01.3
finish-report_03-DB (Dynamic) 12:02.2 00:00.9
start-report_03-DB (Stored) 12:03.9
finish-report_03-DB (Stored) 12:42.1 00:38.2 Dynamic
start-report_04-DB (Dynamic) 12:43.6
finish-report_04-DB (Dynamic) 12:50.2 00:06.6
start-report_04-DB (Stored) 12:51.3
finish-report_04-DB (Stored) 14:26.4 01:35.1 Dynamic
start-report_05-DB (Dynamic) 14:36.3
finish-report_05-DB (Dynamic) 15:18.3 00:42.0
start-report_05-DB (Stored) 15:19.6
finish-report_05-DB (Stored) 17:32.0 02:12.4 Dynamic
Starting new process at Tue 01/20/2009 10:30:55.65
start-report_01-DB (Dynamic) 30:57.5
finish-report_01-DB (Dynamic) 30:59.9 00:02.4
start-report_01-DB (Stored) 31:01.0
finish-report_01-DB (Stored) 31:05.8 00:04.7 Dynamic
start-report_02-DB (Dynamic) 31:07.7
finish-report_02-DB (Dynamic) 31:40.8 00:33.1
start-report_02-DB (Stored) 31:42.5
finish-report_02-DB (Stored) 31:46.1 00:03.5 Stored
start-report_03-DB (Dynamic) 31:50.4
finish-report_03-DB (Dynamic) 31:51.0 00:00.6
start-report_03-DB (Stored) 31:52.4
finish-report_03-DB (Stored) 31:52.8 00:00.3 Tie
start-report_04-DB (Dynamic) 31:54.0
finish-report_04-DB (Dynamic) 32:06.3 00:12.3
start-report_04-DB (Stored) 32:12.1
finish-report_04-DB (Stored) 32:51.4 00:39.3 Dynamic
start-report_05-DB (Dynamic) 32:55.5
finish-report_05-DB (Dynamic) 33:38.1 00:42.6
start-report_05-DB (Stored) 33:39.7
finish-report_05-DB (Stored) 36:42.5 03:02.8 Dynamic

So, interestingly enough, the Dynamic dimension comes out on top, at least for most of the tests I wrote. There is one of the tests though (report_02) that seems to completely smoke the Dynamic hierarchy.  I wrote these report scripts kind of randomly, so I definitely need to do some more testing, but in the mean time I think I feel better about using a Dynamic hierarchy.  Since the ASO aggregation method for these cubes is simply to process aggregations until the database size is a certain multiple of it’s original size, one of the next steps I could look at for query optimization would be to enable query tracking, stuff the query statistics by running some reports, and then using those stats to design the aggregations.  In any case, I’m glad I am looking at some actual data rather than just blindly implementing a change and hoping for the best.

This isn’t to say that Dynamic is necessarily better than Stored or vice versa, however, I ran this very limited number of tests numerous times and got essentially the same results.  For the least part, this goes to show that there isn’t really a silver bullet for optimization and that experimentation is always a good way to go (except on your production servers, of course).  I am curious, however to go back and look at report_02 and see what it is about that particular report that is apparently so conducive to Stored hierarchies.

Optimizing Essbase Automation Jobs (for fun and profit!) – Part 1

I freely submit that I am a complete geek.  But it’s okay, because I have come to accept and embrace that inner geek.  Anyway… Many organizations run jobs to perform a task as certain intervals.  Your job as a good Essbase/Hyperion administrator is to be lazy — that is, make the computers do the boring stuff, while you get to do the fun stuff, like thinking.  What we end up doing is setting up batch and script files that do all of the things in a particular sequence that we might do by hand.  So instead of logging in, copying a data file, clearing a cube, running a calc, loading some data, running another calc, and all that other stuff, we do this automatically.

After a while, you get a feel for how long a certain job takes.  The job that loads last week’s sales might take about an hour to run.  The job to completely restate the database for the entire fiscal year might take six hours to run.  This probably also becomes painfully obvious when it turns out that the job didn’t run correctly, and now you need to run it again (and keep those users happy).  I am all about jobs running as fast as they can.  This is useful on a few different levels.  First and probably foremost, it finishes faster, which is nice, especially for those times when the sooner you have data the better.  It’s also nice when you need to re-run things.  It can also open up some new possibilities to use the tool in ways you didn’t think or know you could — such as during the business week, getting live updates of data.

Typically there are a few big time-killers in an automated process.  These are the data loads, calculations, and reports that are run.  There are also some others, such as calling out to a SQL server to do some stuff, restructuring outlines, pulling tons of data off the WAN, and all that.  You can always go through your database logs to see how long a particular calculation took, but this is a bit tedious.  Besides, if you’re like me, then you want to look under EVERY single nook and cranny for places to save a little time.  To borrow an a saying from the race car world, instead of finding one place to shave off 100 pounds, we can find 100 places to shave off one pound.  In practice, of course, some of these places you can’t cut anything from, but some of them you can cut down quite a bit.

The first, and probably most important step to take is to simply understand where you are spending your processing time.  This will give you a better window into where you spend time, rather than just a few things in the normal Essbase logs.  There are numerous ways you can do this.  But, if you know me at all by now, you know that I like to do things on the cheap, and make them dead-simple.  That being said, the following posts will be based around Windows batch file automation, but the concepts are easily portable to any other platform.

First of all, take a look at the below graph:

Total Seconds to Run Essbase Automation Job

Total Seconds to Run Essbase Automation Job

What you’re looking at is a graph that has been created in Excel, based on data in a text file that was generated as a result of a profiling process.  The steps are as granular as we want them to be.  In other words, if we simply want to clump together a bunch of jobs that all run really fast, then we can do that.  Similarly, if we want to break down a single job that seems to take a relatively long amount of time, we can do that too.  Each segment in the starcked bar chart is based on the number of seconds that the particular step took.  We can see that in the original (pre-optimization) scenario, the orangey-brown step takes up a pretty considerable amount of the overall processing time.  After some tweaking, we were able to dramatically slash the amount of time that it took to run the whole job.  And, in the third bar, after yet more tweaking, we were able to slash the overall processing time quite a bit again.

The important thing here is that we are now starting to build an understanding of exactly where we are spending our time — and we can therefore prioritize where we want to attempt to reduce that time.  My next post will detail a sample mechanism for getting this kind of raw data into a file, and the next few posts after those will start to dig in to where and how time was shaved off due to better outline, system, and other settings and changes.  Happy profiling!