Speed up ASO SQL data loads by using multiple rules files

Just another quick post today about possibly speeding up data loads to an ASO database when loading from SQL. I got on a quick call with a former colleague that was looking to gain a little more performance on their load process to a massive ASO database, and the first thing that jumped out at me was that I recall you can do parallel loads with some native MaxL syntax.

Here’s a quick example of the syntax:


import database $APPLICATION.$DATABASE data
connect as $SQL_USER identified by $SQL_PW
using multiple rules_file $RULE1, $RULE2, $RULE3, $RULE4, $RULE5
to load_buffer_block starting with buffer_id 100 on error write to "errors.txt";

Basically, you provide multiple rules files (configured for your SQL datasource of course). The rules files are likely to be the same as each other but I suppose it’s possible you might want to partition the data in some logical way to try and speed things up even more.

For example, let’s say that in the code above, we are loading five years of data from a relational database. We might then make it so that each rule is set for this particular year by doing the following things:

  • Set the year in the data header
  • Remove that column from the list of SELECT columns
  • Put a filter/predicate in the WHERE clause on the query
  • Bonus points for using substitution variables in both the header definition and the where clause

Performance in this particular use case went up substantially. It’s my understanding that data loads that were taking an hour are now cut down to 17 minutes. Your mileage may vary, of course.

Let’s Not Forget About Hybrid BSO

That said, I think this can be an effective strategy for trying to squeeze performance out of some ASO cubes that need a smaller load window and you don’t want to go changing a lot of the internals in play. If you’re doing new development, then I strongly, strongly recommend using hybrid BSO (or rather, BSO and making sure the cube is configured properly so as to get the hybrid BSO performance benefits). I have been seeing hybrid BSO cubes absolutely killing it in performance, what with their ability to leverage ASO technology for aggregates, and massive calculation improvements owing to the smaller block sizes and indexes you get from having so many dynamic calc members in dimensions. Plus, you of course get all of the classic/rich/awesome BSO functionality out of the box, like dynamic time series, expense tagging, time balance, and more. These were never very strong areas for ASO and often required a lot of non-optimal workarounds to make users happy.

Performance nuances with MaxL data imports with local and server

Some time ago, I reviewed and revamped the MaxL automation for a client. One of the major performance gains I got was actually pretty simple to implement but resulted in a huge performance improvement.

Did you know that the MaxL import data command can can be told whether the file to load is a local data file or a server data file? Check out the MaxL reference here for a quick refresher. See that bold “local” after from? That’s the default, meaning if we omit the keyword altogether, then the MaxL interpreter just assumes it’s a local file.

Imagine that you have an Essbase server, and then a separate machine with the MaxL interpreter. This could be your local workstation or a dedicated automation server. Let’s say that there is a text file on your workstation at C:/Essbase/data.txt. You would craft a MaxL import command to import the local data file named C:/Essbase/data.txt. That’s because the file is local to the MaxL interpreter.

Now imagine that the file we want to load is actually on the server itself and we have a drive mapped (such as the Y: drive) from our workstation to the server. We can still import the data file as a local file, but this time it’s Y:/data.txt (Assume that the drive is mapped directly to the folder containing the file).

In this scenario, MaxL reads the file over the network from the server to the client, then uploads that data back to the server. This data flow is represented in the figure in the left of this diagram:

MaxL data loads: server vs. local

You might be thinking, “But wait, the file is on the server, shouldn’t it be faster?” Well, no. But there’s hope. Now consider server file loading. In this case we use the server keyword on the import statement and we specify the name of the file to load. Note that the file location is based on the database being loaded to. If you’re loading to Sample Basic, then Essbase will look in the ../app/Sample/Basic folder for the file. If you don’t want to put files in the database folder, you can actually cheat a little bit and specify a path such as ..\..\data.txt and load the file from a relative path. In this case by specifying the ..\..\, Essbase will go up two folders (to the \app folder) and look for the file there. You can fudge the paths a little, but the key is this: Essbase will load the file from itself, without the MaxL client incurring the performance penalty of two full trips of the data. This is depicted in the right figure in the diagram: the MaxL client issues a data load command to the server, which then loads the file directly, and we don’t incur the time needed to load the file.

In my case the automation the written to load a file that was already on the server (in the \app folder), so I just changed the import to be a server style import, and immediately cut the data import time dramatically.

I wouldn’t be surprised if this “anti-pattern” is being used in other places – so take a look at your automation. Let me know if you find this in your environment and are able to get a performance boost!

 

Beta testers wanted for Hyperion Unused Member Analyzer tool

I have been working on a tool called HUMA – Hyperion Unused Member Analyzer. The idea for it came out of some side discussions at Kscope a couple of months ago. The idea is simple: Wouldn’t it be nice if there was an easy way to determine if any members are unused in a given cube?

Given a server, database, and credentials, HUMA will connect to a a cube, analyze its stored members, generate a list of all possible values, then iterate over it, analyzing the resulting data grids for the presence of data. If there are members with no data in them, they are shown to the user running the program. To increase performance, HUMA orders the grids and sequences of members within the sub grids so that they are aligned to the dense/sparse structure of the cube, so that it can pound on the same hot blocks before moving on to grids with different sparse permutations.

On a pretty gutless VM of mine with Essbase running in 1GB of RAM, a standard Sample/Basic cube can be ransacked for data in about three seconds. Also, given the way the tool works, it’s not necessary to do a full export of a cube or anything since the analysis is based on the data that is queried and immediately discarded. So far it seems to work pretty well.

The goal of the tool is to be a tool in the toolbox for Hyperion/Essbase admins that want to analyze their environment and act on possible improvements. This goes hand in hand with my research and efforts to find out what we all do when we dive into a new system as part of a health check hit list.  Doing so on a BSO database can yield improvements (particularly on dense members).

In any case, version 1.0 of the tool is basically ready to go and I’d love to have a few people test it out and let me know of any issues!

Hyperion Health Check Hit List

I am asking for your Hyperion wisdom again, oh beloved readers! In particular, I am soliciting information from you consultanty types and those of you who otherwise hop into a lot of different Hyperion systems.

Oftentimes a client needs help with speeding up an Essbase/Hyperion process/server/cube that has become unwieldy and slow. So you take a look at things. When you happen to hop in to an environment and assess its health, what do you look for, from a Hyperion point of view? For example, on BSO cubes I go right for the stats and check out the block density and average cluster ratio. From there I can go in any number of directions, looking at the overall outline, automation, cache settings, and so forth. So I have this already:

  1. Check block density and other cube stats
  2. Review outline for any red flags
  3. Check size of index cache with respect to the size of the index itself
  4. Take a look at outline for things that can be removed/deleted/dynamic calc, etc
  5. Ensure logs are not huge
  6. Look for XCP files, if any
  7. And a few others

I am really, really, curious if you have something you look for, particularly if it’s something you might dive into with EAS. I’m working on something interesting and your feedback is very appreciated! It can be anything at all: checking the server, checking the app or the cube, checking the file system, calc scripts, business rules, automation, and so on. Thanks!

Hyperion Essbase wish list: Import a compressed file

I thought this up while attending Dan Pressman’s Kscope presentation How ASO Works and How to Design for Performance, a presentation that definitely appealed to my inner Hyperion geek. Dan did a crazy deep dive on performance tuning with particular respect to loading ASO. He had some pretty bangin hardware to play with too.

Long story short, and many of us have known this for awhile, but there are ways to format your Essbase load files so that they load faster. Basically what you are trying to do is make things easier on Essbase: stream in less data, don’t repeat things you don’t need to repeat, don’t thrash blocks in and out of memory, and so on. That’s all well and good.

The advent and proliferation of SSDs in the enterprise has done wonderful things for Hyperion performance by  eliminating a lot of the performance quirks with rotational media and penalties from fragmentation. But at the end of the day we are still looking for ways to pump ever-increasing amounts of information into our cubes even faster than we were the day before.

For instances where we are loading a file that resides on the same machine as the Hyperion apps/cubes or even across the network, I wonder what, if any, performance benefits are to be had if we had the ability to import a zip file?

Zip files can get awesome compression on text files. They can also have their uncompressed contents streamed. In other words, it’s not necessary to extract the contents of a zip file before you can read the contents (starting at the beginning). In theory, if one achieved moderate to decent compression on their zip file and handed that to Essbase (say with a specialized import data MaxL command), it would be saving time on the disk-read aspect of the data load, at the expense of some additional CPU usage. Many Essbase load operations are disk I/O bound anyway so this seems like a reasonable tradeoff to make.

As an additional benefit or elaboration on the concept, perhaps multiple text files could be placed into the same zip file, perhaps with a “load manifest” or options on the load command, and Essbase would attempt to parallelize the data load to the extent it can. This would likely be an add-on feature once the basic support is in place. In all you would need to augment the data load process with a zip file reader routine (this would be an off-the-shelf library that is quite common), a couple new MaxL import data variants, and an augmentation to the Java API. I suppose you could leave the MaxL command alone and just program the interpreter to look for a .zip extension and treat it accordingly, but it seems like it’d be the better choice to specifically indicate the data load is from a compressed file.

Of course, if you’re loading just from SQL this whole thing wouldn’t apply to you. Loading data files may seem low-tech but it’s incredibly common and often times I prefer it as I have an exact text file to tie back to, if need be, versus a possibly changing SQL data store (but that’s a conversation for a different blog post). This feature would cater to the performance nuts out there – and if Kscope is any indication, there are plenty. I’d be curious to hear anyone’s thoughts on this.

SSD Upgrade (MOAR SPACE!) – for laptops and Essbase servers

Samsung 840 Pro 512GB SSD Upgrade for my MacBook Pro

Samsung 840 Pro 512GB SSD Upgrade for my MacBook Pro

This isn’t ostensibly Hyperion related, but I’m going to find a way to tie it back to Hyperion and Essbase. My main machine these days is a MacBook Pro. I love this thing. It’s not even the newest retina model (soon…) but it’s still a beast. Originally it came with a 128GB SSD. I quickly outgrew it – mostly owing to the VMs I store locally. I bumped up to a 256GB Samsung 830 SSD and enjoyed breathing room for awhile longer. Recently, though, I have even outgrown this, owing to various VMs and other files I need on a day to day basis.

I could run my VMs and store files on an external drive, of course. It wouldn’t be too hard to just plug a USB drive. Now, I’m not lazy, but it is just one more thing to deal with (I’m sure performance would be reasonable, but SATA beats USB). This machine has Thunderbolt, but again that’s just one more external thing to plug in and the Thunderbolt enclosures for drives are pretty expensive (i.e., they make more sense for an array of drives rather than just one drive). I could even replace the optical bay (a DVD-RW drive) with a hard drive. This is very tempting – to pop in a 1 TB laptop drive in place of the optical drive. I’m not ready to go there, just yet. After some deliberation, I decided to bump up this baby to a 512 GB Samsung 840 Pro SSD. It’s a little pricey but I’m not ready to commit to a new laptop just yet, gets me the space I need right now, doesn’t entail me having to worry about an external drive, and it is fast. I cloned my existing drive over so this isn’t even new install of the OS or anything. This thing just screams.

How does this relate to Essbase? Well, let me tell you.

Disk performance affects almost all major aspects of Essbase solutions: retrieval times, calc times, data load times, and more. We [database geeks] spend countless hours optimizing solutions or designing them around performance issues. Almost every client I go to with an existing solution has a performance issue somewhere.

Now, there is definitely an art and a science to getting those dense/sparse configurations just right, optimizing load rules, calcs, and so forth. I have spent countless hours investigating, researching, and testing these settings – understanding them, talking about them, presenting on them, and most importantly, comprehensively optimizing solutions to run faster.

That all being said (and this is hardly a new or insightful thought), SSD works extremely well with Essbase. SSD speeds up Essbase for the same reasons that an SSD speeds up using a laptop or computer.

That being said, faster hard drives and faster hardware should never be used to try and paper over a fundamental design or architecture problem. SSDs are very affordable now and will continue to get more affordable. So, to sort of complement and add a tiny bit of my own insight, such as it is, to the notion of bringing in SSDs for your organization, think about it from a simple business math perspective: A is the cost of new hard drives,  B is the benefit of the increased performance, C is the cost of someone (you or a consultant) performance tuning your system, and lastly D is the benefit of that tuning.

Now, quantifying B and D is subjective. The value in your system running processes faster can be based on the aggregate improved query response times for users plus some sort of benefit from being able to load up numbers faster (or perhaps more importantly, reload numbers when things don’t tie out). Let’s be very, very optimistic and say that the benefit of D can be equal to the benefit of B. In other words, I’m going to say that you can tune an Essbase solution so well on rotational media (traditional hard drives) that it rivals SSD performance. This is, I think, being quite generous, but let’s go with it. The cost to achieve this performance benefit from tuning  (C), particularly when done by a reputable consultant, can very easily exceed the cost of the SSDs, A. Obviously there are many other factors to take into account and this is a gross simplification. But the beauty of going to SSDs is that you leave the option to do deeper performance tuning on the table.

So anyway, think about it. A lot of us just have to deal with the environment we have and be thankful we even have what we do, but in my opinion, this paradigm shift in storage is an absolute no brainer from a time and money standpoint.

Release of cubedata 1.0.0

Continuing in the same spirit as the release of Jessub, I am happy to announce the release of another open source tool meant to benefit Hyperion Essbase administrators: cubedata. cubedata is a simple tool that makes it easy to generate a text file that can be loaded to a cube. Well, of course, there’s nothing too special about this. The real purpose of the tool is to be able to generate huge text files based on the permutations of data that you specify. For example, let’s look at a simple data definition:

dimensions=Time,Scenario,Location,Departments
members.Time=P01,P02
members.Scenario=Actual,Budget
members.Location=Lo.806,Lo.808,Lo.822
members.Departments=Dt.01,Dt.02,Dt.03,Dt.04

So we just have a really simple definition in a configuration file. We run cubedata and tell it to use this file to generate some data for us. Out comes 48 rows of data: 2 time periods x 2 scenarios x 3 locations x 4 departments = 48 combinations. The generated data file looks like this:

P01,Actual,Lo.806,Dt.01,911.85
P01,Actual,Lo.806,Dt.02,887.100
P01,Actual,Lo.806,Dt.03,251.49
P01,Actual,Lo.806,Dt.04,115.64
P01,Actual,Lo.808,Dt.01,197.60
P01,Actual,Lo.808,Dt.02,704.71
P01,Actual,Lo.808,Dt.03,512.76
.. more rows ..

The configuration file lets you specify a few other options such as the column delimiter (default is comma), the numerical range of fact values to generate, and a few other things such as the “load factor” (what percentage of data combinations will have data).

cubedata, like Jessub, is licensed under the Apache Software License 2.0, a very permissive license that basically says you can do whatever you want to the code. The project is shared at GitHub in one of my public repositories.

I haven’t done extensive testing on the program but it does do a reasonable job of telling you if the configuration in incomplete or otherwise incorrect. I have tested it with quite a few dimensions and members and was able to generate a file with many millions of records quite easily. I don’t see any reason why it wouldn’t support generating absolutely massive amounts of data. It’s programmed in such a way as to iterate over the dataset, rather than try to keep it all in memory at once, meaning that there shouldn’t be any memory issues with regard to generating massive data sets.

So, there you have it. Another simple tool that might make developing and testing a little easier for you, particularly if you hate generating dummy data by hand and/or you don’t have a system to source data from that is ready or convenient.

As always, please feel free to let me know any suggestions or comments you may have and I will be happy to look in to improving the program. If you end up downloading the code and making tweaks please share them back if they would be useful to more people.

A comprehensive Essbase automation optimization story, or: Shaving off every last second

I’ve been wanting to write this post for awhile.  Like, for months.  Some time ago I took an in-depth look at profiling the performance (duration-wise) of an automated system I have.  The system works great, it’s worked great, I don’t need to touch it, it just does its thing.  This is the funny thing about systems that live in organizations: some of the rather inglorious ones are doing a lot of work, and do it reliably.  Why reinvent the wheel and risk screwing things up?

Well, one reason is that you need to squeeze some more performance out of that system.  The results of my profiling efforts showed that the process was taking about an hour and a half to run.  So, considering that the processing of the data is an hour behind the source system (the source system drops off text files periodically for Essbase), and it takes an hour and a half to run, the data in the cubes is some two and a half hours behind the real world numbers in the original system.

So, first, a little background on what kind of processing we have here.  The gist of how this particular system works is that a mainframe manages all of the data, and certain events trigger updated files to get delivered to the Essbase server.  These files are delivered at predictable intervals, and automation jobs are scheduled accordingly (about an hour after the text files are scheduled to be delivered, just to be on the safe side).  The data that comes in is typical financial data — a location, a time period, a year, an account, and an amount.

Pretty simple, right?  Well, there are about twenty cubes that are financial in nature that are modeled off of this data.  The interesting thing is that these cubes represent certain areas or financial pages on the company’s chart of accounts.  Many of the pages are structurally similar, and thus grouped together.  But the pages can be wildly different from each other.  For this reason, it was decided to put them in numerous cubes and avoid senseless inter-dimensional irrelevance.  This keeps the contents of the cubes focused and performance a little better, at the expense of having to manage more cubes (users and admins alike).  But, this is just one of those art versus science trade-offs in life.

Since these cubes add a “Departments” dimension that is not present in the source data, it is necessary to massage the data a bit and come up with a department before we can load the raw financial data to a sub-cube.  Furthermore, not all the cubes take the same accounts so we need some way to sort that out as well.  Therefore, one of the cubes in this process is a “staging” database where all of the data is loaded in, then report scripts are run against certain cross sections of the data, which kick out smaller data files (with the department added in) that are then loaded to the other subsequent cubes.  The staging database is not for use by users — they can’t even see it.  The staging database, in this case, also tweaks the data in one other way — it loads locations in at a low level and then aggregates them into higher level locations.  In order to accomplish this, the staging database has every single account in it, and lots of shared members (accounts can be on multiple pages/databases).

That being said, the database is s highly sparse.  For all of these reasons, generating reports out of the staging database can take quite a bit of time, and this is indicated as the very wide brown bar on the performance chart linked above.  In fact, the vast majority of the total processing time is just generating reports out of this extremely sparse database.  This makes sense because if you think about what Essbase has to do to generate the reports, it’s running through a huge section of database trying to get what it wants, and frankly, who knows if the reports are even setup in a way that’s conducive to the structure of the database.

Of course, I could go through and play with settings and how things are configured and shave a few seconds or minutes off here and there, but that really doesn’t change the fact that I’m still spending a ton of my time on a task, that quite simply, isn’t very conducive to Essbase.  In fact, this is the kind of thing that a relational database is awesome at.  I decided, first things first, let’s see if we can massage the data in SQL Server, and load up the databases from that (and get equivalent results).

The SQL Server staging database is relatively simple.  There is now a table that will have the data loaded to it (this is the data that was loaded to the staging cube with a load rule).  The other tables include a list of financial pages (these are the departments on the sub-cubes, so to speak), a table for the list of databases (you’ll see why in a minute), a table for linking the departments/pages to a particular database, a table linking accounts to departments (pages), a page-roll table to manage the hierarchy of how pages aggregate to “bigger” pages, and a location/recap table that links the different small locations to their bigger parent locations (this is the equivalent of the location summing from the original staging database).

With all these tables in place, it’s time to add a few special views that will make life easier for us (and our load rules):

SELECT
     D.database_name
     , P.physical_page
     , A.account
     , K.div
     , K.yr
     , K.pd
     , K.amt
FROM
     account_dept AS A INNER JOIN
     jac_sum_vw AS K ON A.account = K.account INNER JOIN
     page P ON A.G_id = P.G_id INNER JOIN
     page_database AS G ON A.G_id = G.G_id INNER JOIN
     dbase AS D ON G.database_id = db.database_id

Obviously I didn’t provide the schema for the underlying tables, but this still should give you a decent idea of what’s going on.  There’s another view that sits on top of this one that takes care of the summing of locations for us, but it’s structurally similar to this one, so no surprises there.  What I end up with here is a view that has columns for the database name, the department, the account, location, year, period, and the amount.  So not only is this perfect for a fact table down the road when I convert this to EIS, I can also use the exact same view for all of my databases, and a very similar load rule in each database that simply references a different database.

This all took a day or two to setup, but after I started getting data how I wanted, I got pretty stoked that I was well on my way to boosting performance of this system.  One of the nice things about the way it’s setup in the relational database is also that it’s very flexible — extra databases, departments, locations, and anything else can be added to one central place without too much trouble.

Interestingly enough, this entire system is one that’s been in production for years but the “test” copy of things was sort of forgotten about and completely out of sync with production.  Since I was getting closer to the point where I was ready to load some cubes, I needed to sync test to prod (oddly enough) so I could do all testing without hosing things up.  Basically I just copied the apps from the production server to the test server with EAS, copied the existing automation folder, changed some server names and passwords (which were all hard-coded, brilliant…), and was pretty much good to go.   On a side note, I took the opportunity to rewrite the automation in modular, test-ready form (basically this involved cleaning up the old hard-coded paths and making it so I could sync test to prod much easier).

My next step was to design a load rule to load data up, to make sure that the structure of the tables and views was sufficient for loading data.  I did a normal SQL load rule that pulled data from the view I setup.  The load rule included some text replacements to adapt my version of departments to the actual alias name in the database, but otherwise didn’t need anything major to get working.  I loaded one period of data, calculated, saw that I was off a bit, researched some things, tweaked the load rule, recalculated, and so on, until finally the numbers were tying out with what I was seeing.  Not bad, not bad at all.

After this basic load rule was working correctly, I started gutting the automation system to clean things up a bit more and use the new load rule (with the old calcs).  After I felt good about the basic layout of things and got to the point where I was ready to copy this out to the other 18 or however many cubes.  Then I put some hooks in the automation to create a log file for me, so I can track performance.

For performance profiling, I used a technique that I’ve been using for awhile and have been quite happy with.  I have a small batch file in the automation folder called essprof.bat that has this in it:

@For /f "tokens=2-4 delims=/ " %%a in ('date /t') do @set FDATE=%%c-%%a-%%b
@echo %2,%FDATE% %TIME% >> %1

Basically when you call this file from another batch file, you tell it what file you want to append the data to, what the step should be named, and it takes care of the rest.  Here is how I might call it from the main automation script:

CALL essprof.bat %PROFFILE% START-PROCESS

The PROFFILE variable comes from a central configuration file and points to a location in the logging folder for this particular automation set.  For this particular line, I would get output like the following:

START-PROCESS,2009-07-31  8:57:09.18

The For loop in the batch file parses the DOS/shell date command a little bit, and together with the TIME variable, I get a nice timestamp — one that, incidentally, can be read by Excel very easily.

So what’s this all for?  As I did in the original performance chart, I want to profile every single step of the automation so I can see where I’m spending my time.  Once I filled out the rest of the load rules and ran the process a few times to work out the kinks, I now had a good tool for analyzing my performance.  Here is what one of the initial runs looked like:

start-db02-process 	10:54 AM
finish-db02-process 	10:54 AM
start-db03-process 	10:54 AM
finish-db03-process 	10:57 AM
start-db03GP-process 	10:57 AM
finish-db03GP-process 	10:58 AM
start-db04-process 	10:58 AM
finish-db04-process 	11:13 AM
start-db05-process 	11:13 AM
finish-db05-process 	11:13 AM
start-db06-process 	11:13 AM
finish-db06-process 	11:13 AM
start-db07A-process 	11:13 AM
finish-db07A-process 	11:13 AM
start-db07D-process 	11:13 AM
finish-db07D-process 	11:14 AM
start-db07L-process 	11:14 AM
finish-db07L-process 	11:14 AM
start-db07M-process 	11:14 AM
finish-db07M-process 	11:14 AM
start-db07T-process 	11:14 AM
finish-db07T-process 	11:14 AM
start-db08-process 	11:14 AM
finish-db08-process 	11:14 AM
start-db09-process 	11:14 AM
finish-db09-process 	11:14 AM
start-db10-process 	11:14 AM
finish-db10-process 	11:14 AM
start-db11-process 	11:14 AM
finish-db11-process 	11:14 AM
start-db12-process 	11:14 AM

And how do we look?  We’re chunking through most of the databases in about 20 minutes.  Not bad!  This is a HUGE improvement over the hour and a half processing time that we had earlier.  But, you know what, I want this to run just a little bit faster.  I told myself that if I was going to go to the trouble of redoing the entire, perfectly working automation system in a totally new format, then I wanted to get this thing down to FIVE minutes.

Where to start?

Well, first of all, I don’t really like these dense and sparse settings.  Scenario and time are dense and everything else is sparse.  For historical reasons, the Scenario dimension is kind of weird in that it contains the different years, plus a couple budget members (always for the current year), so it’s kind of like a hybrid time/scenario dimension.  But, since we’ll generally be loading in data all for the same year and period at once (keep in mind, this is a period-based process we’re improving), I see no need to have Scenario be dense. Having only one dense dimension (time) isn’t quite the direction I want to go in, so I actually decided to make location a dense dimension.  Making departments dense would significantly increase my inter-dimensional irrelevance so Location seems like a sensible choice — especially given the size of this dimension, which is some 20 members or so.  After testing out the new dense/sparse settings  on a couple of databases and re-running the automation, I was happy to see that not only did the DB stats look good, but the new setting was helping improve performance as well.  So I went ahead and made the same change to all of the other databases, then re-ran the automation to see what kind of performance I was getting.

End to end process time was now down to 12 minutes — looking better!  But I know there’s some more performance out there.  I went back to the load rule and reworked it so that it was “aligned” to the dense/sparse settings.  That is, I set it so the order of the columns is all the sparse dimensions first, then the dense dimensions.  The reason for this is that I want Essbase to load all the data to a single data block that it can, and try to minimize the number of times that the data block is loaded to memory.

Before going too much further I added some more logging to the automation so I could see exactly when the database process started, when it ran a clearing calc script, loaded data, and calculated again.

Now I was down to about 8 minutes… getting better.  As it turns out, I am using the default calculation for these databases, which is a CALC ALL, so that is a pretty ripe area for improvement.  Since I know I’m loading data all to one time period and year, and I don’t have any “fancy” calcs in the database, I can rewrite this to fix on the current year and period, aggregate the measures and departments dimensions, and calc on the location dimension.  By fancy, I’m referring to instances were a simple aggregation as per the outline isn’t sufficient — however, in this case, just aggregating works fine.  I rewrote the FIX, tested it, and rolled it out to all the cubes.  Total end to end load time was now at about four minutes.

But, this four minute figure was cheating a little since it didn’t include the load time to SQL, so I added in some of the pre-Essbase steps such as copying files, clearing out the SQL table, and loading in the data.  Fortunately, all of these steps only added about 15 seconds or so.

I decided to see if using a more aggressive threads setting to load data would yield a performance gain — and it did.  I tweaked essbase.cfg to explicitly use more threads for data loading (I started with four threads), and this got total process time down to just under three minutes (2:56).

As an aside, doing in three minutes what used to take 90 would be a perfectly reasonable place to stop, especially considering that my original goal was to get to five minutes.

But this is personal now.

Me versus the server.

I want to get this thing down to a minute, and I know I still have some optimizations left on the table that I’m not using.  I’ve got pride on the line and it’s time to shave every last second I can out of this thing.

Let’s shave a few more seconds off…

Some of the databases don’t actually have a departments dimension but I’m bringing a “dummy” department just so my load rules are all the same — but why bring in a column of data I don’t need?   Let’s tweak that load rule to skip that column (as in, don’t even bother to bring it in from SQL) on databases that don’t have the departments dimension.  So I tweaked the load rule got the whole process down to 1:51.

Many of these load rules are using text replacements to conform the incoming department to something that is in the outline… well, what if I just setup an alternate alias table so I don’t even have to worry about the text replacements?  It stands to reason, from an Essbase data load perspective, that it’s got to cycle through the list of rows on the text replace list and check it against the incoming data, so if I can save it the trouble of doing that, it just might go faster.  And indeed it does: 1:39 (one minute, thirty nine seconds).  I don’t normally advocate just junking up the outline with extra alias tables, but it turns out that I already had an extra one in there that was only being used for a different dimension, so I added my “convenience” aliases to that.  I’m getting close to that minute mark, but of course now I’m just shaving tiny bits off the overall time.

At this point, many of the steps in the 90-step profiling process are taking between 0 and 1 seconds (which is kind of funny, that a single step starts, processes, and finishes in .3 seconds or something), but several of them stand out and take a bit longer.  What else?

I tried playing with Zlib compression, thinking that maybe if I could shrink the amount of data on disk, I could read it faster into memory.  In all cases this seemed to hurt performance a bit so I went back to bitmap.  I have used Zlib compression quite successfully before, but there’s probably an overhead hit I’m taking for using it on relatively “small” database — in this case I need to get in and get out as fast as I can and bitmap seems to allow me to do that just a little faster, so no Zlib this time around (but I still love you Zlib, no hard feelings okay?).

Okay, what ELSE?  I decided to bump the threads on load to 8 to see if I got more of a boost — I did.  Total load time is now at 1:25.

The SQL data load using bcp (the bulk load command line program) takes up a good 10 seconds or so, and I was wondering if I could boost this a bit.  bcp lets you tweak the size of the packets, number of rows per batch, and lock the table you’re loading to.  So I tweaked the command line to use these options, and killed another few seconds off the process — down to 1:21.

Now what?  It turns out that my Location dimension is relatively flat, with just two aggregated members in it.  I decided that I don’t really need to calculate these and setting them as dynamic would be feasible.  Since location is dense this has the added benefit of removing two members from my stored data block, or about 10 percent in this case.  I am definitely approaching that area where I am possibly trading query performance just for a little bit faster load, but right now I don’t think I’m trading away any noticeable performance on the user side (these databases are relatively small).

Just reducing the data blocks got me down to 1:13 or so (closer!), but since there are now no aggregating members that are stored in the Location dimension, I don’t even need to calculate this dimension at all — so I took the CALC DIMs out of all the calc scripts and got my calc time down further to about 1:07.

Where oh where can I find just 7 more seconds to shave off?  Well, some of these databases also have a flat department structure, so I can definitely shave a few seconds and save myself the trouble of having to aggregate the departments dimension by also going to a dynamic calc on the top level member.  So I did this where I could, and tweaked the calcs accordingly and now the automation is down to about 1:02.

And I am seriously running out of ideas as to where I can shave just a COUPLE of more seconds off.  Come on Scotty, I need more power!

Well, keeping in mind that this is a period process that runs numerous times during closing week, part of this process is to clear out the data for the current period and load in the newer data.  Which means that every database has a “clear” calc script somewhere.  I decided to take a look at these and see if I could squeeze a tiny bit of performance out.

I wanted to use CLEARBLOCK because I’ve used that for good speedups before, but it’s not really effective as a speedup here because I have a dense time dimension and don’t want to get rid of everything in it (and I think CLEARBLOCK just reverts to a normal CLEARDATA if it can’t burn the whole block).  However, there were still some opportunities to tweak the clearing calc script a little so that it was more conducive to the new dense and sparse settings.   And sure enough, I was able to shave .1 or .2 seconds off of the clear calc on several databases, giving me a total process time of……. 59.7 seconds.   Made it, woot!

Along the way I tried several other tweaks here and there but none of them gave me a noticeable performance gain.  One change I also made but seems to be delivering sporadic speed improvements is to resize the index caches to accommodate the entire index file.  Many of the indexes are less than 8 megabytes already so they’re fine in that regard, some of them aren’t so I sized them accordingly.  I’d like to believe that keeping the index in memory is contributing to better performance overall but I just haven’t really been able to profile very well.

After all that, in ideal circumstances, I got this whole, previously hour-and-a-half job down to a minute.  That’s not bad.  That’s not bad at all.  Sadly, my typical process time is probably going to be a minute or two or longer as I adjust the automation to include more safety checks and improve robustness as necessary.  But this is still a vast improvement.  The faster turnaround time will allow my users to have more accurate data, and will allow me to turn around the databases faster if something goes wrong or I need to change something and update everything.  In fact, this came in extremely useful today while I was researching some weird variances against the financial statements.  I was able to make a change, reprocess the databases in a minute or so, and see if my change fixed things.  Normally I’d have to wait an hour or more to do that. I may have optimized more than I needed to (because I have an almost unhealthy obsession with performance profiling and optimization), but I think it was worth it.  The automation runs fast.

So, my advice to you: always look at the big picture, don’t be afraid to use a different technology, get metrics and refer to them religiously (instead of “hmm, I think it’s faster this way), try to think like Essbase — and help it do its job faster, and experiment, experiment, experiment. Continue Reading…

An Essbase Optimization Story

I got a call the other day from one of my fellow Essbase developers. One of the cubes in their budgeting system was taking a long time to finish calculations, whereas just days before the calculation time was just fine.  I have an almost unhealthy obsession with cube performance and optimization, so I jumped in to take a look.

The first thing I looked at (although not generally the first thing I look at) was the average cluster ratio.  The average cluster ratio is roughly analogous to the fragmentation of your hard drive: data gets added, removed, and the performance with regard to accessing that data is increasingly suboptimal.  Although the configuration of the sparse and dense dimensions in a BSO cube in theory dictates where data gets placed in the corresponding data files, quite often, for performance reasons, the data files will just grow to accomodate the new data instead of putting it where it “should” go.  A cluster ratio of 1.00 is optimal.  In this case, the database with performance issues was reporting a significantly lower cluster ratio.  So the first order of business was to export all of the data in the cube to a raw export file, clear the cube, then reimport.

This helped performance a little bit, but not much.  Something else needed to be done.  I checked out the index and page files in the cube directory to discover that the index files were almost four gigabytes, with about 20 gigabytes of page files.  This size of data isn’t inherently difficult to work with, but it does warrant a look into the dense and sparse dimension settings,  the calc script that is having issues, and trying to understand the way in which the database data is being used.

The database is a six dimension cube with the following dimensions: Scenario, Years, Time, Location, Customer, Commodities.  Time and Scenario are marked as dense.  I can live with Time being dense, especially since it’s just periods rolling to quarters rolling to a fiscal year (with non-Level 0 members tagged as Dynamic Calc), but the dense Scenario dimension didn’t quit jive with me.  In this particular database, it turns out that the reason for the block explosion and subsequent poor calc performance is that users and loading in tons of data to the Forecast member in Scenario.  Scenario also has four other dense members.  In addition to the cube being to to create blocks new blocks on equations, the size of the page files was blowing out with all of the data being sent in.

The first thing that jumps out at me is that even though all of this data is being loaded to the Forecast member, we’re taking a hit on storage because the dense data blocks are being allocated for the other four members in Scenario that are very sparingly utilized.  Surely there is a better candidate for a dense dimension?  You betcha.  Given the sparse nature of the database, most of the other dimensions aren’t really good candidates for dense because of all the interdimensional irrelevance that it would incur, however, the Years dimension is a good candidate because the Forecast data being loaded in is being loaded in for a particular year and all periods.  Perfect.

Given that there are not a significant number of calc, report, and load scripts in this database, it is relatively safe to change the dimensional configuration here.  This isn’t always necessarily the case and changing dense/sparse settings could very easily deoptimize the performace of calc, load, and report scripts, not to mention adversely affecting other numerous aspects of the database.

And of course, this was all tested on the test server before loading to production.  The process was fairly straightforward.  I cleared the database again, and loaded the export file.  Since export files are sensitive to the dense/sparse settings in the database, I did this so I could restructure the database with the new settings and have Essbase do the work for me instead of having to load the data from somewhere else.  I also made sure to do this before doing a calculation on the database so there would be less data to restructure.  After the data loaded in, I changed the dense and sparse settings to set Years to dense and Scenario to sparse.  After the restructure, database statistics were looking fairly good, but I didn’t want to get my hopes up.

Clearly one of the issues with this database was that the calculation script that was taking too long had to span through 20 gigabytes of page files in order to do what it needed to do.  Generally my databases are set to bitmap encoding as the compression scheme (because in most cases it is “good enough”), but I decided to go with zlib compression instead.  Although technically speaking, zlib is more computationally intensive to compress and uncompress, because our servers’ CPUs aren’t generally getting hammered, I decided that I would rather hit the CPUs a little harder if it meant that I could read the data off the disk a little faster (even though it will still uncompress in memory to the same amount as bitmap encoding or any other compression scheme).  Conversely, the stronger compression will mean less data is also written to the disk.

I then had to do some tweaking to the calc script to align it with the next dense and sparse settings.  This was fairly easy.  As with most calc script optimization, this involved changing the FIX statements and CALC DIM statements so that I was fixing on sparse dimensions and doing as much calculating within the dense data block as possible.  Remember, you are always trying to help Essbase do its job and give it hints as to how it can do things more optimally.  In this case, we are trying to minimize the amount of data that must be scanned through.  By calculating within the dense blocks, we are doing all of the work in one place that we can, then moving on to the next area.  For good measure, I also added a SET command to utilize the server’s higher calc lock block setting (see your technical reference for more details).

I fired off the calculation and checked out the results.  The calculation’s first pass took about 20 minutes (a lot of blocks were being created the first time around).  After this pass was done, I checked out the index and page files.   The index file seemed to be holding steady at about 300 megabytes — quite an improvement from four gigs!  The page file was now about 700 megabytes.  Given this new page file size, I decided to set the index cache to 512 megabytes.  Although this will eat 512 megabytes directly out of memory on the server, we have decided to allocate more memory for this app until forecasts are done, then we can pull back on the amount of memory it gets.  But for now, with enough memory to hold the entire index at one time, calculations and other operations will be much faster.

At the time of this writing, the database calculation that was the original problem is still holding steady at about 10 minutes to calculate.  The page and index files have grown a bit but still appear to be holding steady, and the database statistics indicate a pretty good block density as well.  Not a bad bit of optimization — a twentyfold decrease in size on disk, and a calculation that comes back in minutes rather than never.

Like I said, I love performance optimization, and as I’ve said before, Essbase is a bit of an art and a science.  The art part is comprehensively understanding how things work and how they are meant to work, then using that information to make the right technical tradeoffs, then putting the changes into the system, keeping in mind that you are simply trying to give “hints” to the system to help it do its job a little better or faster.

Essbase Performance Optimization: it’s not just the calc script

Here’s a quick post that is a bit of a precursor to some of my more in-depth performance analysis articles that will be coming out in the future.  One of my automation systems takes a bit over an hour to run.  There are a lot of people I know that need to squeeze performance out of their systems and immediately look to their calc scripts.  Yes, calc time can be a large part of your downtime, as can data loads, reports, and other activities.  But I always stress that it is useful and important to understand your systems in their entirety.

As part of looking at the bigger picture, I put together the following graph showing each step and how long it takes in this system that takes around an hour.  It’s not hard to tell that the majority of the time that it takes to run this job (the brownish bar that takes about an hour) is in one task!  And what is that task?  It’s a bunch of report scripts running on a staging database.  This is clearly an obvious place for me to look at ways of saving time.

Duration of Steps for an Essbase Automation Process

Duration of Steps for an Essbase Automation Process

The staging database is is a rather clever cube that is essentially used to scrub, aggregate, and associate raw account level data to some more meaningful dimensional combinations for all of the other databases.  Data comes in, it’s calculated, and it outputs a bunch of report scripts.  Fundamentally, the reason that this approach takes so much time is that there are two highly sparse dimension combinations with tens of thousands of members each, and the report script writer has to go through a ton of on-disk data in order to figure out what to write.  I could spend some time trying to optimize this process, in fact, I could probably play with some settings and get at least 20% improvement right now.

But, this is one of those times where it pays to stand back and look at what we’re trying to accomplish.  As it turns out, I actually have all of the infrastructure I need to accomplish this task, but it’s in a SQL database.  And, the task that is being performed is actually much more conducive to the way that a relational database works.  I’m still putting the finishing touches on this process, but it’s mostly complete as of right now, and the performance is amazing.  I can pump through the same amount of data in mere minutes now, with no loss of functionality.

My specific goal is to get this process that takes an hour or longer, to run in less than five minutes.  I chose this instead of “as fast as possible” because I wanted something concrete and attainable.  (My secret goal, just for kicks, is to get this all to run in under a minute).  Once the automation for the SQL staging is all in place, I will be going through all of the individual databases and tweaking any and all settings in order to shave their downtime as well.

Historically, not a lot of effort has gone into extensive profiling on these cubes, so as nerdy as it sounds, I’m actually very interested to see where else I can shave a few seconds off.  At first this will undoubtedly involve using more write threads in the dataload, rewriting the calc scripts to tighten them up from just their current CALC ALL, aligning the order of the data fields and rows with the dense/sparse-ness of the outlines and the outline order, choosing better cache settings that are customized for the size of the index and page files, and perhaps looking at benefits of zlib compression (theoretically more CPU time to compress/decompress, however, generally the CPUs on these servers are not slammed very hard, so if I can get the size of the physical page files down, I may be able to read it into memory faster).

So remember — you spend a lot of time doing calculations, but that might not alway be where the low hanging fruit is.  I cannot stress enough the importance of understanding where you spend your time, and using that as a basis for helping Essbase do its job faster.