Essbase & Hyperion Papercuts

I am a software developer who regards the output of the software development process as skin to the woodworker making a fine jewelry box, the master chef perfecting a plate, or the musician crafting a song. I believe solutions should be elegant, robust, deceivingly simple, and polished. I take the same approach to code with my various open source Essbase related Java projects, GitHub projects, and perhaps more importantly, the critical filter through which I see and use Hyperion, Essbase, and related software.

To that end and just for my own amusement, I will be henceforth be writing about various “Essbase and Hyperion papercuts” that I see and perceive. This is in similar spirit to the Ubuntu Papercuts project that occurred some time ago. The idea was that in the aggregate, a lot of little issues start to become troublesome and lead to a worse user experience. As with many pieces of software, Essbase in general has this problem. It’s an awesome piece of software, but then has all these little warts and quirks. It’s like NASA puts a man on the moon (this is the scientific equivalent of Essbase at its heart) and then when stepping out of the lander on the surface of the moon, trips on a faulty stair leading to the surface (this is like any number of little Essbase quirks I run into).

That being said, if there is some small, seemingly trivial and inconsequential Essbase/Hyperion (or Planning, or FDM, or ODI, or…) issue that just really grinds your gears, let me know about it! This blog gets a few hits from Oracle headquarters, so maybe, just maybe, someone will see it and we can all make the world’s best EPM software just a little bit cooler.


My ODTUG Kscope13 presentation: Practical Essbase Web Services

It has been a few years since I last presented at Kscope, but I am back this year! I will be presenting on “Practical Essbase Web Services” – this will be my take on the new web services features from recent Essbase versions, as well as drawing on my experience developing mobile solutions, developing Essbase middle tiers with the Java API, and other approaches to extracting data from Essbase. For those of you in C# shops or wanting to get at Essbase data from your other favorite languages (I’m looking at you, PHP, Python, and Clojure), this should be a fun overview of your options. I’ll look forward to seeing you there – and if you are interested in the presentation but aren’t going to ODTUG’s Kscope, let me know!

Install Essbase Java API (jar) files as a local Maven artifact

Maven is a comprehensive build system for Java projects. A lot of people, including myself, have a love/hate relationship with Maven. The reasons for this relationship can be discussed at another time. In any case, used judiciously, it can make managing dependencies in Java projects much easier than handling them by hand.

Eclipse has pretty good Maven integration. It’s possible to setup a new project and browse for dependencies and add them automatically to your project. Everything just works. I develop quite a few Java applications that rely on the Essbase Java API, so I have imported the Essbase jar files to my local repository (since they are not available from a central public repository) to make development a breeze.

Here’s how you can do the same. First, you need to go get your Essbase jar file. These are installed on the Hyperion server. You might have to search around a little bit since the directories seem to change from release to release, but in the case of this stock Hyperion 9.3.1 server (with Hyperion installed in C:\Hyperion) they can be found at C:\Hyperion\AnalyticProviderServices\lib.

Here’s what the directory looks like on one of my machines:

Hyperion Java lib folder screenshot

Hyperion Java lib folder screenshot

Right now we’re just interested in the ess_japi.jar file. We’re going to import this in to our local machine’s Maven repository. This assumes you have Maven installed locally, of course. If not it’s pretty straightforward. Just Google around and all will be clear.

Maven is very particular about the versions of everything. It allows you to store multiple versions of files. This means that our single repository can store the files for Essbase 9.3.1,,, and so on, all next to each other. Since we’re importing this resource manually we are going to tell it the version. First though, let’s rename this local file to something more consistent with Maven naming conventions. Let’s rename it from ess_japi.jar to essbase-japi-9.3.1.jar (since this is a file from a 9.3.1 server). Change it accordingly for other versions. If this were then we would make it essbase-japi- Note that Maven “prefers” a versioning scheme of major.minor.revision but not all software (particularly Essbase) adheres to this, so we’ll do our best.

So now we have essbase-japi-9.3.1.jar. A simple command line will import this. From a command prompt in the same folder as the jar file, execute this command:

mvn install:install-file -Dfile=essbase-japi-9.3.1.jar -DgroupId=com.essbase -DartifactId=essbase-japi -Dversion=9.3.1 -Dpackaging=jar

Each -D indicates a parameter we are filling out: the name of the file, a Maven group ID (which we’ll decide to make com.essbase), what the name of the artifact itself should be (essbase-japi), the version, and lastly that it is a jar file. You’d think Maven could infer some of this for us but we only have to do this once in a blue moon so it’s not so bad. Maven will copy the file to the local repository. To make it visible from Eclipse you will likely have to rebuild your Maven repository index which is no big deal.

Essbase Jar import success

Success importing Essbase jar file

Now when we are specifying the dependencies for our projects from Eclipse, we can easily browse it by name and add it in to our Maven POM file:

Eclipse Select Essbase Jar Dependency

Eclipse Select Essbase Jar Dependency

Now we’re good to go. We can easily include this artifact in future projects quickly and easily. This is particularly useful if you happen to download the source code for some of my Essbase-related open source projects, which as of late rely on Maven for dependency management.

The Essbase Book Review to End All Reviews: Developing Essbase Applications

I received my copy of Developing Essbase Applications – Advanced Techniques for Finance and IT Professionals some time ago and spent every spare minute I could find eagerly devouring the book. I was eagerly anticipating the arrival of the book to my doorstep, hoping and wanting it to be the Essbase book to end all Essbase books. While it has its flaws, it is a must-have for the dedicated Essbase practitioner’s bookshelf. As I always say in these reviews, there are precious few Essbase books out there, so anything that helps the cause is welcome as far as I am concerned.

Developing Essbase Applications sports an impressive and diverse array of contributors. Most of the names are easily recognizable to anyone that has taken a break from writing a calc script to seek out an Essbase blog article or Google for some help on the Essbase forums. Seriously, check out this who’s who list of Essbase folks that helped create this thing: Cameron Lackpour, Dave Anderson, Joe Aultman, John Booth, Gary Crisci, Natalie Delemar, Dave Farnsworth, Michael Nader, Dan Pressman, Robb Salzmann, Tim Tow, Jake Turrell, and Angela Wilcox.

While this is probably the greatest strength of the book, it also inevitably contributes to my complaints about the book, which is not so much that it is a book, but more, a collection of books. This divide and conquer approach is probably the only realistic way to get so many people together to create such a thing, and the necessity of doing so is perhaps a reflection of the increasing breadth and scope of the Essbase ecosystem itself.

So for those of you looking to start on page 1 and work your way front to back (as I am want to do with a good programming book), you can do that, but it’s not necessary since the chapters don’t really build on each other. They just really do feel like a dozen books bound together, each with their own table of contents and style. The voice of each chapter author really shines through. In this regard, the book can be thought of as more of a reference, and indeed, some chapters are so packed full of information that there’s no reasonable way to absorb it all in one reading.

That all being said, what about the actual content of the book? It’s impressive, if slightly disjointed. In order, the chapters cover the following: Essbase infrastructure, tackling bad data, Essbase Studio, BSO, converting BSO to ASO, MDX, ASO and performance, Essbase Java API, system automation with Groovy, Advanced Smart View, and how to successfully manage an Essbase system.

Some chapters are more useful than others, and some chapters are definitely “stronger” than others. I can easily see anyone reading the book and being asked to rank their chapters in order of preference or usefulness coming up with completely different rankings from anyone else.

The way in which chapters convey their information to the reader also varies significantly from chapter to chapter. There are various degrees of difficulty that the reader has to go through to extract the useful information from the chapters of the book. This thing is fill with gold and diamonds but its up to the reader, in no small part, to extract it, as it is rarely handed over on a silver platter.

For example, starting things off is John Booth’s chapter about Essbase infrastructure. Clearly, John is a very smart guy witha lot of experience, a recent Oracle ACE who graciously donates his time in the form of numerous forum posts and even creating Amazon EC2 images of Essbase servers. However, his chapter , but I found that his chapter  reads more like an animated brain dump of Essbase infrastructure. I imagine having a beer with John and saying “Tell me everything you know about infrastructure” and Chapter 1 is the result of that conversation. At the end of the day, I learned a lot, but as far as book form is concerned, would have appreciated more structure and progression to the manner in which the information was doled out.

Cameron picks things up in Chapter 2 which is ostensibly about slaying bad data in Essbase, a ubiquitous problem for any database administrator. While this is ostensibly the focus of the chapter, after a few pages, Cameron takes an immediate left turn into all things ODI and covers that. While also useful and insightful, as pedantic as it sounds, I found myself wishing the chapter had just been made to focus on ODI itself.

The “voice” in each chapter also varies, and in many instances while a chapter ostensibly wants to explain some concept to you, seems to talk as if you already know the concept. Going along again with my “you have to mine the information” analogy, I would say it’s also the responsibility of the reader to have been a somewhat astute observer of various acronyms and terms in the Essbase ecosystem even if they aren’t familiar with them beyond being able to simply unwind the acronym.

The book covers the Essbase Java API and brings in Mr. Essbase Java API himself, Tim Tow to help out. As an also experienced Essbase Java API guy myself, I certainly read this chapter very enthusiastically, and Tim does not disappoint. That being said, however, in my experience I haven’t found a lot of overlap between the Essbase administrator and Java programmers (much to my chagrin). So while I welcome the content, in some ways I question its usefulness to the average Essbase administrator. Of course, I could be mistaken and for the least part, exposing more people to the Java API in the hopes that they will pick it up and do something cool with it (as I have tried to do myself with various Essbase-related open source projects) is nevertheless extremely welcome.

The rest of the chapters round out topics from MDX to Groovy. Gary Crisci is the quintessential MDX practitioner, having presented on it and posted many an Essbase forum comment on it and does a nice job, as always, of conveying information on a topic that can be challenging. I have looked at Groovy in the past but haven’t made the jump just yet (too busy with Clojure and Go, I guess) but was found it to be an enjoyable read for what surely must be an uncommon, if esoteric, way to automate Essbase.

Lastly, on a purely aesthetic and non-functional note, I would like to say that I would have enjoyed more and better visuals in the book, in fact, the formatting of the book leaves something to be desired. For a tech book I would have loved to see something more modern than Times New Roman (for example, the formatting in O’Reilly and Apress books is second to none). Many of the tables and figures are laid out so as to be unhelpful and awkwardly cram information into a weird presentation, and a perennial pet peeve of mine is inconsistent framing, cropping, and styling of screenshots. Of course, this is all pretty predantic stuff and has more to do with my almost closet obsession with fonts and data visualization.

All that being said, all of the little things that I didn’t like about the book or thought could be better are  easily and handily overshadowed by all of the things that I did like about the book. This thing is jam-packed full of content and, perhaps more importantly, content that is based on the vast experience of some of the top Essbase people in the industry. These aren’t people pretending to know their stuff or trying to fool you, this is people at the top of their game in the top of their field that are sharing some top shelf knowledge with the rest of us. You just have to dig a little to get at it. At $69.99 list price on Amazon, this book is a little bit pricey for the average tech book. But you know what? Even if you only got one or two useful things out of it, it would absolutely be worth the price. This is an absolutely essential book for anyone that is serious about using Essbase and being a successful administrator or consultant. On a classic scale from one to five stars, I easily give this four stars.

Book review: Oracle Essbase 11 Development Cookbook

I was recently given the opportunity to review another Essbase book from Packt: Oracle Essbase 11 Development Cookbook by Jose Ruiz. Overall I would say I am pleased with the book. It covers a lot of ground and a lot of disparate tools, many of which are scantily documented elsewhere.

Before I really get into the review, I must say that I have never been a big fan of the approach that technology cookbooks take. I’m also not a huge fan of having a book for a specific version of software. Of course, in order for the cookbook approach to work you don’t have a choice but to tie to a version of software. This is because the recipes are sequential and very explicit — as with cooking  a recipe in real life — and rely on the exact version of the software in order for the detailed steps of the recipe to work. I’ve grown up with software, and am a cross between a visual and a kinesthetic learner, so my preference is to have concepts and goals explained to me, then to go exploring on my own. To this end, I find technology/recipe books to be tedious as they laboriously lay out the steps: click this, then click that, enter this text in, and 15 steps later you have a result.

So, my personal preference for book styles aside, this book largely succeeds for what it is: specific, methodical ways to perform a certain task. You won’t get a lot of explanation on why you might do something a certain way. In this regard, the book is useful as a complement to your Essbase literature rather that the place you would go to understand why you might want to accomplish some task.

Okay, now that I have beat up on that horse enough.

As I said, I enjoyed the breadth of content in the book. There are detailed recipes for setting up your relational data store to load a cube with EIS and Essbase Studio, building load rules and loading data to BSO/ASO cubes, writing calc scripts, working with Star Analytics, using EAS, HFR, writing MaxL scripts, and provisioning security. It even covers working with the revered Outline Extractor tool.

All of this content was really nice to see in book form. One of the upsides to the recipe format book is that it won’t spend a lot of time laboring over what a cube is and your first steps retrieving data with Excel. In fact, the book even says it’s not for beginners. It just jumps right in. I think this book can be a very handy reference for someone that needs something a little more guided than the technical reference (and less heavy).

On my arbitrary rating system, I would give this book a four out of five star rating. And again, that’s me trying to be fair to the book even though I’m not in love with this format, but it largely accomplishes what it sets out to do. I’d say it’s a great addition to the pragmatic Essbase developer’s library, but certainly not the only book in it.

Book review: Oracle Essbase 9 Implementation Guide

There’s a new Essbase book out, and I was fortunate enough to receive a copy to review.   Glenn Schwartzberg already beat me to the punch with his review, which I would say is fairly thorough as well as fair.  There are precious few Essbase books out there, so the more the merrier, I say.

The book starts out with the basics, covering some of the history of Essbase, and explaining multi-dimensionality, then walks you through the install process.  Obviously, based on the title, you can tell that this book is for System 9, whereas the current version is 11 (or “eleven dot one dot something dot ahh screw it” as I like to say).  While this means that the specific installation instructions are different, from a conceptual standpoint, everything else is quite similar.  In fact, the core of Essbase has changed precious little over the years, so even people in shops running Essbase 7 or 6 could benefit as well.

After installation, a tour of Essbase through EAS is given, using a fictitious automobile company and its databases to illustrate the concepts.  It’s kind of nice to see a change from the usual The Beverage Company that we’re all so familiar with.  “Write what you know,” they say.  As one of the authors works for Ford, it’s not surprising that the examples are somewhat of a thinly veiled obfuscation of some real-world databases, which is reminiscent of my own efforts to scrub examples from a certain large grocery company.

The rest of the book proceeds with a methodical tour of various Essbase components, including outlines, dimension-build load rules, loading data, calc scripts, using the Excel add-in, report scripts, automation, ASO considerations, and SmartView.

As the book is geared more for the OLAP novice, I did not personally get a lot out of the book.  I am still waiting for a book to come out along the lines of “Effective Essbase for those with a slightly unhealthy obsession with analytics” or something similar.  There are some technical details in the book that seem a bit off, but may be the author’s attempt to gloss over some arcane specifics for the newbie. So if you aren’t familiar with Essbase, or are just familiar with the basics, this could be a very helpful book for you. As I mentioned, there are precious few Essbase books out there, so I welcome any and all attempts to distribute knowledge from the OLAP grandmasters to those that would take the time to read it.

You can find this book online at Packt’s website, where you can find a sample chapter as well.

A comprehensive Essbase automation optimization story, or: Shaving off every last second

I’ve been wanting to write this post for awhile.  Like, for months.  Some time ago I took an in-depth look at profiling the performance (duration-wise) of an automated system I have.  The system works great, it’s worked great, I don’t need to touch it, it just does its thing.  This is the funny thing about systems that live in organizations: some of the rather inglorious ones are doing a lot of work, and do it reliably.  Why reinvent the wheel and risk screwing things up?

Well, one reason is that you need to squeeze some more performance out of that system.  The results of my profiling efforts showed that the process was taking about an hour and a half to run.  So, considering that the processing of the data is an hour behind the source system (the source system drops off text files periodically for Essbase), and it takes an hour and a half to run, the data in the cubes is some two and a half hours behind the real world numbers in the original system.

So, first, a little background on what kind of processing we have here.  The gist of how this particular system works is that a mainframe manages all of the data, and certain events trigger updated files to get delivered to the Essbase server.  These files are delivered at predictable intervals, and automation jobs are scheduled accordingly (about an hour after the text files are scheduled to be delivered, just to be on the safe side).  The data that comes in is typical financial data — a location, a time period, a year, an account, and an amount.

Pretty simple, right?  Well, there are about twenty cubes that are financial in nature that are modeled off of this data.  The interesting thing is that these cubes represent certain areas or financial pages on the company’s chart of accounts.  Many of the pages are structurally similar, and thus grouped together.  But the pages can be wildly different from each other.  For this reason, it was decided to put them in numerous cubes and avoid senseless inter-dimensional irrelevance.  This keeps the contents of the cubes focused and performance a little better, at the expense of having to manage more cubes (users and admins alike).  But, this is just one of those art versus science trade-offs in life.

Since these cubes add a “Departments” dimension that is not present in the source data, it is necessary to massage the data a bit and come up with a department before we can load the raw financial data to a sub-cube.  Furthermore, not all the cubes take the same accounts so we need some way to sort that out as well.  Therefore, one of the cubes in this process is a “staging” database where all of the data is loaded in, then report scripts are run against certain cross sections of the data, which kick out smaller data files (with the department added in) that are then loaded to the other subsequent cubes.  The staging database is not for use by users — they can’t even see it.  The staging database, in this case, also tweaks the data in one other way — it loads locations in at a low level and then aggregates them into higher level locations.  In order to accomplish this, the staging database has every single account in it, and lots of shared members (accounts can be on multiple pages/databases).

That being said, the database is s highly sparse.  For all of these reasons, generating reports out of the staging database can take quite a bit of time, and this is indicated as the very wide brown bar on the performance chart linked above.  In fact, the vast majority of the total processing time is just generating reports out of this extremely sparse database.  This makes sense because if you think about what Essbase has to do to generate the reports, it’s running through a huge section of database trying to get what it wants, and frankly, who knows if the reports are even setup in a way that’s conducive to the structure of the database.

Of course, I could go through and play with settings and how things are configured and shave a few seconds or minutes off here and there, but that really doesn’t change the fact that I’m still spending a ton of my time on a task, that quite simply, isn’t very conducive to Essbase.  In fact, this is the kind of thing that a relational database is awesome at.  I decided, first things first, let’s see if we can massage the data in SQL Server, and load up the databases from that (and get equivalent results).

The SQL Server staging database is relatively simple.  There is now a table that will have the data loaded to it (this is the data that was loaded to the staging cube with a load rule).  The other tables include a list of financial pages (these are the departments on the sub-cubes, so to speak), a table for the list of databases (you’ll see why in a minute), a table for linking the departments/pages to a particular database, a table linking accounts to departments (pages), a page-roll table to manage the hierarchy of how pages aggregate to “bigger” pages, and a location/recap table that links the different small locations to their bigger parent locations (this is the equivalent of the location summing from the original staging database).

With all these tables in place, it’s time to add a few special views that will make life easier for us (and our load rules):

     , P.physical_page
     , A.account
     , K.div
     , K.yr
     , K.pd
     , K.amt
     account_dept AS A INNER JOIN
     jac_sum_vw AS K ON A.account = K.account INNER JOIN
     page P ON A.G_id = P.G_id INNER JOIN
     page_database AS G ON A.G_id = G.G_id INNER JOIN
     dbase AS D ON G.database_id = db.database_id

Obviously I didn’t provide the schema for the underlying tables, but this still should give you a decent idea of what’s going on.  There’s another view that sits on top of this one that takes care of the summing of locations for us, but it’s structurally similar to this one, so no surprises there.  What I end up with here is a view that has columns for the database name, the department, the account, location, year, period, and the amount.  So not only is this perfect for a fact table down the road when I convert this to EIS, I can also use the exact same view for all of my databases, and a very similar load rule in each database that simply references a different database.

This all took a day or two to setup, but after I started getting data how I wanted, I got pretty stoked that I was well on my way to boosting performance of this system.  One of the nice things about the way it’s setup in the relational database is also that it’s very flexible — extra databases, departments, locations, and anything else can be added to one central place without too much trouble.

Interestingly enough, this entire system is one that’s been in production for years but the “test” copy of things was sort of forgotten about and completely out of sync with production.  Since I was getting closer to the point where I was ready to load some cubes, I needed to sync test to prod (oddly enough) so I could do all testing without hosing things up.  Basically I just copied the apps from the production server to the test server with EAS, copied the existing automation folder, changed some server names and passwords (which were all hard-coded, brilliant…), and was pretty much good to go.   On a side note, I took the opportunity to rewrite the automation in modular, test-ready form (basically this involved cleaning up the old hard-coded paths and making it so I could sync test to prod much easier).

My next step was to design a load rule to load data up, to make sure that the structure of the tables and views was sufficient for loading data.  I did a normal SQL load rule that pulled data from the view I setup.  The load rule included some text replacements to adapt my version of departments to the actual alias name in the database, but otherwise didn’t need anything major to get working.  I loaded one period of data, calculated, saw that I was off a bit, researched some things, tweaked the load rule, recalculated, and so on, until finally the numbers were tying out with what I was seeing.  Not bad, not bad at all.

After this basic load rule was working correctly, I started gutting the automation system to clean things up a bit more and use the new load rule (with the old calcs).  After I felt good about the basic layout of things and got to the point where I was ready to copy this out to the other 18 or however many cubes.  Then I put some hooks in the automation to create a log file for me, so I can track performance.

For performance profiling, I used a technique that I’ve been using for awhile and have been quite happy with.  I have a small batch file in the automation folder called essprof.bat that has this in it:

@For /f "tokens=2-4 delims=/ " %%a in ('date /t') do @set FDATE=%%c-%%a-%%b
@echo %2,%FDATE% %TIME% >> %1

Basically when you call this file from another batch file, you tell it what file you want to append the data to, what the step should be named, and it takes care of the rest.  Here is how I might call it from the main automation script:


The PROFFILE variable comes from a central configuration file and points to a location in the logging folder for this particular automation set.  For this particular line, I would get output like the following:

START-PROCESS,2009-07-31  8:57:09.18

The For loop in the batch file parses the DOS/shell date command a little bit, and together with the TIME variable, I get a nice timestamp — one that, incidentally, can be read by Excel very easily.

So what’s this all for?  As I did in the original performance chart, I want to profile every single step of the automation so I can see where I’m spending my time.  Once I filled out the rest of the load rules and ran the process a few times to work out the kinks, I now had a good tool for analyzing my performance.  Here is what one of the initial runs looked like:

start-db02-process 	10:54 AM
finish-db02-process 	10:54 AM
start-db03-process 	10:54 AM
finish-db03-process 	10:57 AM
start-db03GP-process 	10:57 AM
finish-db03GP-process 	10:58 AM
start-db04-process 	10:58 AM
finish-db04-process 	11:13 AM
start-db05-process 	11:13 AM
finish-db05-process 	11:13 AM
start-db06-process 	11:13 AM
finish-db06-process 	11:13 AM
start-db07A-process 	11:13 AM
finish-db07A-process 	11:13 AM
start-db07D-process 	11:13 AM
finish-db07D-process 	11:14 AM
start-db07L-process 	11:14 AM
finish-db07L-process 	11:14 AM
start-db07M-process 	11:14 AM
finish-db07M-process 	11:14 AM
start-db07T-process 	11:14 AM
finish-db07T-process 	11:14 AM
start-db08-process 	11:14 AM
finish-db08-process 	11:14 AM
start-db09-process 	11:14 AM
finish-db09-process 	11:14 AM
start-db10-process 	11:14 AM
finish-db10-process 	11:14 AM
start-db11-process 	11:14 AM
finish-db11-process 	11:14 AM
start-db12-process 	11:14 AM

And how do we look?  We’re chunking through most of the databases in about 20 minutes.  Not bad!  This is a HUGE improvement over the hour and a half processing time that we had earlier.  But, you know what, I want this to run just a little bit faster.  I told myself that if I was going to go to the trouble of redoing the entire, perfectly working automation system in a totally new format, then I wanted to get this thing down to FIVE minutes.

Where to start?

Well, first of all, I don’t really like these dense and sparse settings.  Scenario and time are dense and everything else is sparse.  For historical reasons, the Scenario dimension is kind of weird in that it contains the different years, plus a couple budget members (always for the current year), so it’s kind of like a hybrid time/scenario dimension.  But, since we’ll generally be loading in data all for the same year and period at once (keep in mind, this is a period-based process we’re improving), I see no need to have Scenario be dense. Having only one dense dimension (time) isn’t quite the direction I want to go in, so I actually decided to make location a dense dimension.  Making departments dense would significantly increase my inter-dimensional irrelevance so Location seems like a sensible choice — especially given the size of this dimension, which is some 20 members or so.  After testing out the new dense/sparse settings  on a couple of databases and re-running the automation, I was happy to see that not only did the DB stats look good, but the new setting was helping improve performance as well.  So I went ahead and made the same change to all of the other databases, then re-ran the automation to see what kind of performance I was getting.

End to end process time was now down to 12 minutes — looking better!  But I know there’s some more performance out there.  I went back to the load rule and reworked it so that it was “aligned” to the dense/sparse settings.  That is, I set it so the order of the columns is all the sparse dimensions first, then the dense dimensions.  The reason for this is that I want Essbase to load all the data to a single data block that it can, and try to minimize the number of times that the data block is loaded to memory.

Before going too much further I added some more logging to the automation so I could see exactly when the database process started, when it ran a clearing calc script, loaded data, and calculated again.

Now I was down to about 8 minutes… getting better.  As it turns out, I am using the default calculation for these databases, which is a CALC ALL, so that is a pretty ripe area for improvement.  Since I know I’m loading data all to one time period and year, and I don’t have any “fancy” calcs in the database, I can rewrite this to fix on the current year and period, aggregate the measures and departments dimensions, and calc on the location dimension.  By fancy, I’m referring to instances were a simple aggregation as per the outline isn’t sufficient — however, in this case, just aggregating works fine.  I rewrote the FIX, tested it, and rolled it out to all the cubes.  Total end to end load time was now at about four minutes.

But, this four minute figure was cheating a little since it didn’t include the load time to SQL, so I added in some of the pre-Essbase steps such as copying files, clearing out the SQL table, and loading in the data.  Fortunately, all of these steps only added about 15 seconds or so.

I decided to see if using a more aggressive threads setting to load data would yield a performance gain — and it did.  I tweaked essbase.cfg to explicitly use more threads for data loading (I started with four threads), and this got total process time down to just under three minutes (2:56).

As an aside, doing in three minutes what used to take 90 would be a perfectly reasonable place to stop, especially considering that my original goal was to get to five minutes.

But this is personal now.

Me versus the server.

I want to get this thing down to a minute, and I know I still have some optimizations left on the table that I’m not using.  I’ve got pride on the line and it’s time to shave every last second I can out of this thing.

Let’s shave a few more seconds off…

Some of the databases don’t actually have a departments dimension but I’m bringing a “dummy” department just so my load rules are all the same — but why bring in a column of data I don’t need?   Let’s tweak that load rule to skip that column (as in, don’t even bother to bring it in from SQL) on databases that don’t have the departments dimension.  So I tweaked the load rule got the whole process down to 1:51.

Many of these load rules are using text replacements to conform the incoming department to something that is in the outline… well, what if I just setup an alternate alias table so I don’t even have to worry about the text replacements?  It stands to reason, from an Essbase data load perspective, that it’s got to cycle through the list of rows on the text replace list and check it against the incoming data, so if I can save it the trouble of doing that, it just might go faster.  And indeed it does: 1:39 (one minute, thirty nine seconds).  I don’t normally advocate just junking up the outline with extra alias tables, but it turns out that I already had an extra one in there that was only being used for a different dimension, so I added my “convenience” aliases to that.  I’m getting close to that minute mark, but of course now I’m just shaving tiny bits off the overall time.

At this point, many of the steps in the 90-step profiling process are taking between 0 and 1 seconds (which is kind of funny, that a single step starts, processes, and finishes in .3 seconds or something), but several of them stand out and take a bit longer.  What else?

I tried playing with Zlib compression, thinking that maybe if I could shrink the amount of data on disk, I could read it faster into memory.  In all cases this seemed to hurt performance a bit so I went back to bitmap.  I have used Zlib compression quite successfully before, but there’s probably an overhead hit I’m taking for using it on relatively “small” database — in this case I need to get in and get out as fast as I can and bitmap seems to allow me to do that just a little faster, so no Zlib this time around (but I still love you Zlib, no hard feelings okay?).

Okay, what ELSE?  I decided to bump the threads on load to 8 to see if I got more of a boost — I did.  Total load time is now at 1:25.

The SQL data load using bcp (the bulk load command line program) takes up a good 10 seconds or so, and I was wondering if I could boost this a bit.  bcp lets you tweak the size of the packets, number of rows per batch, and lock the table you’re loading to.  So I tweaked the command line to use these options, and killed another few seconds off the process — down to 1:21.

Now what?  It turns out that my Location dimension is relatively flat, with just two aggregated members in it.  I decided that I don’t really need to calculate these and setting them as dynamic would be feasible.  Since location is dense this has the added benefit of removing two members from my stored data block, or about 10 percent in this case.  I am definitely approaching that area where I am possibly trading query performance just for a little bit faster load, but right now I don’t think I’m trading away any noticeable performance on the user side (these databases are relatively small).

Just reducing the data blocks got me down to 1:13 or so (closer!), but since there are now no aggregating members that are stored in the Location dimension, I don’t even need to calculate this dimension at all — so I took the CALC DIMs out of all the calc scripts and got my calc time down further to about 1:07.

Where oh where can I find just 7 more seconds to shave off?  Well, some of these databases also have a flat department structure, so I can definitely shave a few seconds and save myself the trouble of having to aggregate the departments dimension by also going to a dynamic calc on the top level member.  So I did this where I could, and tweaked the calcs accordingly and now the automation is down to about 1:02.

And I am seriously running out of ideas as to where I can shave just a COUPLE of more seconds off.  Come on Scotty, I need more power!

Well, keeping in mind that this is a period process that runs numerous times during closing week, part of this process is to clear out the data for the current period and load in the newer data.  Which means that every database has a “clear” calc script somewhere.  I decided to take a look at these and see if I could squeeze a tiny bit of performance out.

I wanted to use CLEARBLOCK because I’ve used that for good speedups before, but it’s not really effective as a speedup here because I have a dense time dimension and don’t want to get rid of everything in it (and I think CLEARBLOCK just reverts to a normal CLEARDATA if it can’t burn the whole block).  However, there were still some opportunities to tweak the clearing calc script a little so that it was more conducive to the new dense and sparse settings.   And sure enough, I was able to shave .1 or .2 seconds off of the clear calc on several databases, giving me a total process time of……. 59.7 seconds.   Made it, woot!

Along the way I tried several other tweaks here and there but none of them gave me a noticeable performance gain.  One change I also made but seems to be delivering sporadic speed improvements is to resize the index caches to accommodate the entire index file.  Many of the indexes are less than 8 megabytes already so they’re fine in that regard, some of them aren’t so I sized them accordingly.  I’d like to believe that keeping the index in memory is contributing to better performance overall but I just haven’t really been able to profile very well.

After all that, in ideal circumstances, I got this whole, previously hour-and-a-half job down to a minute.  That’s not bad.  That’s not bad at all.  Sadly, my typical process time is probably going to be a minute or two or longer as I adjust the automation to include more safety checks and improve robustness as necessary.  But this is still a vast improvement.  The faster turnaround time will allow my users to have more accurate data, and will allow me to turn around the databases faster if something goes wrong or I need to change something and update everything.  In fact, this came in extremely useful today while I was researching some weird variances against the financial statements.  I was able to make a change, reprocess the databases in a minute or so, and see if my change fixed things.  Normally I’d have to wait an hour or more to do that. I may have optimized more than I needed to (because I have an almost unhealthy obsession with performance profiling and optimization), but I think it was worth it.  The automation runs fast.

So, my advice to you: always look at the big picture, don’t be afraid to use a different technology, get metrics and refer to them religiously (instead of “hmm, I think it’s faster this way), try to think like Essbase — and help it do its job faster, and experiment, experiment, experiment. Continue Reading…

The %COMPUTERNAME%/MaxL trick for syncing test to production

There’s an automation trick I’ve been using for awhile that I like.  Having talked about automation and such with other people at ODTUG this year, it seems that several people are using this technique or a variant of it on their own systems.

Basically, the idea is that you want to be able to sync your production automation scripts from your test server as easily as possible.  The number one cause of scripts not working after copying them from test to production is because they have some sort of hard-coded path, filename, server name, user id, password, or other value that simply doesn’t work on the server you are dropping your scripts onto.

Therefore, you want to try and write your automation scripts as generically as possible, and use variables to handle anything that is different between test and production.  As an added bonus for making the sync from test to prod just a little bit easier, why not dynamically choose the proper configuration file?

Assuming you are running on Windows (the same concept will work on other platforms with some tweaks for your local scripting environment), one way to handle it is like this: your main script is main.bat (or whatever).  One of the environment variables on a windows server is the COMPUTERNAME variable.  Let’s say that your test server is essbase30 and your production server is essbase10.  On the production server, COMPUTERNAME is essbase10.

Knowing that we can set environment variables in a batch file that will be available in MaxL, we could setup a file called essbase30.bat that has all of our settings for when the script runs on that server.  For example, the contents of essbase30.bat might be this:

SET ESSPW=password

From main.bat, we could then do this:

cd /d %~dp0
essmsh cleardb.msh

Assuming that the two batch files and the cleardb.msh are in the same folder, cleardb.msh could contain the following MaxL:

login $ESSUSR identified by $ESSPW on $ESSSERVER;
alter database Sample.Basic reset data;

Now for a little explanation.  Note that in essbase30.bat I am explicitly setting the name of the server.  We could assume that this is localhost or the COMPUTERNAME but why not set it here so that if we want to run the script against a remote server, we could do that as well (note that if we did run it remotely, we’d have to change the name of our batch file to match the name of the server running the script).  In general, more flexibility is a good thing (don’t go overboard though).  The first line of main.bat (the cd command) is simply a command to change the current directory to the directory containing the script.  This is handy if our script is launched from some other location — and using this technique we don’t have to hard-code a particular path.

Then we use a call command in the batch file to run the batch file named %COMPUTERNAME%.bat, where %COMPUTERNAME% will be replaced with the name of the computer running the automation, which is in this case essbase30.  The batch file will run, all of the SET commands inside it will associate values to those environment variables, and control flow will return to the calling batch file, which then calls essmsh to run cleardb.msh (note that essmsh should be in your current PATH for this to work).  The file cleardb.msh then runs and can “see” the environment variables that we set in essbase30.bat.

If we want to, we can set variables for folder names, login names, SQL connection names/passwords, application names, and database names.  Using this technique can make your MaxL scripts fairly portable and more easily reusable.

In order to get this to work on the production server, we could just create another batch file called essbase10.bat that has the same contents as essbase30.bat but with different user names and passwords or values that are necessary for that server.

For all you advanced batch file scripters out there, it may be necessary to use a setlocal command so the variables in the batch file don’t stomp on something you need that is already an environment variable.  As you can see, I’m a big fan of the %COMPUTERNAME% technique, however, there are a few things to watch out for:

  • You might be putting passwords in clear text in a batch file.  You can alleviate some of this by using MaxL encryption, although I haven’t actually done this myself.  The folder with these files on my servers already have filesystem level security that prevent access, and for the time being, this has been deemed good enough.
  • It’s difficult with this technique to run automation for test and prod from the same server (say, some central scheduling server).  It’s not too difficult to address this, though.
  • If you run the automation from a remote server instead of the Essbase server itself, you may end up with lots of different config files — in which case, you might want to adjust this technique a bit.

As for deploying your test files to your production server, if you’ve set everything up in a generic way, then simply changing the variables in the configuration file should allow you to run the exact same script on each server.  Therefore, your deploy method could literally be as simple as copying the folder from the test server to the production server.  I’m actually using Subversion to version control the automation systems now, but a simple file copy also works just as well

ODTUG 2009 Recap

I started writing this recap at the airport in San Jose, waiting for my flight back to Seattle, but the article got a bit longer than I thought so I’m just not finishing it up here on Monday.  As expected, the conference itself was great, the food was even better, and it was fun meeting all the people.  One of the things that the ODTUG organizers pointed out was this is one of the very few conferences that has grown in attendance this year.  It’s very easy to believe that due to economic conditions,  companies are cutting back on travel dollars — and I think it’s a testament to the quality of content [and food] of ODTUG that even with these conditions, the attendance has grown.

I flew in Saturday evening.  Due to scheduling issues, I wasn’t able to get in earlier and help pull “evil ice plant” (an invasive plant species) from one of the local nature areas, but it’s my understanding that the other ODTUG attendees pulled an absolutely ginormous amount of the stuff, so kudos to my colleagues for that.

The conference-proper started on Sunday morning and as expected there were lots of important people talking about roadmaps and future plans and synergy and all of that fun stuff.  That’s all well and good, but it can be a little difficult for me to get jazzed up about such things because they are still months away from being released, and on top of that, it would be even longer until I could get a hold of it, understand it, and put it to use, not to mention the fact that new releases of software (particularly in the Hyperion world) tend to be a little rough around the edges (to be charitable).  The general consensus, though, seems to be that Oracle is dumping a lot more money on R&D, development, and polishing things up, so one is hopeful that new releases earn a better reputation than their predecessors.

One of the more entertaining aspects of Sunday was Toufic Wakim giving a presentation on the state of Smart View, which was not only funny because of Toufic’s energetic and humorous delivery style, but also because during the presentation, the audience was quite vocal about the shortcomings of the product and why they can’t roll it out to their users.  As has happened before, much love was expressed for the power and simplicity of the classic Excel add-in.  Toufic seemed very interested in collecting and recording people’s feedback so that future releases of Smart View will be better.

It was hard to go more than a few minutes during the conference without hearing about Twitter and tweets and everything else along those lines.  Twitter was extensively embraced.  I’m not really much of a Twitter guy so I didn’t really follow the conference in that way.  It did, however,  seem like the usage and mentioning of Twitter died down significantly as the conference went on, which I can’t really say was a bad thing.  If nothing else, it allowed for more bandwidth to the conference and hotel rooms, which was grossly inadequate, to say the least (with thousands of geeks pounding the network, transfer speeds were dismal).  Apparently the checklist for next year’s ODTUG involves verifying that the hotel can accommodate all of the net traffic that is sure to occur at such a conference.

Among the conferences I attended on Monday was Ed Roske doing one of his calc script presentations, Matt Millella’s Top 5 Essbase CDFs, and “Building Sustainability Dashboards with Oracle EPM.” All were interesting and the CDF presentation really hit home since I’ve been using CDFs lately, and have a toolchain setup to roll new functionality into Jar files and deploy it.

Monday was also the day that I gave my presentation on Dodeca during the vendor slot.  Tim Tow and I had a good sized audience while I showed how Dodeca has been a good fit for my company.  It was the first presentation I’ve given (ever), so although I’d say it generally went okay, now that I have a couple presentations under my belt I am feeling more comfortable and thinking of ways to improve them.  I suspect that as with most things, the more you do it the better you get.  So for anyone out there that hasn’t given a presentation and has been thinking about it but couldn’t quite pull the trigger, I whole heartedly encourage you to throw your name in the hat and give it a shot.

On Tuesday I checked out Ruddy Zucca and Steve Liebermensch talking about ASO.  I had seen Steve’s presentation on ASO last year as well.  I had gone home from ODTUG New Orleans with a list of things to try out to make some of my ASO cubes a little snappier, and I am happy to say that I am going home from this ODTUG as well with a few more ideas on how to squeeze a little more performance out of my ASO cubes — and improve the Essbase reporting experience for my users.

As an aside, the week of the conference was “closing week” at work, meaning that the fiscal period is over and mountains of data need to be loaded up to various databases.  Normally taking time off during closing week is almost impossible, but the automation systems are pretty dialed in so we thought it’d be okay.  Although things largely went off without a hitch, there were a couple of small issues so I had to skip a few sessions to VPN to work and check in on things.  In most cases, re-running a job seemed to fix things up.  So I’m very happy to say that I have just about automated myself out of a job.  Oh wait…

On Wednesday I checked out Glenn Schwartzberg talking about “Little Used Features in Essbase,” which was pretty interesting.  He talked about Triggers and all sorts of miscellaneous things that don’t quite warrant their own session but combined together made for an interesting presentation.  He didn’t quite get to Data Mining, which could warrant a session all on it’s own.  Oddly enough, it’s now my understanding that the data mining features of Essbase were supposed to be removed from the latest release (presumably because this functionality will be provided elsewhere in the Oracle stack).  It came as a surprise to at least one Oracle employee that the functionality in question was still around. (Oops!)

Later in the day I presented on Optimizing Loads and Builds with Kevin Corlett.  Kevin was an awesome co-presenter and had a ton of  great content to add to the presentation.  As with my Dodeca presentation, I felt pretty good about the content and I’m confident that if I presented again I could make it even better.  The original abstract I wanted to present on was about not just loads and builds but about taking an entire automation process and squeezing every last drop of performance out of it.  I currently have a job that takes about an hour and a half to run and one of my projects this summer is to rebuild it to get the processing time down a bit.  I am fairly confident that I can get this below 5 minutes.  So with any luck, next year I’ll be able to present on this.

After my presentation I checked out Rohit Amaranth’s presentation about Essbase in the Clouds.  This was a very interesting topic.   The idea is that instead of buying or leasing your own hardware, you put Essbase in the cloud.   “In the cloud” is a bit of a buzzword right now, so Rohit took a moment to explain what it was and what it wasn’t.  My big takeaway is that the cloud computing model treats computational resources as elastic and a platform to build on, and this can significantly alter the economics of utilzing computing capacity.  Instead of having to go through the process of acquiring or leasing hardware that you are “stuck” with, why not rent out the service and pay for what you use.  Although there are still some licensing issues to sort out with Oracle, I am extremely interesting in seeing how this model could be used to setup test servers.

The next thing I did on Wednesday was sit on the Optimization Roundtable with Ed Roske, Steve Liebermensch, and Cameron Lackpour.  Glenn Schwartzberg was the moderator and went around the room letting people ask their optimization questions.  The content of the questions tended to focus largely on hardware issues.  Although I find hardware interesting and necessary to provide a service to my users, I don’t spend a lot of time dwelling on it since for the most part ‘It Just Works’ and performance issues [for me] tend to be dealt with on the software side.  So, sadly, I didn’t have much of a chance to show off much Essbase knowledge, although Steve wound up fielding a bunch of ASO questions and providing some insight on things as a current Oracle employee.

Finally, Thursday was a light day filled with “Best of” sessions from earlier in the week.  There was a bit of a mixup for the first session so a bunch of us that had gathered for that wound up having a bit of an adhoc session discussing various Essbase topics.  In a lot of ways this was actually one of the more fun sessions.  After that I went to Cameron’s session on mastering Essbase with MaxL automation.  Although he didn’t think I’d pick much out of it, I actually did.  For example, I now know what MaxL stands for (probably) and saw a couple of neat things on encrypting MaxL files.  Plus, as an added bonus, I got to heckle him during his presentation, and that alone made it worth it.

Last, but certainly not least, I went to Angela Wilcox’s session on automation with Perl scripting.  This topic is interesting to me for a number of reasons, one, I used to be quite the Perl programmer back in the day (although sadly I can barely do a Hello World program anymore), and two, getting up an running with Perl for Essbase can be a formidable challenge — especially due to the scant documentation provided by Hyperion/Oracle.  The presentation was quite interesting and it looks like her group has done some nice things to simplify their automation.  I am definitely going to take a look at where I could simplify some of my systems with some Perl scripts.

So, that was the conference in a nutshell for me.  For all of you that couldn’t attend, it’s my understanding that the ODTUG website will have the slides up within 90 days so you’ll be able to check out the content then.  If you’d like my slides from the Data Load Optimization presentation, just shoot me an email and I’d be happy to send them your way.

An Essbase Optimization Story

I got a call the other day from one of my fellow Essbase developers. One of the cubes in their budgeting system was taking a long time to finish calculations, whereas just days before the calculation time was just fine.  I have an almost unhealthy obsession with cube performance and optimization, so I jumped in to take a look.

The first thing I looked at (although not generally the first thing I look at) was the average cluster ratio.  The average cluster ratio is roughly analogous to the fragmentation of your hard drive: data gets added, removed, and the performance with regard to accessing that data is increasingly suboptimal.  Although the configuration of the sparse and dense dimensions in a BSO cube in theory dictates where data gets placed in the corresponding data files, quite often, for performance reasons, the data files will just grow to accomodate the new data instead of putting it where it “should” go.  A cluster ratio of 1.00 is optimal.  In this case, the database with performance issues was reporting a significantly lower cluster ratio.  So the first order of business was to export all of the data in the cube to a raw export file, clear the cube, then reimport.

This helped performance a little bit, but not much.  Something else needed to be done.  I checked out the index and page files in the cube directory to discover that the index files were almost four gigabytes, with about 20 gigabytes of page files.  This size of data isn’t inherently difficult to work with, but it does warrant a look into the dense and sparse dimension settings,  the calc script that is having issues, and trying to understand the way in which the database data is being used.

The database is a six dimension cube with the following dimensions: Scenario, Years, Time, Location, Customer, Commodities.  Time and Scenario are marked as dense.  I can live with Time being dense, especially since it’s just periods rolling to quarters rolling to a fiscal year (with non-Level 0 members tagged as Dynamic Calc), but the dense Scenario dimension didn’t quit jive with me.  In this particular database, it turns out that the reason for the block explosion and subsequent poor calc performance is that users and loading in tons of data to the Forecast member in Scenario.  Scenario also has four other dense members.  In addition to the cube being to to create blocks new blocks on equations, the size of the page files was blowing out with all of the data being sent in.

The first thing that jumps out at me is that even though all of this data is being loaded to the Forecast member, we’re taking a hit on storage because the dense data blocks are being allocated for the other four members in Scenario that are very sparingly utilized.  Surely there is a better candidate for a dense dimension?  You betcha.  Given the sparse nature of the database, most of the other dimensions aren’t really good candidates for dense because of all the interdimensional irrelevance that it would incur, however, the Years dimension is a good candidate because the Forecast data being loaded in is being loaded in for a particular year and all periods.  Perfect.

Given that there are not a significant number of calc, report, and load scripts in this database, it is relatively safe to change the dimensional configuration here.  This isn’t always necessarily the case and changing dense/sparse settings could very easily deoptimize the performace of calc, load, and report scripts, not to mention adversely affecting other numerous aspects of the database.

And of course, this was all tested on the test server before loading to production.  The process was fairly straightforward.  I cleared the database again, and loaded the export file.  Since export files are sensitive to the dense/sparse settings in the database, I did this so I could restructure the database with the new settings and have Essbase do the work for me instead of having to load the data from somewhere else.  I also made sure to do this before doing a calculation on the database so there would be less data to restructure.  After the data loaded in, I changed the dense and sparse settings to set Years to dense and Scenario to sparse.  After the restructure, database statistics were looking fairly good, but I didn’t want to get my hopes up.

Clearly one of the issues with this database was that the calculation script that was taking too long had to span through 20 gigabytes of page files in order to do what it needed to do.  Generally my databases are set to bitmap encoding as the compression scheme (because in most cases it is “good enough”), but I decided to go with zlib compression instead.  Although technically speaking, zlib is more computationally intensive to compress and uncompress, because our servers’ CPUs aren’t generally getting hammered, I decided that I would rather hit the CPUs a little harder if it meant that I could read the data off the disk a little faster (even though it will still uncompress in memory to the same amount as bitmap encoding or any other compression scheme).  Conversely, the stronger compression will mean less data is also written to the disk.

I then had to do some tweaking to the calc script to align it with the next dense and sparse settings.  This was fairly easy.  As with most calc script optimization, this involved changing the FIX statements and CALC DIM statements so that I was fixing on sparse dimensions and doing as much calculating within the dense data block as possible.  Remember, you are always trying to help Essbase do its job and give it hints as to how it can do things more optimally.  In this case, we are trying to minimize the amount of data that must be scanned through.  By calculating within the dense blocks, we are doing all of the work in one place that we can, then moving on to the next area.  For good measure, I also added a SET command to utilize the server’s higher calc lock block setting (see your technical reference for more details).

I fired off the calculation and checked out the results.  The calculation’s first pass took about 20 minutes (a lot of blocks were being created the first time around).  After this pass was done, I checked out the index and page files.   The index file seemed to be holding steady at about 300 megabytes — quite an improvement from four gigs!  The page file was now about 700 megabytes.  Given this new page file size, I decided to set the index cache to 512 megabytes.  Although this will eat 512 megabytes directly out of memory on the server, we have decided to allocate more memory for this app until forecasts are done, then we can pull back on the amount of memory it gets.  But for now, with enough memory to hold the entire index at one time, calculations and other operations will be much faster.

At the time of this writing, the database calculation that was the original problem is still holding steady at about 10 minutes to calculate.  The page and index files have grown a bit but still appear to be holding steady, and the database statistics indicate a pretty good block density as well.  Not a bad bit of optimization — a twentyfold decrease in size on disk, and a calculation that comes back in minutes rather than never.

Like I said, I love performance optimization, and as I’ve said before, Essbase is a bit of an art and a science.  The art part is comprehensively understanding how things work and how they are meant to work, then using that information to make the right technical tradeoffs, then putting the changes into the system, keeping in mind that you are simply trying to give “hints” to the system to help it do its job a little better or faster.