June 27, 2013

Thoughts on my Kscope Session ‘Practical Essbase Web Services’

My session went alright. I asked the audience to please be gentle on account of my first time presenting the deck and the material. They were quite gentle. I’d like to give a big huge shout out to my favoritest Essbase peeps in the world on Team Kroger for all coming and cheering me on.

My big takeaway from the audience was that a demo would have been helpful. Duly noted. I didn’t think there’d be time but the session actually came in a bit under time so there’s definitely a way to retool it and include a demo with real code and a real working example. So if I run this deck again I’ll be sure to incorporate that.

Many people were also very thankful for my apparent candor about the technology. I start off with a high-level overview of the available technologies for practically getting data out of of Essbase and come to the conclusion that while Essbase Web Services have a use case, it’s not one that I will personally be likely to use. This is largely due to my large investment in the infrastructure for Saxbi Server, which is based on the Essbase Java API.

One of my more recent thoughts about EWS is that it’s probably something that is largely intended to be used internally by Oracle and was something they knew they could clean up a bit and toss over the fence for other people to leverage. As I mentioned in the presentation, I hit some bumps in my research and will hope to see them cleaned up in future releases. Now that I have names with faces of a few more Oracle folks from the conference I’ll be looking forward to providing actual feedback to them instead of just ~~bitching~~ writing about it on my blog.

Anyway, as promised, here is the presentation for those of you that want to skim it. The bullets might be too high-level to be incredibly insightful but if you have any questions please feel free to mail me! I am happy to help and for extensive development needs I am available to consult through my firm. I have some related code I will clean up and get into a GitHun repository as time permits.

June 21, 2013

Essbase/EAS feature request: Metadata and description fields on Essbase objects

As a programmer, I find it imperative to document code as part of the coding process itself. I have learned to treat the construction of good code as a process that includes the documentation as part of the code. One of the things I love about ODI is the ability to add a description to many objects such as interfaces. Even within a single interface you can document individual elements. This is incredibly useful to provide context around why something is designed the way it is designed, to remind yourself of something, or for the next person to use/edit the code (which could be many years in the future as well as after you have moved on).

Essbase is no different. One of the most important things to document and that quite frequently get good documentation are calc scripts. We have the ability to write documentation in them, as much or as little as we want (hopefully we write as much is as needed and no more). We can add documentation to report scripts (of course, while still useful these seem to have fallen quite out of favor, given all of the tools that can move Essbase data around). We can add comments to individual members in an outline. We can add comments to our batch files and MaxL scripts. We can add notes to databases (did you know that? You can set a note on a database… I have used these quite successfully but they seem to be a quite seldom used feature despite how useful they can be, or were at least).

I wish we had more though. I’d like to be able to at least have a Description field that I could fill in for applications, databases, load rules, outlines, calc scripts (beyond the inline documentation), and any other object. I could see a lot of uses for this. Making notes on temporary databases or other temporary files, explaining the purpose of a particular load rule, quirks in an outline (notes to a future admin), and so on. Of course, it’s possible to document this in a separate document, but as we all know, these get stale and go out of date. Furthermore, they frequently neglect to document everything. So new features or files pop up and they are not included in the documentation.

Come to think of it, more metadata than even just Description would be useful: last person to edit, create time, edit time, basically all the usual stuff. As a bonus feature, the ability to associate some arbitrary files with a server or application would be nice so that we could upload a Word doc or PDF or something to its associated cube/server and have it available as a quick reference.

Anyone else think enhanced metadata and associated functionality would be useful?

June 19, 2013

Essbase & Hyperion Papercuts

I am a software developer who regards the output of the software development process as skin to the woodworker making a fine jewelry box, the master chef perfecting a plate, or the musician crafting a song. I believe solutions should be elegant, robust, deceivingly simple, and polished. I take the same approach to code with my various open source Essbase related Java projects, GitHub projects, and perhaps more importantly, the critical filter through which I see and use Hyperion, Essbase, and related software.

To that end and just for my own amusement, I will be henceforth be writing about various “Essbase and Hyperion papercuts” that I see and perceive. This is in similar spirit to the Ubuntu Papercuts project that occurred some time ago. The idea was that in the aggregate, a lot of little issues start to become troublesome and lead to a worse user experience. As with many pieces of software, Essbase in general has this problem. It’s an awesome piece of software, but then has all these little warts and quirks. It’s like NASA puts a man on the moon (this is the scientific equivalent of Essbase at its heart) and then when stepping out of the lander on the surface of the moon, trips on a faulty stair leading to the surface (this is like any number of little Essbase quirks I run into).

That being said, if there is some small, seemingly trivial and inconsequential Essbase/Hyperion (or Planning, or FDM, or ODI, or…) issue that just really grinds your gears, let me know about it! This blog gets a few hits from Oracle headquarters, so maybe, just maybe, someone will see it and we can all make the world’s best EPM software just a little bit cooler.

June 5, 2013

My ODTUG Kscope13 presentation: Practical Essbase Web Services

It has been a few years since I last presented at Kscope, but I am back this year! I will be presenting on “Practical Essbase Web Services” – this will be my take on the new web services features from recent Essbase versions, as well as drawing on my experience developing mobile solutions, developing Essbase middle tiers with the Java API, and other approaches to extracting data from Essbase. For those of you in C# shops or wanting to get at Essbase data from your other favorite languages (I’m looking at you, PHP, Python, and Clojure), this should be a fun overview of your options. I’ll look forward to seeing you there – and if you are interested in the presentation but aren’t going to ODTUG’s Kscope, let me know!

January 3, 2013

Install Essbase Java API (jar) files as a local Maven artifact

Maven is a comprehensive build system for Java projects. A lot of people, including myself, have a love/hate relationship with Maven. The reasons for this relationship can be discussed at another time. In any case, used judiciously, it can make managing dependencies in Java projects much easier than handling them by hand.

Eclipse has pretty good Maven integration. It’s possible to setup a new project and browse for dependencies and add them automatically to your project. Everything just works. I develop quite a few Java applications that rely on the Essbase Java API, so I have imported the Essbase jar files to my local repository (since they are not available from a central public repository) to make development a breeze.

Here’s how you can do the same. First, you need to go get your Essbase jar file. These are installed on the Hyperion server. You might have to search around a little bit since the directories seem to change from release to release, but in the case of this stock Hyperion 9.3.1 server (with Hyperion installed in C:\Hyperion) they can be found at C:\Hyperion\AnalyticProviderServices\lib.

Here’s what the directory looks like on one of my machines:

Hyperion Java lib folder screenshot

Right now we’re just interested in the ess_japi.jar file. We’re going to import this in to our local machine’s Maven repository. This assumes you have Maven installed locally, of course. If not it’s pretty straightforward. Just Google around and all will be clear.

Maven is very particular about the versions of everything. It allows you to store multiple versions of files. This means that our single repository can store the files for Essbase 9.3.1, 11.1.1.1.0, 11.1.1.3, and so on, all next to each other. Since we’re importing this resource manually we are going to tell it the version. First though, let’s rename this local file to something more consistent with Maven naming conventions. Let’s rename it from ess_japi.jar to essbase-japi-9.3.1.jar (since this is a file from a 9.3.1 server). Change it accordingly for other versions. If this were 11.1.1.3 then we would make it essbase-japi-11.1.1.3.jar. Note that Maven “prefers” a versioning scheme of major.minor.revision but not all software (particularly Essbase) adheres to this, so we’ll do our best.

So now we have essbase-japi-9.3.1.jar. A simple command line will import this. From a command prompt in the same folder as the jar file, execute this command:

mvn install:install-file -Dfile=essbase-japi-9.3.1.jar -DgroupId=com.essbase -DartifactId=essbase-japi -Dversion=9.3.1 -Dpackaging=jar

Each -D indicates a parameter we are filling out: the name of the file, a Maven group ID (which we’ll decide to make com.essbase), what the name of the artifact itself should be (essbase-japi), the version, and lastly that it is a jar file. You’d think Maven could infer some of this for us but we only have to do this once in a blue moon so it’s not so bad. Maven will copy the file to the local repository. To make it visible from Eclipse you will likely have to rebuild your Maven repository index which is no big deal.

Success importing Essbase jar file

Now when we are specifying the dependencies for our projects from Eclipse, we can easily browse it by name and add it in to our Maven POM file:

Eclipse Select Essbase Jar Dependency

Now we’re good to go. We can easily include this artifact in future projects quickly and easily. This is particularly useful if you happen to download the source code for some of my Essbase-related open source projects, which as of late rely on Maven for dependency management.

August 23, 2012

The Essbase Book Review to End All Reviews: Developing Essbase Applications

I received my copy of Developing Essbase Applications – Advanced Techniques for Finance and IT Professionals some time ago and spent every spare minute I could find eagerly devouring the book. I was eagerly anticipating the arrival of the book to my doorstep, hoping and wanting it to be the Essbase book to end all Essbase books. While it has its flaws, it is a must-have for the dedicated Essbase practitioner’s bookshelf. As I always say in these reviews, there are precious few Essbase books out there, so anything that helps the cause is welcome as far as I am concerned.

Developing Essbase Applications sports an impressive and diverse array of contributors. Most of the names are easily recognizable to anyone that has taken a break from writing a calc script to seek out an Essbase blog article or Google for some help on the Essbase forums. Seriously, check out this who’s who list of Essbase folks that helped create this thing: Cameron Lackpour, Dave Anderson, Joe Aultman, John Booth, Gary Crisci, Natalie Delemar, Dave Farnsworth, Michael Nader, Dan Pressman, Robb Salzmann, Tim Tow, Jake Turrell, and Angela Wilcox.

While this is probably the greatest strength of the book, it also inevitably contributes to my complaints about the book, which is not so much that it is a book, but more, a collection of books. This divide and conquer approach is probably the only realistic way to get so many people together to create such a thing, and the necessity of doing so is perhaps a reflection of the increasing breadth and scope of the Essbase ecosystem itself.

So for those of you looking to start on page 1 and work your way front to back (as I am want to do with a good programming book), you can do that, but it’s not necessary since the chapters don’t really build on each other. They just really do feel like a dozen books bound together, each with their own table of contents and style. The voice of each chapter author really shines through. In this regard, the book can be thought of as more of a reference, and indeed, some chapters are so packed full of information that there’s no reasonable way to absorb it all in one reading.

That all being said, what about the actual content of the book? It’s impressive, if slightly disjointed. In order, the chapters cover the following: Essbase infrastructure, tackling bad data, Essbase Studio, BSO, converting BSO to ASO, MDX, ASO and performance, Essbase Java API, system automation with Groovy, Advanced Smart View, and how to successfully manage an Essbase system.

Some chapters are more useful than others, and some chapters are definitely “stronger” than others. I can easily see anyone reading the book and being asked to rank their chapters in order of preference or usefulness coming up with completely different rankings from anyone else.

The way in which chapters convey their information to the reader also varies significantly from chapter to chapter. There are various degrees of difficulty that the reader has to go through to extract the useful information from the chapters of the book. This thing is fill with gold and diamonds but its up to the reader, in no small part, to extract it, as it is rarely handed over on a silver platter.

For example, starting things off is John Booth’s chapter about Essbase infrastructure. Clearly, John is a very smart guy witha lot of experience, a recent Oracle ACE who graciously donates his time in the form of numerous forum posts and even creating Amazon EC2 images of Essbase servers. However, his chapter , but I found that his chapter reads more like an animated brain dump of Essbase infrastructure. I imagine having a beer with John and saying “Tell me everything you know about infrastructure” and Chapter 1 is the result of that conversation. At the end of the day, I learned a lot, but as far as book form is concerned, would have appreciated more structure and progression to the manner in which the information was doled out.

Cameron picks things up in Chapter 2 which is ostensibly about slaying bad data in Essbase, a ubiquitous problem for any database administrator. While this is ostensibly the focus of the chapter, after a few pages, Cameron takes an immediate left turn into all things ODI and covers that. While also useful and insightful, as pedantic as it sounds, I found myself wishing the chapter had just been made to focus on ODI itself.

The “voice” in each chapter also varies, and in many instances while a chapter ostensibly wants to explain some concept to you, seems to talk as if you already know the concept. Going along again with my “you have to mine the information” analogy, I would say it’s also the responsibility of the reader to have been a somewhat astute observer of various acronyms and terms in the Essbase ecosystem even if they aren’t familiar with them beyond being able to simply unwind the acronym.

The book covers the Essbase Java API and brings in Mr. Essbase Java API himself, Tim Tow to help out. As an also experienced Essbase Java API guy myself, I certainly read this chapter very enthusiastically, and Tim does not disappoint. That being said, however, in my experience I haven’t found a lot of overlap between the Essbase administrator and Java programmers (much to my chagrin). So while I welcome the content, in some ways I question its usefulness to the average Essbase administrator. Of course, I could be mistaken and for the least part, exposing more people to the Java API in the hopes that they will pick it up and do something cool with it (as I have tried to do myself with various Essbase-related open source projects) is nevertheless extremely welcome.

The rest of the chapters round out topics from MDX to Groovy. Gary Crisci is the quintessential MDX practitioner, having presented on it and posted many an Essbase forum comment on it and does a nice job, as always, of conveying information on a topic that can be challenging. I have looked at Groovy in the past but haven’t made the jump just yet (too busy with Clojure and Go, I guess) but was found it to be an enjoyable read for what surely must be an uncommon, if esoteric, way to automate Essbase.

Lastly, on a purely aesthetic and non-functional note, I would like to say that I would have enjoyed more and better visuals in the book, in fact, the formatting of the book leaves something to be desired. For a tech book I would have loved to see something more modern than Times New Roman (for example, the formatting in O’Reilly and Apress books is second to none). Many of the tables and figures are laid out so as to be unhelpful and awkwardly cram information into a weird presentation, and a perennial pet peeve of mine is inconsistent framing, cropping, and styling of screenshots. Of course, this is all pretty predantic stuff and has more to do with my almost closet obsession with fonts and data visualization.

All that being said, all of the little things that I didn’t like about the book or thought could be better are easily and handily overshadowed by all of the things that I did like about the book. This thing is jam-packed full of content and, perhaps more importantly, content that is based on the vast experience of some of the top Essbase people in the industry. These aren’t people pretending to know their stuff or trying to fool you, this is people at the top of their game in the top of their field that are sharing some top shelf knowledge with the rest of us. You just have to dig a little to get at it. At $69.99 list price on Amazon, this book is a little bit pricey for the average tech book. But you know what? Even if you only got one or two useful things out of it, it would absolutely be worth the price. This is an absolutely essential book for anyone that is serious about using Essbase and being a successful administrator or consultant. On a classic scale from one to five stars, I easily give this four stars.

March 17, 2012

Book review: Oracle Essbase 11 Development Cookbook

I was recently given the opportunity to review another Essbase book from Packt: Oracle Essbase 11 Development Cookbook by Jose Ruiz. Overall I would say I am pleased with the book. It covers a lot of ground and a lot of disparate tools, many of which are scantily documented elsewhere.

Before I really get into the review, I must say that I have never been a big fan of the approach that technology cookbooks take. I’m also not a huge fan of having a book for a specific version of software. Of course, in order for the cookbook approach to work you don’t have a choice but to tie to a version of software. This is because the recipes are sequential and very explicit — as with cooking a recipe in real life — and rely on the exact version of the software in order for the detailed steps of the recipe to work. I’ve grown up with software, and am a cross between a visual and a kinesthetic learner, so my preference is to have concepts and goals explained to me, then to go exploring on my own. To this end, I find technology/recipe books to be tedious as they laboriously lay out the steps: click this, then click that, enter this text in, and 15 steps later you have a result.

So, my personal preference for book styles aside, this book largely succeeds for what it is: specific, methodical ways to perform a certain task. You won’t get a lot of explanation on why you might do something a certain way. In this regard, the book is useful as a complement to your Essbase literature rather that the place you would go to understand why you might want to accomplish some task.

Okay, now that I have beat up on that horse enough.

As I said, I enjoyed the breadth of content in the book. There are detailed recipes for setting up your relational data store to load a cube with EIS and Essbase Studio, building load rules and loading data to BSO/ASO cubes, writing calc scripts, working with Star Analytics, using EAS, HFR, writing MaxL scripts, and provisioning security. It even covers working with the revered Outline Extractor tool.

All of this content was really nice to see in book form. One of the upsides to the recipe format book is that it won’t spend a lot of time laboring over what a cube is and your first steps retrieving data with Excel. In fact, the book even says it’s not for beginners. It just jumps right in. I think this book can be a very handy reference for someone that needs something a little more guided than the technical reference (and less heavy).

On my arbitrary rating system, I would give this book a four out of five star rating. And again, that’s me trying to be fair to the book even though I’m not in love with this format, but it largely accomplishes what it sets out to do. I’d say it’s a great addition to the pragmatic Essbase developer’s library, but certainly not the only book in it.

November 15, 2009

Book review: Oracle Essbase 9 Implementation Guide

There’s a new Essbase book out, and I was fortunate enough to receive a copy to review. Glenn Schwartzberg already beat me to the punch with his review, which I would say is fairly thorough as well as fair. There are precious few Essbase books out there, so the more the merrier, I say.

The book starts out with the basics, covering some of the history of Essbase, and explaining multi-dimensionality, then walks you through the install process. Obviously, based on the title, you can tell that this book is for System 9, whereas the current version is 11 (or “eleven dot one dot something dot ahh screw it” as I like to say). While this means that the specific installation instructions are different, from a conceptual standpoint, everything else is quite similar. In fact, the core of Essbase has changed precious little over the years, so even people in shops running Essbase 7 or 6 could benefit as well.

After installation, a tour of Essbase through EAS is given, using a fictitious automobile company and its databases to illustrate the concepts. It’s kind of nice to see a change from the usual The Beverage Company that we’re all so familiar with. “Write what you know,” they say. As one of the authors works for Ford, it’s not surprising that the examples are somewhat of a thinly veiled obfuscation of some real-world databases, which is reminiscent of my own efforts to scrub examples from a certain large grocery company.

The rest of the book proceeds with a methodical tour of various Essbase components, including outlines, dimension-build load rules, loading data, calc scripts, using the Excel add-in, report scripts, automation, ASO considerations, and SmartView.

As the book is geared more for the OLAP novice, I did not personally get a lot out of the book. I am still waiting for a book to come out along the lines of “Effective Essbase for those with a slightly unhealthy obsession with analytics” or something similar. There are some technical details in the book that seem a bit off, but may be the author’s attempt to gloss over some arcane specifics for the newbie. So if you aren’t familiar with Essbase, or are just familiar with the basics, this could be a very helpful book for you. As I mentioned, there are precious few Essbase books out there, so I welcome any and all attempts to distribute knowledge from the OLAP grandmasters to those that would take the time to read it.

You can find this book online at Packt’s website, where you can find a sample chapter as well.

August 5, 2009

A comprehensive Essbase automation optimization story, or: Shaving off every last second

I’ve been wanting to write this post for awhile. Like, for months. Some time ago I took an in-depth look at profiling the performance (duration-wise) of an automated system I have. The system works great, it’s worked great, I don’t need to touch it, it just does its thing. This is the funny thing about systems that live in organizations: some of the rather inglorious ones are doing a lot of work, and do it reliably. Why reinvent the wheel and risk screwing things up?

Well, one reason is that you need to squeeze some more performance out of that system. The results of my profiling efforts showed that the process was taking about an hour and a half to run. So, considering that the processing of the data is an hour behind the source system (the source system drops off text files periodically for Essbase), and it takes an hour and a half to run, the data in the cubes is some two and a half hours behind the real world numbers in the original system.

So, first, a little background on what kind of processing we have here. The gist of how this particular system works is that a mainframe manages all of the data, and certain events trigger updated files to get delivered to the Essbase server. These files are delivered at predictable intervals, and automation jobs are scheduled accordingly (about an hour after the text files are scheduled to be delivered, just to be on the safe side). The data that comes in is typical financial data — a location, a time period, a year, an account, and an amount.

Pretty simple, right? Well, there are about twenty cubes that are financial in nature that are modeled off of this data. The interesting thing is that these cubes represent certain areas or financial pages on the company’s chart of accounts. Many of the pages are structurally similar, and thus grouped together. But the pages can be wildly different from each other. For this reason, it was decided to put them in numerous cubes and avoid senseless inter-dimensional irrelevance. This keeps the contents of the cubes focused and performance a little better, at the expense of having to manage more cubes (users and admins alike). But, this is just one of those art versus science trade-offs in life.

Since these cubes add a “Departments” dimension that is not present in the source data, it is necessary to massage the data a bit and come up with a department before we can load the raw financial data to a sub-cube. Furthermore, not all the cubes take the same accounts so we need some way to sort that out as well. Therefore, one of the cubes in this process is a “staging” database where all of the data is loaded in, then report scripts are run against certain cross sections of the data, which kick out smaller data files (with the department added in) that are then loaded to the other subsequent cubes. The staging database is not for use by users — they can’t even see it. The staging database, in this case, also tweaks the data in one other way — it loads locations in at a low level and then aggregates them into higher level locations. In order to accomplish this, the staging database has every single account in it, and lots of shared members (accounts can be on multiple pages/databases).

That being said, the database is s highly sparse. For all of these reasons, generating reports out of the staging database can take quite a bit of time, and this is indicated as the very wide brown bar on the performance chart linked above. In fact, the vast majority of the total processing time is just generating reports out of this extremely sparse database. This makes sense because if you think about what Essbase has to do to generate the reports, it’s running through a huge section of database trying to get what it wants, and frankly, who knows if the reports are even setup in a way that’s conducive to the structure of the database.

Of course, I could go through and play with settings and how things are configured and shave a few seconds or minutes off here and there, but that really doesn’t change the fact that I’m still spending a ton of my time on a task, that quite simply, isn’t very conducive to Essbase. In fact, this is the kind of thing that a relational database is awesome at. I decided, first things first, let’s see if we can massage the data in SQL Server, and load up the databases from that (and get equivalent results).

The SQL Server staging database is relatively simple. There is now a table that will have the data loaded to it (this is the data that was loaded to the staging cube with a load rule). The other tables include a list of financial pages (these are the departments on the sub-cubes, so to speak), a table for the list of databases (you’ll see why in a minute), a table for linking the departments/pages to a particular database, a table linking accounts to departments (pages), a page-roll table to manage the hierarchy of how pages aggregate to “bigger” pages, and a location/recap table that links the different small locations to their bigger parent locations (this is the equivalent of the location summing from the original staging database).

With all these tables in place, it’s time to add a few special views that will make life easier for us (and our load rules):

SELECT
     D.database_name
     , P.physical_page
     , A.account
     , K.div
     , K.yr
     , K.pd
     , K.amt
FROM
     account_dept AS A INNER JOIN
     jac_sum_vw AS K ON A.account = K.account INNER JOIN
     page P ON A.G_id = P.G_id INNER JOIN
     page_database AS G ON A.G_id = G.G_id INNER JOIN
     dbase AS D ON G.database_id = db.database_id

Obviously I didn’t provide the schema for the underlying tables, but this still should give you a decent idea of what’s going on. There’s another view that sits on top of this one that takes care of the summing of locations for us, but it’s structurally similar to this one, so no surprises there. What I end up with here is a view that has columns for the database name, the department, the account, location, year, period, and the amount. So not only is this perfect for a fact table down the road when I convert this to EIS, I can also use the exact same view for all of my databases, and a very similar load rule in each database that simply references a different database.

This all took a day or two to setup, but after I started getting data how I wanted, I got pretty stoked that I was well on my way to boosting performance of this system. One of the nice things about the way it’s setup in the relational database is also that it’s very flexible — extra databases, departments, locations, and anything else can be added to one central place without too much trouble.

Interestingly enough, this entire system is one that’s been in production for years but the “test” copy of things was sort of forgotten about and completely out of sync with production. Since I was getting closer to the point where I was ready to load some cubes, I needed to sync test to prod (oddly enough) so I could do all testing without hosing things up. Basically I just copied the apps from the production server to the test server with EAS, copied the existing automation folder, changed some server names and passwords (which were all hard-coded, brilliant…), and was pretty much good to go. On a side note, I took the opportunity to rewrite the automation in modular, test-ready form (basically this involved cleaning up the old hard-coded paths and making it so I could sync test to prod much easier).

My next step was to design a load rule to load data up, to make sure that the structure of the tables and views was sufficient for loading data. I did a normal SQL load rule that pulled data from the view I setup. The load rule included some text replacements to adapt my version of departments to the actual alias name in the database, but otherwise didn’t need anything major to get working. I loaded one period of data, calculated, saw that I was off a bit, researched some things, tweaked the load rule, recalculated, and so on, until finally the numbers were tying out with what I was seeing. Not bad, not bad at all.

After this basic load rule was working correctly, I started gutting the automation system to clean things up a bit more and use the new load rule (with the old calcs). After I felt good about the basic layout of things and got to the point where I was ready to copy this out to the other 18 or however many cubes. Then I put some hooks in the automation to create a log file for me, so I can track performance.

For performance profiling, I used a technique that I’ve been using for awhile and have been quite happy with. I have a small batch file in the automation folder called essprof.bat that has this in it:

@For /f "tokens=2-4 delims=/ " %%a in ('date /t') do @set FDATE=%%c-%%a-%%b
@echo %2,%FDATE% %TIME% >> %1

Basically when you call this file from another batch file, you tell it what file you want to append the data to, what the step should be named, and it takes care of the rest. Here is how I might call it from the main automation script:

CALL essprof.bat %PROFFILE% START-PROCESS

The PROFFILE variable comes from a central configuration file and points to a location in the logging folder for this particular automation set. For this particular line, I would get output like the following:

START-PROCESS,2009-07-31  8:57:09.18

The For loop in the batch file parses the DOS/shell date command a little bit, and together with the TIME variable, I get a nice timestamp — one that, incidentally, can be read by Excel very easily.

So what’s this all for? As I did in the original performance chart, I want to profile every single step of the automation so I can see where I’m spending my time. Once I filled out the rest of the load rules and ran the process a few times to work out the kinks, I now had a good tool for analyzing my performance. Here is what one of the initial runs looked like:

start-db02-process 	10:54 AM
finish-db02-process 	10:54 AM
start-db03-process 	10:54 AM
finish-db03-process 	10:57 AM
start-db03GP-process 	10:57 AM
finish-db03GP-process 	10:58 AM
start-db04-process 	10:58 AM
finish-db04-process 	11:13 AM
start-db05-process 	11:13 AM
finish-db05-process 	11:13 AM
start-db06-process 	11:13 AM
finish-db06-process 	11:13 AM
start-db07A-process 	11:13 AM
finish-db07A-process 	11:13 AM
start-db07D-process 	11:13 AM
finish-db07D-process 	11:14 AM
start-db07L-process 	11:14 AM
finish-db07L-process 	11:14 AM
start-db07M-process 	11:14 AM
finish-db07M-process 	11:14 AM
start-db07T-process 	11:14 AM
finish-db07T-process 	11:14 AM
start-db08-process 	11:14 AM
finish-db08-process 	11:14 AM
start-db09-process 	11:14 AM
finish-db09-process 	11:14 AM
start-db10-process 	11:14 AM
finish-db10-process 	11:14 AM
start-db11-process 	11:14 AM
finish-db11-process 	11:14 AM
start-db12-process 	11:14 AM

And how do we look? We’re chunking through most of the databases in about 20 minutes. Not bad! This is a HUGE improvement over the hour and a half processing time that we had earlier. But, you know what, I want this to run just a little bit faster. I told myself that if I was going to go to the trouble of redoing the entire, perfectly working automation system in a totally new format, then I wanted to get this thing down to FIVE minutes.

Where to start?

Well, first of all, I don’t really like these dense and sparse settings. Scenario and time are dense and everything else is sparse. For historical reasons, the Scenario dimension is kind of weird in that it contains the different years, plus a couple budget members (always for the current year), so it’s kind of like a hybrid time/scenario dimension. But, since we’ll generally be loading in data all for the same year and period at once (keep in mind, this is a period-based process we’re improving), I see no need to have Scenario be dense. Having only one dense dimension (time) isn’t quite the direction I want to go in, so I actually decided to make location a dense dimension. Making departments dense would significantly increase my inter-dimensional irrelevance so Location seems like a sensible choice — especially given the size of this dimension, which is some 20 members or so. After testing out the new dense/sparse settings on a couple of databases and re-running the automation, I was happy to see that not only did the DB stats look good, but the new setting was helping improve performance as well. So I went ahead and made the same change to all of the other databases, then re-ran the automation to see what kind of performance I was getting.

End to end process time was now down to 12 minutes — looking better! But I know there’s some more performance out there. I went back to the load rule and reworked it so that it was “aligned” to the dense/sparse settings. That is, I set it so the order of the columns is all the sparse dimensions first, then the dense dimensions. The reason for this is that I want Essbase to load all the data to a single data block that it can, and try to minimize the number of times that the data block is loaded to memory.

Before going too much further I added some more logging to the automation so I could see exactly when the database process started, when it ran a clearing calc script, loaded data, and calculated again.

Now I was down to about 8 minutes… getting better. As it turns out, I am using the default calculation for these databases, which is a CALC ALL, so that is a pretty ripe area for improvement. Since I know I’m loading data all to one time period and year, and I don’t have any “fancy” calcs in the database, I can rewrite this to fix on the current year and period, aggregate the measures and departments dimensions, and calc on the location dimension. By fancy, I’m referring to instances were a simple aggregation as per the outline isn’t sufficient — however, in this case, just aggregating works fine. I rewrote the FIX, tested it, and rolled it out to all the cubes. Total end to end load time was now at about four minutes.

But, this four minute figure was cheating a little since it didn’t include the load time to SQL, so I added in some of the pre-Essbase steps such as copying files, clearing out the SQL table, and loading in the data. Fortunately, all of these steps only added about 15 seconds or so.

I decided to see if using a more aggressive threads setting to load data would yield a performance gain — and it did. I tweaked essbase.cfg to explicitly use more threads for data loading (I started with four threads), and this got total process time down to just under three minutes (2:56).

As an aside, doing in three minutes what used to take 90 would be a perfectly reasonable place to stop, especially considering that my original goal was to get to five minutes.

But this is personal now.

Me versus the server.

I want to get this thing down to a minute, and I know I still have some optimizations left on the table that I’m not using. I’ve got pride on the line and it’s time to shave every last second I can out of this thing.

Let’s shave a few more seconds off…

Some of the databases don’t actually have a departments dimension but I’m bringing a “dummy” department just so my load rules are all the same — but why bring in a column of data I don’t need? Let’s tweak that load rule to skip that column (as in, don’t even bother to bring it in from SQL) on databases that don’t have the departments dimension. So I tweaked the load rule got the whole process down to 1:51.

Many of these load rules are using text replacements to conform the incoming department to something that is in the outline… well, what if I just setup an alternate alias table so I don’t even have to worry about the text replacements? It stands to reason, from an Essbase data load perspective, that it’s got to cycle through the list of rows on the text replace list and check it against the incoming data, so if I can save it the trouble of doing that, it just might go faster. And indeed it does: 1:39 (one minute, thirty nine seconds). I don’t normally advocate just junking up the outline with extra alias tables, but it turns out that I already had an extra one in there that was only being used for a different dimension, so I added my “convenience” aliases to that. I’m getting close to that minute mark, but of course now I’m just shaving tiny bits off the overall time.

At this point, many of the steps in the 90-step profiling process are taking between 0 and 1 seconds (which is kind of funny, that a single step starts, processes, and finishes in .3 seconds or something), but several of them stand out and take a bit longer. What else?

I tried playing with Zlib compression, thinking that maybe if I could shrink the amount of data on disk, I could read it faster into memory. In all cases this seemed to hurt performance a bit so I went back to bitmap. I have used Zlib compression quite successfully before, but there’s probably an overhead hit I’m taking for using it on relatively “small” database — in this case I need to get in and get out as fast as I can and bitmap seems to allow me to do that just a little faster, so no Zlib this time around (but I still love you Zlib, no hard feelings okay?).

Okay, what ELSE? I decided to bump the threads on load to 8 to see if I got more of a boost — I did. Total load time is now at 1:25.

The SQL data load using bcp (the bulk load command line program) takes up a good 10 seconds or so, and I was wondering if I could boost this a bit. bcp lets you tweak the size of the packets, number of rows per batch, and lock the table you’re loading to. So I tweaked the command line to use these options, and killed another few seconds off the process — down to 1:21.

Now what? It turns out that my Location dimension is relatively flat, with just two aggregated members in it. I decided that I don’t really need to calculate these and setting them as dynamic would be feasible. Since location is dense this has the added benefit of removing two members from my stored data block, or about 10 percent in this case. I am definitely approaching that area where I am possibly trading query performance just for a little bit faster load, but right now I don’t think I’m trading away any noticeable performance on the user side (these databases are relatively small).

Just reducing the data blocks got me down to 1:13 or so (closer!), but since there are now no aggregating members that are stored in the Location dimension, I don’t even need to calculate this dimension at all — so I took the CALC DIMs out of all the calc scripts and got my calc time down further to about 1:07.

Where oh where can I find just 7 more seconds to shave off? Well, some of these databases also have a flat department structure, so I can definitely shave a few seconds and save myself the trouble of having to aggregate the departments dimension by also going to a dynamic calc on the top level member. So I did this where I could, and tweaked the calcs accordingly and now the automation is down to about 1:02.

And I am seriously running out of ideas as to where I can shave just a COUPLE of more seconds off. Come on Scotty, I need more power!

Well, keeping in mind that this is a period process that runs numerous times during closing week, part of this process is to clear out the data for the current period and load in the newer data. Which means that every database has a “clear” calc script somewhere. I decided to take a look at these and see if I could squeeze a tiny bit of performance out.

I wanted to use CLEARBLOCK because I’ve used that for good speedups before, but it’s not really effective as a speedup here because I have a dense time dimension and don’t want to get rid of everything in it (and I think CLEARBLOCK just reverts to a normal CLEARDATA if it can’t burn the whole block). However, there were still some opportunities to tweak the clearing calc script a little so that it was more conducive to the new dense and sparse settings. And sure enough, I was able to shave .1 or .2 seconds off of the clear calc on several databases, giving me a total process time of……. 59.7 seconds. Made it, woot!

Along the way I tried several other tweaks here and there but none of them gave me a noticeable performance gain. One change I also made but seems to be delivering sporadic speed improvements is to resize the index caches to accommodate the entire index file. Many of the indexes are less than 8 megabytes already so they’re fine in that regard, some of them aren’t so I sized them accordingly. I’d like to believe that keeping the index in memory is contributing to better performance overall but I just haven’t really been able to profile very well.

After all that, in ideal circumstances, I got this whole, previously hour-and-a-half job down to a minute. That’s not bad. That’s not bad at all. Sadly, my typical process time is probably going to be a minute or two or longer as I adjust the automation to include more safety checks and improve robustness as necessary. But this is still a vast improvement. The faster turnaround time will allow my users to have more accurate data, and will allow me to turn around the databases faster if something goes wrong or I need to change something and update everything. In fact, this came in extremely useful today while I was researching some weird variances against the financial statements. I was able to make a change, reprocess the databases in a minute or so, and see if my change fixed things. Normally I’d have to wait an hour or more to do that. I may have optimized more than I needed to (because I have an almost unhealthy obsession with performance profiling and optimization), but I think it was worth it. The automation runs fast.

So, my advice to you: always look at the big picture, don’t be afraid to use a different technology, get metrics and refer to them religiously (instead of “hmm, I think it’s faster this way), try to think like Essbase — and help it do its job faster, and experiment, experiment, experiment. Continue Reading…

June 30, 2009

The %COMPUTERNAME%/MaxL trick for syncing test to production

There’s an automation trick I’ve been using for awhile that I like. Having talked about automation and such with other people at ODTUG this year, it seems that several people are using this technique or a variant of it on their own systems.

Basically, the idea is that you want to be able to sync your production automation scripts from your test server as easily as possible. The number one cause of scripts not working after copying them from test to production is because they have some sort of hard-coded path, filename, server name, user id, password, or other value that simply doesn’t work on the server you are dropping your scripts onto.

Therefore, you want to try and write your automation scripts as generically as possible, and use variables to handle anything that is different between test and production. As an added bonus for making the sync from test to prod just a little bit easier, why not dynamically choose the proper configuration file?

Assuming you are running on Windows (the same concept will work on other platforms with some tweaks for your local scripting environment), one way to handle it is like this: your main script is main.bat (or whatever). One of the environment variables on a windows server is the COMPUTERNAME variable. Let’s say that your test server is essbase30 and your production server is essbase10. On the production server, COMPUTERNAME is essbase10.

Knowing that we can set environment variables in a batch file that will be available in MaxL, we could setup a file called essbase30.bat that has all of our settings for when the script runs on that server. For example, the contents of essbase30.bat might be this:

SET ESSUSR=admin
SET ESSPW=password
SET ESSSERVER=essbase30

From main.bat, we could then do this:

cd /d %~dp0
call %COMPUTERNAME%.bat
essmsh cleardb.msh

Assuming that the two batch files and the cleardb.msh are in the same folder, cleardb.msh could contain the following MaxL:

login $ESSUSR identified by $ESSPW on $ESSSERVER;
alter database Sample.Basic reset data;
logout;
exit;

Now for a little explanation. Note that in essbase30.bat I am explicitly setting the name of the server. We could assume that this is localhost or the COMPUTERNAME but why not set it here so that if we want to run the script against a remote server, we could do that as well (note that if we did run it remotely, we’d have to change the name of our batch file to match the name of the server running the script). In general, more flexibility is a good thing (don’t go overboard though). The first line of main.bat (the cd command) is simply a command to change the current directory to the directory containing the script. This is handy if our script is launched from some other location — and using this technique we don’t have to hard-code a particular path.

Then we use a call command in the batch file to run the batch file named %COMPUTERNAME%.bat, where %COMPUTERNAME% will be replaced with the name of the computer running the automation, which is in this case essbase30. The batch file will run, all of the SET commands inside it will associate values to those environment variables, and control flow will return to the calling batch file, which then calls essmsh to run cleardb.msh (note that essmsh should be in your current PATH for this to work). The file cleardb.msh then runs and can “see” the environment variables that we set in essbase30.bat.

If we want to, we can set variables for folder names, login names, SQL connection names/passwords, application names, and database names. Using this technique can make your MaxL scripts fairly portable and more easily reusable.

In order to get this to work on the production server, we could just create another batch file called essbase10.bat that has the same contents as essbase30.bat but with different user names and passwords or values that are necessary for that server.

For all you advanced batch file scripters out there, it may be necessary to use a setlocal command so the variables in the batch file don’t stomp on something you need that is already an environment variable. As you can see, I’m a big fan of the %COMPUTERNAME% technique, however, there are a few things to watch out for:

You might be putting passwords in clear text in a batch file. You can alleviate some of this by using MaxL encryption, although I haven’t actually done this myself. The folder with these files on my servers already have filesystem level security that prevent access, and for the time being, this has been deemed good enough.
It’s difficult with this technique to run automation for test and prod from the same server (say, some central scheduling server). It’s not too difficult to address this, though.
If you run the automation from a remote server instead of the Essbase server itself, you may end up with lots of different config files — in which case, you might want to adjust this technique a bit.

As for deploying your test files to your production server, if you’ve set everything up in a generic way, then simply changing the variables in the configuration file should allow you to run the exact same script on each server. Therefore, your deploy method could literally be as simple as copying the folder from the test server to the production server. I’m actually using Subversion to version control the automation systems now, but a simple file copy also works just as well

jason's hyperion blog

essbase from the trenches

Tag Archives: essbase