January 27, 2009

Remote server automation with MaxL

Did you know that you don’t have to run your MaxL automation on the Essbase server itself? Of course, there is nothing wrong with running your Essbase automation on the server: network delays are less of a concern, it’s one less server to worry about, and in many ways, it’s just simpler. But perhaps you have a bunch of functionality you want to leave on a Windows server and have it run against your shiny new AIX server, or you just want all of the automation on one machine. In either case, it’s not too difficult to setup, you just have to know what to look out for.

If you’re used to writing MaxL automation that runs on the server, there are a few things you need to look out for in order to make your automation more location-agnostic. It is possible to specify the locations of rules, reports, and data files all using either a server-context or a client-context. For example, your original automation may have referred to absolute file paths that are only valid if you are on the server. If the automation is running on a different machine then it’s likely that those paths are no longer valid. You can generally adjust the syntax to explicitly refer to files that are local versus files that are remote.

The following example is similar in content to an earlier example I showed dealing with converting an ESSCMD automation system to MaxL. This particular piece of automation will also run just as happily on a client or workstation or remote server (that has the MaxL interpreter, essmsh installed of course). Keeping in mind that if we do run this script on our workstation, however, the entries highlighted in red refer to paths/files on the server, and the text highlighted in green refer to things that are relevant to the client executing the script. So, here is the script:

/* conf includes SET commands for the user, password, server
   logpath, and errorpath */

msh "conf.msh";

/* Transfer.Data is a "dummy" application on the server that is useful
   to be able to address text files within a App dot Database context 

   Note that I have included the ../../ prefix because with version 7.1.x of
   Essbase even though prefixing the file name with a directory separator is
   supposed to indicate that the path is an app/database path, I can't get it
   to work, but using ../../ seems to work (even on a Windows server)

 */

set DATAFOLDER = "../../Transfer/Data";

login $ESSUSER identified by $ESSPW on $ESSSERVER;

/* different files for the spool and errors */

spool stdout on to "$LOGPATH/spool.stdout.PL.RefreshOutline.txt";
spool stderr on to "$LOGPATH/spool.stderr.PL.RefreshOutline.txt";

/* update P&L database 

   Note that we are using 3 different files to update the dimensions all at once
   and that suppress verification is on the first two. This is roughly analogous
   to the old BEGININCBUILD-style commands from EssCmd

*/

import database PL.PL dimensions

    from server text data_file "$DATAFOLDER/DeptAccounts.txt"
    using server rules_file 'DeptAcct' suppress verification,

    from server text data_file "$DATAFOLDER/DeptAccountAliases.txt"
    using server rules_file 'DeptActA' suppress verification,

    from server text data_file "$DATAFOLDER/DeptAccountsShared.txt"
    using server rules_file 'DeptShar'

    preserve all data
    on error write to "$ERRORPATH/dim.PL.txt";

/* clean up */

spool off;

logout;
exit;

This is a script that updates dimensions on a fictitious “PL” app/cube. We are using simple dimension build load rules to update the dimensions. Following line by line, you can see the first thing we do is run the “conf.msh” file. This is merely a file with common configuration settings in it that are declared similarly to the following “set” line. Next, we set our own helper variable called DATAFOLDER. While not strictly necessary, I find that it makes the script more flexible and cleans things up visually. Note that although it appears we are using a file path (“../../Transfer/Data”) this actually refers to a location on the server, specifically, it is the app/Transfer/Data path in our Hyperion folder (where Transfer is the name of an application and Data is the name of a database in that application). This is a common trick we use in order to have both a file location as well as a way to refer to files in an Essbase app/db way.

Next, we login to the Essbase server. Again, this just refers to locations that are defined in the conf.msh file. We set our output locations for the spool command. Here is our first real difference when it comes to running the automation on the server versus running somewhere else. These locations are relevant to the system executing the automation — not the Essbase server.

Now on to the import command. Note that although we are using three different rules files and three different input files for those rules files, we can do all the work in one import command. Also note that the spacing and spanning of the command over multiple lines makes it easier for us humans to read — and the MaxL interpreter doesn’t really care one way or another. The first file we are loading in is DeptAccounts.txt, using the rules file DeptAcct.

In other words, here is the English translation of the command: “Having already logged in to Essbase server $ESSSERVER with the given credentials, update the dimensions in the database called PL (in the Application PL), using the rules file named DeptAcct (which is also located in the database PL), and use it to parse the data in DeptAccounts.txt file (which is located in the Transfer/Data folder. Also, suppress verification of the outline for the moment.”

The next two sections of the command do basically the same thing, however we omit the “suppress verification” on the last one so that now the server will validate all the changes for the outline. Lastly, we want to preserve all of the data currently in the cube, and send all rejected data (records that could not be used to update the dimensions) to the dim.PL.txt file (which is located on the machine executing this script, in the $ERRORPATH folder).

So, as you can see, it’s actually pretty simple to run automation on one system and have it take action on another. Also, some careful usage of MaxL variables, spacing, and comments can make a world of difference in keeping things readable. One of the things I really like about MaxL over ESSCMD is that you don’t need a magic decoder ring to understand what the script is trying to do — so help yourself and your colleagues out by putting that extra readability to good use.

January 20, 2009

Essbase Performance Optimization: it’s not just the calc script

Here’s a quick post that is a bit of a precursor to some of my more in-depth performance analysis articles that will be coming out in the future. One of my automation systems takes a bit over an hour to run. There are a lot of people I know that need to squeeze performance out of their systems and immediately look to their calc scripts. Yes, calc time can be a large part of your downtime, as can data loads, reports, and other activities. But I always stress that it is useful and important to understand your systems in their entirety.

As part of looking at the bigger picture, I put together the following graph showing each step and how long it takes in this system that takes around an hour. It’s not hard to tell that the majority of the time that it takes to run this job (the brownish bar that takes about an hour) is in one task! And what is that task? It’s a bunch of report scripts running on a staging database. This is clearly an obvious place for me to look at ways of saving time.

Duration of Steps for an Essbase Automation Process

The staging database is is a rather clever cube that is essentially used to scrub, aggregate, and associate raw account level data to some more meaningful dimensional combinations for all of the other databases. Data comes in, it’s calculated, and it outputs a bunch of report scripts. Fundamentally, the reason that this approach takes so much time is that there are two highly sparse dimension combinations with tens of thousands of members each, and the report script writer has to go through a ton of on-disk data in order to figure out what to write. I could spend some time trying to optimize this process, in fact, I could probably play with some settings and get at least 20% improvement right now.

But, this is one of those times where it pays to stand back and look at what we’re trying to accomplish. As it turns out, I actually have all of the infrastructure I need to accomplish this task, but it’s in a SQL database. And, the task that is being performed is actually much more conducive to the way that a relational database works. I’m still putting the finishing touches on this process, but it’s mostly complete as of right now, and the performance is amazing. I can pump through the same amount of data in mere minutes now, with no loss of functionality.

My specific goal is to get this process that takes an hour or longer, to run in less than five minutes. I chose this instead of “as fast as possible” because I wanted something concrete and attainable. (My secret goal, just for kicks, is to get this all to run in under a minute). Once the automation for the SQL staging is all in place, I will be going through all of the individual databases and tweaking any and all settings in order to shave their downtime as well.

Historically, not a lot of effort has gone into extensive profiling on these cubes, so as nerdy as it sounds, I’m actually very interested to see where else I can shave a few seconds off. At first this will undoubtedly involve using more write threads in the dataload, rewriting the calc scripts to tighten them up from just their current CALC ALL, aligning the order of the data fields and rows with the dense/sparse-ness of the outlines and the outline order, choosing better cache settings that are customized for the size of the index and page files, and perhaps looking at benefits of zlib compression (theoretically more CPU time to compress/decompress, however, generally the CPUs on these servers are not slammed very hard, so if I can get the size of the physical page files down, I may be able to read it into memory faster).

So remember — you spend a lot of time doing calculations, but that might not alway be where the low hanging fruit is. I cannot stress enough the importance of understanding where you spend your time, and using that as a basis for helping Essbase do its job faster.

January 6, 2009

MaxL Essbase automation patterns: moving data from one cube to another

A very common task for Essbase automation is to move data from one cube to another. There are a number of reasons you may want or need to do this. One, you may have a cube that has detailed data and another cube with higher level data, and you want to move the sums or other calculations from one to the other. You may accept budget inputs in one cube but need to push them over to another cube. You may need to move data from a “current year” cube to a “prior year” cube (a data export or cube copy may be more appropriate, but that’s another topic). In any case, there are many reasons.

For the purposes of our discussion, the Source cube is the cube with the data already in it, and the Target cube is the cube that is to be loaded with data from the source cube. There is a simple automation strategy at the heart of all these tasks:

Calculate the source cube (if needed)
Run a Report script on the source cube, outputting to a file
Load the output from the report script to the target cube with a load rule
Calculate the target cube

This can be done by hand, of course (through EAS), or you can do what the rest of us lazy cube monkeys do, and automate it. First of all, let’s take a look at a hypothetical setup:

We will have an application/database called Source.Foo which represents our source cube. It will have dimensions and members as follows:

Location: North, East, South, West
Time: January, February, …, November, December
Measures: Sales, LaborHours, LaborWages

As you can see, this is a very simple outline. For the sake of simplicity I have not included any rollups, like having “Q1/1st Quarter” for January, February, and March. For our purposes, the target cube, Target.Bar, has an outline as follows:

Scenario: Actual, Budget, Forecast
Time: February, …, November, December
Measures: Sales, LaborHours, LaborWages

These outlines are similar but different. This cube has a Scenario dimension with Actual, Budget, and Forecast (whereas in the source cube, since it is for budgeting only, everything is assumed to be Budget). Also note that Target.Bar does not have a Location dimension, instead, this cube only concerns itself with totals for all regions. Looking back at our original thoughts on automation, in order for us to move the data from Source.Foo to Target.Bar, we need to calculate it (to roll-up all of the data for the Locations), run a report script that will output the data how we need it for Target.Bar, use a load rule on Target.Bar to load the data, and then calculate Target.Bar. Of course, business needs will affect the exact implementation of this operation, such as the timing, the calculation to use, and other complexities that may arise. You may actually have two cubes that don’t have a lot in common (dimensionally speaking), in which case, your load rule might need to really jump through some hoops.

We’ll keep this example really simple though. We’ll also assume that the automation is being run from a Windows server, so we have a batch file to kick things off:

cd /d %~dp0
essmsh ExportAndLoadBudgetData.msh

I use the cd /d %~dp0 on some of my systems as a shortcut to switch the to current directory, since the particular automation tool installed does not set the home directory of the file to the current working directory. Then we invoke the MaxL shell (essmsh, which is in the PATH) and run ExportAndLoadBudgetData.msh. I enjoy giving my automation files unnecessarily long filenames. It makes me feel smarter.

As you may have seen from an earlier post, I like to modularize my MaxL scripts to hide/centralize configuration settings, but again, for the sake of simplicity, this example will forgo that. Here is what ExportAndLoadBudgetData.msh could look like:

/* Copies data from the Budget cube (Source.Foo) to the Budget Scenario
   of Target.Bar */

/* your very standard login sequence here */
login AdminUser identified by AdminPw on EssbaseServer;

/* at this point you may want to turn spooling on (omitted here) */

/* disable connections to the application -- this is optional */
alter application Source disable connects;

/* PrepExp is a Calc script that lives in Source.Foo and for the purposes
   of this example, all it does is makes sure that the aggregations that are
   to be exported in the following report script are ready. This may not be
   necessary and it may be as simple as a CALC ALL; */

execute calculation Source.Foo.PrepExp;

/* Budget is the name of the report script that runs on Source.Foo and outputs a
   text file that is to be read by Target.Bar's LoadBud rules file */

export database Source.Foo
    using report_file 'Budget'
    to data_file 'foo.txt';

/* enable connections, if they were disabled above */
alter application Source enable connects;

/* again, technically this is optional but you'll probably want it */
alter application Target disable connects;

/* this may not be necessary but the purpose of the script is to clear out
   the budget data, under the assumption that we are completely reloading the
   data that is contained in the report script output */

execute calculation Target.Bar.ClearBud;

/* now we import the data from the foo.txt file created earlier. Errors
   (rejected records) will be sent to errors.txt */

import database Target.Bar data
    from data_file 'foo.txt'
    using rules_file 'LoadBud'
    on error write to 'errors.txt';

/* calculate the new data (may not be necessary depending on what the input
   format is, but in this example it's necessary */

execute calculation Target.Bar.CalcAll;

/* enable connections if disabled earlier */
alter application Target enable connects;

/* boilerplate cleanup. Turn off spooling if turned on earlier */

logoff;
exit;

At this point , if we don’t have them already, we would need to go design the aggregation calc script for Source.Foo (PrepExp.csc), the report script for Source.Foo (Budget.rep), the clearing calc script on Target.Bar (ClearBud.csc), the load rule on Target.Bar (LoadBud.rul), and the final rollup calc script (CalcAll.csc). Some of these may be omitted if they are not necessary for the particular process (you may opt to use the default calc script, may not need some of the aggregations, etc).

For our purposes we will just say that the PrepExp and CalcAll calc scripts are just a CALC ALL or the default calc. You may want a “tighter” calc script, that is, you may want to design the calc script to run faster by way of helping Essbase understand what you need to calculate and in what order.

What does the report script look like? We just need something to take the data in the cube and dump it to a raw text file.

<ROW ("Time", "Measures")

{ROWREPEAT}
{SUPHEADING}
{SUPMISSINGROWS}
{SUPZEROROWS}
{SUPCOMMAS}
{NOINDENTGEN}
{SUPFEED}
{DECIMAL 2}

<DIMBOTTOM "Time"
<DIMBOTTOM "Measures"
"Location"
!

Most of the commands here should be pretty self explanatory. If the syntax looks a little different than you’re used to, it’s probably because you can also jam all of the tokens in one line if you want like {ROWREPEAT SUPHEADING} but historically I’ve had them one to a line. If there were more dimensions that we needed to represent, we’d put thetm on the <ROW line. As per the DBAG, we know that the various tokens in between {}’s format the data somehow — we don’t need headings, missing rows, rows that are zero (although there are certainly cases where you might want to carry zeros over), no indentation, and numbers will have two decimal places (instead of some long scientific notation). Also, I have opted to repeat row headings (just like you can repeat row heading in Excel) for the sake of simplicity, however, as another optimization tip, this isn’t necessary either — it just makes our lives easier in terms of viewing the text file and loading it to a SQL database or such.

As I mentioned earlier, we didn’t have rollups such as different quarters in our Time dimension. That’s why we’re able to get away with using <DIMBOTTOM, but if we wanted just the Level 0 members (the months, in this case), we could use the appropriate report script. Lastly, from the Location dimension we are taking use the Location member (whereas <DIMBOTTOM “Time” tells Essbase to give us all the members to the bottom of the Time dimension, simply specifying a member or members from the dimension will give us those members), the parent to the different regions. “Location” will not actually be written in the output of the report script because we don’t need it — the outline of Target.Bar does not have a location dimension since it’s implied that it represents all locations.

The output of the report script will look similar to the following:

January Sales 234.53
January LaborHours 35.23
February Sales 532.35

From here it is a simple matter of designing the load rule to parse the text file. In this case, the rule file is part of Target.Bar and is called LoadBud. If we’ve designed the report script ahead of time and run it to get some output, we can then go design the load rule. When the load rule is done, we should be able to run the script (and schedule it in our job scheduling software) to carry out the task in a consistent and automated manner.

As an advanced topic, there are several performance considerations that can come into play here. I already alluded to the fact that we may want to tighten up the calc scripts in order to make things faster. In small cubes this may not be worth the effort (and often isn’t), but as we have more and more data, designing the calc properly (and basing it off of good dense/sparse choices) is critical. Similarly, the performance of the report script is also subject to the dense/sparse settings, the order of the output, and other configuration settings in the app and database. In general, what you are always trying to do (performance wise) is to help the Essbase engine do it’s job better — you do this by making the tasks you want to perform more conducive to the way that Essbase processes data. In other words, the more closely you can align your data processing to the under-the-hood mechanisms of how Essbase stores and manipulates your data, the better off you’ll be. Lastly, the load rule on the Target database, and the dense/sparse configurations of the Target database, will impact the data load performance. You may not and probably will not be able to always optimize everything all at once — it’s a balancing act — since a good setting for a report script may result in suboptimal calculation process. But don’t let this scare you — try to just get it to work first and then go in and understand where the bottlenecks may be.

As always, check the DBAG for more information, it has lots of good stuff in it. And of course, try experimenting on your own, it’s fun, and the harder you have to work for knowledge, the more likely you are to retain it. Good luck out there!

December 26, 2008

A quick and dirty substitution variable updater

There are a lot of different ways to update your substitution variables. You can tweak them with EAS by hand, or use one of several different methods to automate it. Here is one method that I have been using that seems to hit a relative sweet spot in terms of flexibility, reuse-ability, and effectiveness.

First of all, why substitution variables? They come in handy because you can leave your Calc and Report scripts alone, and just change the substitution variable to the current day/week/month/year and fire off the job. You can also use them in load rules. You would do this if you only want to load in data for a particular year or period, or records that are newer than a certain date, or something similar.

The majority of my substitution variables seem to revolve around different time periods. Sometimes the level of granularity is just one period or quarter (and the year of the current period, if in a separate Years dimension), and sometimes it’s deeper (daily, hourly, and so on).

Sure, we could change the variables ourselves, manually, but where’s the fun in that? People that know me know that I have a tendency to automate anything I can, although I still try to have respect for what we have come to know as “keeping an appropriate level of human intervention” in the system. That being said, I find that automating updates to timing variables is almost always a win.

Many organizations have a fiscal calendar that is quite different than a typical (“Gregorian”) calendar with the months January through December. Not only can the fiscal calendar be quite different, it can have some weird quirks too. For example, periods may have only four weeks one year but have five weeks in other years, and on top of that, there is some arcane logic used to calculate which is which (well, it’s not really arcane, it just seems that way). The point is, though, that we don’t necessarily have the functionality on-hand that converts a calendar date into a fiscal calendar date.

One approach to this problem would be to simply create a data file (or table in a relational database, or even an Excel sheet) that maps a specific calendar date to its equivalent fiscal date counterparts. This is kind of the “brute-force” approach, but it works, and it’s simple. You just have to make sure that someone remembers to update the file from year to year.

For example, for the purposes of the date “December 22, 2008” in a cube with separate years, time, and weekday dimensions, I need to know three things: the fiscal year (probably 2008), the fiscal period (we’ll say Period 12 for the sake of simplicity, and the day of the week: day “2”). Of course, this can be very different across different companies and organizations. Monday might be the first day of the week or something. If days are included in the Time dimension, we don’t really need a separate variable here. So, the concepts are the same but the implementation will look different (as with everything in Essbase, right?).

I want something a bit “cleaner,” though. And by cleaner, I mean that I want something algorithmic to convert one date to another, not just a look-up table. Check with the Java folks in your company, if you’re lucky then they may already have a fiscal calendar class that does this for you. Or it might be Visual Basic, or C++, or something else. But, if someone else did the hard work already, then by all means, don’t reinvent the wheel.

Here is where the approaches to updating variables start to differ. You could do the whole thing in Java, updating variables with the Java API. You could have a fancy XML configuration file that is interpreted and tells the system what variables to create, where to put them, and so on. In keeping with the KISS philosophy, though, I’m going to leave the business logic separate from the variable update mechanism. Meaning this: in this case I will use just enough program code to generate the variables, then output them to a space-delimited file. I will then have a separate process that reads the file and updates the Essbase server. One of the other common approaches here would be to simply output MaxL or ESSCMD script itself, then run the file. This works great too, however, I like having “vanilla” files that I can load in to other programs if needed (or, say, use in a SQL Server DTS/SSIS job).

At the end of the day, I’ve generated a text file with conents like this:

App1 Db1 CurrentYear 2008
App1 Db1 CurrentPeriod P10
App1 Db1 CurrentWeek Week4
App2 Db1 CurrentFoo Q1

Pretty simple, right? Note that this simplified approach is only good for setting variables with a specific App/database. It needs to be modified a little to set global substitution variables (but I’m sure you are enterprising enough to figure this out — check the tech ref for the appropriate MaxL command).

At this point we could setup a MaxL script that takes variables on the command line and uses them in its commands to update the corresponding substitution variable, but there is also another way to do this: We can stuff the MaxL statement into our invocation of the MaxL shell itself. In a Windows batch file, this whole process looks like this:

SET SERVER=essbaseserver
SET USER=essbaseuser
SET PW=essbasepw

REM generates subvar.conf file
REM this is your call to the Java/VB/C/whatever program that
REM updates the variable file
subvarprogram.exe

REM this isn't strictly needed but it makes me feel better
sleep 2

REM This is batch code to read subvar.conf's 4 fields and pipe
REM them into a MaxL session
REM NOTE: this is ONE line of code but may show as multiple in
REM your browser!

FOR /f "eol=; tokens=1,2,3,4 delims=, " %%i in (subvar.conf) do echo
alter database %%j.%%k set variable %%i %%l; | essmsh -s %SERVER% -l
%USER% %PW% -i 

REM You would use the below statement for the first time you need
REM to initialize the variables, but you will use the above statement
REM for updates to them (you can also just create the variables in
REM EAS)

REM FOR /f "eol=; tokens=1,2,3,4 delims=, " %%i in (subvar.conf) do
echo alter database %%j.%%k add variable %%i; | essmsh -s %SERVER% -l
%USER% %PW% -i

Always remember — there’s more than one way to do it. And always be mindful of keeping things simple — but not too simple. Happy holidays, ya’ll.

December 22, 2008

MaxL tricks and strategies on upgrading a legacy automation system from ESSCMD

The Old

In many companies, there is a lot of code laying around that is, for lack of better word, “old.” In the case of Essbase-related functionality, this may mean that there are some automation systems with several ESSCMD scripts laying around. You could rewrite them in MaxL, but where’s the payoff? There is nothing inherently bad with old code, in fact, you can often argue a strong case to keep it: it tends to reflect many years of tweaks and refinements, is well understood, and generally “just works” — and even when it doesn’t you have a pretty good idea where it tends to break.

Rewrite it?

That being said, there are some compelling reasons to do an upgrade. The MaxL interpreter brings a lot to the table that I find incredibly useful. The existing ESSCMD automation system in question (essentially a collection of batch files, all the ESSCMD scripts with the .aut extension, and some text files) is all hard-coded to particular paths. Due to using absolute paths with UNC names, and for some other historical reasons, there only exists a production copy of the code (there was perhaps a test version at some point, but due to all of the hard-coded things, the deployment method consisted of doing a massive search and replace operation in a text editor). Because the system is very mature, stable, and well-understood, it has essentially been “grandfathered” in as a production system (it’s kind of like a “black box” that just works).

The Existing System

The current system performs several different functions across its discreet job files. There are jobs to update outlines, process new period data, perform a historical rebuild of all cubes (this is currently a six hour job and in the future I will show you how to get it down to a small fraction of its original time), and some glue jobs that scurry data between some different cubes and systems. The databases in this system are setup such that there are about a dozen very similar cubes. They are modeled on a series of financial pages, but due to differences in the way some of the pages work, it was decided years ago that the best way to model cubes on the pages was to split them up in to different sets of cubes, rather than one giant cube. This design decision had paid off in many ways. One, it keeps the cubes cleaner and more intuitive; interdimensional irrelevance is also kept to a minimum. Strategic dense/sparse settings and other outline tricks like dynamic calcs in the Time dimension rollups also keep things pretty tight.

Additionally, since the databases are used during the closing period, not just after (for reporting purposes), new processes can go through pretty quickly and update the cubes to essentially keep them real-time with how the accounting allocations are being worked out. Keeping the cubes small allows for a lot less down-time (although realistically speaking, even in the middle of a calc, read-access is still pretty reliable).

So, first things first. Since there currently are no test copies of these “legacy” cubes, we need to get these setup on the test server. This presents a somewhat ironic development step: using EAS to copy the apps from the production server, to the development server. These cubes are not spun up from EIS metaoutlines, and there is very little compelling business reason to convert them to EIS just for the sake of converting them, so this seems to be the most sensible approach.

Although the outlines are in sync right now between production and development because I just copied them, the purpose of one of the main ESSCMD jobs is to update the outlines on a period basis, so this seems like a good place to start. The purpose of the outline update process is basically to sync the Measures dimension to the latest version of the company’s internal cross-reference. The other dimensions are essentially static, and only need to be updated rarely (e.g., to add a new year member). The cross-reference is like a master list of which accounts are on which pages and how they aggregate.

On a side note, the cross-reference is part of a larger internal accounting system. What it lacks in flexibility, it probably more than makes up for with reliability and a solid ROI. One of the most recognized benefits that Essbase brings to the table in this context is a completely new and useful way of analyzing existing data (not to mention write-back functionality for budgeting and forecasting) that didn’t exist. Although Business Objects exists within the company too, it is not recognized as being nearly as useful to the internal financial customers as Essbase is. I think part of this stems from the fact that BO seems to be pitched more to the IT crowd within the organization, and as such, serves mostly as a tool to let them distribute data in some fashion, and call it a day. Essbase really shines, particularly because it is aligned with the Finance team, and it is customized (by finance team members) to function as a finance tool, versus just shuttling gobs of data from the mainframe to the user.

The cross-reference is parsed out in an Access database in order to massage the data into various text files that will serve as the basis of dimension build load rules for all the cubes. I know, I know, I’m not a huge Access fan either, but again, the system has been around forever, It Just Works, and I see no compelling reason to upgrade this process, to say, SQL Server. Because of how many cubes there are, different aliases, different rollups, and all sorts of fun stuff, there are dozens of text files that are used to sync up the outlines. This has resulted in some pretty gnarly looking ESSCMD scripts. They also use the BEGININCBUILD and ENDINCBUILD ESSCMD statements, which basically means that the cmd2mxl.exe converter is useless to us. But no worries — we want to make some more improvements besides just doing a straight code conversion.

In a nutshell, the current automation script logs in (with nice hard-coded server path, user name, and password, outputs to a fixed location, logs in to each database in sequence, and has a bunch of INCBUILDDIM statements. ESSCMD, she’s an old girl, faithful, useful, but just not elegant. You need a cheatsheet to figure out what the invocation parameters all mean. I’ll spare you the agony of seeing what the old code looks like.

Goals

Here are my goals for the conversion:

Convert to MaxL. As I mentioned, MaxL brings a lot of nice things to the table that ESSCMD doesn’t provide, which will enable some of the other goals here.
Get databases up and running completely in test — remember: the code isn’t bad because it’s old or “legacy code,” it’s just “bad” because we can’t test it.
Be able to use same scripts in test as in production. The ability to update the code in one place, test it, then reliably deploy it via a file-copy operation (as opposed to hand-editing the “production” version) is very useful (also made easier because of MaxL).
Strategically use variables to simplify the code and make it directory-agnostic. This will allow us to easily adapt the code to new systems in the future, for example, if we want to consolidate to a different server in the future, even one on a different operating system).
And as a tertiary goal: Start using a version control system to manage the automation system. This topic warrants an article all on itself, which I fully intend to write in the future. In the meantime, if you don’t currently use some type of VCS, all you need to know about the implications of this are that we will have a central repository of the automation code, which can be checked-in and checked-out. In the future we’ll be able to look at the revision history of the code. We can also use the repository to deploy code to the production server. This means that I will be “checking-out” the code to my workstation to do development, and I’m also going to be running the code from my workstation with a local copy of the MaxL interpreter. This development methodology is made possible in part because in this case, my workstation is Windows, and so are the Essbase servers.

For mostly historical reasons the old code has been operated and developed on the analytic server itself, and there are various aspects about the way the code has been developed that mean you can’t run it from a remote server. As such, there are various semantic client/server inconsistencies in the current code (e.g. in some cases we are referring to a load rule by it’s App/DB context, and in some cases we are using an absolute file path). Ensuring that the automation works from a remote workstation will mean that these inconsistencies are cleaned up, and if we choose to move the automation to a separate server in the future, it will be much easier.

First Steps

So, with all that out of the way, let’s dig in to this conversion! For the time being we’ll just assume that the versioning system back-end is taken care of, and we’ll be putting all of our automation files in one folder. The top of our new MaxL file (RefreshOutlines.msh) looks like this:

msh "conf.msh";
msh "$SERVERSETTINGS";

What is going on here? We’re using some of MaxL features right away. Since there will be overlap in many of these automation jobs, we’re going to put a bunch of common variables in one file. These can be things like folder paths, app/database names, and other things. One of those variables is the $SERVERSETTINGS variable. This will allow us to configure a variable within conf.msh that points to where the server-specific MaxL configuration file. This is one method that allows us to centralize certain passwords and folder paths (like where to put error files, where to put spool files, where to put dataload error files, and so on). Configuring things this way gives us a lot of flexibility, and further, we only really need to change conf.msh in order to move things around — everything else builds on top of the core settings.

Next we’ll set a process-specific configuration variable which is a folder path. This allows us to define the source folder for all of the input files for the dimension build datafiles.

SET SRCPATH = "../../Transfer/Data";

Next, we’ll log in:

login $ESSUSER identified by $ESSPW on $ESSSERVER;

These variables are found in the $SERVERSETTINGS file. Again, this file has the admin user and password in it. If we needed more granularity (i.e., instead of running all automation as the god-user and instead having just a special ID for the databases in question), we could put that in our conf.msh file. As it is, there aren’t any issues on this server with using a master ID for the automation.

spool stdout on to "$LOGPATH/spool.stdout.RefreshOutlines.txt";
spool stderr on to "$LOGPATH/spool.stderr.RefreshOutlines.txt";

Now we use the spooling feature of MaxL to divert standard output and error output to two different places. This is useful to split out because if the error output file has a size greater than zero, it’s a good indicator that we should take a look and see if something isn’t going as we intended. Notice how we are using a configurable LOGPATH directory. This is the “global” logpath, but if we wanted it somewhere else we could have just configured it that way in the “local” configuration file.

Now we are ready for the actual “work” in this file. With dimension builds, this is one of the areas where ESSCMD and MaxL do things a bit differently. Rather than block everything out with begin/end build sections, we can jam all the dimension builds into one statement. This particular example has been modified from the original in order to hide the real names and to simplify it a little, but the concept is the same. The nice thing about just converting the automation system (and not trying to fix other things that aren’t broken — like moving to an RDBMS and EIS) is that we get to keep all the same source files and the same build rules.

import database Foo.Bar dimensions

    from server text data_file "$SRCPATH/tblAcctDeptsNon00.txt"
    using server rules_file 'DeptBld' suppress verification,

    from server text data_file "$SRCPATH/tblDept00Accts.txt"
    using server rules_file 'DeptBld'

    preserve all data
    on error write to "$ERRORPATH/JWJ.dim.Foo.Bar.txt";

In the actual implementation, the import database blocks go on for about another dozen databases. Finally, we finish up the MaxL file with some pretty boilerplate stuff:

spool off;
logout;
exit;

Note that we are referring to the source text data file in the server context. Although you are supposed to be able to use App/database naming for this, it seems that on 7.1.x, even if you start the filename with a file separator, it still just looks in the folder of the current database. I have all of the data files in one place, so I was able to work around this by just changing the SRCPATH variable to go up two folders from the current database, then back down into the Transfer\Data folder. The Transfer\Data folder is under the Essbase app\ folder. It’s sort of a nexus folder where external processes can dump files because they have read/write access to the folder, but it’s also the name of a dummy Essbase application (Transfer) and database (Data) so we can refer to it and load data from it, from an Essbase-naming perspective. It’s a pretty common Essbase trick. We are also referring to the rules files from a server context. The output files are to a local location. This all means that we can run the automation from some remote computer (for testing/development purposes), and we can run it on the server itself. It’s nice to sort of “program ahead” for options we may want to explore in the future.

For the sake of completeness, when we go to turn this into a job on the server, we’ll just use a simple batch file that will look like this:

cd /d %~dp0
essmsh RefreshOutlines.msh

The particular job scheduling software on this server does not run the job in the current folder of the job, therefore we use cd /d %~dp0 as a Windows batch trick to change folders (and drives if necessary) to the folder of the current file (that’s what %~dp0 expands out to). Then we run the job (the folder containing essmsh is in our PATH so we can run this like any other command).

All Done

This was one of the trickier files to convert (although I have just shown a small section of the overall script). Converting the other jobs is a little more straightforward (since this is the only one with dimension build stuff in it), but we’ll employ many of the same concepts with regard to the batch file and the general setup of the MaxL file.

How did we do with our goals? Well, we converted the file to MaxL, so that was a good start. We copied the databases over to the test server, which was pretty simple in this case. Can we use the same scripts in test/dev and production? Yes. Since the server specific configuration files will allow us to handle any folder/username/password issues that are different between the servers, but the automation doesn’t care (it just loads the settings from whatever file we tell it), I’d say we addressed this just fine. We used MaxL variables to clean things up and simplify — this was a pretty nice cleanup over the original ESSCMD scripts. And lastly, although I didn’t really delve into it here, this was all developed on a workstation (my laptop) and checked in to a Subversion repository, further, the automation all runs just fine from a remote client. If we ever need to move some folders around, change servers, or make some other sort of change, we can probably adapt and test pretty quickly.

All in all, I’d say it was a pretty good effort today. Happy holidays ya’ll.

December 15, 2008

Automate that old cube archive process!

Your server may have dozens or even hundreds of cubes on it. A common strategy with a large and slowly changing Measures dimension (or some other dimension like Product) is to spin off a copy of the cube after a certain time period, typically the fiscal year end. There are a number of different reasons that you might do this. First, the cube may simply focus on Current Year and Prior Year, or a fixed number of years and scenarios such that the cube becomes too unwieldy when you start adding more. Second, if you need to be able to go back and pull a report so that it looks exactly how it did in a certain fiscal year, then you may need to spin off the cube. Depending on how many cubes you end up spinning off for each fiscal year, it may be necessary to go and clean them up at some point, but you might still want to keep them around, just in case. You can do this by hand by stopping the app, zipping up the app folder and all its contents, and deleting the app from within EAS.

Here is an example of a batch file you could use on Windows. This relies on the free 7-Zip package being installed somewhere. The nice thing about this approach is that while it uses MaxL, it doesn’t actually have any MaxL files — it just injects the MaxL command via the command-line. Edit the variables for your setup, and you’re on your way. It’s not pretty but it’s nice if you have to go cleanup a bunch of apps! Happy cubing — Jason. [download zip of the following batch file]

@echo off

SET USER=adminuser
SET PW=adminuserpw
SET SERVER=essbaseserver
SET APPPATH=D:\Essbase\App
SET ZIP="7zp\App\7-Zip\7z.exe"

@echo.
@echo -------------------------------------------
@echo This is the cube archiver utility...
@echo.
@echo Looking for App %1 ...

IF NOT EXIST %APPPATH%\%1 GOTO NoApp

@echo.
@echo I found it at %APPPATH%\%1 ...
@echo.
@echo Attempting to stop the app...

REM essmsh -l %USER% -p %PW% -s %SERVER% StopCube.msh %1

echo alter system unload application %1; | essmsh -s %SERVER% -l %USER% %PW% -i

@echo Archiving the app ...

%ZIP% a -tzip EssApp_%1.zip %APPPATH%\%1

echo.

choice /M "Okay to delete app %1"

IF ERRORLEVEL 2 GOTO Done

echo alter application %1 enable startup; | essmsh -s %SERVER% -l %USER% %PW% -i
echo drop application %1 cascade force; | essmsh -s %SERVER% -l %USER% %PW% -i

GOTO Done

:NoApp

@echo I could not find that app at %APPPATH%\%1 !!!

:Done

jason's hyperion blog

essbase from the trenches

Category Archives: automation