Vess + ODI to extract Essbase metadata

Well, apparently it’s Friday Fun Day over here in Seattle. I was going to head up to the mountains for the weekend but plans changed. So how about a little frankendriver Vess update with some ODI goodness thrown in?

Vess has some really interesting data integration possibilities that I’ve mentioned before, one of which is being able to drop it into Oracle Data Integrator and use it as you would any other JDBC driver. I hadn’t really tested this out though, until yesterday. It turns out that it works surprisingly well.

Continue Reading…

Manually injecting rows to an ODI journal table

Earlier this year I did an absolute whirlwind crash course on ODI journalization. In that article I talked about the layout of tables that ODI uses to track changes in particular tables (for simple journalization with one table). You may recall that changed (inserted, updated, deleted) rows are tracked in this table. This is done by way of some metadata as well as the primary key of the monitored table.

Typically after your processing is done, the entries from the ODI journal table that were processed will be deleted. Occasionally, however, you may need to manually fix the ODI journal table. For me this sometimes involves “manually injecting” rows to the journal table. You could achieve this in a couple of ways. This way we can force some rows to process that otherwise wouldn’t be processed because ODI doesn’t consider them to be new or updated.

The first way is to “touch” the rows in the monitored table in some way so that the trigger installed on that table gets activated and inserts rows to the ODI journal table. You could perform some sort of dummy update on a row. This can work but isn’t always desirable.

Another technique and the one I’ll talk about in this post is to manually add rows to the journal table that contain the primary key of the row to be processed. The trick, then, is what to fill in for the other values of the columns?

Recall that the general format of a table is the subscriber name, a consumed flag, the date it went into the table, whether it’s an insert or a delete, and the primary key of the row being journalized.

As an example, assuming the journal table is called J$TEST_CODE_RESULT (for a real table of TEST_CODE_RESULT), and also assuming a primary key name of TEST_CODE_RESULT_ID (that in this case is an integer), we can add a row to the journal table like this:

INSERT INTO TEST_CODES.J$TEST_CODE_RESULT VALUES ('TEST_CODE_PROCESSING', 0, 'I', sysdate, 12345);

The first value is the JRN_SUBSCRIBER. This corresponds to the name of the subscriber setup in the ODI journalizing and is also the same as the filter you use in the interface to get rows for just your particular subscriber. Next is the JRN_CONSUMED field which we set to 0 (during processing, ODI will change this to 1 to indicate it’s working on that data), the character ‘I’ (upper case i) to indicate the row was an insert or update, the current time (this assumes an Oracle database), and finally, the ID of the row we need to inject.

You can of course also insert as many rows as you want such as by writing an INSERT that gets values from a SELECT. Every now and then if you have a lot of rows you need to reprocess you might find yourself with a table that has the IDs, and you can just insert them into this table and they’ll get picked up on the next run.

 

More fun with Vess, the Essbase JDBC driver

All I can say about Vess is that it’s still… interesting. This is definitely one of the more complex things I’ve ever worked on, as well as one of the more elegant. I did a lot of plumbing on how the driver works which is now enabling so interesting and rapid progress.

Here are some interesting things that you can do as of right now:

  1. Query list of applications and metadata
  2. Query list of cubes and metadata
  3. Query list of sessions (same as Sessions in EAS)
  4. Load data without a load rule by using a SQL INSERT statement
  5. List users
  6. List substitution variables

Some of the really fun things are actually the most subtle. Wherever possible Vess tries to supply metadata for a modeled table that matches up with Essbase semantics or naming limits. For example, substitution variables are in a table where you have the variable name, value, application, and database. Each of these columns are normal String columns but they have character limits that match the maximum Essbase length (320, 256, 30, and 30, respectively, taking into account Unicode).

Similarly for member names on the data load table. Also wherever possible, a primary key or nullability constraint is set on a table where it also makes sense. For example, substitution variables are assumed to be non-null, as well as member names for an Essbase fact table, cube names, and more.

I have been using a free generic JDBC tool called Squirrel SQL to help facilitate testing. Squirrel supports any database for which you can supply a generic JDBC driver. Here are a few screenshots from testing out Vess when connected to Sample/Basic (list of cubes on server):

View list of cubes and metadata with Vess

List of active sessions on server:

Viewing list of active sessions on Essbase server using Vess

Some of the tables also support DELETE semantics, for example, deleting a row from the session table is equivalent to killing the session, as with substitution variables.

Lastly, all tables that are modeled in Vess are part of the SQL metadata store, so that tools that go in to ask for metadata (such as Squirrel) get it back. This is interesting/notable because theoretically you could already use Vess with ODI to perform data loads and some other things. I haven’t tried this yet, but theoretically I could add the drivers, reverse the metadata, and use one of the tables in an interface. So it’d be fairly trivial to update a substitution variable directly from SQL, perform a data load, and you could even kick off a calc script simply by invoking a procedure.

In short, things are exciting in the Vess world. While many of you want to load data to Essbase cubes without a load rule, just as many are interested in pulling database metadata (things you normally dump with MaxL and scrape out), and this tool might just fill that need in a clean way. More to come later, I have some presentations to get ready for Kscope15 (although sadly none on this little technological flight of fancy…)

ODI Automation: Don’t Call Us, We’ll Call You

I have developed numerous jobs with Oracle Data Integrator. I have developed new jobs that connect systems together, and I have developed jobs in ODI that replace existing jobs. There are plenty of reasons for redeveloping a solution in ODI that I have talked about before. Generally you can significantly clean things up – sometimes a little, sometimes a lot.

Sometimes a very particular aspect (such as a certain section of script code) of an existing batch job will be targeted for replacement with something developed in ODI. Let’s say that there is an existing batch file that performs a series of six distinct steps (not counting boilerplate code like setting some variables and echoing out logging statements and such). For example, the steps might be this:

  1. Copy a file from one place to another (such as from an “incoming” folder to a known folder to another location)
  2. Run a custom program to process the text file in some way, generating a new file
  3. Use a command-line SQL program to run a particular line of SQL, to truncate a particular table
  4. Run a custom program (or a BULK INSERT) to load the text file to a SQL table
  5. Use command-line program to run a line of SQL such as an UPDATE on the newly loaded data (UPDATE STAGING SET YEAR = '2015' WHERE YEAR IS NULL or something)
  6. Run a MaxL script that clears a cube, imports data via a load rule, and calculates the database

The Perfect Solution

Now, sitting from my architectural ODI ivory tower, I would make the case that this should all be done in an ODI job with an interface. For example, loading a text file to Microsoft SQL Server, defaulting some of the years data, and loading the data to a cube could all more or less be done in a single interface (BULK INSERT LKM, Hyperion Data Load IKM, use appropriate load rule, etc.).

The Intermediate Solution

But let’s say that just the data load and nothing else is to be performed by ODI because we are revamping some of the job, but not the whole job, for some reason (risk, time, money, incrementalism). This means that we are going to call an ODI job from an existing batch file. We will use our trusty friend startscen.bat in order to accomplish this. startscen.bat is good for running scenarios. We pass in the name of the scenario, the ID, and the context so that we can very specifically ID the exact job for the ODI server to run.

So now we’ll keep our existing batch file and most of its contents, but in place of part of the previous code, we’ll have a call to startscen.bat and then resume any necessary processing after that. So technically, we’ve improved the process a bit (hopefully) by using ODI for part of it, but we haven’t really done much to significantly improve things. This is now where I get to the point of the article: I say go all in with your ODI automation.

Think of this intermediate solution like this, in terms of script code:

  1. Copy file from incoming folder
  2. Run custom program to transform file
  3. Call startscen.bat (invoke ODI scenario, load SQL data, load to Essbase)

The benefits of this incremental approach should be obvious: we’ve ostensibly cleaned up part of the job. Here’s the thing though: I think we’ve actually added some undesirable technical debt by doing it this way.

Yes, the job technically works, but think about how this batch file is positioned from an administration point of view now: we have a batch file, some configuration settings for that batch file (server names, encryption keys, folder paths, etc.), the batch file is likely outputting data or otherwise logging things to a text file, then we’re calling the ODI job itself, thereby creating an entry in the ODI Operator (which is a good thing, but it will only show part of the the overall job).

More often than not when I am redeveloping an automation process for someone, the batch files have no error control and inconsistent amounts of logging. The logging that is done is often times never looked at. Having easy visibility into our automation processes, however, is a very good thing. We can achieve this with ODI. While incrementalism has its merits, I say go all in with the ODI automation.

Go all in

What I mean by this is that instead of sticking with the batch file, instead of the batch file calling the automation steps and then calling an ODI step, use a package in ODI to call the necessary steps, your interfaces, and any other processing you need.

You get a lot of benefits from this approach. The ODI Operator will log each step so that you know how long each one took. You can get consistent and easy error handling/branching in an ODI package. You can easily log all of the output. You can schedule the job from the ODI scheduler. ODI has package steps that can be used to replace kludgey code in a batch file. For example, there is an ODI “wait for file” step that waits around for a file to show up. You can copy files around, pull files from FTP, and more – if you need to just call an arbitrary program or script you can do that with an ODI OS Command. Everything you need is there, and more.

You don’t even have to use the ODI scheduler. If you are already invested in another scheduler (Windows Scheduler, cron jobs, etc.) and want to keep everything in one place, then just call the ODI scenario from your scheduler, but that scenario will have been generated from a package that has all of the steps in it (instead of your batch file calling those).

This new world of pure ODI jobs is worth the extra effort to get to: you have status and metrics for all of your jobs (without tons of boilerplate logging code), you can manage your entire job in one place, and its easier to follow best (better) practices with development. While batch files and various scripts have served us well over the years, more and more I find myself abstracting away from the file system when it comes to automation and given the capabilities of ODI. It’s worth it.

Kscope15 Presentation Preview: ODI Workhorses

The other day I mentioned my goals for attendees for my upcoming ODTUG presentation on Drillbridge. Today I’m going to talk about my goals for my presentation on Oracle Data Integrator (ODI).

Over the last few years I have presented on ODI a handful of times. My main presentation on it has been highlighting a success story with it, where ODI was used to clean up and facilitate a lot of automation and ETL jobs for a health services company that [as you can imagine] has tons of data flying around everywhere. This previous presentation was more of a high-level affair, where I talked very generically about what the benefits of ODI were over the previous solution. Wanting to add a little more technical meat to the presentation, I appended what started off as a small section at the end where I take a look at just how ODI works under the covers.

While the “business” or high-level part of the presentation was all well and good, I found myself getting really excited to explain just how awesome the workings of the details of ODI ETL jobs were, and what started out as a 10-minute flight of fancy into the lower depths of something technical has now been promoted, as it were, to full on presentation.

In other words, I am going to spend an entire presentation literally tearing apart a single ODI interface, explaining why the steps are sequences the way they are, how to build idiomatic ODI interfaces, the affect of various options (journalization, query hints, delete all vs. truncate, etc.), update strategies, and more. I’m also going to marry up what is ostensibly an IT tool with various concepts from computer science, including the notion of idempotence (and why it’s a good thing).

With any luck, the attendee coming to this presentation will have a new or expanded understanding of how interfaces/mappings work, feel comfortable with modifying knowledge modules, and learn a trick or two when it comes to debugging ODI jobs. This will be a nuts and bolts, technical deep dive. While it’s ostensibly an advanced content presentation, I believe that people with even only a cursory familiarity should benefit from it as well. If you haven’t worked with ODI at all but are curious (and intrepid!) about what it can do, I think you’ll also benefit. So to all of my pure-Hyperion colleagues that haven’t dipped their toes in the ODI pool just yet, this is a great chance to hop on a different track and expand your horizons – I hope to see you there!

Vess: a Virtual Essbase JDBC driver

Normally I finish some programming and then throw it over the wall to you guinea pigs enthusiastic blog readers before I say anything about it (no vaporware here, nosiree!), but I thought I’d share something I’ve been playing with as a proof of concept. It’s sort of inspired by some of the ODI/Essbase integration I’ve been working with lately.

The Essbase/Hyperion knowledge modules in ODI are sort of different from typical knowledge modules because in a lot of ways they cram a round peg into a square hole. So you kind of get this relational facade over an Essbase cube in terms of loading and integrating data/metadata. Instead of relying on a normal JDBC driver, the Hyperion KMs more or less delegate out to some custom Java code that wraps the Essbase Java API. This isn’t a bad thing, it’s just how it works.

Unrelated to ODI, there are many situations where we are jumping through hoops to get data of some kind – not just cube data and metadata – out of Essbase. For example, you run a MaxL script to dump stats from a cube to a text file. Then you run some regular expression voodoo on it to get to the data you actually want.

Side note: parsing MaxL text output is surely amusing to the boffins at Oracle that designed it. I don’t think many people know this, but MaxL communicates with the Essbase server using a hacked up JDBC driver (it’s ingenious really). So when we parse the plusses and minuses and other ASCII crap off of some MaxL output, effectively what is happening is that a real dataset is coming back from the Essbase server, the MaxL interpreter is applying the extra work of prettying it up with some text art (the only thing missing is lime green letters from the 80’s), it gets dumped to a text file, and then what happens in so many places is that the text is reparsed into data.

To some extent the MaxL Perl and Essbasepy modules can be used to get MaxL data in a cleaner way. The Java API can definitely be used. Of course, I have no problem myself jumping in and coding up a Essbase Java API routine to pull some data down. But for people that aren’t as comfortable, stripping some info out of MaxL output offers a clear path with less resistance, so I can’t say I blame them.

So we have all of these instances where we can get data out of Essbase (not just actual cube data, but metrics from the server) using EAS, using MaxL, dumping MaxL to a text file, the C API, Java API, and so on. But it seems like a lot of these approaches have undesirable combinations of effort/quality/reward.

Let’s talk about JDBC for a moment. JDBC is the Java Database Connectivity model. Java is one of the most popular programming languages on the planet (perhaps not the sexiest language, but it’s getting better), and JDBC is the tried and true model for connecting to databases. Every popular database in the world has a JDBC driver – you can grab a generic SQL tool (I’ve been using the simple RazorSQL as of late), pop in the appropriate JDBC driver (such as for MySQL, Microsoft SQL Server, Oracle, DB2, whatever), connect, and start issuing SELECT statements to your heart’s content.

So, this all brings me to something interesting I’ve been playing with for awhile: Vess – a virtual Essbase JDBC driver. Vess is a JDBC-compliant driver that can be dropped into any generic SQL tool that supports JDBC drivers (RazorSQL) as well as used out of the box with ODI. Vess offers a JDBC-compliant URL for connecting to Essbase, such as the following:

jdbc:essbase://epm11123.corp.saxifrages.com:9000/Sample.Basic;mode=embedded

Vess then presents several tables, views, and stored procedures. For example, consider substitution variables. These are modeled in a table where the column types are String, String, String, String (the VARCHAR database type). So you can do this:

SELECT * FROM SYS.SUBSTITUION_VARIABLES

And you get back a dataset kind of like this:

Sample, Basic, CurrMonth, Jan

(It’s not actually text with commas, it’s a normal dataset). There are similar tables with other system/cube information in them, such as database stats, connected users (the Sessions tab in EAS), and more. There are views for dimensions/members/aliases/UDAs. There are stored procedures for calling calc scripts (e.g. sp_exec_calc(‘CalcAll’)). MDX queries can also be crammed into a normal JDBC ResultSet with a little bit of imagination.

So, why bother?

Using a JDBC driver would let us easily (and cleanly) get to Essbase data and metadata using any generic database tool, as well as being able to drop the driver right in to ODI and use it as part of a normal interface. Whereas ODI uses Jython to call custom Java classes, this could potentially simplify the layers a little bit and offer an interesting alternative. This could also potentially be used in places where MaxL output is being parsed out via black magic, obviate the need for writing custom Essbase JAPI programs, and some use cases with the outline extractor.

Status

As of right now, the framework of the driver is created already and you can drop it into a SQL tool such as the aforementioned RazorSQL and run a command like SELECT name FROM SYS.SUBSTITUTION_VARIABLES WHERE APP = 'Sample' and get the proper results back. Over time (as time permits) I will be adding a few more gizmos. If there are any intrepid Java developers out there that would like to play with this, let me know and I can make this an open source project on GitHub that could be refined over time.

Where is the Essbase Java API method to build dimensions from a SQL source?

I was writing a custom ODI Open Tool the other day so that I could update an Essbase dimension as a custom package step. Normally this would be better suited as a normal update metadata interface, but in this case it seems like a good solution to be able to call an existing load rule without having to worry about the models and topologies.

In the Essbase Java API, most data load and dimension build operations are initiated from the cube object (IEssCube). I’m already familiar with the various loadData() methods that have several variations (variants of providing a rules file name, local/remote file, SQL username/password, error file name, etc), but hadn’t tried to start a dimension build from a SQL datasource from the Java API before.

Interestingly, there is no method in the IEssCube interface for directly performing a dimension build from a SQL datasource. But obviously we know it’s possible – and generally speaking, there’s a pretty clear mapping from EAS functionality to the underlying Java API.

So I was curious, what the hell does ODI do when you’re performing a dimension build (metadata update) from a datasource? Well, after a bit of digging through the ODI internals, I found it: ODI pulls the data from SQL, dumps it to a temporary local text file (a temporary file starting with the name EssData and ending with the .txt extension), then calls one of the normal data load methods on what is just a normal text file.

So, the solution in my case (to implement the custom package step) is just to dump a ResultSet to a text file (no big deal), then call a normal method. It’s a bit of a small workaround, but it works just fine. I thought it was pretty interesting that there wasn’t a built-in method to do this.

Running ODI Studio on OS X

I usually run ODI Studio in a Windows VM or just in an RDP session on a Windows server. But, I run a Mac (a lovely MacBook Pro) and am embarking on something of a local development project, and decided to run ODI Studio natively.

I’m hardly the first one to the party. In fact, I found some great instructions that got me 95% of the way up and running over on a blog article from 2011. I couldn’t quite get ODI to run, however. My default Java JDK is 1.8, but the version of ODI Studio I was trying to run is 11.1.1.7 (yeah yeah, it’s older, I’ll get 12c up and running soon enough…). In short, this version of ODI Studio wants a 1.6 JDK.

So the trick was how to leave my system default JDK at 1.8 but then force ODI to use the 1.6 JDK that’s also installed (just not default).

Normally you can use a SetJavaHome option in a conf file, but this didn’t seem to change anything. I think the ODI launcher tool (which is based on a generic Oracle IDE launcher system) just kept quickly seeing that I have a JAVA_HOME set to a 1.8 JDK, so it was just running with that and not trying anything else. So I dug through the files and found that the launcher will respect a special Java home variable named OIDE_JAVA_HOME, and use that first. So without making any other change, this following can be used on a Mac to force the use of a 1.6 JDK when there is a higher version installed as the default:

export OIDE_JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/Home

Then just run ODI as normal:

./odi.sh

After that, ODI Studio launched like a champ with no error messages:

ODI Studio running on Mac OS XI’m not sure how many people will find this useful (not many), but if this post is like many of my others, I’ll be stuck on this in a couple of years, Google around for a solution, then find my own article and be amazed that I forgot that I solved it…

 

 

ODI journal table performance tweak with an index

The other day I provided a crash course on simple journalization in ODI with an Oracle database. I wanted to set the stage for this performance tweak that I put in awhile ago on a few tables.

I mentioned that ODI stores journalized rows in a special table it creates that tracks the subscriber, processing status, and date of each row. The possible performance issue has to do with what ODI does during the processing of rows out of the table. If you have two different systems (subscribers) for a given table, when an interface runs, it is going to process data for just one subscriber at a time (usually). During processing ODI needs to mark these rows that it’s working with by changing the JRN_CONSUMED column to 1 for each matching subscriber.

In other words, if our interface processing data out of the CUSTOMERS table (by way of the journal table J$CUSTOMERS) is for subscriber ‘MARKETING’, then at some point in the ODI interface there will be a step like this:

UPDATE J$CUSTOMERS SET JRN_CONSUMED = '1' WHERE JRN_SUBSCRIBER = 'MARKETING'

If there are a lot of rows in the table, and particularly if there are a lot of rows in this table AND there are multiple subscribers, then the WHERE clause on this UPDATE statement can be a bottleneck. This is because the JRN_SUBSCRIBER field is not indexed, meaning that Oracle has to do a full table scan to find every row with a matching value. On a few interfaces I had at a client, this step was taking an inordinate amount of time. So I added an index on the JRN_SUBSCRIBER column (and eventually added one for the combination of JRN_SUBSCRIBER and JRN_CONSUMED), and performance increased dramatically, as Oracle was now able to avoid a full table scan.

The first time I made this change was over a year ago and I haven’t seen any adverse affects or otherwise had any problems, and it dropped processing time for this one step from over 30 minutes to well under a minute.

If you have large, multi-subscriber journal tables being worked on by performance-critical interfaces, you might want to open up Operator and see how long they are taking and if a carefully placed index could benefit them.

 

ODI journalization crash course

I do seem to love me some ODI journalization. It has been a really great feature that I have implemented countless times to great effect. For those not familiar, journalization in ODI is where you tell ODI that you want to record what rows get updated and inserted in a particular table, such that when you load data out of that table, you can choose to work with just those updated rows.

This is useful for many obvious reasons: only processing data that’s new or changed is usually faster than processing a whole table. When you are dealing with millions and billions of rows, it’s a necessity. Journalization is also nice for instances where it replaces processing data based on time.

For example, I have reworked many systems where the data to be processed was sysdate - 1 or some other time period, and the automation runs daily. This can work. But what if something goes wrong and you don’t catch it for a day? Well, you go in and tweak things and run a one-off job to fix it. Or you have other coping mechanisms. What if data from a month ago is updated? Many times just processing anything that changed is the effective strategy.

It’s how journalization works under the hood, though, that is the focus of this post (for performance reasons). When you journalize a table (at least for the simple Oracle implementation), ODI installs a trigger on that table so that updates and inserts are recorded somewhere.

In this case, that somewhere is your work schema. If you were to journalize a table called CUSTOMERS and the primary key of customers was CUSTOMER_ID, then you would end up with a journal table called J$CUSTOMERS that has the following columns:

  • JRN_SUBSCRIBER
  • JRN_CONSUMED
  • JRN_FLAG
  • JRN_DATE
  • CUSTOMER_ID

The structure will always be the four JRN_ columns and the primary key (which can be multiple columns but is often just one) of the table being recorded. This simple table records everything you need to know about the new and updated rows.

Here’s a quick crash course in these columns:

JRN_SUBSCRIBER: You supply a text name indicating what the consumer of the data will be. For example, you may have two systems that independently want to process new and updated customers, so you might have two subscribers: MARKETING and CRM, for example. Rather than creating two separate journal tables, we just have a column in here to differentiate the columns.

JRN_CONSUMED: This defaults to 0 and will be 0 until processing data out of the journal occurs. This column provides a really elegant solution to making sure you don’t lose data when processing journalized data. When an ODI interface runs to process data out of a journal, it sets the JRN_CONSUMED column for rows to process to 1. This does a couple of things. One, any rows that are added to the journal after a job starts do not get lost or otherwise suffer from some timing issue. They will just get picked up on the next processing job. Two, if the ODI interface fails for some reason, before it completes successfully, and you need to re-run the job, you won’t have lost the journalized rows: they are only dropped when the job successfully completes (it’s one of the last cleanup steps in the job).

JRN_FLAG: Indicates whether the row was inserted or updated (I or U), although in practice you will notice that the value is always ‘I’, indicating an inserted row (even if it was actually just an update). This is because later on in the job ODI will mark rows that are actually updates with a U and then handle the processing accordingly

JRN_DATE: The date (time) that the row was recorded into the journal.

With all of this infrastructure in place (plus a couple of views that ODI puts in for you), journalization is ready to go. You design your interface as normal, and if you want to process just the new/updated rows, all you have to do is check a single checkbox on the Mapping tab (Use Journalized Data). ODI handles the rest for you.

This is a really killer feature of ODI and can cleanup countless home-brew solutions that are fragile, complex, and code-intensive.

That being said, the original purpose of this post was to actually talk about a performance issue with ODI journals, but it was necessary to provide some context with an overview of journalization first. With that out of the way, check back tomorrow for a tweak you can make to your journal tables that might speed things up if you are processing large amounts of data!