A little while ago I presented a success story using Oracle Data Integrator. The webinar goes over ODI functionality at a high-level, giving a quick tour of the benefits of major functional areas and how they were used to greatly improve and build out the ETL and automation at a high profile client. If you are interested in using ODI in your organization or aren’t sure (but want to know more!) then I strongly encourage you to check it out. ODI is one of my specialization areas now where I have been doing a tremendous amount of work, so I look forward to becoming an expert in this technology and sharing this expertise with others.
Note that this webinar, while technical, is not a deep drive into implementation details. I am planning a followup webinar to very comprehensively look at using ODI as an ETL/automation solution and implementing it with best practices. This will be the webinar to look for if you are more technical and really want to see the ins and outs of how things work and can be architected for robustness, maintainability, and efficiency.
I received an email today from someone looking to speed up their data loads, specifically their ASO data loads that seem to be taking too long. This is, of course, an important topic for many Essbase cube wranglers. I would be willing to bet that many people spend more time optimizing calcs and may even neglect profiling their performance on data loads. You might be surprised just how long your automation spends doing a data load. That being said, there are several different scenarios you may find yourself in and different places to optimize. Of course there are more items that can go on this list, but here are the big ones off the top of my head. In other words, if I was tasked with improving my data load speed, here’s what I would look at:
- Do you need to load the data in the first place? I know this seems a bit rudimentary, but if you are loading data that you don’t even need (and can possibly help it), then don’t waste your time on it.
- Use the fastest connection. Architecturally you may simply have to load data one way or another (from a text file, from a SQL server, off the SAN, etc), but, hands down the fastest data loads I’ve seen (short of trying to load from a RAMdisk, although I’m dying to try it) are from different physical hard drives attached to the Essbase server. With a good RAID setup the performance is still quite good if you are loading text files from the Essbase server. If you are loading your records from an RDBMS across the WAN, you might be killing your performance due to network bottlebecks. Another option is to put the RDBMS on the same box as Essbase. I know many people do this, but personally I am not a huge fan of this option. My Essbase servers tend to have plenty enough to do without having SQL software on the box to worry about. Additionally, we license SQL by the CPU, and since my Essbase servers are all quad-proc and the SQL servers are getting by just fine with dual-proc, the cost to license it for two more CPUs is quite significant — particularly when all of the other optimization methods are essentially ‘free’.
- If loading from a SQL RDBMS: your bottleneck may very well be the network speed here. If you can, make sure your Essbase server is on the same LAN as the SQL server, with the fastest possible connections (Gigabit or better). If you are loading from a SQL table and using the WHERE clause in your load rule, or you are loading from a SQL view, make sure you have good indexes setup in SQL. This can make a HUGE difference if you are loading just the records from Period 08 and you have an index, versus making the SQL server scan the entire table. If performance is extremely critical see if you can pre-stage text files on the Essbase server (like tip #2). If loading to BSO, order the rows to match your dense/sparse settings and the order of the outline (basically, you are trying to give Essbase a hand and load up a whole dense datablock in as few passes as possible).
- Do as little work as possible in the load rule. The cost of doing text replacements, column swaps, accept/reject rules, and all that stuff can really add up. If you can do this elsewhere then do it there (e.g., in a SQL view or having your ETL software prep it for you in the format you need).
- Tweak your settings. The Essbase.cfg has some black magic stuff in it. Try the DLTHREADSWRITE parameter (check the DBAG for details) to see if you can throw some threads at the problem. Watch your performance on the server — slamming all the CPUs may cause performance for other users to decrease.
- (BSO) Sort the data. Try to give Essbase a a hand. If your records are sorted such that Essbase is looking at the fewest blocks at a time, and reading the items in the same order as the outline, you’ll reduce the punishment to Essbase and help it load records faster.
- Outline optimization. This one, of course, applies to just about everything you do in Essbase. Smaller datablocks are your friends — in the sense that if you are committed to your dense/sparse settings, see if you can lighten up the dense blocks with some strategic dynamic calcs and labels. For instance, my Time dimension is usually Time –> Quarters –> Periods and nine times out of ten, when Time is dense, the Quarters and Time members are dynamic calc instead of stored. Of course, there may be numerous other reasons that the dense/sparse settings are what they are (calc performance, retrieve performance, etc), so don’t go making changes without understanding them.
- Load and Swap. You may find it useful to load up a cube separate from the production cube, then implement MaxL to drop it on top of the production cube (all on the same server of course). This way you do all of the hard work in one and when it’s ready you can just pop it in place. I think this works better in theory than in practice, at least for me. I initially tried this with some very large ASO cubes, and although the performance wasn’t terrible, at least with version 7.1.x of Essbase, the swap process (the MaxL was a “create or replace” command) was not very graceful — it would shake the server to its knees during the swap process. I eventually dropped this method in favor of using all the other points to optimize and make the down time even smaller.
- Use your own load rules instead of EIS. Out of convenience, and particularly on small cubes, I will load data with EIS. However, you are going to see better performance when you use your own load rules and optimize them. Besides, you probably already did it this way since you are concerned about performance, and this is probably in an automation script anyway.
- Reduce the size of the data members. Any time your bottleneck is the speed of the transmission of information, try to cut it down (particuarly for SQL loads across the LAN). For example, don’t have the field be “Period 01” if you can use “P01” instead. Use “08” instead of “Yr2008”. Try to balance this with how much work your load rule is doing (Tip #4).
As always, experiment! Try different combinations to see what works and what doesn’t. Remember that for squeezing out that extra bit of performace, you are trying to help Essbase do its job better. Always remember that dimensions can be and probably are (on cubes you didn’t build) setup how they are setup for a reason — i.e., you might have worse load time but the benefit is faster calc or retrieve time. If you have your own tips please let me know!