Release of cubedata 1.0.0 / jason's hyperion blog

Continuing in the same spirit as the release of Jessub, I am happy to announce the release of another open source tool meant to benefit Hyperion Essbase administrators: cubedata. cubedata is a simple tool that makes it easy to generate a text file that can be loaded to a cube. Well, of course, there’s nothing too special about this. The real purpose of the tool is to be able to generate huge text files based on the permutations of data that you specify. For example, let’s look at a simple data definition:

dimensions=Time,Scenario,Location,Departments
members.Time=P01,P02
members.Scenario=Actual,Budget
members.Location=Lo.806,Lo.808,Lo.822
members.Departments=Dt.01,Dt.02,Dt.03,Dt.04

So we just have a really simple definition in a configuration file. We run cubedata and tell it to use this file to generate some data for us. Out comes 48 rows of data: 2 time periods x 2 scenarios x 3 locations x 4 departments = 48 combinations. The generated data file looks like this:

P01,Actual,Lo.806,Dt.01,911.85
P01,Actual,Lo.806,Dt.02,887.100
P01,Actual,Lo.806,Dt.03,251.49
P01,Actual,Lo.806,Dt.04,115.64
P01,Actual,Lo.808,Dt.01,197.60
P01,Actual,Lo.808,Dt.02,704.71
P01,Actual,Lo.808,Dt.03,512.76
.. more rows ..

The configuration file lets you specify a few other options such as the column delimiter (default is comma), the numerical range of fact values to generate, and a few other things such as the “load factor” (what percentage of data combinations will have data).

cubedata, like Jessub, is licensed under the Apache Software License 2.0, a very permissive license that basically says you can do whatever you want to the code. The project is shared at GitHub in one of my public repositories.

I haven’t done extensive testing on the program but it does do a reasonable job of telling you if the configuration in incomplete or otherwise incorrect. I have tested it with quite a few dimensions and members and was able to generate a file with many millions of records quite easily. I don’t see any reason why it wouldn’t support generating absolutely massive amounts of data. It’s programmed in such a way as to iterate over the dataset, rather than try to keep it all in memory at once, meaning that there shouldn’t be any memory issues with regard to generating massive data sets.

So, there you have it. Another simple tool that might make developing and testing a little easier for you, particularly if you hate generating dummy data by hand and/or you don’t have a system to source data from that is ready or convenient.

As always, please feel free to let me know any suggestions or comments you may have and I will be happy to look in to improving the program. If you end up downloading the code and making tweaks please share them back if they would be useful to more people.