Glitch with a non-printing Unicode character in member name

This is an Essbase bug, kind of. I’ve been working on a project lately that uses the relatively new MaxL Essbase outline export command (yes, Pete, I called it relatively new again, even though according to you it’s not… well, it’s relatively NEW TO ME SO THAT’S WHAT MATTERS… :-). Anyway, I wrote a quick XML parser for the output, using Java.

The nice thing about the parser is that it uses something in the Java language called JAXB. It’s a really nice way of modeling the contents of an XML file using Java classes, so that you don’t have to write your own parsing code, which is tedious and error prone. There are reasons you might use either approach, but overall I have been very happy over the last few years with the ability to write XML readers in Java with JAXB.

Curiously, I came across an outline export that would cause the parser to throw an exception. The Java stack trace indicated that an illegal character (0x1f – that’s not the character itself, rather, the Unicode character ID) was at fault. Specifically, character 0x1f is the “unit separator” character. In a nutshell you might say that while most of us are used to writing things with letters and numbers and things like tabs, spaces, and newlines, there happen to be all of these other weird characters that exist that have arcane uses or historical reasons for existing. It’s such a prevalent issue (or at least, can be) that many advanced text editors have various commands to “show invisibles” or non-printing characters. One such tool that many of us Essbase boffins are adept with is Notepad++ – a veritable Swiss army knife of a text editor.

Nicely enough, the Java stack trace indicated that the problem in the XML was with parsing a “name” attribute on a <Member> tag – in other words, an Essbase member name in the source outline contained an invisible character. As it turns out, in XML 1.0 it is illegal to have this particular character. So while Essbase happily generates invalid XML during the export, when I try to import it with Java, I get the exception. But how to find the offending member? I mean, how do you do a text search for an invisible character (seriously, this is like some “what is the sound of one hand clapping” kind of stuff).

In Notepad++ you can search for a regular expression. So I turned on Show Invisibles, pulled up the Find dialog, checked on the “Use Regular Expressions” option, then typed in [\x1f] which is is the Regex code to tell Notepad++ to search for this little bastard of a character. Sure enough, there was exactly one in the output file that surely snuck in from an otherwise innocuous copy and paste to EAS some time ago. I fixed the member name in EAS, reran the export, reprocessed with the parser, and all was well again in the universe.

Leave a Reply

Your email address will not be published.