Commenting on Agile Enterprise Data Modeling
I’ve been practicing agile BI for over twenty years, and am very happy to see this article by Information Management Magazine’s Steve Hoberman: “Is Agile Enterprise Data Modeling an Oxymoron?“
The gist of the article is that agile practices have great value in enterprise data modeling, a sentiment I heartily agree with. This posting recaps my comment on Hoberman’s article.
Agile BI, which includes agile data modeling, deliberately set outs to deliver business value early, often and continuously in the form of meaningful, timely, high quality analytics. Delivering the information business decision makers require when they need it is the whole point of BI. Nothing else matters. Not building elegant data models. Not buying, installing and configuring hugely expensive enterprise BI software. Not building systems that can answer any question that might ever be asked.
Far too many BI initiatives fall into the trap of trying to deliver the universal analytical appliance as their first achievement. As the maxim puts it: “You can’t start with everything; if you try you’ll never deliver anything.”
While I agree with the major thrust of Hoberman’s article there are a couple of misinterpretations of agility that it perpetuates.
First up: “A majority of projects driven by agility, however, lack a big-picture focus and often strive to deliver small slices of functionality within tight time frames, at times redoing and revamping prior work.” While it seems reasonable, even wise, it’s naive and there are a couple of corrections required to this statement.
Redoing and revamping prior work is an essential ingredient element in any agile undertaking. In “normal” agile software development this is called refactoring and is a good thing. Agile BI is no different. Refactoring is always part of the process – agile projects surface and embrace it, make it part of the normal course of events; big-BI project hide it under the covers where it pollutes everything, adds tremendous friction, and invisibly hinders progress.
There’s a hint in the statement that agility projects bias is towards a “lack a big-picture focus”. (yes I know it’s qualified but the smell of allegation lingers). This is a misrepresentation of the true nature of agility. Agility requires a professional awareness of the large scale structures of the diverse elements in the mix. A project that doesn’t do this may wear the clothes of agility, but doesn’t possess its soul. Just throwing functional bits out into the world without regard for the rich complexity of the environment will inevitably lead to a terrible mess. An organization that creates such a mess is unlikely to get it cleaned up, and some poor schlub is usually left trying to continue to work with it while getting blamed for the inability to produce results.
There’s also an argument to be made against using a pre-existing framework to guide agile BI. While it’s incumbent upon BI professionals to recognize the conceptual architectures-business and technical-forming the environment, the paradigm that a pre-formed architecture is necessarily a valuable asset fails to recognize the costs, burdens, and downstream ill effects that result from the principle of simply trying to “fill in the details.” Experience shows that the best architectures are emergent, resulting from the continual refactoring of the developing systems while adhering to suitable and appropriate architectural principles. In this approach, there’s always an intentional architecture that fully supports the real needs of the system, is flexible and adaptable (agility at work!), and can be grown and refined as necessary while never imposing costs not directly related to delivering specific business value.
How Big BI Fails To Deliver Business Value
Business Intelligence, the noun, is the information examined by a business person for the purpose of understanding their business, and using as the basis of their business decision.
Business Intelligence, the verb, is the practice of providing the business person with the Business Intelligence they require.
Business Intelligence is conceptually based in a valuable proposition: the delivery of actionable, timely, high quality information to business decision makers that provides the data-based evidence that they can use in making business decisions.
Timely, high quality information is extremely valuable. It’s also somewhat perishable. Deriving the maximum business value from Business Intelligence is the expressed value proposition of all BI projects. Or it should be.
Enterprise Business Intelligence usually means the entire complex of tools, technologies, infrastructure, data sources and sinks, designs, implementations and personnel involved in collecting business data, combining it into collective stores, and creating analyses that are accessed by the business people.
Enterprise BI projects are very strongly biased towards complexity. They’re based on the paradigm of using large complex products and technologies to design and build out data management infrastructures that underpin the development of analytics – reports, dashboards, charts, graphs, etc. that are made available for consumption.
Big BI is what happens when Enterprise Business Intelligence grows bigger than is absolutely necessary to deliver the BI value proposition. Unfortunately, Enterprise BI has an almost invariant tendency to mutate into Big BI. and the nature of Big BI almost invariably impedes the realization of the BI business value proposition.
Big BI is by its very nature overly large, complex and complicated. There are many moving parts, complicated and expensive products, tools and technologies that need to be installed, configured, fed, and cared for. All before any information actually gets delivered to the business decision makers.
Big BI is today’s Data Processing. In the old days of mainframes, COBOL, batch jobs, terminals, and line printers business people who wanted reports had to make the supplicant journey to their Data Processing department and ask (or beg) for a report to get created, and scheduled, and delivered. Data Processing became synonymous with “we can do that (maybe) but it’ll take a really long time, if it happens at all.” Today’s Big BI is similarly slow-moving and hard to get anything out of; it’s the nature of the beast.
There are multiple reasons for this sad state of affairs. New BI programs need to acquire the requisite personnel, infrastructure, tools, and technologies. All of which need to be installed and operational, which can take a very long time. Data needs to be analyzed, information models need to be created, reporting data bases designed, ETL transformations designed and implemented, reporting tool semantic layers-e.g. Business Objects Universes, Cognos Frameworks-need to be constructed, reports need to be created, and, finally, the reports made available.
Only then do the business decision makers (remember them?) get anything out of the entire process.
This entire process can take a very long time. All too frequently it takes months. As much because of the friction inherent in coordinating the many moving parts and involved parties, each with their own bailiwick and gateways to protect, as with the inherent complexity of the tools and process.
The tragic part in many Big BI projects is that the reports delivered to the business people usually fall far short of delivering the business value they should provide.
The reports are out of date, or they’re incomplete, or they’re no longer relevant, or they are poorly designed and executed, or they’re simply wrong because the report production team “did something” without having a clear and unambiguous understanding of the information needs of the business decision maker.
In far too many instances there’s a vast gulf between the business information needs and the reports that get developed and delivered. This situation occurs because there’s too much distance, in too many different dimensions, between the business and the report creators.
I’ve put together a diagram of the structure and processes and a typical Big BI project. It’s worth studying to see how far apart the two ends of the spectrum are. At one end is the business person, with their business data that they need to analyze. At the far end, at the end of all of the technology, separated by multiple barriers from the business, are the report generators.
A typical scenario in this environment is for somebody, with any luck a competent business analyst, usually a project manager, all too frequently a technical resource, to be tasked with interviewing the business and write up some report specs, perhaps some wireframes, maybe Use Cases (which aren’t a good tool for capturing Business Intelligence analytical requirements).
These preliminary specs are then used as the inputs into the entire BI implementation project. Which then goes about its business of creating and implementing all of the technological infrastructure necessary to crank out some analyses of the data.
The likelihood of this approach achieving anything near the real business value obtainable from the data is extremely low. There’s simply too much distance, and too many barriers, between the business decision makers and the analyses of their data. Big BI has become too big, too complex, with too much mass and inertia, all of which get between the business and the insights into their data that are essential to making high quality business decisions.
There is, however, a bright horizon.
Business Intelligence need not be Big BI. Even in those circumstance where Enterprise BI installations are required, and there are good reasons for them, they need not be the monolithic voracious all-consuming resource gobblers they’ve become. Done properly, Enterprise BI can be agile, nimble, and highly responsive to the ever-evolving business needs for information.
Future postings will explain how this can be your Enterprise BI reality.
High Level Enterprise BI Project Activities Diagrams
This PDF document–Data Warehouse Typical Project–maps the normal set of high level activities involved in Enterprise BI projects.
Typical Enterprise BI projects are complex complicated affairs that have difficulty delivering Business Intelligence quickly due to multiple factors:
- There are many complex discrete interconnected activities which have stringent analytical interconnections and dependencies
- there is generally a lack of analytical expertise brought to bear on the data, meta-data, and meta-meta-data that needs to be transported and communicated between the various parties
- high quality requirements are extremely difficult to achieve, primarily because there’s an enormous amount of setup work that needs to occur before an Enterprise BI tool can be connected to real data and be used to develop preliminary analytics
- absent live reports, any end-user analytical requirements are usually low quality, low fidelity best guesses made in the vacuum of the real feedback essential for arriving at truly useful Business Intelligence
This PDF–Data Warehouse – Tableau Augmented Project Processes–highlights those areas where Tableau can be profitably employed to dramatically improve the velocity and quality of the Business Intelligence delivered to business decision makers, and in significantly streamlining the entire process stream by introducing the practices of data analysis into the Enterprise BI project activities.
In practice, this approach has been shown to provide business value early and often, and result in better Enterprise BI outcomes much sooner and at lower cost. Leaving more resources available to continuing to expand the scope, sophistication, velocity and quality of the Business Intelligence provided, and therefore providing a much higher business value delivery.
Common Problems Saving Tableau Packaged Workbooks
Tableau’s packaged workbooks are tremendously useful. Bundling data with the workbook allows anyone to peruse the data using the Workbook without having access to the original source data. I use them frequently in large BI projects as a way of providing Reports to end users, analyses of data all along the project process chain, even in providing the database schema to downstream technical teams when the “normal” processes take too long.
Packaged Workbooks can be opened with the Desktop Application or the Tableau Reader. Published to the Tableau Serer they’re available just like normal Workbooks.
Creating a Packaged Workbook is really pretty straightforward: create an extract of the data (for every data source used in the Workbook); save or export the Workbook in its packaged form.
There are a couple of reasonably common circumstances I’ve run into again recently; this post covers them.
Problem—SQL Parsing error creating the extract
I’ve seen this more than once: when Tableau tries to create an extract it fails with a fairly obscure error along the lines of “Data format string terminated prematurely”, which seems to indicate that there’s been a problem parsing a date value using whatever internal format it’s employing. There are no calculated fields or data calculations, so it’s really puzzling and Tableau doesn’t really provide any diagnostics.
There’s also the matter that this problem doesn’t surface until the extract is under preparation, implying that it’s not involved with any of the fields being referenced in the Worksheets, which leads us to the
Solution—Hide the unused fields and try to create the extract
Almost too easy, isn’t it? Hiding the unused fields also reduces the size of the extract, which in some cases makes a big difference. On the other hand, the unused fields aren’t available for use in the extract, and therefore in the Packaged Workbook; this isn’t a problem for Tableau Reader users, but limits those Desktop Application and Server users who otherwise could extend the Workbook’s analytics.
Problem—creating the Packaged Workbook generates an “unconnectable data source” message
[insert message here]
Solution—find and close any Data Connections that aren’t being used
Orphaned Data Connections can have a number of causes, but usually because the last Worksheet using the Data Connection gets deleted or pointed to another Data Connection.
Finding and closing unused Data Connections from within the Workbook can be a bit of a hunting expedition–this will be the topic of another post. But very soon the Tableau Inventory will identify orphaned Data Connections.
Inventory Your Tableau Workbooks (or…)
“Are You Using That Field?”
It always happens: you’ve put together a nice set of Workbooks, produced a bunch of really valuable analytics, and now you need to figure out what’s where so you can:
- accommodate the inevitable changes to the database
- enumerate the reports
- identify the calculated fields, and their calculations
- identify what Dashboards and Worksheets are in what Workbooks
- the relationships between Dashboards and Worksheets
- so on and so forth and such like
You COULD manually browse through the Workbooks, Dashboards, and Worksheets and dutifully record everything.
(good luck with that, and with keeping up with changes)
Or you could automate the process by processing the Workbooks and teasing out the information about Dashboards, Worksheets, Rows, Columns, Filters, Fields, etc. into data that Tableau can read and then prepare a Tableau workbook that provides the essential information.
Or you could use the Tableau Inventory application that I’ve built to do the inventorying for you, and the TableauReportsInventory.twb to see the inventory.
There’s a Tableau Reports Inventory – Sample Workbooks PDF attached to this post with the output of TableauReportsInventory.twb connected to the inventory of the Tableau Sample Workbooks attached to this post. (I can’t attach Tableau Workbooks)
If you think it’s useful I’d sure like to hear about it.
I’m preparing to release the Tableau Inventory as an Open Source project, and welcome anyone who wants to participate.
I can be reached at Chris@Gerrard.net – please put “Tableau Inventory” in the subject line.
Or comment here.
Few v Forrester
I’ve been pondering a kerfuffle that erupted when Stephen Few criticized a blog by Forrester’s Boris Evelson here. Stephen took Boris to task for presenting a list of the desirable characteristics of data visualization products. A melee erupted.
I’ve been thinking about this, and why I’m troubled by the nature of the conversation.
First off, I wholeheartedly agree with Stephen’s overall analysis of the contents of Boris’ Forrester blog in terms of the blog’s value in assessing the data visualization abilities of various products.
I applaud Stephen for pointing out the abrogation of professional responsibility by Boris and Forrester in publishing and promoting analyses which are in the main actively damaging in that they propagate misconceptions about their nominal topic area.
It’s distressing that many people have been critical of Stephen for providing the extremely valuable service of pointing out that the Emperor has no clothes. That many of these criticisms have come from Big-BI vendors is no surprise, as they can be expected to attack anyone who exposes their flaws and points out their shortcomings. It’s disheartening to see the same sentiments parroted by those who are actually suffering from these very same problems with the various products, but this simply attests to the success of the Big-BI vendors, and their promoters, in establishing the framing of the dicussion about the BI environment, of which data visualization is (in their view) a small part.
The big fly in the BI soup is that BI has become framed in the media and in the minds of most people solely in terms of very large, complicated, complex, expensive, and difficult to install and get operational data warehouse-based commercial products.
BI has become the fiefdom of large software companies whose motivation is to sell larger and more expensive products.
They are aided in this by organizations like Forrester and Gartner whose revenue is derived from providing analyses of the products in the areas they review. It’s in their interest to collude, if even through harmonic reinforcement, with the large BI vendors in promulgating the idea that Big-BI products -are- the way BI is done. The larger, more complex, and more expensive the products, the more “value” the analysts’ products-reports, quadrants, capability analyses, etc,-appear to be and the more revenue they can derive from them.
On one level, it’s understandable that Boris’ blog enumerated the feature set that’s been created as the desirable characteristics of commercial BI tools; these are the features that the Big-BI vendors have been promoting, are surfaced as Good Things in their products, and are therefore necessarily going to be prominent in the lists of features provided in Vendor representations, and by the majority of non-expert IT people who are passive market followers.
On another, more meaningful level, and this is where I think Stephen rightly criticizes Boris and Forrester, passing off bad information (and Boris’ list of features really does qualify here) as informed, expert analysis and advice really does come up short of professional standards, and does real harm in that it continues to reinforce harmful ideas that limit the effectiveness of delivering high quality information to people who need to make decisions.
I’ve been working in BI for twenty five years. Early in my career I was lucky enough to work for one of the companies that pioneered the field of BI. Our product was a specialized reporting technology that let us essentially sidestep the entrenched Data Processing environments then controlling things and get information into the heads and minds of business decision makers, often in hours instead of the weeks and months it took the DP shops to bring their big machinery to bear on even simple reporting requests.
BI has become the modern DP. The environment is ruled by the commercial interests of big technology companies.
Data visualization is, in this environment, a small backwater of little interest to the Big-BI-invested parties.
The products that provide high quality visualization capabilities are the early mammals in this world. They are more nimble, agile, and provide tremendous value that Big-BI tools do not.
To sum up, the crux of the larger issue here is whether one considers the delivery of high quality information or the installation of complex, expensive big machinery to be the point of BI. If the former, use the good tools and observe the principle “All BI is Local”; your users and clients will appreciate the value you deliver. If the former, spend a lot of time and money while delivering little if any information to the business decision makers; your users and clients will be impatient and frustrated.
Better yet, use the best modern tools where they provide their real value, and use the Big-BI tools where they contribute to improving the delivery of information, not impede it.
Design your Bullet Graphs with The Bullet Grapherator
The Bullet Grapherator now has an integrated Designer with which you can design and render your Bullet Graphs in PNG, JPEG, and SVG.
Using the Designer’s Renderings tab, you can also create the Bullet Graphs in a directory of your choosing, which makes them readily available for referencing in your dashboards.
These Bullet Graphs were created during the Designer session shown above:
Unfortunately, WordPress doesn’t allow me to upload the SVG file.
Bullet Graphs may also be saved as templates. These templates are intended to be used when the business data changes—simply:
- load the appropriate template;
- provide the new business data
usually the performance and comparative measures, less frequently the qualitative ranges’ values and the upper and lower limits of the quantitative scale; - provide the location, name and formats of the files to create;
- generate the Bullet Graphs
How to get it (it’s simple and easy):
The Bullet Grapherator’s site is here.
Download it from here.
Getting Started instructions on using it are here.
Instructions on using The Designer are here.
I’m interested in hearing your observations, reactions, hints, suggestions, rants, comments, or anything else.
The Bullet Grapherator is online.
After noodling around with Bullet Graphs, and being frustrated at the lack of readily available, easy to use, highly effective tools for creating them, I started an Open Source project to create just such a tool.
Announcing the Bullet Grapherator, a Java project for creating dynamic data-driven Bullet Graphs that are faithful to Stephen Few’s design.
As of this posting, the Grapherator is online and fully capable of generating SVG Bullet Graphs. It’s in its first release, and there are plenty of enhancements to make, most obviously in generating other media formats – jpeg, png, PDF, etc.
Here’s the canonical Bullet Graph created by the Grapherator:
Here’s the Java code that created it:
BulletGraphSVG bGraph = new BulletGraphSVG();
bGraph.setLabel("Revenue 2005 YTD", "(U.S. $ in thousands)");
float[] ranges = {200,250};
bGraph.setRanges(ranges);
bGraph.setLimit(300);
bGraph.setPerformance(270);
bGraph.setComparison(265);
bGraph.setScale(6);
String result = bGraph.getSVG();
I invite anyone who’s interested to please take a look and let me know what you think. Suggestions, enhancement requests, criticisms, and all other forms of civil feedback are welcome.
Bullet Graph Design: Scale Geometry and Body Alignment
The previous post explored the geometry of the Bullet Graph Body. This post explores the geometry of the Bullet Graph Scale and the alignment of the Scale’s elements with the Body and the Body’s elements.
See Bullet graph design: nomenclature for the nomenclature used in describing the Graph Body, Text Label, and Quantitative Scale.
This is the example Bullet Graph used in this exercise, copied from Stephen Few’s Bullet Graph Specification, without the Text Label:

For the purposes of this article the Scale’s elements have been rendered somewhat oversized; for example, although the Scale Labels are identified as Arial 6pt text, they are relatively large compared to the graph Body. This makes it easier to illustrate the alignment of the Scale’s elements with each other, and with the Body’s elements. Once the relationships have been rationalized and an API for Bullet Graph creation is developed it will be easy to configure the various dimensions, colors, and other properties of the graph to provide enough flexibility to render high quality presentations in different contexts.

As shown here, the graph’s Body and Scale are abutted, joining seamlessly at their common edges.

The individual Scale marks are laid out evenly, spanning the full length of the Body, with projections on the baseline and terminus of the body to accommodate the multiple requirements for: 1) aligning the midlines of the Ticks and Labels; 2) aligning the leftmost Scale element’s Tick with the Body’s baseline; and 3) aligning the rightmost Scale element’s Tick with the Body’s terminus. These are show below.

Each point on the Scale is identified by a Tick mark and a text Label. The Tick is used to precisely denote the point on the graph corresponding to the Scale point’s value. As the Tick has a definite width (it’s really a rectangle), it’s essential to be specific in determining how to align it with the Graph Body in order to accurately and precisely denote the correct value. Although seemingly simple, there are subtleties involved, as shown in the examples below.

The leftmost Scale point’s Tick mark is left-aligned with the baseline of the Graph Body. This is different from the other Scale Tick marks, but is a visual convention that works well. As always, the Label is center-aligned with the Tick; this will almost in all cases cause the Label’s bounding box to extend to the left of the baseline of the Body. Handling text has particular considerations—look for a future article on handling the various text elements found in Bullet Graphs.

The Scale’s midpoints’ Tick marks are right-aligned with the corresponding value, scaled to match the length of the Graph Body. The importance of this alignment is revealed when the Tick’s value corresponds to the value of one of the Qualitative Bars, in which case they are right-aligned on the value. The basic principle here is that the leading edge of the element corresponds to the value being visually coded.
The Label is center-aligned to the Tick.

The rightmost Scale point’s Tick mark is right-aligned with the point’s value, scaled to the Graph Body. This is the same convention as for the midpoints.
The Label is center-aligned to the Tick; this will cause the label to extend to the right of the Graph Body, in almost all cases.
Bullet graph design: Body Geometry
Examining the geometry—the dimensions of and relationships between—the various elements of the body and quantitative scale.
See Bullet graph design: nomenclature for the nomenclature used in describing the Graph Body, Text Label, and Quantitative Scale.

This is a copy of the first Bullet Graph Stephen Few presents in the Bullet Graph specification, without the Text Label.
This study is an exercise in determining the geometry, dimensions, and alignment of the various elements of the Graph in order to assist the creation of a detailed design specification for other bullet graphs that conform to this example and can be created programmatically.
For the purposes of this article – describing the geometry and relative alignments of the graph’s components the Scale’s elements have been rendered somewhat oversized.

This diagram has exploded the different bars of the graph to reveal their layering. (Well, actually, it slipped the bars down, but (exploded” is a much more dynamic term.)
The top-down ordering shown matters in that each bar hides everything beneath it in the final rendering, making it much easier to build the final image.
The * indicates that there may be more than one middle bar, i.e. when there are more than three Qualitative Ranges.
In this diagram the Comparative Measure is placed on top of the Performance Measure; the specification isn’t clear on this and as long as they are both are colored 100% black and are have these dimensions and positions it seems not to matter. (while this may not matter in normal circumstances it may become significant in some edge cases. Look for further studies on this and related topics)

As illustrated in this diagram the baseline dimensions of the graph have been normalized to 200 units wide and 12 units high. (Inkscape assumes a unit of pixels if none is specified, and this works well for our purposes.)
The dimensions of 200×12 provide a good approximation of the dimensions of the Canonical Bullet Graph, and have the advantage of providing easy-to-work-with integer values for most of our calculations employed in programmatically constructing bullet graphs. The 200 units width is easy to scale for both absolute and percentage scales, and the 12 units height makes it easy to center the Comparative and Performance measures along the horizontal axis for a variety of the possible values that their heights may end up being (12 is easily divided into 1/6s, 1/4s, 1/3s, 1/2).
It’s assumed that the bottom Qualitative Range Bar will span the full 200 units width, and that the other graph elements will be scaled proportionally to it.

The Performance Measure bar is defined to be one-third of the width of the graph baseline value for the bar group, or 4 units. When rendered on top of the bar group this results in the Performance Measure occupying the center third of the bar group’s height, which provides for good visual discrimination between the elements.
The Performance Measure is left-aligned with the graph bar group and its width is proportional to its fraction of the maximum scale value of the graph, normalized to the 200 units of the width of the graph bar group.

The Comparative Measure measure is defined to be a rectangle 2 units wide and 10 units high.
When rendered horizontally aligned atop the graph’s Qualitative Range bars there is a 1 unit gap on either side of the Comparative Measure, aiding the viewer’s visual location of its position. When rendered atop the Performance Measure the Comparative Measure extends 3/4 of the way across the visual gap between their outer edges of the PM and the edge of the bar group.
The CM’s y coordinate is calculated to achieve its horizontal alignment with the graph bar group and the Comparative Measure.
The CM’s x coordinate is calculated to be proportionate to its fraction of the maximun scale value of the graph, normalized to the 200 units of the width of the graph bar group, minus the 2 units of the CM’s width. This ensures that the leading edge of the CM is exactly geometrically aligned with its value.

