Great article on "Search" from back in '08 in Forbes. "Why
Google Isn't Enough
", by Dan Woods. He's talking about "Enterprise
Search": why in-house Google-style search is really hard and often
Here's the cool quote.
search systems also index and navigate information that may reside in
databases, content management systems and other structured or
semi-structured repositories. The contents may include not only text
documents, but also spreadsheets, presentations, XML documents and so
on. Even text documents may include some amount of structure, perhaps
stored in an XML format.
Everyone thinks (hopes)
that the mere presence of data is sufficient. That fact that it's
structured doesn't seem to influence their hopes.
complication is simple -- and harsh. Many enterprise databases are
really bad. Really, really epically bad. So bad as to be
incomprehensible to a search engine.
many spreadsheets or reports "stand alone" as tidy, complete, usable
create a budget for a project. It seems clear enough. Then the
project director wants to know if the labor costs are "burdened or
unburdened". So the column labeled "cost" has to be further qualified.
And "burdened" costs need to be detailed as to which -- exact --
overheads are included.
So a search engine
might find your spreadsheet. If a person can't interpret the data,
neither can a search engine.
You can build a clever star schema
from source data. But what you find is that your sources have nuanced
definitions. Each field isn't directly mappable because it includes one
or more subtleties.
Customer name and address.
Seems simple enough. But... is that mailing address or shipping
address or billing address? Phone number. Seems simple. Fax, Voice,
Mobile, Land-line, corporate switch-board, direct? Sigh. So much
Of course the users "just want search".
Sadly, they've created data so subtle and
nuanced that they can't have search.