Data. Data. Data.
December 29th, 2008by Jeremy Thomas
Something I learned while working with the Information Management group at BearingPoint down in Australia continues to resonate for me at my “Web 2.0-ish” job in San Diego, CA. Data integrity is king but is bloody hard to maintain. Consider a datawarehouse, where information about information is stored, often for reporting purposes. Datawarehouses can be used to answer the question “how many customers do I have?”, or more specifically, “how many residential customers do I have?”. Seems simple enough.
But data, dare I say “truth”, is federated. And each member of the federation has its own vernacular.
For example, the residential loan processing system might call a customer a “customer”, while the commercial loan processing system calls a customer a “client”. At the core these are the same entities, with “residential” or “commercial” being a modifier (as an adjective is to a noun). So a datawarehousing solution would apply its central vernacular to these entities allowing the question “how many customers do I have?” to be answered even though the answer is informed by two sources of truth.
Data transformation and categorization works moderately well when an organization has control over its data sources (and has, therefore, a limited number of vernaculars). But consider the La Jolla, CA, page on Yelp, http://www.yelp.com/la-jolla-ca, which claims that La Jolla has 1028 restaurants worth reviewing. Most of this data is user-submitted. And how does a user classify Starbucks? “Food”? “Restaurants”? And what about subcategories? “Coffee and Tea”? “Desserts”? Some users might choose to use some of these categories, while others might use all. And it’s consistency that lies at the heart of the issue of maintaining data integrity. A user should have access to all restaurants when browsing by “Restaurants”.
If information is consistently categorized, even incorrectly, we can get accurate answers to our queries. But if it’s inconsistently categorized our answers will not be comprehensive.
So how, then, do websites like yelp.com deliver meaningful, consistently categorized results when they’re reliant on crowdsourcing? Are there really only 1028 review-worthy restaurants in La Jolla? And what of those restaurants that are mistakingly subcategorized as “Turkish” when they’re actually “Lebanese”?
Manual Labor is the answer.
I suspect sites like Yelp.com leverage services like mechanical turk to comb through the thousands of user-submitted records apply a more uniform categorization scheme. And this is why data integrity is bloody hard to maintain as there is so much manual labor involved. I question the sustainability of such a model, especially as a site grows and gathers more data.
But, what I can say, is it is more important for data to be correctly categorized than it is for it to be mostly correctly categorized. If users on Yelp search for “Automotive” assets and are shown beauty salons they will leave. Data integrity is king.
newthinking.bearingpoint.com
September 8th, 2008by Jeremy Thomas
My former employer, BearingPoint, has recently launched newthinking.bearingpoint.com, a WordPress-powered blog seemingly open to all employees. This is a bold move as consulting companies typically guard their intellectual property with an iron first. But BearingPoint has been a leader when it comes to transparency. MIKE2, BearingPoint’s information management methodology, launched in 2005 and is “open source”, meaning it’s free for all to consume and contribute to, even competitors. The value to doing this is that BearingPoint capitalizes on the IM market taking business from rivals who would otherwise charge for the information that is free on MIKE2. And, while open, IM methodologies are complex to implement, and clients will be quick to select BearingPoint as their implementation vendor.Kudos to Nate and Jay, who must have played a huge role in getting thiew new blog rolled out. And check out this post from my buddy Sean (who’s getting married next month). Sean is an up and coming Enterprise 2.0 star at BearingPoint. I’m glad to see the new school is starting to have an impact on an otherwise traditional organization.
Update: It looks like Paul Dunay, Global Director of Integrated Marketing at BearingPoint, is the man responsible for newthinking.bearingpoint.com.
E2.0 Stagnation
June 23rd, 2008by Jeremy Thomas
We seem to have done a good job about defining the enterprise knowledge management problem and how Enterprise 2.0 wants to fix it. Knowledge is locked in people’s PCs, file shares, is hard to find and is underutilized. Not only that, corporations fail to efficiently tap into their human resources and facilitate the creation of weak ties between employees. I think everybody gets it now.
So why is Andrew McAfee still talking about why email sucks? Haven’t we heard this story time and time again? Why don’t we talk more about how Enterprise 2.0 has helped companies, about how it’s had the dramatic impact that we predicted two years ago? Maybe it’s because it’s not happening, or maybe it’s because the doers are quietly doing and have no time to blog about it.
With that, I’m super stoked about TechCrunch’s new enterprise software-focused blog, TechCrunchIT. TechCrunch has been the defacto leader in all things Web 2.0. Maybe they’ll bring some fresh thinking to the Enterprise 2.0 space.
Why Enterprise Search Could be so Much More than Search
November 22nd, 2007by Jeremy Thomas

Enterprise Search (the first “S” in SLATES) has long been heralded as the mechanism companies can use as a gateway to discover knowledge assets buried across the organization. I’ve discussed this topic a few times on this blog. Most enterprise search solutions integrate (at the API level) to line of business and reporting systems, meaning users can benefit from these systems without having to actually access them (try searching for “GOOG” on google.com to see how this works). Users who may not have known these systems exist now benefit from them.
But what of the other useful statistics enterprise search solutions can offer? Below I cover a few ways in which search solutions can enrich the Enterprise 2.0 and knowledge management experience.
Trends
What’s hot? What are people within the organization interested in? What documents are viewed the most? These types of statistics help showcase the collective intelligence of the enterprise and provide valuable insight into what information assets are deemed valuable, or at least interesting, by the knowledge worker base. I can see an enterprise-ready application like Google Trends (see graph above) being used to analyze and provide this kind of business intelligence.
Correlation to Taxonomies
I discussed automated content tagging a few months ago, and search engines are certainly optimized to do this. They associate keywords (tags) to documents, and this inherently creates relationships between documents. So, from a knowledge management perspective, I can see tremendous value in enterprise search solutions providing business intelligence on the richness of information assets related to a corporate taxonomy (with an element in the taxonomy being treated as a keyword by the search engine). With such a solution an organization could automatically determine that, within its Information Management group, it has 1,745 knowledge assets across 7 business units and 6 countries pertaining to “metadata management”, for example.
Information Asset Age
Enterprise search solutions also store information about when a document was created or last modified. When combined with the correlation capability, this can be valuable information for a knowledge manager. For example, if an inquiry into information assets related to “web 2.0″ revealed that 75% of those assets were more than one year old, he’d know he’s in need of an update of knowledge about web 2.0.
Conclusion
Enterprise search vendors should take a serious look at packaging enriching business intelligence capabilities into their solutions. Search engines have a wealth of information not only about information assets but also user patterns. Why not expose this information?
Follow Me on Twitter
Co-Author