<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Social Glass &#187; data</title>
	<atom:link href="http://www.socialglass.com/tags/data/feed" rel="self" type="application/rss+xml" />
	<link>http://www.socialglass.com</link>
	<description>All Things Relevant to a Technologist</description>
	<lastBuildDate>Tue, 25 May 2010 16:22:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Speaking at the Business of APIs Conference</title>
		<link>http://www.socialglass.com/speaking-at-the-business-of-apis-conference</link>
		<comments>http://www.socialglass.com/speaking-at-the-business-of-apis-conference#comments</comments>
		<pubDate>Fri, 13 Nov 2009 05:46:59 +0000</pubDate>
		<dc:creator>Jeremy Thomas</dc:creator>
				<category><![CDATA[active.com]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://www.socialglass.com/speaking-at-the-business-of-apis-conference</guid>
		<description><![CDATA[I&#8217;m happy to announce that I&#8217;m one of the featured speakers at the Business of APIs Conference in NYC on 16 November.  I&#8217;ve been leading the charge to open our data at Active.com, and we&#8217;ve started a slow rollout of our API.  I&#8217;ll be talking about the journey we&#8217;ve taken to get to where we [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" title="logo_apiconference.png" id="image261" alt="logo_apiconference.png" src="http://www.socialglass.com/wp-content/uploads/2009/11/logo_apiconference.png" />I&#8217;m happy to announce that I&#8217;m one of the featured speakers at the <a href="http://apiconference.com" onclick="pageTracker._trackPageview('/outgoing/apiconference.com?referer=');">Business of APIs Conference</a> in NYC on 16 November.  I&#8217;ve been leading the charge to open our data at Active.com, and we&#8217;ve started a <a href="http://developer.active.com/docs" onclick="pageTracker._trackPageview('/outgoing/developer.active.com/docs?referer=');">slow rollout of our API</a>.  I&#8217;ll be talking about the journey we&#8217;ve taken to get to where we are today with our API.  We&#8217;ve still got a long way to go.</p>
<p>If you&#8217;re in NYC on Monday and are interested in APIs, come by and check it out!<img src="file:///C:/Users/JEREMY%7E1.THO/AppData/Local/Temp/moz-screenshot.png" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.socialglass.com/speaking-at-the-business-of-apis-conference/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data. Data. Data.</title>
		<link>http://www.socialglass.com/data-data-data</link>
		<comments>http://www.socialglass.com/data-data-data#comments</comments>
		<pubDate>Tue, 30 Dec 2008 05:24:13 +0000</pubDate>
		<dc:creator>Jeremy Thomas</dc:creator>
				<category><![CDATA[data]]></category>
		<category><![CDATA[knowledge management]]></category>
		<category><![CDATA[web 2.0]]></category>

		<guid isPermaLink="false">http://www.socialglass.com/archives/242</guid>
		<description><![CDATA[Something I learned while working with the Information Management group at BearingPoint down in Australia continues to resonate for me at my &#8220;Web 2.0-ish&#8221; job in San Diego, CA.  Data integrity is king but is bloody hard to maintain.  Consider a datawarehouse, where information about information is stored, often for reporting purposes.  Datawarehouses can be [...]]]></description>
			<content:encoded><![CDATA[<p>Something I learned while working with the <a href="http://openmethodology.org" onclick="pageTracker._trackPageview('/outgoing/openmethodology.org?referer=');">Information Management</a> group at BearingPoint down in Australia continues to resonate for me at my &#8220;Web 2.0-ish&#8221; job in San Diego, CA.  Data integrity is king but is bloody hard to maintain.  Consider a datawarehouse, where information about information is stored, often for reporting purposes.  Datawarehouses can be used to answer the question &#8220;how many customers do I have?&#8221;, or more specifically, &#8220;how many <em>residential</em> customers do I have?&#8221;.  Seems simple enough.</p>
<p>But data, dare I say &#8220;truth&#8221;, is federated.  And each member of the federation has its own vernacular.</p>
<p>For example, the residential loan processing system might call a customer a &#8220;customer&#8221;, while the commercial loan processing system calls a customer a &#8220;client&#8221;.  At the core these are the same entities, with &#8220;residential&#8221; or &#8220;commercial&#8221; being a modifier (as an adjective is to a noun).  So a datawarehousing solution would apply its central vernacular to these entities allowing the question &#8220;how many customers do I have?&#8221; to be answered even though the answer is informed by two sources of truth.</p>
<p><img align="left" title="yelp-categories.gif" id="image243" alt="yelp-categories.gif" src="http://www.socialglass.com/wp-content/uploads/2008/12/yelp-categories.gif" />Data transformation and categorization works moderately well when an organization has control over its data sources (and has, therefore, a limited number of vernaculars).  But consider the La Jolla, CA, page on Yelp, <a href="http://www.yelp.com/la-jolla-ca" onclick="pageTracker._trackPageview('/outgoing/www.yelp.com/la-jolla-ca?referer=');">http://www.yelp.com/la-jolla-ca</a>, which claims that La Jolla has 1028 restaurants worth reviewing.  Most of this data is user-submitted.  And how does a user classify Starbucks?  &#8220;Food&#8221;?  &#8220;Restaurants&#8221;?  And what about subcategories?  &#8220;Coffee and Tea&#8221;? &#8220;Desserts&#8221;?  Some users might choose to use some of these categories, while others might use all.  And it&#8217;s consistency that lies at the heart of the issue of maintaining data integrity.  A user should have access to all restaurants when browsing by &#8220;Restaurants&#8221;.<br />
If information is consistently categorized, even incorrectly, we can get accurate answers to our queries.  But if it&#8217;s inconsistently categorized our answers will not be comprehensive.</p>
<p>So how, then, do websites like yelp.com deliver meaningful, consistently categorized results when they&#8217;re reliant on <a href="http://en.wikipedia.org/wiki/Crowdsourcing" onclick="pageTracker._trackPageview('/outgoing/en.wikipedia.org/wiki/Crowdsourcing?referer=');">crowdsourcing</a>?  Are there really only 1028 review-worthy restaurants in La Jolla? And what of those restaurants that are mistakingly subcategorized as &#8220;Turkish&#8221; when they&#8217;re actually &#8220;Lebanese&#8221;?</p>
<p>Manual Labor is the answer.</p>
<p>I suspect sites like Yelp.com leverage services like <a href="https://www.mturk.com/mturk/welcome" onclick="pageTracker._trackPageview('/outgoing/www.mturk.com/mturk/welcome?referer=');">mechanical turk</a> to comb through the thousands of user-submitted records apply a more uniform categorization scheme.  And this is why data integrity is bloody hard to maintain as there is so much manual labor involved.  I question the sustainability of such a model, especially as a site grows and gathers more data.</p>
<p>But, what I can say, is it is more important for data to be correctly categorized than it is for it to be mostly correctly categorized.  If users on Yelp search for &#8220;Automotive&#8221; assets and are shown beauty salons they will leave. Data integrity is king.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.socialglass.com/data-data-data/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
