According to the WIPO classfication schema for goods and services, all of human endeavour can be grouped into just 45 groupings! This Nice Classification (NCL) scheme is a very blunt tool for organizing data.
The Lewis project team indexed all of the goods and services data for each trademark into a Big Data platform (Hadoop, HBase, Mahout, Solr) that executed machine learning algorithms to extract key concepts and statistically important phrases from the raw data submitted, and related them to one another.
Lewis shows the ability to use machines to pull out connections, using a k-means algorthem, in our existing data.
One of the most useful visulizations was for the concept "whiskey". Whiskey, unlike many goods, has very specific categorizations that are embraced by people. For example, a Bourbon only comes from Kentucky, and no one will confuse a Scotch with a Canadian whiskey. Pulling from the raw data, we are able to understand that the categorization of whiskey falls into 9 main groupings: Bourbon, Scotch, Blended, Irish, Canadian, Corn whiskey, as well as Whiskey Gin, Straight Whiskey, and Whiskey based liquerurs.
If you look at the brands, they make sense as well:
These are great classifications, and yet according to WIPO, all whiskeys are the same!