Tidy Text Mining Beer Reviews

0x04 Tidy Text Mining Beer Reviews

What better article to feature in a bots and beer newsletter than one that actually has to do with artificial intelligence (okay, text mining) and craft beer? We decided to give this article a little love, and let it be the only article featured. Bill and I will both give our thoughts, and some beer picks. Enjoy! --Michael Szul

I'm leaving tomorrow for the Microsoft MVP Summit! Hopefully, I see some of you there. It's all under NDA, so I won't be able to share anything, but hopefully it will give me a good idea of the roadmap for chatbots and machine learning moving forward--at least from Micosoft's perspective. --MS

The taxonomy of beer tasting is far more complex than I ever could have imagined. What surprises me the most is that BeerAdvocate.com both doesn't support the open data, and does not offer some kind of API of its own. Allowing this layer of interaction with its audience of beer aficionados and technology only seems like a positive. --Bill Ahern

BeerAdvocate is a great site, but as Bill mentioned, not having an API is a big negative, in my opinion. In addition, making Pavlik take down the data set she put together makes me not want to use the site much anymore. I understand their need to protect their intellectual property, but most of these reviews are user submitted. They have an opportunity to develop a great craft beer data set, but would rather keep their site (and all their data) closed.

In any event, through term frequency analysis, Pavlik has a nice write-up of what terms users use most to describe certain types of beer. It also seems like blueberry is the most popular term (and flavor) for a fruit beer, while American Pale Ales have a distinct grapefruit taste.

Through this term frequency, Pavlik then builds out a nice correlation diagram to show which beers are most similar to each other in the opinions (reviews) of the site users. She then does some nice hierarchical clustering to group the beers into specific categories.

Pavlik then takes a shot at predicting the beer style of unclassified reviews using k-Nearest Neighbor on the words. Ultimately there were beers that were easily to predict (stouts) and those that weren't (Gose), but this was a great thought experiment, and you can take a look at the R source code on GitHub. --MS

Weyerbacher Sunday Morning Stout

Weyerbacher Sunday Morning Stout

12.7% ABV.A delicious Bourbon Barrel coffee Stout, a little bit on the bitter side but very much in line with fans of Founders Breakfast Stout. --BA

Hardywood Raspberry Stout

Hardywood Raspberry Stout

9.2% ABVOriginally a part of Hardywood's Virginia Reserve series, this one makes it into their regular rotation around winter time. A chocolate stout with local raspberries, it pours with a medium-thickness head that has a reddish tinge to it. The raspberry flavor slightly cuts the traditional bitterness of chocolate and stout beer. Highly drinkable, they recently went to a smaller bottle size, so buy two! I'm usually one to try new things at the craft beer store, but when I see this in stock, I'll buy it a few weeks in a row because it's a favorite. --MS

See you all on the other side of this trip! --MS