Andrew Graham of Mashable wrote a wonderful article this week, Why Wall Street is Betting Big on Social Media Data. Here, he outlines the potential benefits and risks of culling data from Social Networks.
Uses of what’s alternatively known as Computational Linguistics or Natural Language Processing are popping up in Finance, Marketing and have a tremendous potential to change the Service Industry. Yet to most people the process is as mysterious as santeria ceremony.
I believe that anything, as complex as it may be, is worthwhile to try understanding when it’s apt to have such a global impact. Here I’ll give a quick and high level look at steps in the Computational Linguistics pipeline. All links will be the chapters describing how to do these steps in the CL torah, Natural Language Processing with Python-Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.
Step One: Get the text from the internet. This takes a couple of steps. First you’ve got to copy pages from the internet, which is written in natural language but wrapped up in html, the stuff that makes web sites look like more than scrolls of typewriter paper. So you strip that off. Then you’ve got a series of words. One series of words. You can’t do much with that, so you cut it up into a list whose length is the number of words. Here this process is described in such terms that a non-Computer Science major could learn to do it in a weekend in Processing Raw Text.
Step Two: Turn a list of words into words whose functions are known. What here is a noun? What’s an adjective? Where are the prepositions? This seemingly impossible task (imagine for example reading content from 50,000 Twitter feeds and manually tagging the sentences. Yet in Categorizing and Tagging Words, one could seriously learn this process in a manner of hours (imagining that they’d already read the previous chapters).
Afterwords one learns more complex (and more fruitful) activities like using statistical tools to classify texts for sentiment analysis, a superpowerful idea for Finance and Marketing in Learning to Classify Text. Also useful for grabbing facts from Social Networks and determining relations between entities is their chapter Extracting Information from Text laying out how to use logical and pattern matching techniques like Regular Expressions and Chunking.
Computational Linguistics is extremely useful for Marketing, Finance and Service professional of the future and the surprisingly good news is that you don’t have to be a computer genius to learn it.