Text Analytics

Textual analysis of the various drafts and pre...

Image via Wikipedia

Text Analytics is an area worth investigating if you’re interested in applying your hard earned knowledge of English vocabulary and grammar within a realistic context. For the uninitiated, there are some interesting web pages that cover the basics and introduce you to the tools and techniques of the trade. If you want to delve into the nitty gritty details of the problem then you can find online courses and lots of code libraries to help you start analyzing documents and web content to your heart’s content.

A good way to start learning about Text Analytics is to read the introduction by Seth Grimes which comes in two parts. The first part is concerned with Business Intelligence and the second part with Information Retrieval and Analysis. You should certainly try out one of the tools he mentions for the linguistic analysis of web pages: spider.

If you are interested in the technology behind this area then you can improve your English while learning how to program in Python by taking Adam Parrish’s course. Python is an excellent language for Text Analytics and easy to learn. The course by Adam Parrish gained notoriety because he teaches creative writing with programming –  RWW article. His course, Reading and Writing Electronic Text, just happens to be an excellent introduction to the basics of Text Analytics as well.

In previous posts I have already mentioned some online resources for text analytics. One of my favorite is hosted by five filters.org. Here you can create your own newspaper from an RSS feed or OPML file and use their online tool to extract the collocations from the text you are reading.

Finally, the software I developed for generating English language tests from Wikipedia articles  is based on text analysis software written in Python. Examples of the tests produced by this software can be found on my other blog: WikiTests.