Provides readers with the methods, algorithms, and means toperform text mining tasks
This book is devoted to the fundamentals of text mining usingPerl, an open-source programming tool that is freely available viathe Internet (www.perl.org). It covers mining ideas from severalperspectives--statistics, data mining, linguistics, and informationretrieval--and provides readers with the means to successfullycomplete text mining tasks on their own.
The book begins with an introduction to regular expressions, atext pattern methodology, and quantitative text summaries, all ofwhich are fundamental tools of analyzing text. Then, it builds uponthis foundation to explore:
* Probability and texts, including the bag-of-words model
* Information retrieval techniques such as the TF-IDF similaritymeasure
* Concordance lines and corpus linguistics
* Multivariate techniques such as correlation, principalcomponents analysis, and clustering
* Perl modules, German, and permutation tests
Each chapter is devoted to a single key topic, and the authorcarefully and thoughtfully introduces mathematical concepts as theyarise, allowing readers to learn as they go without having to referto additional books. The inclusion of numerous exercises andworked-out examples further complements the book's student-friendlyformat.
Practical Text Mining with Perl is ideal as a textbookfor undergraduate and graduate courses in text mining and as areference for a variety of professionals who are interested inextracting information from text documents.
Roger Bilisoly
Bioinformatics & Computational Biology Bioinformatik u. Computersimulationen in der Biowissenschaften Biowissenschaften Computer Science Data Mining Data Mining Statistics Database & Data Warehousing Technologies Datenbanken u. Data Warehousing Informatik Life Sciences Perl (EDV) Statistics Statistik
"Practical Text Mining with Perl is an excellent book for readers at a variety of different programming skill levels ... Bilisoly's book would serve as a good text for an introductory text mining course, and could be supplemented with lecture notes for Web mining or data mining courses." (Journal of Statistical Software, January 2009)
()