A Programmer’s Guide to Data Mining

The Ancient Art of the Numerati

Summary

This free and openly licensed online textbook provides a hands-on introduction to fundamental data mining techniques for programmers. Spanning 8 chapters, it comprehensively yet approachably covers topics such as recommendation systems, filtering, classification, naive Bayes modeling, and clustering algorithms.

Readers learn both conceptual underpinnings and practical implementation through interactive coding exercises in Python demonstrating techniques applied to real-world datasets. The goal is to equip learners with applicable analytical skills.

Detailed Overview

Fundamental concepts like distances, correlations, cross-validation, and unsupervised/supervised methods are explained in an accessible and step-by-step manner. This allows readers to experiment while laying a solid conceptual foundation.

Later chapters delve into hierarchical and k-means clustering, providing worked examples that cluster images and documents. Evaluation methods and their importance are also profiled.

The freely available code and exemplar implementations aim to lower barriers to understanding and applying techniques. Readers gain skills to prototype recommendation engines, analytic tools, and more.

While tutorial in nature, the combined emphasis on theory and hands-on practice imbues an intuitive grasp of underlying mathematics.

Citation and License

Zacharski, R. (2015). A Programmer’s Guide to Data Mining. http://guidetodatamining.com/. Access under CC BY-NC 4.0 License. The license can be viewed here: https://creativecommons.org/licenses/by-nc/4.0/