Text and Data Mining Tools and Tutorials

Text and Data Mining Tools

Awesome Machine Learning

Awesome Machine Learning is a curated list of machine learning frameworks, libraries, and software for advanced users.


DHbox is a Docker environment for digital humanities computational work. It comes pre-equipped with IPython, RStudio, Omeka, and NLTK.

DiRT Directory

The DiRT (Digital Research Tools) Directory is a registry of digital research tools for scholarly use.


The Text Analysis Portal for Research, TAPoR, is a gateway to tools used in text analysis and retrieval.

Text and Data Mining Tutorials

Command Line Crash Course

Check out The Command Line Crash Course if you need to brush up on using the command line.

Programming Historian

The Programming Historian is an online, open-access, peer-reviewed suite of tutorials aimed at helping humanists learn a wide range of digital tools, techniques, and workflows to facilitate their research. There are lessons on topics including Python, using the Zotero API, data manipulation, topic modeling, and GIS that humanists and non-humanists may find useful.