3rd Brown Bag DH-Workshop: Text Mining Tools (8 June 2015)

Original notes document on etherpad: http://etherpad.mpiwg-berlin.mpg.de:9001/p/bb_textminingtools

Meeting notes for the MPIWG DH workshop #3 "Text Mining Tools"


  • Robert Casties
  • Donatella Germanese
  • Lino Camprubí
  • Cesare Pastorino
  • Elena Aronova
  • Klaus Thoden
  • Shih-Pei Chen
  • ...


Text material

  • PDF Journals (Donatella)
  • Newton Project texts
  • History of MPG archival texts
  • IGY data entry handbooks


Common analysis questions

  • Term frequencies
  • (editable) stop word lists
  • configurable tokenisation (for chinese)
  • configurable stemming/normalization
  • Word cloud
  • co-ocurrence
  • with time plots (plots using Excel?)
  • topic modeling (interpretation of topics?)



(list from https://it-dev.mpiwg-berlin.mpg.de/tracs/IGY/wiki/Other%20text%20analysis%20tools


Collation tools

Next steps

Voyant tools has been installed at the MPIWG: https://voyant.mpiwg-berlin.mpg.de

Documentation is here: http://docs.voyant-tools.org/

There will be a hands-on follow-up workshop using Voyant on Monday June 15th 11:00-12:00 in R265!

There will be a second hands-on follow-up workshop using IPython on Tuesday July 7th 14:00-16:00 in R265!