Skip to content

Latest commit

 

History

History
78 lines (58 loc) · 4.94 KB

File metadata and controls

78 lines (58 loc) · 4.94 KB

PHD COURSE

title: TEXT ANALYTICS - A SHORTCUT TO LINGUISTIC EVIDENCE
place: KUA Søndre Campus, Danmark
time: October 22-23, 9:00-16:00 (both days).
instructors: Claus Povlsen (CST|KU) & Kristoffer L. Nielbo (datakuben|SDU)
contact: cpovlsen@hum.ku.dk, kln@cas.au.dk

DESCRIPTION

The recent explosion in digitized and digital text-media is rapidly changing the evidential basis for the humanities. While the humanities used to be the principal scientific consumers of text-based data, the majority of text analysis is now performed by 'machines' outside traditional humanistic domains.
Text-Analytics applies automated and data-intensive techniques in order to extract useful knowledge from from large collections of linguistic data. In this PhD course, the participant will acquire experience with two major machine learning paradigms (supervised and unsupervised learning) in order to answer research questions fundamental to the humanities: can we classify texts by genres, periods and status and how do surface structures reveal latent semantic properties.
The workshop consists of a series of hands-on tutorials with Python combined with useful explanations and illustrations through use-cases. Programming experience is not a requirement, but participants are should to prepare by installing Python and completing three introductory tutorials available on-line.

KEYWORDS

TEXT ANALYTICS, TEXT DATA MINING, DIGITAL HUMANITIES, HUMANITIES COMPUTING, CULTURE ANALYTICS

PROGRAM

DAY 1: Thematic Analysis and Unsupervised Learning

Time Content Instructor
09:00-10:00 Text Analytics/ML KLN
10:00-11:00 Topic Modeling KLN
10:11-12:00 Preparation KLN
12:00-13:00 Lunch
13:00-14:00 Training KLN
14:00-15:00 Application KLN
15:00-16:00 Free Play KLN

DAY 2: Document Classification and Supervised Learning

Time Content Instructor
09:00-10:00 Text Analytics/buffer KLN
10:00-11:00 Document Classification KLN
11:00-12:00 Representation KLN
12:00-13:00 Lunch
13:00-14:00 Validation KLN
14:00-15:00 Optimization KLN
15:00-16:00 Free Play KLN

CURRICULUM

LITERATURE

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.

Broadwell, P., Mimno, D., & Tangherlini, T. R. (2017). The Tell-Tale Hat: Surfacing the Uncertainty in Folklore Classification , Journal of Culture Analytics, DOI: 10.22148/16.012.

Brücher, H., Knolmayer, G., & Mittermayer, M. A. (2002). Document classification methods for organizing explicit knowledge. Institut fur Wirtschaftsinformatik der Universität Bern.

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37.

Jänicke, S., Franzini, G., Cheema, M. F & Scheuermann, G. (2015) On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges in Eurographics Conference on Visualization (EuroVis) (2015), DOI: 10.2312/eurovisstar.20151113

Jockers, M. L., & Mimno, D. (2013). Significant themes in 19th-century literature. Poetics, 41(6), 750–769.

Radovanović, M., & Ivanović, M. (2008). Text mining: Approaches and applications. Novi Sad J. Math, 38(3), 227–234.

Tangherlini, T. R., & Leonard, P. (2013). Trawling in the Sea of the Great Unread: Sub-corpus topic modeling and Humanities research. Poetics, 41(6), 725–749.

Underwood, T. (2016). The Life Cycles of Genres. Journal of Culture Analytics, DOI: 10.22148/16.005.

PREPARATION

  1. Install the Anaconda distribution of Python here
  2. Go through episodes 1-3 in Software Carpentry's Python Lesson here

CREDITS

1.2 ECTS for course attendance, reading and preparation