-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathread_me
More file actions
17 lines (9 loc) · 1.31 KB
/
read_me
File metadata and controls
17 lines (9 loc) · 1.31 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Vaudeville Regex + Topic Model Readme
This project aims to take vaudeville touring data form Variety Magazine using a regular expression, organize this data, and then analyze it using topic modeling.
This task is broken down into three programs:
The first “Vaudeville_Regex_Defense_2.2.” reads through text files and splits them into chunks based on a regular expression searching through a list of town names.
Each of these chunks is then analyzed using a regular expression built to identify touring records. The regular expression used is intentionally capacious—incorrect results are later eliminated based on Vaudeville naming conventions.
The program than produces a json file with the following characteristics – date (of the paper) town (town used to chunk) (venue) and (performers)
The second program (Vaudeville_Dictionary_List_Maker_For_Topic_Modeling) takes this list, and transforms it into a flat ontology of dictionaries, each of which corresponds to a venue with a list of performers. –
Dictionary key structure = “date,” “city_name,” “venue_name,” “artist”. The final is a list of performers.
The third program (Touring_Topic_Model_1.py) takes this list, and does topic modelling on them using an LDA model, using every venue/performance as a date, and every performer as a word.