w266_final_project_MediQA2023
This repository hosts the materials for the research project titled "Advancing Clinical NLP with RoBERTa & BART", which focuses on the classification and summarization of medical dialogues using advanced NLP techniques.
- 01.Paper_Submitted: Contains the final paper submitted for the project.
- 02.Shortlist_Notebooks_for_Paper: A curated collection of Jupyter notebooks integral to the development of the final paper.
- 03.Notebooks_by_Step: Contains primary notebooks for each step of the analysis (the Archive folder is not uploaded).
- 10.Tracking_Metrics: Csv files generated to follow the performance of different models.
- 11.Source_Data: The raw data used in the project.
- 13.Instructor_Embeddings_Dialogue: Evaluating Instructor Embeddings and their impact on the dialogue data.
A comment on the naming convention for all notebooks. I generally follow a date-step-topic-version structure with data in the yyyymmdd format. This makes it visually easier to follow the progression of analysis especially when projects transition from one calendar year to the next.
This project is a comprehensive effort to push the boundaries of clinical NLP by focusing on medical dialogue classification and summarization. Aligned with the MediQA2023-Chat challenge, it demonstrates how to achieve competitive NLP results under constraints such as limited computational resources and stringent data privacy requirements.
To delve into this project:
- Clone the repository to your local machine.
- Explore the directories for notebooks, data, and the final paper.
- The notebooks follow a systematic naming convention (date, step, topic) for ease of access and understanding.
- Requirements Python 3.x Jupyter Notebook Relevant libraries as outlined in the notebooks (e.g., pandas, NumPy, SBert)
Public models on HuggingFace at https://huggingface.co/zibajoon
GitHub repo: https://github.com/abachaa/MEDIQA-Chat-2023
This research project is managed by Gaurav Narasimhan. Contributions, suggestions, and discussions are warmly welcomed.
For any queries or suggestions, feel free to email me at gaurav.narasimhan@berkeley.edu
A heartfelt thank you to all the collaborators and contributors for their invaluable insights and expertise that have significantly contributed to this research.