Skip to content

gaurav8936/w266_final_project_MediQA2023

Repository files navigation

Advancing Clinical NLP with RoBERTa & BART

From Classification to Summarization: Unveiling New Insights

w266_final_project_MediQA2023

Description

This repository hosts the materials for the research project titled "Advancing Clinical NLP with RoBERTa & BART", which focuses on the classification and summarization of medical dialogues using advanced NLP techniques.

Repository Structure

  • 01.Paper_Submitted: Contains the final paper submitted for the project.
  • 02.Shortlist_Notebooks_for_Paper: A curated collection of Jupyter notebooks integral to the development of the final paper.
  • 03.Notebooks_by_Step: Contains primary notebooks for each step of the analysis (the Archive folder is not uploaded).
  • 10.Tracking_Metrics: Csv files generated to follow the performance of different models.
  • 11.Source_Data: The raw data used in the project.
  • 13.Instructor_Embeddings_Dialogue: Evaluating Instructor Embeddings and their impact on the dialogue data.

A comment on the naming convention for all notebooks. I generally follow a date-step-topic-version structure with data in the yyyymmdd format. This makes it visually easier to follow the progression of analysis especially when projects transition from one calendar year to the next.

Project Overview

This project is a comprehensive effort to push the boundaries of clinical NLP by focusing on medical dialogue classification and summarization. Aligned with the MediQA2023-Chat challenge, it demonstrates how to achieve competitive NLP results under constraints such as limited computational resources and stringent data privacy requirements.

Getting Started

To delve into this project:

  1. Clone the repository to your local machine.
  2. Explore the directories for notebooks, data, and the final paper.
  3. The notebooks follow a systematic naming convention (date, step, topic) for ease of access and understanding.
  4. Requirements Python 3.x Jupyter Notebook Relevant libraries as outlined in the notebooks (e.g., pandas, NumPy, SBert)

Models

Public models on HuggingFace at https://huggingface.co/zibajoon

Primary References

GitHub repo: https://github.com/abachaa/MEDIQA-Chat-2023

Contributions

This research project is managed by Gaurav Narasimhan. Contributions, suggestions, and discussions are warmly welcomed.

Contact

For any queries or suggestions, feel free to email me at gaurav.narasimhan@berkeley.edu

Acknowledgments

A heartfelt thank you to all the collaborators and contributors for their invaluable insights and expertise that have significantly contributed to this research.

About

GitHub Repo for w266 Final Project (MediQA-2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors