GitHub - kh-a17/semantic-parser

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README		README
amazon_products_dataset.csv		amazon_products_dataset.csv
data_clean.py		data_clean.py
data_preparation.py		data_preparation.py
llm_based_semantic_output.csv		llm_based_semantic_output.csv
llm_based_structured_json.py		llm_based_structured_json.py
llm_generated_queries_mistral.csv		llm_generated_queries_mistral.csv
llm_output_with_ground_truth.csv		llm_output_with_ground_truth.csv
llm_parser.py		llm_parser.py
regex_parser.py		regex_parser.py
spacy_parser.py		spacy_parser.py
train_model.py		train_model.py

Repository files navigation

## Name - Khushi Sumit Agrawal - KXA230070

## Steps to run

1. python data_python.py
    This creates the (amazon_products_dataset.csv) file

2. Need to login to hugging face by running the following:
    from huggingface_hub import login
    login(<hugging-face-token>) 

3. python data_preparation.py
    This creates the (llm_generated_queries_mistral.csv) file with LLM generated queries.
    <need to run this on kaggle server by setting the accelerator as GPU T4x2 in the notebook settings>

4. python llm_based_structured_json.py
    This creates (llm_based_semantic_output.csv) file with LLM generated structured json
    <need to run this on kaggle server by setting the accelerator as GPU T4x2 in the notebook settings>

5. I created (llm_output_with_ground_truth.csv) file by taking the generated_query and output column from the (llm_based_semantic_output.csv) file and added the column of ground truth for comparison

6. python llm_parser.py
    Prints the scores by comparing ground truth with LLM generated structured json

7. python regex_parser.py
    Prints the scores by comparing ground truth with regex generated structured json

8. python spacy_parser.py
    Prints the scores by comparing ground truth with spacy generated structured json

9. python train_model.py
    We would run this step if we had more data in our dataset

10. we can now use selenium to automate the actions