GitHub

Problem Statement

Indix deals with product data. Most of our data source is the web. We collect information from ecommerce portals, parse them and add it to our index. One of the challenges we face with product data is to identify the brand a particular belongs to.

This episode of hackathon is going to expose you to the challenges in this space. You are given a product dataset which contains just 3 fields, product_title, brand_id and category_id (in order). The problem is to identify the brand_id, using the other features (product_title and category_id). You could treat this as a standard classification problem and arrive at the label (brand_id) for a given input record. The test set would have 2 fields - product_title and category_id.

We would use accuracy measurement to evaluate your classifier's performance.

Tech Dependencies

anaconda

How to Execute the training and testing

brand_classifier_combined.py is the main source to be executed
Please place both the input file with names as below in same folder as the script
Training File : classification_train.tsv
Testing File : classification_blind_set_corrected.tsv
Run python blind_classifier_combined.py
The file oddCategoryFinder.R can be run R Studio to identify the best fit for each category. To be used in the rules for featuring.s
Once the script is executed output files are created in same folder in .txt format with file name of the format output_<timestamp>.txt

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.Rhistory		.Rhistory
README.md		README.md
brand.m		brand.m
brand_classifier.m		brand_classifier.m
brand_classifier.py		brand_classifier.py
brand_classifier_combined.py		brand_classifier_combined.py
brand_classifier_rulebased.py		brand_classifier_rulebased.py
brand_classifier_rulebased_test.py		brand_classifier_rulebased_test.py
brand_classify.m		brand_classify.m
classifier.m		classifier.m
classifier_brand.m		classifier_brand.m
classify.m		classify.m
classify_brand.m		classify_brand.m
count.m		count.m
count.mat		count.mat
hash.m		hash.m
license.txt		license.txt
oddCategoryFinder.R		oddCategoryFinder.R
output.mat		output.mat
output.txt		output.txt
result.m		result.m
result.mat		result.mat
strnearest.m		strnearest.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Problem Statement

Tech Dependencies

How to Execute the training and testing

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

raman22feb1988/ProductScouts

Folders and files

Latest commit

History

Repository files navigation

Problem Statement

Tech Dependencies

How to Execute the training and testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages