Need to guide me in building a multi-class text classifier using ML, NLP & Scikit-learn . The target label has five classes.
The data consists of emails received from customers that need to be assigned a priority based on historical data.
There are around 40,000 samples for train set.
Need to evaluate where the already built model is failing and also pin point any drawbacks in the way data has been pre-processed.
I do not need end to end code implementations, should be able to understand the problem statement well and guide till good recall and precision rates are achieved in all five classes.
which algorithm to use? (Implemented a couple of algorithms already with TF-IDF and BOW, svm-sgd with loss hinge performing best.)
The overall Test accuracy is 60% as of now, need to help me increase it to at least 80%.
Check if the data is pre-processed correctly.
How to best handle data imbalance and create features required if any based on the problem statement.
Copyright © 2021 | Truelancer.com