Multi-label Classification of Coding Questions Using BERT and LSTM
For the CSC413 - Neural Network and Deep Learning I took this winter semester, I formed the team with my roommates Ethan and Thomas, and we chose the title as the focus for our final project, so in this post, I will briefly introduce the project and results.
Links to resources
We put all the resources and intermediate results in our github repo, and you can directly access the report here. Please see the abstract below for motivation and a very brief introduction of our work.
The core part is to use two models - BERT and LSTM to do the multi-label classification task, and we compared their performance and discuss the results.
Data Collection & Preprocessing
While we used data from Kaggle, it actually contains tens of columns of irrelevant information, and we cleared them out and only kept the title, description, and tags. After that, we record all tags appeared and turn the tags into 0-1 encodings for each question for the multi-label classification task - it is a necessary step for helping with neural network encoding.
Data Augmentation
Since there’re only 2000+ questions in the dataset, we decided to use data augmentation to increase the size of the dataset - mainly by translating engligh description into other languages and then translate back to english. We used google translate api for this task:
Model Architecture
Here is the archetecture of our BERT model:
Here is the archetecture of our LSTM model:
Results
As we expected, the results show that BERT model outperforms LSTM model in terms of accuracy and F1 score, and we also did some analysis on the results and found that the LSTM model is more likely to predict the same label for all questions, while BERT model is more likely to predict the same label for all questions, which is a good sign for the BERT model.
Here are the results:
While this is not a very complicated project, it really helped me to practice building a neural network using the technique of “transfer learning” and “fine-tuning” from those mature archetectures like BERT, which is the core competency of taking CSC413 course.