Insincere questions are defined as questions that are unethical and have a disparaging tone. These types of questions are rather intended to make an inappropriate statement or comment than to search for a helpful and beneficent solution or answer. These insincere questions are classified using machine learning and Transformers. The dataset used is from Quora, a question-answer forum containing the questions asked by the users. Several models are trained, Naive Bayes and Logistic Regression showing traditional machine learning methods, then Convolutional Neural Network and BERT language model representing some advanced methods.
Exploratory Data Analysis is done for the insights for the methodology and the preprocessed using various NLP techniques then Stanford GloVe embedding is used to increase the vocabulary coverage to see its effect on the model performance and
The details are given below:
Dataset - Kaggle Quora Dataset
Published Research Paper - Insincere Questions Classification Using CNN with Increased Vocabulary Coverage of GloVe Embedding
Kaggle Notebook : Notebook