Skip to content

mortfer/tweets-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

App for analyzing tweets through topic modeling and sentiment analysis.

This repository contains the code of my Master’s Degree Final Project. This project aims to provide a pipeline (from extracting tweets to designing an interactive app) to explore a group of tweets with visual widgets and machine learning techniques such as topic modeling and sentiment analysis.

This pipeline consists in:

  • Extracting tweets from Twitter with snscrape
  • Preprocessing tweets and their metadata with well-known libraries such as pandas or spacy
  • Compute tweets embeddings with sentence transformers. These contextual embeddings will improve topic modeling compared to classical techniques like LDA and also allow us to build a simple logistic regression for sentiment classification
  • Train a sentiment analysis model with labeled datasets
  • Clustering tweets and assigning them topics with contextualized topic modeling
  • Build an interactive app with streamlit and plotly

In this project I use two different group of tweets: tweets from @IbaiLlanos and spanish tweets with the keyword 'netflix'.
App in spanish deployed on https://tweets-visualizer.streamlit.app/

Repository structure

  • app contains scripts to deploy streamlit's app in Heroku
  • data contains all the data needed along the process, from raw data extracted with snscrape to embeddings and datasets with sentiment labels
  • dev contains scripts for local development: preprocessing, creating embeddings, training sentiment model and topic modeling

Using the code

Notebooks for sentiment classification in dev/sentiment_model are run once to train a model and use it for every group of tweets. GPU is highly recommended when computing embeddings and creating topics
Whenever we want to analyze a new group of tweets:

  • First, from data/raw_data folder extract tweets with snscrape's commands. Example used:
    snscrape --jsonl --progress twitter-search "from:IbaiLlanos -filter:replies AND -filter:quote" > IbaiLlanos.json
  • Run main.py in dev/ for preprocessing, embeddings, infering sentiment and saving results
    python main.py --data_name IbaiLlanos
  • Run main_opics.py in dev/ for topics creation. Results saved in data/topic_data
    python main_topics.py --data_name IbaiLlanos
  • Finally, choose which data and script to use in app/app.py and run it
    streamlit run app.py

About

App for analyzing tweets through topic modeling and sentiment analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages