App for analyzing tweets through topic modeling and sentiment analysis.

This repository contains the code of my Master’s Degree Final Project. This project aims to provide a pipeline (from extracting tweets to designing an interactive app) to explore a group of tweets with visual widgets and machine learning techniques such as topic modeling and sentiment analysis.

This pipeline consists in:

Extracting tweets from Twitter with snscrape
Preprocessing tweets and their metadata with well-known libraries such as pandas or spacy
Compute tweets embeddings with sentence transformers. These contextual embeddings will improve topic modeling compared to classical techniques like LDA and also allow us to build a simple logistic regression for sentiment classification
Train a sentiment analysis model with labeled datasets
Clustering tweets and assigning them topics with contextualized topic modeling
Build an interactive app with streamlit and plotly

In this project I use two different group of tweets: tweets from @IbaiLlanos and spanish tweets with the keyword 'netflix'.
App in spanish deployed on https://tweets-visualizer.streamlit.app/

Repository structure

app contains scripts to deploy streamlit's app in Heroku
data contains all the data needed along the process, from raw data extracted with snscrape to embeddings and datasets with sentiment labels
dev contains scripts for local development: preprocessing, creating embeddings, training sentiment model and topic modeling

Using the code

Notebooks for sentiment classification in dev/sentiment_model are run once to train a model and use it for every group of tweets. GPU is highly recommended when computing embeddings and creating topics
Whenever we want to analyze a new group of tweets:

First, from data/raw_data folder extract tweets with snscrape's commands. Example used:
snscrape --jsonl --progress twitter-search "from:IbaiLlanos -filter:replies AND -filter:quote" > IbaiLlanos.json
Run main.py in dev/ for preprocessing, embeddings, infering sentiment and saving results
python main.py --data_name IbaiLlanos
Run main_opics.py in dev/ for topics creation. Results saved in data/topic_data
python main_topics.py --data_name IbaiLlanos
Finally, choose which data and script to use in app/app.py and run it
streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
app		app
data		data
dev		dev
Ibai_example.png		Ibai_example.png
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

App for analyzing tweets through topic modeling and sentiment analysis.

Repository structure

Using the code

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

App for analyzing tweets through topic modeling and sentiment analysis.

Repository structure

Using the code

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages