Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
-
Updated
Mar 8, 2026 - Jupyter Notebook
Big Data essentials: Hadoop, MapReduce, Spark. Explore tutorials and demos in Jupyter notebooks—most are self-contained and live, ready to run with a click.
The largest collection of publicly accessible Progressive Web Apps*
In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we implement the project using MRJob, PySpark and Spark's MLlib then compare the performance and accuracy of those implementations.
Movie rating prediction application
Practice tasks in Python programming language using Hadoop, MRJob, PySpark for Big Data Analytics.
RECUPERACIÓ DE LA INFORMACIÓ Curs 2023-24 EPSEVG
Project developed to make an sentiment analysis using dictionary implemented with MrJob applying a map-reduce model. It can be executed locally or in HDFS enviroments (such as Hadoop or AWS)
Distributed word frequency analysis on 5,000 HuffPost news headlines using Apache Hadoop MapReduce and mrjob. Single-node cluster on Docker with HDFS and YARN configured from scratch. Top 50 keywords extracted via a 2-step MapReduce pipeline with NLTK stopword filtering.
Samples related to data engineering, e.g. spark, embulk, airflow, etc.
Add a description, image, and links to the mrjob topic page so that developers can more easily learn about it.
To associate your repository with the mrjob topic, visit your repo's landing page and select "manage topics."