GitHub - mtimaN/map-reduce-inverted-index

Tema1a

Author: Matei Mantu 332CA

Parallel Computations of an Inverted Index using Map-Reduce

Short description

The program parses text from a given list of files and outputs a reverted index for each word found and the ids of the files where that word is found. The output is written in files named as .txt where is the starting character of the words found inside the file.

Implementation details

The main function parses the list of file names and adds each file in a queue together with their id. It then creates a queue with the alphabet's characters. The threads are then created, each thread being designated as either a mapper or a reducer based on id.

Mappers

Mappers take tasks from the queue of file names and parse each word. They remove any letter not part of the english alphabet using isalpha() and turns all letters to lowercase.

In the end each mapper thread will return a "partial" map of type std::map<std::string, std::set<int>> - this way the entries are sorted by first letter initially and the indices are stored in increasing order. The "partial" map is inserted into a vector used by the reducers.

Reducers

The reducers receive a vector of maps from the mapper threads and a queue of the alphabet. Each letter represents a "task": iterate through the maps aggregate the data in order to create <ch>.txt. This approach is faster than merging all maps into one before writing to disk and is valid according to the forums.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
checker		checker
src		src
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
run_with_docker.sh		run_with_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tema1a

Author: Matei Mantu 332CA

Parallel Computations of an Inverted Index using Map-Reduce

Short description

Implementation details

Mappers

Reducers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Tema1a

Author: Matei Mantu 332CA

Parallel Computations of an Inverted Index using Map-Reduce

Short description

Implementation details

Mappers

Reducers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages