Norstop is a lightweight, zero-dependency Python library designed to remove Norwegian stopwords from text with high speed and accuracy.
Unlike generic NLP libraries, Norstop handles Norwegian-specific nuances like:
- Inflections: Handles har, hadde, hatt and min, mi, mine.
- Dialects: Includes common Nynorsk and Bokmål variations.
- Punctuation: Smart stripping ensures words like
«Hei,»are processed correctly without destroying sentence structure. - Speed: Uses optimized set lookups (
O(1)) and avoids slow regex compilation.
First create a Virtual Environment in a folder of your choice.
python -m venv venvThen Activate it:
# Windows
venv\Scripts\activate
# MacOS
source venv/bin/activateThen you can install this library directly from GitHub using pip. No need to download files manually.
pip install git+https://github.com/vidito/norstop.gitYou can also reinstall it using:
pip install --force-reinstall git+https://github.com/vidito/norstop.gitAfter installing the library you can create a python file like main.py with the following:
from norstop import remove_stopwords
# Basic example
text = "Jeg er en gutt som liker å kode."
clean_text = remove_stopwords(text)
print(clean_text)
# Output: "gutt liker kode."
# Handling punctuation and quotes
quote = "Han sa: «Det er viktig»."
clean_quote = remove_stopwords(quote)
print(clean_quote)
# Output: "sa: viktig»."and run it in the terminal:
python main.pyNorstop is built for speed:
- No Dependencies: Pure Python, no heavy frameworks like NLTK or SpaCy.
- Frozenset Lookups: Checking if a word is a stopword happens instantly.
- Memory Efficient: Stopwords are loaded as bytecode, not parsed from text files at runtime.
Thank you for helping improve norstop!
Follow these simple steps to add new Norwegian stopwords.
Go to the project page:
https://github.com/Vidito/norstop.git
Click Fork to create your own copy.
Download your fork to your computer:
git clone <your-fork-url>
cd norstopNavigate to:
src/norstop/const.py
Inside, you’ll find the NORWEGIAN_STOPWORDS set.
- Add your new word to the set.
- Keep the list alphabetical.
- Add common inflections (for example, for verbs: present, past, participle).
If the project has tests or examples, run them to ensure everything still works.
git add src/norstop/const.py
git commit -m "Added (a number) new Norwegian stopwords"git push origin mainThen visit your fork on GitHub and click New Pull Request.
To run the tests locally:
# Install in editable mode
pip install -e .
# Run the test suite
python -m unittest discover tests