This project implements a Bloom Filter for fast membership testing with a target False Positive Rate (FPR). It extends the classical Bloom Filter design by adding Mutual Information (MI) based feature selection and Conditional Entropy to measure uncertainty after a Bloom Filter match.
✅ Standard Bloom Filter (bit array + k hash functions)
✅ Double Hashing using mmh3 (efficient hashing)
✅ Target FPR design using optimal formulas for m and k
✅ Feature Extraction (tokenization for text inputs)
✅ Mutual Information for feature selection
✅ Bayesian Posterior Membership Probability
✅ Conditional Entropy for uncertainty quantification
✅ Multi-query evaluation + plots
✅ Results and plots saved into results/ folder
- No false negatives: If Bloom Filter returns
False, the item is definitely not present. - Possible false positives: If Bloom Filter returns
True, the item may or may not be present.
Bloom_Filter_Project/
│
├── bloom_filter.py # Bloom Filter core implementation
├── features.py # Feature extraction + Mutual Information computation
├── inference.py # Posterior probability + Conditional entropy
├── evaluation.py # Multiple query evaluation functions
├── plots.py # Plotting functions + saving plots
├── main.py # Runs the full pipeline
│
├── results/ # Auto-generated results
│ ├── query_results.csv
│ ├── summary.txt
│ └── plots/
│ ├── posterior_per_query.png
│ ├── entropy_per_query.png
│ └── trustworthiness_curve.png
│
├── requirements.txt # Project dependencies
└── README.md