A C program that compresses and decompresses files using Huffman coding. Supports direct text files and PDF/Word document conversion to text for compression. Includes both command-line interaction and file handling for efficient storage and retrieval.
- Huffman Coding: Efficient lossless compression using Huffman trees.
- PDF/Word Support: Converts PDF and Word files to text before compression (requires
pdftotextandlibreoffice). - Custom File Header: Stores Huffman codes and padding info for accurate decompression.
- Interactive CLI: Choose between compress, decompress, or both.
- Cross-Platform: Written in standard C, with external tool dependencies for document conversion.
- GCC or any C compiler
- For PDF support:
pdftotext - For Word support:
libreoffice(for.doc/.docxto text conversion) - Unix-like environment (for
popenand shell commands)
-
Compile the program:
gcc -o huffman_compress hf\ pdf.c -
Run the program:
./huffman_compress
-
Follow the prompts:
- Choose to compress, decompress, or both.
- Enter the file name as prompted.
-
Output:
- Compressed files will be named
<input>_compressed - Decompressed files will be named
<input>_decompressed
- Compressed files will be named
$ ./huffman_compress
Choose an option:
1. Compress and Decompress
2. Compress Only
3. Decompress Only
Enter your choice (1/2/3): 1
Enter the input file name: sample.pdf
Compression and decompression completed successfully!- For PDF/Word files, the program converts them to text before compression.
- Ensure
pdftotextandlibreofficeare installed and available in your system's PATH. - The program is designed for educational purposes and may require adaptation for large files or production use.
This project is open-source and available under the MIT License.
Author:
[Praveen K]