Skip to content

Latest commit

 

History

History
210 lines (158 loc) · 5.77 KB

File metadata and controls

210 lines (158 loc) · 5.77 KB

BitTorrent Parser Rust

A Rust implementation of a BitTorrent metainfo file (.torrent) parser using the Bencode format. This project demonstrates parser combinators with the nom library and provides detailed educational comments explaining Rust concepts.

Features

  • Complete Bencode format parser supporting all data types:
    • Integers (i42e)
    • Strings (4:spam)
    • Lists (l4:spami42ee)
    • Dictionaries (d4:spami42ee)
  • Extracts comprehensive torrent file metadata
  • Supports both single-file and multi-file torrents
  • Detailed error reporting with hex dump of unparsed data
  • Educational code comments with Q&A format
  • Comprehensive test suite

Technologies

  • Rust (Edition 2024)
  • nom (8.0.0) - Parser combinator library
  • humantime (2.2.0) - Human-readable time formatting

Installation

Prerequisites

  • Rust 1.70 or higher
  • Cargo (comes with Rust)

Building from Source

# Clone the repository
git clone git@github.com:gvpaleev/BitTorrentParserRust.git
cd BitTorrentParserRust

# Build the project
cargo build --release

# Run the application
cargo run --release

Usage

Place a .torrent file named ubuntu.torrent in the project root directory and run:

cargo run

Example Output

File size: 245503 bytes
First 50 bytes: [100, 56, 58, 97, 110, 110, 111, 117, 110, 99, ...]

Parsed .torrent file successfully!

=== TORRENT FILE INFO ===
Announce URL: https://torrent.ubuntu.com/announce
Creation date: 2023-10-12T12:00:00Z (timestamp: 1697112000)
Comment: Ubuntu CD releases.ubuntu.com
Created by: mktorrent 1.1

=== FILE INFO ===
Name: ubuntu-23.10-desktop-amd64.iso
Single file
File size: 5234567890 bytes (4991.23 MB)
Piece length: 262144 bytes (256.00 KB)
Number of pieces: 19968

Project Structure

BitTorrentParserRust/
├── src/
│   └── main.rs          # Main parser implementation
├── Cargo.toml           # Project dependencies
├── Cargo.lock           # Dependency lock file
├── ubuntu.torrent       # Sample torrent file
├── ubuntu.dec           # Decoded torrent data
├── launch.json          # VSCode debug configuration
└── README.md            # This file

Code Architecture

Core Components

BencodeValue Enum

Represents all possible Bencode data types:

  • String(Vec<u8>) - Byte strings
  • Integer(i64) - Signed integers
  • List(Vec<BencodeValue>) - Lists of values
  • Dictionary(HashMap<Vec<u8>, BencodeValue>) - Key-value maps

Parser Functions

  • parse_integer() - Parses integers in format i<number>e
  • parse_string() - Parses strings in format <length>:<data>
  • parse_list() - Parses lists in format l<values>e
  • parse_dictionary() - Parses dictionaries in format d<key><value>...e
  • parse_bencode_value() - Main dispatcher using alt combinator

Helper Methods

  • get_string() - Extracts byte string
  • get_string_utf8() - Converts to UTF-8 string
  • get_integer() - Extracts integer value
  • get_dict() - Extracts dictionary
  • get_list() - Extracts list

Bencode Format Specification

Bencode is a simple binary serialization format used by BitTorrent:

Type Format Example Result
Integer i<number>e i42e 42
String <length>:<data> 4:spam "spam"
List l<values>e l4:spami42ee ["spam", 42]
Dictionary d<key><value>...e d4:spami42ee {"spam": 42}

Rules

  • Dictionary keys must be strings
  • Dictionary keys must be sorted lexicographically
  • Strings can contain arbitrary binary data
  • Integers can be negative

Testing

Run the test suite:

cargo test

Test Coverage

  • Integer parsing (positive and negative)
  • String parsing
  • List parsing
  • Dictionary parsing
  • Edge cases and error handling

Educational Features

The code includes extensive educational comments in a Q&A format:

  • Explains Rust concepts (ownership, borrowing, traits)
  • Describes nom parser combinators
  • Details BitTorrent protocol specifics
  • Clarifies design decisions

Perfect for learning:

  • Parser combinators with nom
  • Binary format parsing
  • Rust error handling
  • BitTorrent protocol internals

Torrent File Information Extracted

  • Announce URL - Primary tracker URL
  • Announce List - Backup tracker tiers
  • Creation Date - Unix timestamp with human-readable format
  • Comment - Torrent description
  • Created By - Client software used
  • File Name - Name of file or directory
  • Piece Length - Size of each piece in bytes
  • Number of Pieces - Total piece count
  • File Size - Total size with MB conversion
  • File List - For multi-file torrents (first 10 files shown)

Error Handling

The parser provides detailed error messages:

  • Shows unparsed data in hex format
  • Displays ASCII representation when possible
  • Indicates exact position of parse failures
  • Handles incomplete input gracefully

Performance Considerations

  • Zero-copy parsing where possible
  • Efficient use of Rust's ownership system
  • HashMap for O(1) dictionary lookups
  • Minimal allocations during parsing

Contributing

Contributions are welcome! Areas for improvement:

  • Support for additional torrent extensions
  • Bencode encoder implementation
  • Streaming parser for large files
  • CLI argument parsing for custom file paths
  • Additional output formats (JSON, YAML)

License

This project is provided as-is for educational purposes.

References

Author

Created as an educational project demonstrating Rust parser implementation and BitTorrent protocol understanding.