Requires Rust basics (ownership, structs, enums, Result/Option, basic iterators).
Building Block: Creating a professional CLI interface
-
Concepts:
- What makes a good CLI? (help text, error messages, exit codes)
- Using
clapfor argument parsing (derive API) - Input validation and user-friendly errors
- Reading files vs stdin
-
Demo Code: Basic CLI skeleton with
clap
use clap::Parser;
use std::path::PathBuf;
/// A log analyzer that extracts insights from log files
#[derive(Parser, Debug)]
#[command(name = "loglyzer")]
#[command(version = "1.0")]
#[command(about = "Analyze log files and extract patterns", long_about = None)]
struct Cli {
/// Path to the log file to analyze
#[arg(value_name = "FILE")]
input: PathBuf,
/// Output format
#[arg(short, long, value_enum, default_value = "text")]
format: OutputFormat,
/// Show only errors (ERROR level logs)
#[arg(short, long)]
errors_only: bool,
/// Verbose output
#[arg(short, long)]
verbose: bool,
}
#[derive(Debug, Clone, clap::ValueEnum)]
enum OutputFormat {
Text,
Json,
Csv,
}
fn main() {
let cli = Cli::parse();
if cli.verbose {
println!("Analyzing file: {:?}", cli.input);
println!("Output format: {:?}", cli.format);
}
println!("Hello from loglyzer!");
}Add to Cargo.toml:
[dependencies]
clap = { version = "4.5.51", features = ["derive"] }- Exercise:
- Run the program with
--helpand observe the generated help text - Add a
--topargument that accepts a number (e.g.,--top 10for top 10 results) - Add a
--searchargument to filter logs containing specific text - Test with invalid arguments and observe error messages
- Run the program with
Building Block: Reading and parsing log data
-
Concepts:
- Efficient file reading with
BufReader - Line-by-line processing for large files
- Pattern matching with regex
- Parsing structured log formats (common patterns: timestamp, level, message)
- Efficient file reading with
-
Demo Code: Parse log entries
use regex::Regex;
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
#[derive(Debug, Clone)]
struct LogEntry {
timestamp: String,
level: LogLevel,
message: String,
}
#[derive(Debug, Clone, PartialEq)]
enum LogLevel {
Info,
Warning,
Error,
Debug,
}
impl LogLevel {
fn from_str(s: &str) -> Option<Self> {
match s.to_uppercase().as_str() {
"INFO" => Some(LogLevel::Info),
"WARN" | "WARNING" => Some(LogLevel::Warning),
"ERROR" => Some(LogLevel::Error),
"DEBUG" => Some(LogLevel::Debug),
_ => None,
}
}
}
fn parse_log_line(line: &str) -> Option<LogEntry> {
// Example log format: "2024-01-15 10:30:45 [INFO] Application started"
let re = Regex::new(r"^(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2})\s+\[(\w+)\]\s+(.+)$")
.unwrap();
re.captures(line).and_then(|caps| {
Some(LogEntry {
timestamp: caps.get(1)?.as_str().to_string(),
level: LogLevel::from_str(caps.get(2)?.as_str())?,
message: caps.get(3)?.as_str().to_string(),
})
})
}
fn read_logs(path: &Path) -> Result<Vec<LogEntry>, std::io::Error> {
let file = File::open(path)?;
let reader = BufReader::new(file);
let mut entries = Vec::new();
for line in reader.lines() {
let line = line?;
if let Some(entry) = parse_log_line(&line) {
entries.push(entry);
}
}
Ok(entries)
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let entries = read_logs(Path::new("sample.log"))?;
println!("Parsed {} log entries", entries.len());
// Count by level
let error_count = entries.iter()
.filter(|e| e.level == LogLevel::Error)
.count();
println!("Errors: {}", error_count);
Ok(())
}Add to Cargo.toml:
regex = "1.12.2"Sample log file (sample.log):
2024-01-15 10:30:45 [INFO] Application started
2024-01-15 10:30:46 [DEBUG] Loading configuration from config.yml
2024-01-15 10:30:47 [INFO] Database connection established
2024-01-15 10:31:02 [WARNING] High memory usage detected: 85%
2024-01-15 10:31:15 [ERROR] Failed to connect to API: timeout
2024-01-15 10:31:16 [INFO] Retrying API connection...
2024-01-15 10:31:18 [INFO] API connection successful
2024-01-15 10:32:00 [ERROR] Database query failed: syntax error
2024-01-15 10:32:01 [WARNING] Cache miss for key: user_1234
2024-01-15 10:33:00 [INFO] Processing completed successfully
- Exercise:
- Integrate
read_logs()with your CLI from Part 1 - Implement the
--errors-onlyflag to filter ERROR level logs - Implement the
--searchflag to filter logs containing specific text - Add error handling for file not found and display a user-friendly message
- Count and display the total number of each log level (INFO, WARNING, ERROR, DEBUG)
- Integrate
Building Block: Presenting data in multiple formats
-
Concepts:
- Serialization with
serde - Creating formatted tables with
prettytable-rs - Writing CSV output
- Pattern analysis (most common errors, time-based trends)
- Serialization with
-
Demo Code: Multi-format output
use prettytable::{Cell, Row, Table};
use serde::Serialize;
use std::collections::HashMap;
#[derive(Debug, Serialize)]
struct LogStats {
total_entries: usize,
by_level: HashMap<String, usize>,
top_errors: Vec<ErrorFrequency>,
}
#[derive(Debug, Serialize)]
struct ErrorFrequency {
message: String,
count: usize,
}
fn analyze_logs(entries: &[LogEntry]) -> LogStats {
let mut by_level = HashMap::new();
let mut error_messages: HashMap<String, usize> = HashMap::new();
for entry in entries {
let level_name = format!("{:?}", entry.level);
*by_level.entry(level_name).or_insert(0) += 1;
if entry.level == LogLevel::Error {
*error_messages.entry(entry.message.clone()).or_insert(0) += 1;
}
}
let mut top_errors: Vec<_> = error_messages
.into_iter()
.map(|(message, count)| ErrorFrequency { message, count })
.collect();
top_errors.sort_by(|a, b| b.count.cmp(&a.count));
top_errors.truncate(5); // Top 5 errors
LogStats {
total_entries: entries.len(),
by_level,
top_errors,
}
}
fn output_text(stats: &LogStats) {
println!("\n Log Analysis Results");
println!("========================\n");
println!("Total entries: {}\n", stats.total_entries);
println!("Breakdown by level:");
let mut table = Table::new();
table.add_row(Row::new(vec![
Cell::new("Level"),
Cell::new("Count"),
Cell::new("Percentage"),
]));
for (level, count) in &stats.by_level {
let percentage = (*count as f64 / stats.total_entries as f64) * 100.0;
table.add_row(Row::new(vec![
Cell::new(level),
Cell::new(&count.to_string()),
Cell::new(&format!("{:.1}%", percentage)),
]));
}
table.printstd();
if !stats.top_errors.is_empty() {
println!("\nTop errors:");
let mut error_table = Table::new();
error_table.add_row(Row::new(vec![
Cell::new("Error Message"),
Cell::new("Occurrences"),
]));
for error in &stats.top_errors {
error_table.add_row(Row::new(vec![
Cell::new(&error.message),
Cell::new(&error.count.to_string()),
]));
}
error_table.printstd();
}
}
fn output_json(stats: &LogStats) {
let json = serde_json::to_string_pretty(stats).unwrap();
println!("{}", json);
}
fn output_csv(stats: &LogStats) {
println!("level,count");
for (level, count) in &stats.by_level {
println!("{},{}", level, count);
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let cli = Cli::parse();
let entries = read_logs(&cli.input)?;
let filtered_entries: Vec<_> = entries.into_iter()
.filter(|e| !cli.errors_only || e.level == LogLevel::Error)
.collect();
let stats = analyze_logs(&filtered_entries);
match cli.format {
OutputFormat::Text => output_text(&stats),
OutputFormat::Json => output_json(&stats),
OutputFormat::Csv => output_csv(&stats),
}
Ok(())
}- Exercise:
- Implement the
--top Nflag to show top N errors instead of hardcoded 5 - Add time-based analysis: group errors by hour of day
- In CSV output, include all statistics (not just by_level)
- Add a
--output FILEflag to write results to a file instead of stdout - Color-code the table output (red for ERROR, yellow for WARNING) using
coloredcrate
- Implement the
Building Block: Scaling to large files
-
Concepts:
- Why parallel processing? (multi-core utilization)
- Using
rayonfor data parallelism - When to parallelize (large files only)
- Benchmarking performance improvements
-
Demo Code: Parallel log processing
use rayon::prelude::*;
use std::time::Instant;
fn read_logs_parallel(path: &Path) -> Result<Vec<LogEntry>, std::io::Error> {
let file = File::open(path)?;
let reader = BufReader::new(file);
// Read all lines first
let lines: Vec<_> = reader.lines().collect::<Result<_, _>>()?;
// Parse in parallel
let entries: Vec<_> = lines
.par_iter()
.filter_map(|line| parse_log_line(line))
.collect();
Ok(entries)
}
fn analyze_logs_parallel(entries: &[LogEntry]) -> LogStats {
use std::sync::Mutex;
let by_level = Mutex::new(HashMap::new());
let error_messages = Mutex::new(HashMap::new());
entries.par_iter().for_each(|entry| {
let level_name = format!("{:?}", entry.level);
*by_level.lock().unwrap()
.entry(level_name)
.or_insert(0) += 1;
if entry.level == LogLevel::Error {
*error_messages.lock().unwrap()
.entry(entry.message.clone())
.or_insert(0) += 1;
}
});
let by_level = by_level.into_inner().unwrap();
let error_messages = error_messages.into_inner().unwrap();
let mut top_errors: Vec<_> = error_messages
.into_iter()
.map(|(message, count)| ErrorFrequency { message, count })
.collect();
top_errors.sort_by(|a, b| b.count.cmp(&a.count));
top_errors.truncate(5);
LogStats {
total_entries: entries.len(),
by_level,
top_errors,
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let cli = Cli::parse();
let start = Instant::now();
// Use parallel version for files larger than 10MB
let file_size = std::fs::metadata(&cli.input)?.len();
let entries = if file_size > 10_000_000 {
if cli.verbose {
println!("Using parallel processing for large file...");
}
read_logs_parallel(&cli.input)?
} else {
read_logs(&cli.input)?
};
let parse_time = start.elapsed();
let filtered_entries: Vec<_> = entries.into_iter()
.filter(|e| !cli.errors_only || e.level == LogLevel::Error)
.collect();
let stats = analyze_logs_parallel(&filtered_entries);
let total_time = start.elapsed();
match cli.format {
OutputFormat::Text => output_text(&stats),
OutputFormat::Json => output_json(&stats),
OutputFormat::Csv => output_csv(&stats),
}
if cli.verbose {
eprintln!("\n⏱️ Performance:");
eprintln!(" Parse time: {:?}", parse_time);
eprintln!(" Total time: {:?}", total_time);
}
Ok(())
}- Exercise:
- Create a large test file (generate 100k log entries programmatically)
- Benchmark sequential vs parallel processing
- Add a
--parallelflag to force parallel processing regardless of file size - Optimize the regex compilation (hint: use
lazy_staticoronce_cell) - Add progress indicator for large files using
indicatifcrate
Students will have a complete, production-ready CLI tool with:
- Professional CLI: Help text, argument validation, multiple output formats
- Efficient parsing: Regex-based log parsing, handles large files
- Rich analysis: Statistics by log level, top errors, pattern detection
- Multiple output formats: Text (with tables), JSON, CSV
- Performance optimization: Parallel processing for large files
- Production quality: Error handling, exit codes, verbose mode
# Basic analysis
./loglyzer application.log
# Show only errors in JSON format
./loglyzer --errors-only --format json app.log
# Get top 10 most common errors
./loglyzer --top 10 --errors-only app.log
# Search for specific pattern
./loglyzer --search "database" --format csv app.log > db_errors.csv
# Verbose mode with timing
./loglyzer --verbose large.log
# Help
./loglyzer --help Log Analysis Results
========================
Total entries: 10
Breakdown by level:
+----------+-------+------------+
| Level | Count | Percentage |
+----------+-------+------------+
| Info | 6 | 60.0% |
| Error | 2 | 20.0% |
| Warning | 2 | 20.0% |
+----------+-------+------------+
Top errors:
+--------------------------------------+-------------+
| Error Message | Occurrences |
+--------------------------------------+-------------+
| Failed to connect to API: timeout | 1 |
| Database query failed: syntax error | 1 |
+--------------------------------------+-------------+
- Real-time monitoring: Use
--followflag (liketail -f) to watch logs in real-time - Advanced patterns: Extract IP addresses, URLs, error codes using regex groups
- Time-based filtering:
--since "2024-01-15 10:00"and--untilflags - Multiple file support: Accept glob patterns (e.g.,
*.log) - Web interface: Add
--serveflag to launch a web dashboard showing results - Export formats: Add HTML output with charts
- Configuration file: Support
.loglyzer.tomlfor default settings - Plugin system: Allow custom log format parsers
- CLI Design: Good UX, helpful errors, comprehensive help text
- File I/O: Efficient reading with
BufReader, line-by-line processing - Pattern Matching: Regex for structured data extraction
- Data Structures: Choosing the right collections (HashMap, Vec)
- Serialization:
serdefor multiple output formats - Performance: When and how to parallelize with
rayon - Error Handling: Result types, proper error messages, exit codes