LLM Data Processer

A Python library for seamlessly working with Large Language Models (LLMs) from multiple providers. Easily integrate Hugging Face models, Google Gemini, OpenAI, and more with a unified interface. Perfect for data analysis, chat applications, and AI-powered workflows.

📖 Documentation | 🚀 Quick Start | 📚 Examples

✨ Features

🤖 Multi-Provider Support: Hugging Face Inference API, Google Gemini 2.5, OpenAI (extensible)
💬 Interactive Chat Widget: Built-in Jupyter notebook UI for chat interactions
📊 Data Integration: Attach pandas DataFrames and query your data with LLMs
📄 PDF Processing: Built-in utility to extract and analyze PDF documents
🔍 Structured Information Extraction: InfoExtractor class for extracting structured data from text using custom schemas with retry logic
📝 Guideline System: Add custom guidelines to steer model behavior
🎨 History Management: Automatic conversation history tracking
🔧 Easy Configuration: Simple initialization with sensible defaults
📦 Pip Installable: Install as a package or use directly

🚀 Installation

Option 1: Install from Source (Recommended for Development)

Clone the repository:

git clone https://github.com/BartonChenTW/LLM-data-processer.git
cd LLM-data-processer

Create and activate a virtual environment:

Windows PowerShell:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Windows CMD:

python -m venv .venv
.venv\Scripts\activate.bat

Linux/Mac:

python3 -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Install PyTorch (for local transformers models):

# CPU only (lightweight)
pip install torch --index-url https://download.pytorch.org/whl/cpu

# OR with CUDA GPU support
pip install torch

Option 2: Install as Package

pip install -e .

⚙️ Configuration

Set up API Keys

Create a .env file or set environment variables:

# For Hugging Face models
export HF_TOKEN="your_huggingface_token_here"

# For Google Gemini
export GEMINI_API_KEY="your_gemini_api_key_here"

# For OpenAI (if using)
export OPENAI_API_KEY="your_openai_api_key_here"

Get API Keys:

Hugging Face: https://huggingface.co/settings/tokens
Google Gemini: https://aistudio.google.com/app/apikey
OpenAI: https://platform.openai.com/api-keys

🎯 Quick Start

Basic Usage with Hugging Face

from llm_helper import AIHelper

# Initialize with Llama or Mistral
ai = AIHelper(model_name='Llama-3.1')

# Simple question
response = ai.ask("What is machine learning?")
print(response)

Using Google Gemini

from llm_helper.ai_helper import AIHelper_Google

# Initialize Gemini with Google Search
ai = AIHelper_Google()

# Ask with web grounding
response = ai.ask("What are the latest AI trends in 2025?")
print(response)

Interactive Chat in Jupyter

from llm_helper import AIHelper

ai = AIHelper(model_name='Llama-3.1')

# Launch interactive widget
ai.chat_widget()

📚 Usage Examples

Example 1: Data Analysis with DataFrames

import pandas as pd
from llm_helper import AIHelper

# Create AI helper
ai = AIHelper(model_name='Llama-3.1')

# Load your data
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "Salary": [50000, 60000, 70000]
})

# Attach data to AI context
ai.attach_data(df)

# Query your data
ai.ask("Who has the highest salary?")
ai.ask("What is the average age?")

Example 2: Custom Guidelines

ai = AIHelper(model_name='Llama-3.1')

# Add behavior guidelines
ai.add_guideline("Always respond in bullet points")
ai.add_guideline("Keep responses under 100 words")
ai.add_guideline("Focus on practical actionable advice")

# Ask with guidelines applied
ai.ask("How do I learn Python?", with_guideline=True)

Example 3: Conversation History

ai = AIHelper(model_name='Llama-3.1')

# Have a conversation
ai.ask("What is Python?")
ai.ask("What are its main uses?")  # Builds on previous context
ai.ask("Compare it to JavaScript")  # Maintains conversation flow

# View history
print(ai.chat_history)

Example 4: Structured Information Extraction

from llm_helper import InfoExtractor

# Initialize extractor with Gemini
extractor = InfoExtractor(api_provider='google', model='gemini-2.5-flash')

# Define data schema
schema = {
    'tech_type': 'StorageTechnology',
    'fields': {
        'name': {'field_type': 'str', 'description': 'Technology name'},
        'description': {'field_type': 'str', 'description': 'Brief description'},
        'advantages': {'field_type': 'List[str]', 'description': 'Key advantages'},
        'use_cases': {'field_type': 'List[str]', 'description': 'Common use cases'}
    }
}

# Set up extraction
extractor.load_data_schema(schema)
extractor.load_prompt_templates(base_prompts, fix_prompts)
extractor.load_info_source("PostgreSQL", info_text)

# Extract structured data with auto-retry on parsing errors
result = extractor.extract_tech_info(max_retries=3)
print(result.name, result.description)

📖 API Documentation

AIHelper (Hugging Face)

AIHelper(
    model_name: str = 'Mistral-7B',
    display_response: bool = True
)

Available Models:

'Llama-3.1' - Meta Llama 3.1 8B Instruct
'Mistral-7B' - Mistral 7B Instruct v0.2

Methods:

`ask()`

ask(
    prompt: str,
    display_response: bool = None,
    with_guideline: bool = True,
    with_data: bool = True,
    with_history: bool = True
) -> str

Generate a response from the LLM.

Parameters:

prompt: Your question or instruction
display_response: Whether to display output (default: True)
with_guideline: Include custom guidelines in context
with_data: Include attached data in context
with_history: Include conversation history

`add_guideline()`

add_guideline(guideline: str)

Add a custom guideline to influence model behavior.

`attach_data()`

attach_data(data: pd.DataFrame)

Attach a pandas DataFrame to the AI context for querying.

`chat_widget()`

chat_widget()

Launch an interactive chat interface in Jupyter notebooks.

AIHelper_Google (Google Gemini)

AIHelper_Google(
    model: str = 'gemini-2.5-flash',
    display_response: bool = True
)

Methods:

`ask()`

ask(
    prompt: str,
    display_response: bool = None
) -> str

Generate a response using Google Gemini with Google Search grounding.

🛠️ Advanced Configuration

Custom Temperature & Max Tokens

Edit llm_helper/ai_helper.py:

config = {
    'max_tokens': 2000,    # Adjust response length
    'temperature': 0.7,     # 0.0 = deterministic, 1.0 = creative
}

Add New Models

llm_models = {
    'Llama-3.1': 'meta-llama/Llama-3.1-8B-Instruct',
    'Mistral-7B': 'mistralai/Mistral-7B-Instruct-v0.2',
    'YourModel': 'your-org/your-model-name'  # Add custom model
}

📂 Project Structure

LLM-data-processer/
├── llm_helper/
│   ├── __init__.py          # Package initialization
│   └── ai_helper.py         # Core AI helper classes
├── notebook/
│   └── llm_chat.ipynb       # Example chat notebook
├── examples/
│   ├── basic_usage.py       # Simple examples
│   ├── data_analysis.py     # DataFrame integration
│   └── custom_guidelines.py # Guideline examples
├── .env.example             # API key template
├── .gitignore               # Git ignore rules
├── requirements.txt         # Python dependencies
├── setup.py                 # Package installation
├── CHANGELOG.md             # Version history
├── CONTRIBUTING.md          # Contribution guide
├── LICENSE                  # MIT License
└── README.md                # This file

🐛 Troubleshooting

ModuleNotFoundError: transformers

pip install transformers torch

PyTorch/TensorFlow Warning

Install PyTorch for local model support:

pip install torch --index-url https://download.pytorch.org/whl/cpu

API Authentication Errors

Ensure your API keys are set:

echo $HF_TOKEN        # Should show your token
echo $GEMINI_API_KEY  # Should show your key

Notebook Kernel Issues

Select the correct kernel in VS Code (.venv interpreter)
Restart the kernel: Kernel → Restart
Re-run imports

Notebook Kernel Issues

Select the correct kernel in VS Code (.venv interpreter)
Restart the kernel: Kernel → Restart
Re-run imports

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Hugging Face for the Inference API
Google for Gemini API
The open-source AI community

📧 Contact

GitHub: @BartonChenTW
Issues: GitHub Issues

🗺️ Roadmap

Add streaming response support
Support for more LLM providers (Anthropic, Cohere)
Enhanced data visualization
Model fine-tuning utilities
Export conversation history
Multi-language support

Star ⭐ this repository if you find it helpful!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
data		data
docs		docs
examples		examples
llm_helper		llm_helper
notebook		notebook
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

LLM Data Processer

✨ Features

📋 Table of Contents

🚀 Installation

🚀 Installation

Option 1: Install from Source (Recommended for Development)

Option 2: Install as Package

⚙️ Configuration

Set up API Keys

🎯 Quick Start

Basic Usage with Hugging Face

Using Google Gemini

Interactive Chat in Jupyter

📚 Usage Examples

Example 1: Data Analysis with DataFrames

Example 2: Custom Guidelines

Example 3: Conversation History

Example 4: Structured Information Extraction

📖 API Documentation

AIHelper (Hugging Face)

ask()

add_guideline()

attach_data()

chat_widget()

AIHelper_Google (Google Gemini)

ask()

🛠️ Advanced Configuration

Custom Temperature & Max Tokens

Add New Models

📂 Project Structure

🐛 Troubleshooting

ModuleNotFoundError: transformers

PyTorch/TensorFlow Warning

API Authentication Errors

Notebook Kernel Issues

Notebook Kernel Issues

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Contact

🗺️ Roadmap

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ask()`

`add_guideline()`

`attach_data()`

`chat_widget()`

`ask()`

Packages