A Python library for seamlessly working with Large Language Models (LLMs) from multiple providers. Easily integrate Hugging Face models, Google Gemini, OpenAI, and more with a unified interface. Perfect for data analysis, chat applications, and AI-powered workflows.
π Documentation | π Quick Start | π Examples
- π€ Multi-Provider Support: Hugging Face Inference API, Google Gemini 2.5, OpenAI (extensible)
- π¬ Interactive Chat Widget: Built-in Jupyter notebook UI for chat interactions
- π Data Integration: Attach pandas DataFrames and query your data with LLMs
- π PDF Processing: Built-in utility to extract and analyze PDF documents
- π Structured Information Extraction: InfoExtractor class for extracting structured data from text using custom schemas with retry logic
- π Guideline System: Add custom guidelines to steer model behavior
- π¨ History Management: Automatic conversation history tracking
- π§ Easy Configuration: Simple initialization with sensible defaults
- π¦ Pip Installable: Install as a package or use directly
-
Clone the repository:
git clone https://github.com/BartonChenTW/LLM-data-processer.git cd LLM-data-processer -
Create and activate a virtual environment:
Windows PowerShell:
python -m venv .venv .\.venv\Scripts\Activate.ps1Windows CMD:
python -m venv .venv .venv\Scripts\activate.bat
Linux/Mac:
python3 -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install --upgrade pip pip install -r requirements.txt
-
Install PyTorch (for local transformers models):
# CPU only (lightweight) pip install torch --index-url https://download.pytorch.org/whl/cpu # OR with CUDA GPU support pip install torch
pip install -e .Create a .env file or set environment variables:
# For Hugging Face models
export HF_TOKEN="your_huggingface_token_here"
# For Google Gemini
export GEMINI_API_KEY="your_gemini_api_key_here"
# For OpenAI (if using)
export OPENAI_API_KEY="your_openai_api_key_here"Get API Keys:
- Hugging Face: https://huggingface.co/settings/tokens
- Google Gemini: https://aistudio.google.com/app/apikey
- OpenAI: https://platform.openai.com/api-keys
from llm_helper import AIHelper
# Initialize with Llama or Mistral
ai = AIHelper(model_name='Llama-3.1')
# Simple question
response = ai.ask("What is machine learning?")
print(response)from llm_helper.ai_helper import AIHelper_Google
# Initialize Gemini with Google Search
ai = AIHelper_Google()
# Ask with web grounding
response = ai.ask("What are the latest AI trends in 2025?")
print(response)from llm_helper import AIHelper
ai = AIHelper(model_name='Llama-3.1')
# Launch interactive widget
ai.chat_widget()import pandas as pd
from llm_helper import AIHelper
# Create AI helper
ai = AIHelper(model_name='Llama-3.1')
# Load your data
df = pd.DataFrame({
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35],
"Salary": [50000, 60000, 70000]
})
# Attach data to AI context
ai.attach_data(df)
# Query your data
ai.ask("Who has the highest salary?")
ai.ask("What is the average age?")ai = AIHelper(model_name='Llama-3.1')
# Add behavior guidelines
ai.add_guideline("Always respond in bullet points")
ai.add_guideline("Keep responses under 100 words")
ai.add_guideline("Focus on practical actionable advice")
# Ask with guidelines applied
ai.ask("How do I learn Python?", with_guideline=True)ai = AIHelper(model_name='Llama-3.1')
# Have a conversation
ai.ask("What is Python?")
ai.ask("What are its main uses?") # Builds on previous context
ai.ask("Compare it to JavaScript") # Maintains conversation flow
# View history
print(ai.chat_history)from llm_helper import InfoExtractor
# Initialize extractor with Gemini
extractor = InfoExtractor(api_provider='google', model='gemini-2.5-flash')
# Define data schema
schema = {
'tech_type': 'StorageTechnology',
'fields': {
'name': {'field_type': 'str', 'description': 'Technology name'},
'description': {'field_type': 'str', 'description': 'Brief description'},
'advantages': {'field_type': 'List[str]', 'description': 'Key advantages'},
'use_cases': {'field_type': 'List[str]', 'description': 'Common use cases'}
}
}
# Set up extraction
extractor.load_data_schema(schema)
extractor.load_prompt_templates(base_prompts, fix_prompts)
extractor.load_info_source("PostgreSQL", info_text)
# Extract structured data with auto-retry on parsing errors
result = extractor.extract_tech_info(max_retries=3)
print(result.name, result.description)AIHelper(
model_name: str = 'Mistral-7B',
display_response: bool = True
)Available Models:
'Llama-3.1'- Meta Llama 3.1 8B Instruct'Mistral-7B'- Mistral 7B Instruct v0.2
Methods:
ask(
prompt: str,
display_response: bool = None,
with_guideline: bool = True,
with_data: bool = True,
with_history: bool = True
) -> strGenerate a response from the LLM.
Parameters:
prompt: Your question or instructiondisplay_response: Whether to display output (default: True)with_guideline: Include custom guidelines in contextwith_data: Include attached data in contextwith_history: Include conversation history
add_guideline(guideline: str)Add a custom guideline to influence model behavior.
attach_data(data: pd.DataFrame)Attach a pandas DataFrame to the AI context for querying.
chat_widget()Launch an interactive chat interface in Jupyter notebooks.
AIHelper_Google(
model: str = 'gemini-2.5-flash',
display_response: bool = True
)Methods:
ask(
prompt: str,
display_response: bool = None
) -> strGenerate a response using Google Gemini with Google Search grounding.
Edit llm_helper/ai_helper.py:
config = {
'max_tokens': 2000, # Adjust response length
'temperature': 0.7, # 0.0 = deterministic, 1.0 = creative
}llm_models = {
'Llama-3.1': 'meta-llama/Llama-3.1-8B-Instruct',
'Mistral-7B': 'mistralai/Mistral-7B-Instruct-v0.2',
'YourModel': 'your-org/your-model-name' # Add custom model
}LLM-data-processer/
βββ llm_helper/
β βββ __init__.py # Package initialization
β βββ ai_helper.py # Core AI helper classes
βββ notebook/
β βββ llm_chat.ipynb # Example chat notebook
βββ examples/
β βββ basic_usage.py # Simple examples
β βββ data_analysis.py # DataFrame integration
β βββ custom_guidelines.py # Guideline examples
βββ .env.example # API key template
βββ .gitignore # Git ignore rules
βββ requirements.txt # Python dependencies
βββ setup.py # Package installation
βββ CHANGELOG.md # Version history
βββ CONTRIBUTING.md # Contribution guide
βββ LICENSE # MIT License
βββ README.md # This file
pip install transformers torchInstall PyTorch for local model support:
pip install torch --index-url https://download.pytorch.org/whl/cpuEnsure your API keys are set:
echo $HF_TOKEN # Should show your token
echo $GEMINI_API_KEY # Should show your key- Select the correct kernel in VS Code (
.venvinterpreter) - Restart the kernel:
Kernel β Restart - Re-run imports
- Select the correct kernel in VS Code (
.venvinterpreter) - Restart the kernel:
Kernel β Restart - Re-run imports
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for the Inference API
- Google for Gemini API
- The open-source AI community
- GitHub: @BartonChenTW
- Issues: GitHub Issues
- Add streaming response support
- Support for more LLM providers (Anthropic, Cohere)
- Enhanced data visualization
- Model fine-tuning utilities
- Export conversation history
- Multi-language support
Star β this repository if you find it helpful!