Skip to content

bryetz/Agenerative

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Demo Video

Agenerative

Agenerative merges advanced technologies to create a RAG-based chatbot capable of code generation and execution optimized by TensorRT-LLM for inference. It uses a REST API compatible with the OpenAI API to integrate seamlessly with AutoGen using any local LLM. It leverages the llama_index and Facebook AI Similarity Search (FAISS) libraries for retrieval-augmented generation (RAG). The RAG component functions within a Flask app which AutoGen agents can use to dynamically query vectorized documents for rich context. AutoGen agents can use user-defined tools written in Python in combination with the RAG tool to access external resources. With AutoGen Studio, it offers a no-code UI, ensuring easy setup and access. Agenerative offers an efficient generative AI solution with no API costs/limits and ensures privacy with local RAG and inference for a secure way to chat with your data.

Installation:

Building TRT Engine

Follow these steps to build your TRT engine:

Download models and quantized weights

Mistral-7B Chat Int4

In the TensorRT-LLM repository:

  • For Mistral engine, navigate to the examples\llama directory and run the following script.
python build.py --model_dir <path to mistral model directory> --quant_ckpt_path <path to mistral_tp1_rank0.npz file> --dtype float16 --use_gpt_attention_plugin float16 --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --enable_context_fmha --max_batch_size 1 --max_input_len 3500 --max_output_len 1024 --output_dir <TRT engine folder>

How to Run:

  • Add any desired documents (.pdf, .txt, etc) for context to the dataset folder in trt-llm-rag-windows for vectorization.
  • In the terminal, run .\run.bat and input the name of your conda installation directory.
  • The embeddings will be stored in the storage-default directory after being generated, so if you want to add new documents delete this folder and restart the application to generate new embeddings.

About

RAG-Chatbot connected to Autogen with TensorRT-LLM optimized inference.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors