Retrieval-augmented generation (RAG) enables LLMs to better answer questions by utilizing external documents. Most RAG tools however split documents linearly by tokens or lines, thus ignoring real-world structure. This causes hallucinations, context loss, and token waste at retrieval.
POMA AI solves this by preserving the structural tree of your documents when chunking them, enabling context-preserving retrieval, so every answer comes with context, not confusion.
Use POMA AI's structural chunking inside your RAG pipeline; integrate it with LlamaIndex, LangChain, Haystack, Weaviate, Pinecone, etc. POMA AI works with both vector search and keyword/fulltext search backends.
- Structure-preserving chunking (headings, lists, articles, etc.)
- LLM-friendly data extraction for precise retrieval
- up to 90% token savings in prompt context for structured docs
- Plug-in to any RAG pipeline
- Supported input types:
.pdf, .md, .html, .txt, and many more
['ai', 'bmp', 'csv', 'djvu', 'doc', 'docx', 'dotx', 'dwf', 'dwfx', 'dwg', 'dxf', 'eps', 'epub', 'gif', 'heic', 'heif', 'htm', 'html', 'ico', 'jpeg', 'jpg', 'key', 'md', 'mdi', 'mobi', 'numbers', 'odc', 'odf', 'odp', 'ods', 'odt', 'oxps', 'pages', 'pdf', 'png', 'pot', 'potx', 'pps', 'ppsx', 'ppt', 'pptx', 'prn', 'ps', 'psd', 'pub', 'rtf', 'svg', 'tif', 'tiff', 'txt', 'vsd', 'vsdx', 'webp', 'xls', 'xlsb', 'xlsx', 'xltx', 'xml', 'xps']
- Installation
- Example Integrations
- Why POMA AI?
- How POMA AI Works
- Real-World Performance Example
- FAQ
- Licensing
pip install pomaImportant
Requires Python 3.10+
Requires POMA_API_KEY as env variable (sign up for free and get it here.
The poma client then offers three endpoints:
- Use
start_chunk_file()to start the chunking process. - With
get_chunk_result()you can download the result after it finished processing. - And
create_cheatsheets()is used at retrieval time (locally, no API).
See How POMA AI Works for more details in the workflow.
Warning
Please do NOT send any sensitive and/or personal information to POMA AI endpoints without a signed contract & DPA !
We provide three example implementations to help you get started with POMA AI:
- Standalone implementation (basic POMA AI workflow with simple keyword-based retrieval)
- Integration with LangChain
- Integration with LlamaIndex
| Module | What it does | Install | License | Link |
|---|---|---|---|---|
| poma (core) | Build depth-aware chunks & chunksets | pip install poma |
MPL-2.0 | pypi |
| LangChain | Drop-in classes for LangChain | poma[langchain] |
MPL-2.0 | github |
| LlamaIndex | Drop-in classes for LlamaIndex | poma[llamaindex] |
MPL-2.0 | github |
| Qdrant | Qdrant vector store support | poma[qdrant] |
MPL-2.0 | github |
| All integrations | LangChain + LlamaIndex + Qdrant + examples | poma[all] |
MPL-2.0 |
Install only what you need:
pip install 'poma[langchain]' # LangChain integration
pip install 'poma[llamaindex]' # LlamaIndex integration
pip install 'poma[qdrant]' # Qdrant vector store
pip install 'poma[all]' # All of the above (e.g. to run all examples)Note
The integration examples use OpenAI embeddings. Make sure to set your OPENAI_API_KEY as environment variable.
Standalone Implementation - example.py
A complete, self-contained implementation that demonstrates the POMA AI workflow. It uses a keyword-based approach for simplicity, avoiding the need to set up a vector database.
Use your CLI with these ingest and retrieve commands (from inside the examples/poma directory):
cd examples/pomaIngest a document to create structured chunks and chunksets, which are stored locally.
python example.py ingest ../Coffee.txtRetrieve with a query to find relevant information; returns one cheatsheet per "affected" document.
python example.py retrieve "finland"Swap the simple keyword search with your vector/full-text DB, and you have a minimal RAG loop. See examples/langchain/example_langchain.py and examples/llamaindex/example_llamaindex.py for full integrations.
Note
In POMA AI, the units you embed are chunksets — structure-preserving contexts, NOT isolated chunks.
LangChain Integration - example_langchain.py
Integrate POMA AI with LangChain’s retrieval and QA components.
- Uses PomaFileLoader, PomaChunksetSplitter, PomaCheatsheetRetrieverLC from
poma.integrations.langchainand POMA AI's API to chunk text. - Stores chunks and chunksets in LangChains Document Metadata for later retrieval.
- FAISS vector search with OpenAI embeddings — Make sure to set your
OPENAI_API_KEYas environment variable. - QA chain using LangChain’s LCEL
- Custom cheatsheet retriever for context-aware retrieval
LlamaIndex Integration - example_llamaindex.py
Use POMA AI with LlamaIndex’s document processing and query engine.
- Uses PomaFileReader, PomaChunksetNodeParser, PomaCheatsheetRetrieverLI from
poma.integrations.llamaindexand POMA AI's API to chunk text. - Stores chunks and chunksets in LlamaIndex Nodes Metadata for later retrieval.
- VectorStoreIndex (implemented with FAISS) and OpenAI embeddings — Make sure to set your
OPENAI_API_KEYas environment variable. - Using LlamaIndex as_query_engine upon the retriever
- Custom cheatsheet retriever for context-aware retrieval
Retrieval-augmented generation (RAG) enables LLMs to better answer questions by utilizing external documents. But if you feed LLMs linear, structureless chunks you get:
- Orphaned headings (a title with no details)
- Fragmented lists (missing key info)
- Chapter–Article Disconnection (context lost)
- Bloated prompts (wasted tokens)
- Hallucinated or incomplete answers
Linear chunking splits docs by tokens or lines — ignoring real-world structure. Tools like LlamaIndex default to this, but linear chunking fails for anything hierarchical: laws, manuals, policies, contracts, technical docs.
-
Isolated Headings → incomplete information
Chunk A (retrieved): “Article 26. Personalized License Plate Fees”
Chunk B (missing): “The fees vary by character count and composition.”
Impact: Incomplete answers, confusion about fees, potential legal/financial misunderstandings. -
Fragmented Lists → partial information
Chunk A (retrieved):
“a) 2 letters and 3 digits: 300 euros; b) 3 letters and 2 digits: 500 euros; c) 4 letters and 1 digit: 1,000 euros; d) 5 letters: 3,000 euros;”
Chunk B (missing): “e) Less than 5 characters: 6,000 euros”
Impact: Missing premium fees; compliance failures or financial errors. -
Chapter–Article Disconnection → ambiguity and misattribution
Chunk A (retrieved): “Chapter 5. Reservation Fee for Personalized License Plates”
Chunk B (missing): “Article 21. Tax Quota … fixed amount of 40.74 euros.”
Impact: Misattribution across chapters; incorrect legal interpretations.
Avoiding chunking in the middle of sentences is a no-brainer, but how do you deal with really long (for example legal) paragraphs that are longer than your chunk limit?
Including neighboring chunks seems to be the method of choice for most chunkers, but limit/target based chunking with overlap
→ doubles the information that needs embedding
→ bloats prompts with irrelevant information
→ consumes valuable token context
and still misses structural boundaries.
Other proposed solutions use auto-summarization or guessed relations, relying on heuristics rather than true document structure, to create additional "context" information for chunks thus
→ losing accuracy (through abstractions)
→ and risking hallucinations.
Rather than slicing blindly or extracting structure from messy documents using brittle heuristics, POMA AI re-generates documents by using powerful generative intelligence and creating structural coherence inside these documents.
+----------------+ \ +----------------+
| (unstructured) | ----\ | POMA SDK |
| documents | ----/ | (API client) |
+--------+-------+ / +--------+-------+
start_chunk_file()
+
get_chunk_result()
|
v
(chunks[], chunksets[])
| |
v v
Vector/Keyword <---- Index chunksets in your DB, also store chunks
…
search/retrieve ----> Retrieve relevant chunksets (context trees)
|
v
Get all chunk_IDs referenced
in the retrieved chunksets
+
Get content for these chunk_IDs
|
v
+----------------+
| POMA SDK |
| (local) |
+--------+-------+
create_cheatsheets()
|
v
Use cheatsheet(s) in LLM prompt
POMA AI converts documents into structurally aware chunks and lossless chunksets. Chunksets can then be embedded and later used to create cheatsheets, a compact representation of the retrieved information, optimized for LLM consumption. This approach ensures full structure preservation, enabling accurate retrieval and context assembly. You can also save the -file, which is basically a .zip-file including all additional assets; when you set the parameters download_dir and filename, in get_chunk_result. If only one is set, the default download dir will be the one the script is running in or the filename will be sautomatically created.
You can save the .poma-file, which is essentially a .zip archive containing all associated assets.
To enable this, set the download_dir and/or filename parameters when calling get_chunk_result (see examples).
Behavior:
- If both
download_dirandfilenameare provided, the file is saved exactly as specified. - If only
download_diris provided, the filename is generated automatically. - If only
filenameis provided, the file is saved in the current working directory.
SDK:
json = client.start_chunk_file(src_path)
result = client.get_chunk_result(job_id_from_json)
chunks, chunksets = result["chunks"], result["chunksets"]
Input: your documents (supported types)
Process:
- Text is analyzed and structural relationships between sentences / text units are identified
- Each sentence / text unit is assigned a depth in the hierarchy
Output: Short, granular, context aware chunks with assigned depth
chunks[{'chunk_index': 0, 'content': 'some text', 'depth': 0}, ...]
We recommend storing the chunks separately in a relational database for faster and safer retrieval.
Input: chunks with depth information
Process:
- Chunks are grouped into semantic units
- Complete root-to-leaf paths are created
- Parent–child relationships are preserve, full hierarchical context is maintained
Output: chunksets containing complete contextual paths
[{'chunkset_index': 0, 'chunks': [0, 1, 2, 3, 4], 'contents': 'combined chunk texts (to embed)'}, ...]
Embed these and store them for later retrieval.
First of all a chunkset is a "set of chunks", a sequence of single sentences or chunks (usually one sentence is one chunk).
Secondly a chunkset is a complete root-to-leaf path for every "leaf chunk" in a document, for example:title → chapter → section → clause, with the clause being the "leaf" and the title being the "root".
Thus chunksets preserve the complete hierarchical context for every chunk in a document - from document root to specific details. This ensures:
- Headings are never separated from their content
- Lists remain intact with all items
- Hierarchical relationships between sections are preserved
- Context is never lost during retrieval So chunksets are also meaningful parts of text, enabling accurate retrieval and context assembly.
Note
When comparing traditional chunks with POMA AI's chunking result, chunksets are the correct counter part.
POMA AI's chunks are very short and used solely to make up the root-to-leaf paths we call chunksets.
Chunksets are the fundamental unit of storage and retrieval in POMA AI.
Use your vector or full-text search to retrieve query relevant chunksets (could be from different documents). Also collect all chunks indicated by the relevant chunksets (indicated in the chunks field of the relevant chunksets).
Input: relevant chunksets (complete root-to-leaf paths) and all necessary chunks (single sentences with depth information)
[{'chunkset_index': 0, 'chunks': [0, 1, 2, 3, 4], ...}, ...]
Process:
- Overlapping content is deduplicated while preserving structural relationships (per document)
- Parents, children, and adjacent chunks are added as needed to ensure structural continuity
- All information is formatted hierarchically
Output: The final LLM context information ready for efficient use in any RAG pipeline. We call them cheatsheets. Use them as the input for LlamaIndex, LangChain, or custom retrieval engines — in place of their default flat chunkers and retrievers.
If all chunksets necessary to answer a query originate from the same document only one single cheatsheet is produced, otherwise you get as many cheatsheets as documents involved.
SDK:
cheatsheets = client.generate_cheatsheets(relevant_chunksets, all_necessary_chunks)Cheatsheets provide the LLM with precisely the context it needs to answer queries accurately, without wasting tokens on redundant information.
It is a compact representation of the retrieved information. It comprises several relevant chunksets, deduplicated and optimized for LLM consumption.
We call them cheatsheets because they are compact compilations of the most important points on a topic, like the ones none of us used during a test or exam…Cheatsheet characteristics:
- Single coherent context block
- Hierarchical relationships preserved
- Logical, structured organization
- LLM-friendly ellipses ([…]) indicate omitted content
- Deduplicated content to minimize token usage
- One cheatsheet per as document involved
Example input:
retrieved_chunksets = [ {"chunkset_index": 0, "chunks": [0, 10, 16, 17], "file_id": "doc_1"}, {"chunkset_index": 1, "chunks": [0, 4, 5, 6], "file_id": "doc_2"} ]Example output (conceptual, IDs only for brevity):
[0, 4, 5, 6, 10, 16, 17]
POMA AI significantly outperforms traditional chunking in token efficiency and retrieval accuracy. While a dedicated benchmark repo is pending, real-world comparisons show substantial improvements.
To illustrate with a (very) niche example: a legal-document query about Andorra’s personalized license-plate law (a notoriously tough document for RAGs) needed 1,542 tokens of retrieved context with traditional RAG, versus 337 tokens with POMA (a roughly 80% reduction), with zero information loss.
This efficiency enables energy and cost savings and/or more context within token limits.
We’ll start building out this FAQ as soon as we receive the first real questions from users.
If you have a question, suggestion, or found something unclear in our readme, please reach out to us:
sdk@poma-ai.com
Your feedback will help us expand this section into a valuable reference for everyone.
Usage of the POMA AI API & ecosystem under MPL-2.0.