Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/deployer_1.1.0-beta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,11 @@ jobs:
NEXT_PUBLIC_PADDLE_PRO_PRICE_ID=${{ secrets.NEXT_PUBLIC_PADDLE_PRO_PRICE_ID }}
NEXT_PUBLIC_PADDLE_SUCCESSURL=${{ secrets.NEXT_PUBLIC_PADDLE_SUCCESSURL }}
NEXT_PUBLIC_PAYMENT=${{ secrets.NEXT_PUBLIC_PAYMENT }}
REDIS_DB_ADDRESS=localhost
REDIS_DB_PORT=6379
REDIS_DB_PASSWORD=
REDIS_DB_DATABASE=0
OPENAI_KEY="my key"
EOF

- name: Run Tests
Expand Down
136 changes: 17 additions & 119 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,139 +1,34 @@
# Dcup: The Open-Source RAG-as-a-Service Platform

<h1 align="center">
<a target="_blank" href="https://dcup.dev"><img align="center" style="width:80%;" src="https://github.com/user-attachments/assets/be7cfdc3-35f2-4886-8616-ebf0bc16be1e"> </a>
</h1>


# 📖 **Open Source, Now and Forever**
📖 **Open Source, Now and Forever**

<div align="center">
<a target="_blank" href="https://dcup.dev"><img align="center" style="max-width:300px;" src="https://github.com/user-attachments/assets/1b00557b-e672-480b-b2e1-2dcc6fa5641e"> </a>
</div>

<br>

Dcup is a fully open-source, self-hostable RAG (Retrieval-Augmented Generation) pipeline designed to seamlessly connect your application to your users' data with pre-built integrations and advanced AI capabilities.

## 🚀 Connected RAG

With Dcup Connect, you can easily link your application to data sources like Google Drive, with more integrations coming soon.

## 🌐 Future-Ready
🚀 Dcup is your go-to solution for building and scaling Retrieval-Augmented Generation (RAG) systems. Whether you’re a developer looking to integrate AI-driven search capabilities or a team wanting to harness data for smarter retrieval, Dcup is fully open-source, self-hostable, and built with scalability in mind.

Advanced features such as LLM re-ranking, summary indexing, entity extraction, and hybrid search using OpenAI embeddings and Qdrant vector storage make Dcup the perfect platform for scalable, intelligent retrieval.
## ✨ Key Features
- **Fully Open Source & Self-Hostable:** Maintain control over your data and infrastructure.
- **Connected RAG:** Easy-to-use integrations with Google Drive, Dropbox, AWS, and direct file uploads.
- **Advanced Search Capabilities:** LLM re-ranking, summary indexing, hybrid search using OpenAI embeddings and Qdrant vector storage.
- **Intuitive Retrieval API:** Seamlessly query and refine your data with optional re-ranking.
- **Developer-Centric:** With clear documentation, easy-to-use APIs, and a modular architecture.

## 🛠️ Built for Developers

Dcup provides easy-to-use APIs that get you started in minutes.

# ⚡️ How Dcup Works
## ⚡️ How Dcup Works

### 1️⃣ Ingest

The first step in the RAG pipeline is data ingestion. Dcup offers simple APIs for uploading files or directly connecting to popular sources like Google Drive. With automatic syncing, your data stays up-to-date effortlessly, handling PDFs and more.
Start by ingesting your data. Dcup offers simple APIs for uploading files or directly connecting to popular sources like Google Drive, Dropbox, and AWS. Your data stays up-to-date automatically with effortless syncing.

### 2️⃣ Chunk and Index

Next, Dcup automatically chunks and embeds your data into vectors using OpenAI embeddings. These vectors are stored in a highly scalable Qdrant vector database. Out of the box, Dcup supports vector indexing, summary indexing, and keyword indexing for enhanced retrieval.
Once ingested, Dcup automatically chunks and embeds your data into vectors using OpenAI embeddings. The vectors are stored in a scalable Qdrant vector database, with indexing for enhanced retrieval (vector, summary, and keyword indexing).

### 3️⃣ Retrieve
The final step is retrieval. With the Dcup Retrieval API, you can query your data and refine results. Features like re-ranking, summary index, entity extraction, flexible filtering, and hybrid search (semantic + keyword) ensure high precision and relevant results for your AI applications.

The final step is to use the Dcup Retrieval API to get relevant chunks for your semantic search queries. Built-in features like re-ranking, summary index, entity extraction, flexible filtering, and hybrid semantic and keyword search ensure highly accurate and relevant results for your AI applications.
#### Retrieval API Documentation
##### Endpoint
```bash
POST /api/retrievals
```
#### Description
This API endpoint allows you to retrieve relevant chunks from your indexed documents based on a search query. The process involves expanding your query, generating embeddings, and using Qdrant to search for matching chunks. Optionally, the results can be re-ranked using cosine similarity.

##### Request Body Parameters
- **query (string, required)**: The search query. Must be at least 2 characters long.
- **top_chunk (number, optional)**: The number of top results to return. Default is 5.
- **filter (object, optional)**: A filter object to narrow down results.
- **rerank (boolean, optional)**: Set to true to enable re-ranking of results based on similarity. Defaults to false.
- **min_score_threshold (number, optional)**: Minimum score threshold for filtering results.
#### Example Request
```json
{
"query": "example search query",
"top_chunk": 5,
"filter": {
"field": "value"
},
"rerank": true,
"min_score_threshold": 0.5
}
```
#### Note:
Include an Authorization header with your API key in the format:
Authorization: Bearer YOUR_API_KEY

#### How It Works
- Query Expansion & Embedding:
The API expands your query and generates embeddings using OpenAI.
- Search & Filter:
Qdrant searches the indexed vectors. You can use a filter to refine the search.
- Re-ranking (Optional):
If enabled, the API generates a hypothetical answer, calculates its embedding, and re-ranks the chunks by cosine similarity.
- Response:
The API returns the top matching chunks, each containing metadata like document name, page number, content, and score.

#### Response Format
A successful response returns a JSON object with a key scored_chunks that contains an array of matching chunks. Each chunk includes:

- id: Identifier of the chunk.
- document_name: Name of the source document.
- page_number: Page number (if applicable).
- chunk_number: Chunk identifier.
- source: Data source.
- title: Title of the document/chunk.
- summary: Summary (if available).
- content: The chunk's content.
- type: The type/category of the chunk.
- metadata: Additional metadata.
- score: Matching score.
#### Example Response
```json
{
"scored_chunks": [
{
"id": "chunk_1",
"document_name": "Document A",
"page_number": 1,
"chunk_number": 2,
"source": "Google Drive",
"title": "Introduction",
"summary": "Overview of the topic",
"content": "Lorem ipsum dolor sit amet...",
"type": "text",
"metadata": {},
"score": 0.87
}
// ...more chunks
]
}
```
#### Error Handling
- 400 Bad Request:
If the request body fails validation, you'll receive details about the validation errors.
- 401 Unauthorized:
If the Authorization header is missing or invalid.
- 403 Forbidden:
If the API key is not associated with a valid user.
- 500 Internal Server Error:
If an unexpected error occurs.

### 🌟 Key Features

- ✅ Pre-built Google Drive integration (more integrations coming soon)
- ✅ OpenAI-powered embeddings
- ✅ Qdrant vector storage
- ✅ Automatic chunking and indexing
- ✅ Advanced retrieval with re-ranking and hybrid search
- ✅ Easy-to-use APIs for fast implementation
- ✅ Scalable and open-source
## 📄 Documentation
For more in-depth details about Dcup's features, API endpoints, and usage, check out our comprehensive documentation [dcup/docs](https://dcup.dev/docs).

## 🛠️ Quick Start Guide
### Self-host Dcup using docker compose
Expand All @@ -143,6 +38,9 @@ If an unexpected error occurs.
```bash
docker compose -f docker-compose.prod.yml --env-file .env up
```
## 🌍 Cloud Version
If you prefer a hosted solution, try the cloud version of Dcup at [app.Dcup](https://dcup.dev) . No setup required — just sign up, connect your data, and start querying.

## 💻 For Developers
Dcup is designed to be modular and flexible, allowing developers to build custom RAG pipelines effortlessly. With open-source architecture, you can contribute, customize, and scale as needed

Expand Down
23 changes: 5 additions & 18 deletions app/terms_of_service_and_privacy_policy/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,10 @@ const TermsPage: React.FC = () => {
<section className="mb-8">
<h2 className="mb-4 text-2xl font-semibold text-gray-700">1. Introduction</h2>
<p className="mb-4 text-gray-600">
Welcome to Dcup Cloud! We provide a powerful, cloud-based service that helps you transform
unstructured documents into clean, structured JSON. By using Dcup Cloud, you agree to the following
Terms of Service and acknowledge our Privacy Policy.
Welcome to Dcup, the open-source Retrieval-Augmented Generation (RAG) platform.
</p>
<p className="text-gray-600">
Your privacy and data security are our top priorities. We do not store your uploaded files or URLs.
The only data we temporarily retain is the extracted result for caching purposes, which is automatically
deleted after 24 hours.
Your privacy and data security are our top priorities.
</p>
</section>

Expand All @@ -43,9 +39,6 @@ const TermsPage: React.FC = () => {
<li className="flex items-center text-gray-600">
<Check className="mr-2 h-5 w-5 text-green-500" /> Use the service only for lawful purposes.
</li>
<li className="flex items-center text-gray-600">
<Check className="mr-2 h-5 w-5 text-green-500" /> Not attempt to exploit, reverse-engineer, or overload the system.
</li>
<li className="flex items-center text-gray-600">
<Check className="mr-2 h-5 w-5 text-green-500" /> Respect API rate limits and fair usage policies.
</li>
Expand Down Expand Up @@ -113,15 +106,12 @@ const TermsPage: React.FC = () => {
<X className="mr-2 h-5 w-5 text-red-500" /> We do <strong>NOT</strong> store your uploaded files, documents, or URLs.
</li>
<li className="flex items-center text-gray-600">
<X className="mr-2 h-5 w-5 text-red-500" /> We do <strong>NOT</strong> permanently retain any input data after processing.
</li>
<X className="mr-2 h-5 w-5 text-red-500" /> We do <strong>NOT</strong> collect, store, or share any personally identifiable information (PII) through our website or open-source platform.
</li>
<li className="flex items-center text-gray-600">
<X className="mr-2 h-5 w-5 text-red-500" /> We do <strong>NOT</strong> access or use your data beyond the requested transformation process.
</li>
</ul>
<p className="mt-3 text-gray-600">
Once processing is complete, the raw files are immediately discarded. The structured results are cached for 24 hours for efficiency and then permanently deleted.
</p>
</div>

{/* 3.3 Security Measures */}
Expand All @@ -134,9 +124,6 @@ const TermsPage: React.FC = () => {
<li className="flex items-center text-gray-600">
<Lock className="mr-2 h-5 w-5 text-blue-500" /> Secure API authentication via Bearer Token (API Key).
</li>
<li className="flex items-center text-gray-600">
<Lock className="mr-2 h-5 w-5 text-blue-500" /> Automatic deletion of cached data after 24 hours.
</li>
</ul>
</div>

Expand Down Expand Up @@ -170,7 +157,7 @@ const TermsPage: React.FC = () => {
</h2>
<p className="text-gray-600">
Due to the nature of our cloud-based services, all purchases and subscriptions are <strong>non-refundable</strong>.
Since our platform provides instant access to API-based processing and document transformation, we cannot offer refunds
Since our platform provides instant access to API-based processing, we cannot offer refunds
once a subscription or transaction is completed.
</p>
<p className="mt-3 text-gray-600">
Expand Down