From 1da5b3a8dc0bef49c5181ea84750586c17b685d4 Mon Sep 17 00:00:00 2001
From: Ali Amer <76897266+aliamerj@users.noreply.github.com>
Date: Tue, 22 Apr 2025 17:32:26 +0300
Subject: [PATCH 1/3] Merge pull request #60 from aliamerj/deploy/docker
fix workflows
---
.github/workflows/deployer_1.1.0-beta.yml | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/.github/workflows/deployer_1.1.0-beta.yml b/.github/workflows/deployer_1.1.0-beta.yml
index 0ebeb90..40545e1 100644
--- a/.github/workflows/deployer_1.1.0-beta.yml
+++ b/.github/workflows/deployer_1.1.0-beta.yml
@@ -33,6 +33,11 @@ jobs:
NEXT_PUBLIC_PADDLE_PRO_PRICE_ID=${{ secrets.NEXT_PUBLIC_PADDLE_PRO_PRICE_ID }}
NEXT_PUBLIC_PADDLE_SUCCESSURL=${{ secrets.NEXT_PUBLIC_PADDLE_SUCCESSURL }}
NEXT_PUBLIC_PAYMENT=${{ secrets.NEXT_PUBLIC_PAYMENT }}
+ REDIS_DB_ADDRESS=localhost
+ REDIS_DB_PORT=6379
+ REDIS_DB_PASSWORD=
+ REDIS_DB_DATABASE=0
+ OPENAI_KEY="my key"
EOF
- name: Run Tests
From 30b9403cd6f49dc924005b62aaec94dda84ce19d Mon Sep 17 00:00:00 2001
From: Ali Amer <76897266+aliamerj@users.noreply.github.com>
Date: Tue, 22 Apr 2025 18:34:00 +0300
Subject: [PATCH 2/3] Update README.md
Signed-off-by: Ali Amer <76897266+aliamerj@users.noreply.github.com>
---
README.md | 136 +++++++-----------------------------------------------
1 file changed, 17 insertions(+), 119 deletions(-)
diff --git a/README.md b/README.md
index a7d760e..c0edc13 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,5 @@
# Dcup: The Open-Source RAG-as-a-Service Platform
-
-
@@ -13,127 +7,28 @@
-Dcup is a fully open-source, self-hostable RAG (Retrieval-Augmented Generation) pipeline designed to seamlessly connect your application to your users' data with pre-built integrations and advanced AI capabilities.
-
-## π Connected RAG
-
-With Dcup Connect, you can easily link your application to data sources like Google Drive, with more integrations coming soon.
-
-## π Future-Ready
+π Dcup is your go-to solution for building and scaling Retrieval-Augmented Generation (RAG) systems. Whether youβre a developer looking to integrate AI-driven search capabilities or a team wanting to harness data for smarter retrieval, Dcup is fully open-source, self-hostable, and built with scalability in mind.
-Advanced features such as LLM re-ranking, summary indexing, entity extraction, and hybrid search using OpenAI embeddings and Qdrant vector storage make Dcup the perfect platform for scalable, intelligent retrieval.
+## β¨ Key Features
+- **Fully Open Source & Self-Hostable:** Maintain control over your data and infrastructure.
+- **Connected RAG:** Easy-to-use integrations with Google Drive, Dropbox, AWS, and direct file uploads.
+- **Advanced Search Capabilities:** LLM re-ranking, summary indexing, hybrid search using OpenAI embeddings and Qdrant vector storage.
+- **Intuitive Retrieval API:** Seamlessly query and refine your data with optional re-ranking.
+- **Developer-Centric:** With clear documentation, easy-to-use APIs, and a modular architecture.
-## π οΈ Built for Developers
-
-Dcup provides easy-to-use APIs that get you started in minutes.
-
-# β‘οΈ How Dcup Works
+## β‘οΈ How Dcup Works
### 1οΈβ£ Ingest
-
-The first step in the RAG pipeline is data ingestion. Dcup offers simple APIs for uploading files or directly connecting to popular sources like Google Drive. With automatic syncing, your data stays up-to-date effortlessly, handling PDFs and more.
+Start by ingesting your data. Dcup offers simple APIs for uploading files or directly connecting to popular sources like Google Drive, Dropbox, and AWS. Your data stays up-to-date automatically with effortless syncing.
### 2οΈβ£ Chunk and Index
-
-Next, Dcup automatically chunks and embeds your data into vectors using OpenAI embeddings. These vectors are stored in a highly scalable Qdrant vector database. Out of the box, Dcup supports vector indexing, summary indexing, and keyword indexing for enhanced retrieval.
+Once ingested, Dcup automatically chunks and embeds your data into vectors using OpenAI embeddings. The vectors are stored in a scalable Qdrant vector database, with indexing for enhanced retrieval (vector, summary, and keyword indexing).
### 3οΈβ£ Retrieve
+The final step is retrieval. With the Dcup Retrieval API, you can query your data and refine results. Features like re-ranking, summary index, entity extraction, flexible filtering, and hybrid search (semantic + keyword) ensure high precision and relevant results for your AI applications.
-The final step is to use the Dcup Retrieval API to get relevant chunks for your semantic search queries. Built-in features like re-ranking, summary index, entity extraction, flexible filtering, and hybrid semantic and keyword search ensure highly accurate and relevant results for your AI applications.
-#### Retrieval API Documentation
-##### Endpoint
-```bash
-POST /api/retrievals
-```
-#### Description
-This API endpoint allows you to retrieve relevant chunks from your indexed documents based on a search query. The process involves expanding your query, generating embeddings, and using Qdrant to search for matching chunks. Optionally, the results can be re-ranked using cosine similarity.
-
-##### Request Body Parameters
-- **query (string, required)**: The search query. Must be at least 2 characters long.
-- **top_chunk (number, optional)**: The number of top results to return. Default is 5.
-- **filter (object, optional)**: A filter object to narrow down results.
-- **rerank (boolean, optional)**: Set to true to enable re-ranking of results based on similarity. Defaults to false.
-- **min_score_threshold (number, optional)**: Minimum score threshold for filtering results.
-#### Example Request
-```json
-{
- "query": "example search query",
- "top_chunk": 5,
- "filter": {
- "field": "value"
- },
- "rerank": true,
- "min_score_threshold": 0.5
-}
-```
-#### Note:
-Include an Authorization header with your API key in the format:
-Authorization: Bearer YOUR_API_KEY
-
-#### How It Works
-- Query Expansion & Embedding:
-The API expands your query and generates embeddings using OpenAI.
-- Search & Filter:
-Qdrant searches the indexed vectors. You can use a filter to refine the search.
-- Re-ranking (Optional):
-If enabled, the API generates a hypothetical answer, calculates its embedding, and re-ranks the chunks by cosine similarity.
-- Response:
-The API returns the top matching chunks, each containing metadata like document name, page number, content, and score.
-
-#### Response Format
-A successful response returns a JSON object with a key scored_chunks that contains an array of matching chunks. Each chunk includes:
-
-- id: Identifier of the chunk.
-- document_name: Name of the source document.
-- page_number: Page number (if applicable).
-- chunk_number: Chunk identifier.
-- source: Data source.
-- title: Title of the document/chunk.
-- summary: Summary (if available).
-- content: The chunk's content.
-- type: The type/category of the chunk.
-- metadata: Additional metadata.
-- score: Matching score.
-#### Example Response
-```json
-{
- "scored_chunks": [
- {
- "id": "chunk_1",
- "document_name": "Document A",
- "page_number": 1,
- "chunk_number": 2,
- "source": "Google Drive",
- "title": "Introduction",
- "summary": "Overview of the topic",
- "content": "Lorem ipsum dolor sit amet...",
- "type": "text",
- "metadata": {},
- "score": 0.87
- }
- // ...more chunks
- ]
-}
-```
-#### Error Handling
-- 400 Bad Request:
-If the request body fails validation, you'll receive details about the validation errors.
-- 401 Unauthorized:
-If the Authorization header is missing or invalid.
-- 403 Forbidden:
-If the API key is not associated with a valid user.
-- 500 Internal Server Error:
-If an unexpected error occurs.
-
-### π Key Features
-
-- β
Pre-built Google Drive integration (more integrations coming soon)
-- β
OpenAI-powered embeddings
-- β
Qdrant vector storage
-- β
Automatic chunking and indexing
-- β
Advanced retrieval with re-ranking and hybrid search
-- β
Easy-to-use APIs for fast implementation
-- β
Scalable and open-source
+## π Documentation
+For more in-depth details about Dcup's features, API endpoints, and usage, check out our comprehensive documentation [dcup/docs](https://dcup.dev/docs).
## π οΈ Quick Start Guide
### Self-host Dcup using docker compose
@@ -143,6 +38,9 @@ If an unexpected error occurs.
```bash
docker compose -f docker-compose.prod.yml --env-file .env up
```
+## π Cloud Version
+If you prefer a hosted solution, try the cloud version of Dcup at [app.Dcup](https://dcup.dev) . No setup required β just sign up, connect your data, and start querying.
+
## π» For Developers
Dcup is designed to be modular and flexible, allowing developers to build custom RAG pipelines effortlessly. With open-source architecture, you can contribute, customize, and scale as needed
From a25b10637afee87f465f4d2bdae1884fbb39114c Mon Sep 17 00:00:00 2001
From: aliamerj
Date: Tue, 22 Apr 2025 19:42:43 +0300
Subject: [PATCH 3/3] update terms_of_service_and_privacy_policy
---
.../page.tsx | 23 ++++---------------
1 file changed, 5 insertions(+), 18 deletions(-)
diff --git a/app/terms_of_service_and_privacy_policy/page.tsx b/app/terms_of_service_and_privacy_policy/page.tsx
index 5af3250..2d372f6 100644
--- a/app/terms_of_service_and_privacy_policy/page.tsx
+++ b/app/terms_of_service_and_privacy_policy/page.tsx
@@ -18,14 +18,10 @@ const TermsPage: React.FC = () => {
1. Introduction
- Welcome to Dcup Cloud! We provide a powerful, cloud-based service that helps you transform
- unstructured documents into clean, structured JSON. By using Dcup Cloud, you agree to the following
- Terms of Service and acknowledge our Privacy Policy.
+ Welcome to Dcup, the open-source Retrieval-Augmented Generation (RAG) platform.
- Your privacy and data security are our top priorities. We do not store your uploaded files or URLs.
- The only data we temporarily retain is the extracted result for caching purposes, which is automatically
- deleted after 24 hours.
+ Your privacy and data security are our top priorities.
@@ -43,9 +39,6 @@ const TermsPage: React.FC = () => {
Use the service only for lawful purposes.
-
- Not attempt to exploit, reverse-engineer, or overload the system.
-
Respect API rate limits and fair usage policies.
@@ -113,15 +106,12 @@ const TermsPage: React.FC = () => {
We do NOT store your uploaded files, documents, or URLs.
- We do NOT permanently retain any input data after processing.
-
+ We do NOT collect, store, or share any personally identifiable information (PII) through our website or open-source platform.
+
We do NOT access or use your data beyond the requested transformation process.
-
- Once processing is complete, the raw files are immediately discarded. The structured results are cached for 24 hours for efficiency and then permanently deleted.
-
{/* 3.3 Security Measures */}
@@ -134,9 +124,6 @@ const TermsPage: React.FC = () => {
Due to the nature of our cloud-based services, all purchases and subscriptions are non-refundable.
- Since our platform provides instant access to API-based processing and document transformation, we cannot offer refunds
+ Since our platform provides instant access to API-based processing, we cannot offer refunds
once a subscription or transaction is completed.