A verifiable ESG reporting platform that connects investors and suppliers through a token-gated upload system. Suppliers submit sustainability CSV reports which are analyzed by a Gemini-powered AI agent and permanently pinned to IPFS.
- Overview
- Tech Stack
- System Architecture
- Design Decisions
- Getting Started
- User Flows
- Project Structure
- Database Schema
- API Reference
- AI Analysis Pipeline
- Authentication and Authorization
- Data Integrity and IPFS
- Frontend Architecture
- Scripts
Aletheia solves the trust problem in ESG reporting. Investors generate one-time upload links and share them with suppliers. Suppliers upload their sustainability data through that link, triggering automatic AI analysis and immutable on-chain storage. Every report is SHA-256 hashed, pinned to IPFS via Lighthouse, and analyzed against GRI (Global Reporting Initiative) standards.
The core problem this addresses: ESG reports are often self-reported, unverified, and difficult to audit. Aletheia makes reports tamper-evident by hashing each file before storage and permanently anchoring it to IPFS, while the AI layer ensures consistent evaluation against recognized sustainability frameworks.
| Category | Technology |
|---|---|
| Framework | Next.js 16 (App Router) |
| Language | TypeScript |
| Styling | Tailwind CSS v4, shadcn/ui |
| Authentication | Better Auth (Google OAuth 2.0) |
| Database | Neon (serverless Postgres) |
| ORM | Drizzle ORM |
| AI | Google Gemini 2.5 Flash via LangChain |
| IPFS | Lighthouse Web3 SDK |
| State | Zustand (persisted via localStorage) |
| Charts | Chart.js, react-chartjs-2 |
| Animation | Framer Motion, GSAP, Lenis |
The system is a monolithic Next.js application that combines the frontend, backend API routes, and AI agent into a single deployable unit. External services (Neon, Google OAuth, Gemini, Lighthouse) are accessed through server-side API routes only — no credentials are exposed to the client.
Client (Browser)
|
| HTTPS
v
Next.js App (Vercel / Node)
|
|-- /app React Server and Client Components
|-- /app/api Route Handlers (server-side only)
|
|-- Better Auth <--> Neon (auth tables: user, session, account)
|-- Drizzle ORM <--> Neon (app tables: suppliers, investors, upload_links, csv_uploads)
|-- LangChain <--> Google Gemini 2.5 Flash
|-- Lighthouse <--> IPFS Network
This is the most complex flow in the system. The following describes what happens when a supplier submits a file:
Browser
|
| POST /api/upload/[token] (multipart/form-data)
v
Route Handler: app/api/upload/[token]/route.ts
|
| 1. Verify session via Better Auth (auth.api.getSession)
| 2. Validate token exists in upload_links and is status=pending
| 3. Validate file type (.csv), size (<= 10MB)
| 4. Compute SHA-256 hash of file buffer
| 5. Parse CSV text into structured JSON array
| 6. Insert row into csv_uploads (parsedData, fileName, fileHash, fileSize)
| 7. Mark upload_links.status = "used", set uploadId FK
|
| Parallel async operations:
|
| [A] Lighthouse SDK
| upload(buffer, apiKey) --> IPFS Network
| <-- { cid, url }
| UPDATE csv_uploads SET ipfsCid, ipfsUrl
|
| [B] runAutoAnalysis(csvContext)
| ChatGoogleGenerativeAI.withStructuredOutput(ZodSchema)
| --> Google Gemini API
| <-- { summary, esgScore, charts, merits, improvements, griCompliance, recommendations }
| UPDATE csv_uploads SET analysisResult
|
v
Response: { success, upload, autoAnalysis }
Browser
|
| POST /api/chat { message, pdfUploadId, threadId }
v
Route Handler: app/api/chat/route.ts
|
| 1. Verify session
| 2. Fetch csv_uploads row by pdfUploadId
| 3. Reconstruct CSV string from parsedData (jsonb)
| 4. createCsvAgent(csvContext)
| --> ChatGoogleGenerativeAI (gemini-2.5-flash, temp 0.7)
| --> System prompt injected with full CSV data
| --> Tools: generate_chart, analyze_sustainability_report, generate_overview_charts
|
| 5. agent.invoke({ messages: [HumanMessage(message)] }, { configurable: { thread_id } })
| --> Google Gemini API (multi-turn, tool-calling)
| Agent may call tools (ReAct loop):
| - generate_chart --> returns ChartConfig JSON
| - analyze_report --> returns ReportAnalysis JSON
|
| 6. Extract final message content + any tool results
v
Response: { message, chartConfig?, reportAnalysis?, threadId }
Browser: /sign-in
|
| User selects role (investor | supplier)
| User clicks "Continue with Google"
|
| signIn.social({ provider: "google", callbackURL: "/[role]/dashboard?role=[role]" })
|
v
Google OAuth Consent Screen
|
v
/api/auth/[...all] (Better Auth handler)
|
| - Exchange OAuth code for tokens
| - Upsert user row in `user` table
| - Create session, set signed cookie (better-auth.session_token)
|
v
Redirect to callbackURL: /investor/dashboard?role=investor
|
v
Dashboard page (client)
|
| useEffect reads ?role= param
| POST /api/role { role: "investor" }
| --> Upsert into investors table (onConflictDoNothing)
| router.replace(...) strips ?role= from URL
Every request passes through middleware.ts before reaching a page or API route. The middleware performs a lightweight check using the session cookie (no database round-trip):
Request arrives
|
| pathname.startsWith("/api/") --> allow (API routes self-authenticate)
| sessionCookie present?
| YES + pathname = "/sign-in" --> redirect "/"
| NO + pathname not public --> redirect "/sign-in?callbackUrl=[pathname]"
| otherwise --> allow
Public routes: / (landing page)
Auth routes: /sign-in
All other routes require authentication.
Upload links are single-use tokens (48-char hex, 192 bits of entropy). Once a supplier uploads a file, the link status changes to used and cannot be submitted again. This is enforced at the API level by checking status = "pending" before accepting any upload. This design prevents:
- Suppliers overwriting or contradicting their own submitted data
- Investors being unable to determine the canonical version of a report
- Any mutation of data after the investor has begun analysis
Every csv_uploads row is inserted with uploadLocked = true. This column exists as a structural signal: no update path in the codebase sets this to false. Combined with the link status check, the system guarantees that once a CSV is submitted it is never replaced or deleted through the application.
Storing CSVs only in Postgres would make the data mutable by anyone with database access. By pinning to IPFS via Lighthouse:
- The CID (content identifier) is computed from the file content itself. If the file changes, the CID changes. Storing the CID alongside the file is therefore a tamper-detection mechanism.
- The IPFS URL gives investors an external, auditable reference to the exact file submitted.
- Even if the application database were compromised, the IPFS pins remain accessible independently.
The file is hashed on the server before Lighthouse receives it. This serves a different purpose from the IPFS CID: it allows the hash to be stored in Postgres and compared against a file independently of IPFS availability. Investors or auditors can download the IPFS file, hash it locally, and compare against fileHash to verify authenticity.
Rather than prompting Gemini to return free-form text and parsing it, runAutoAnalysis() uses LangChain's withStructuredOutput() with a Zod schema. This enforces a guaranteed JSON structure at the model level. The tradeoff is slightly higher latency, but eliminates parsing failures and the need for retry logic when the model returns malformed JSON.
The GET /api/links endpoint determines the caller's role before returning data:
- If the caller's email appears in the
investorstable, it returns links whereinvestorUserIdmatches the session. - If the caller appears in the
supplierstable, it returns links wheresupplierUserIdmatches the session.
This means a supplier cannot see links they were not assigned to, and an investor cannot see other investors' links. The role check is done via a database query because Better Auth's user table does not store application-level roles.
Each upload session generates a UUID as a threadId. This is passed with every chat request and used as the LangChain thread_id configurable. The agent retains conversation history within a thread, allowing follow-up questions to reference previous answers within the same analysis session.
After a successful CSV upload on the generic /upload page, the upload ID, thread ID, analysis result, and parsed CSV data are stored in Zustand with localStorage persistence. This allows the user to refresh the page without losing the analysis view. The store is reset explicitly via the "Reset" button.
- Node.js 18 or later
- pnpm
- A Neon database instance
- Google OAuth credentials (Client ID and Secret)
- Google Generative AI API key
- Lighthouse API key
git clone <repository-url>
cd aletheia
pnpm installCreate a .env file in the root directory:
# Application
NEXT_PUBLIC_APP_URL=http://localhost:3000
# Database
DATABASE_URL=your_neon_connection_string
# Google OAuth (Better Auth)
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret
# Better Auth
BETTER_AUTH_SECRET=your_secret_key
# Google Generative AI
GOOGLE_GENERATIVE_AI_API_KEY=your_gemini_api_key
# Lighthouse (IPFS)
LIGHTHOUSE_API_KEY=your_lighthouse_api_keyFor Google OAuth, set the authorized redirect URI in the Google Cloud Console to:
http://localhost:3000/api/auth/callback/google
Push the schema to your Neon database:
pnpm db:pushTo seed initial data:
pnpm db:seedTo open Drizzle Studio for database inspection:
pnpm db:studiopnpm devThe application will be available at http://localhost:3000.
- Navigate to
/sign-in, select the Investor role, and authenticate with Google. - On the dashboard at
/investor/dashboard, click "Create Upload Link" to generate a one-time token. - Copy the link URL and share it with a supplier out of band (email, Slack, etc.).
- Once the supplier uploads their report, the link status changes to "Uploaded".
- Click "View Report" on any completed link to open the full analysis at
/upload/[token]. - The report view includes the AI-generated ESG score, charts, GRI compliance breakdown, and an interactive chat interface.
- Navigate to
/sign-in, select the Supplier role, and authenticate with Google. - Open the link received from an investor (e.g.,
https://app.com/upload/abc123...). - Select a
.csvfile from your local machine (max 10MB). - Click "Upload and Analyze" — the system validates, hashes, stores, pins to IPFS, and runs AI analysis.
- The report view opens immediately with the full analysis result.
- The dashboard at
/supplier/dashboardshows all assigned links and their submission status.
.
├── app/
│ ├── page.tsx # Landing page (public)
│ ├── layout.tsx # Root layout with font and global providers
│ ├── globals.css # Global styles and CSS variables
│ ├── sign-in/page.tsx # Authentication page with role selection
│ ├── investor/
│ │ ├── profile/page.tsx # Investor profile (identity, account details)
│ │ └── dashboard/page.tsx # Upload link creation and management table
│ ├── supplier/
│ │ ├── profile/page.tsx # Supplier profile (same structure, indigo accent)
│ │ └── dashboard/page.tsx # Submitted report history
│ ├── upload/
│ │ ├── layout.tsx # Upload section layout wrapper
│ │ ├── page.tsx # Generic upload interface (direct access)
│ │ └── [token]/
│ │ ├── layout.tsx # Token upload layout
│ │ └── page.tsx # Token-gated supplier upload page
│ └── api/
│ ├── auth/[...all]/ # Better Auth catch-all handler
│ ├── role/route.ts # POST: register role, GET: lookup role
│ ├── links/
│ │ ├── route.ts # POST: create link, GET: list links
│ │ └── [token]/route.ts # GET: fetch single link + associated report
│ ├── upload/
│ │ ├── route.ts # POST: direct CSV upload (no token)
│ │ └── [token]/route.ts # POST: token-gated upload + IPFS + AI analysis
│ └── chat/route.ts # POST: invoke AI agent with user message
│
├── core/
│ ├── db/
│ │ ├── index.ts # Drizzle database instance (Neon HTTP driver)
│ │ ├── schema.ts # Application table definitions
│ │ └── seed.ts # Database seed script
│ ├── agent/
│ │ └── pdf-agent.ts # createCsvAgent, invokeCsvAgent, runAutoAnalysis
│ └── tool/
│ ├── chart-generator.ts # LangChain tool: generate_chart
│ ├── auto-chart-generator.ts # LangChain tool: generate_overview_charts
│ ├── report-analyzer.ts # LangChain tool: analyze_sustainability_report
│ └── csv-loader.ts # CSV text-to-JSON parser utility
│
├── components/
│ ├── AnalysisDashboard.tsx # ESG score ring, charts, merits, GRI compliance
│ ├── ChatInterface.tsx # Scrollable chat panel with message history
│ ├── CsvPreview.tsx # Paginated, scrollable raw CSV data table
│ ├── UploadForm.tsx # Drag-and-drop CSV upload form (generic)
│ ├── ConditionalLayout.tsx # Suppresses landing-page Navbar/Footer on app routes
│ ├── ChartRenderer.tsx # Renders Chart.js configs returned by the agent
│ ├── landingpage/
│ │ ├── Navbar.tsx
│ │ ├── HeroScroll.tsx
│ │ ├── Features.tsx
│ │ ├── HowItWorks.tsx
│ │ ├── ValueCreated.tsx
│ │ ├── Footer.tsx
│ │ ├── ScrollProgress.tsx
│ │ └── LenisProvider.tsx # Lenis smooth scroll context
│ └── ui/ # shadcn/ui: Avatar, Badge, Button, Card, Input, Separator
│
├── lib/
│ ├── auth.ts # Better Auth server config (providers, session, adapter)
│ ├── auth-client.ts # createAuthClient export (useSession, signIn, signOut)
│ ├── db/auth-schema.ts # Better Auth managed tables
│ └── utils.ts # cn() helper (clsx + tailwind-merge)
│
├── store/
│ └── uploadStore.ts # Zustand store: pdfUploadId, threadId, autoAnalysis, csvData
│
├── types/
│ └── index.ts # Shared interfaces: CsvUploadResponse, AutoAnalysisResult,
│ # ChatMessage, ChartConfig, AgentResponse, Supplier, Investor
│
├── middleware.ts # Cookie-based route protection
├── drizzle.config.ts # Drizzle Kit configuration
├── migrate.ts # Standalone schema migration script
└── migrate-schema.ts # Standalone Better Auth schema migration
The application uses two separate sets of tables managed separately by separate Drizzle configurations.
suppliers
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | uuid | PK, default | Auto-generated primary key |
| name | text | NOT NULL | Full name sourced from Google profile |
| text | UNIQUE | Used as the role identity key | |
| created_at | timestamp | DEFAULT NOW() | Account creation timestamp |
investors
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | uuid | PK, default | Auto-generated primary key |
| text | UNIQUE | Used as the role identity key |
upload_links
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | uuid | PK, default | Auto-generated primary key |
| token | text | UNIQUE | 48-char hex random token (24 bytes via crypto.randomBytes) |
| investor_user_id | text | NOT NULL | References Better Auth user.id |
| supplier_user_id | text | Set when a supplier uses the link | |
| status | text | DEFAULT pending | pending or used |
| upload_id | uuid | FK to csv_uploads.id, set on successful upload |
|
| created_at | timestamp | DEFAULT NOW() | Link creation timestamp |
csv_uploads
| Column | Type | Constraints | Description |
|---|---|---|---|
| id | uuid | PK, default | Auto-generated primary key |
| supplier_id | uuid | FK suppliers | Domain-level supplier reference |
| investor_id | uuid | FK investors | Domain-level investor reference |
| investor_user_id | text | Auth-level investor user ID | |
| supplier_user_id | text | Auth-level supplier user ID | |
| link_token | text | The token used to submit this upload | |
| parsed_data | jsonb | NOT NULL | Full CSV as JSON array of row objects |
| file_name | text | NOT NULL | Original filename as submitted |
| file_hash | text | NOT NULL | SHA-256 hex digest of the raw file buffer |
| file_size | integer | NOT NULL | Size in bytes |
| ipfs_cid | text | Lighthouse IPFS content identifier | |
| ipfs_url | text | Public IPFS gateway URL for the pinned file | |
| analysis_result | jsonb | Full AutoAnalysisResult object from Gemini |
|
| upload_locked | boolean | DEFAULT true | Structural immutability flag |
| uploaded_at | timestamp | DEFAULT NOW() | Upload timestamp |
| anomaly_score | integer | DEFAULT 0 | Reserved field for future anomaly detection |
Managed by Better Auth. Not modified by application code.
| Table | Purpose |
|---|---|
user |
Core user identity (id, name, email, emailVerified, image) |
session |
Active sessions (token, expiresAt, userId) |
account |
OAuth account links (providerId, accessToken, etc.) |
verification |
Email verification tokens (unused in current config) |
user (Better Auth)
|
|-- investors (email match, email is the FK bridge)
| |-- upload_links (investorUserId = user.id)
| |-- csv_uploads (via uploadId)
|
|-- suppliers (email match)
|-- upload_links (supplierUserId = user.id)
|-- csv_uploads (supplierId FK)
Registers the authenticated user into the investor or supplier table.
Request body:
{ "role": "investor" | "supplier" }Response:
{ "success": true, "role": "investor" }Uses onConflictDoNothing on the email column so repeated calls (e.g., repeated sign-ins) are idempotent.
Returns the current user's registered role.
Response:
{ "success": true, "role": "investor" | "supplier" | null }Creates a new one-time upload link. Investor-only by convention (no server-side role check; the link is owned by whoever calls it).
Response:
{
"success": true,
"link": { "id", "token", "uploadUrl", "status", "createdAt" }
}Returns all links associated with the current user. The server determines role by checking the investors table.
- Investors receive links where
investorUserIdmatches their session. - Suppliers receive links where
supplierUserIdmatches their session. - Used links include a
reportobject with file metadata.
Returns a single link by token, including the full report and analysis result if uploaded.
Accepts a multipart form upload from a supplier. Expects a .csv file attached as file. Performs the full pipeline: validate, hash, parse, store, pin to IPFS, run AI analysis.
Response:
{
"success": true,
"upload": { "id", "fileName", "fileHash", "fileSize", "uploadedAt" },
"autoAnalysis": { "summary", "esgScore", "charts", "analysis" }
}Invokes the LangChain agent with a user message in the context of a specific upload.
Request body:
{ "message": "string", "pdfUploadId": "string", "threadId": "string" }Response:
{
"message": "string",
"chartConfig": { ... },
"reportAnalysis": { ... },
"threadId": "string"
}When a file is submitted via POST /api/upload/[token], runAutoAnalysis() is called immediately. It uses ChatGoogleGenerativeAI.withStructuredOutput() bound to a Zod schema, which instructs Gemini to return a strictly typed JSON object rather than free-form text.
The Zod schema enforces:
summary— string, 1–3 sentence executive summaryesgScore— integer 0–100charts— array of 2–3 chart definitions (type, title, labels, datasets, insight)merits— 3–6 strings identifying reporting strengthsimprovements— 3–6 strings identifying weaknessesgriCompliance.score— integer 0–100griCompliance.coveredStandards— GRI standard codes found in the datagriCompliance.missingStandards— GRI standard codes absent from the datarecommendations— 3–5 actionable improvement strings
If the analysis fails (model error, timeout, malformed content), the upload is not rolled back. A fallback AutoAnalysisResult with zero scores and empty arrays is returned so the upload still succeeds and the interface does not break.
| Standard | Topic |
|---|---|
| GRI 302 | Energy consumption |
| GRI 303 | Water withdrawal |
| GRI 305 | GHG emissions (Scope 1/2/3) |
| GRI 306 | Waste generation |
| GRI 401 | Employment |
| GRI 403 | Occupational safety |
| GRI 404 | Training and education |
| GRI 405 | Diversity and inclusion |
The chat agent is built with LangChain's createAgent() using the ReAct (Reasoning + Acting) pattern. The full CSV data is embedded in the system prompt for every request, giving the model complete context without vector search or retrieval.
Agent tools:
| Tool Name | Input | Output |
|---|---|---|
generate_chart |
Chart type, title, labels, dataset values | Chart.js config JSON |
generate_overview_charts |
Automatic based on CSV context | Array of Chart.js configs |
analyze_sustainability_report |
CSV context | GRI compliance analysis |
The threadId is passed as a LangGraph configurable parameter, enabling multi-turn conversation memory within a session.
Better Auth manages sessions using signed HTTP-only cookies. Sessions expire after 7 days and are refreshed if they are older than 24 hours. The session cookie name is better-auth.session_token (or __Secure-better-auth.session_token in production with HTTPS).
Aletheia uses a bespoke role system rather than a column on the auth user table. On first sign-in, the selected role is registered by upserting a row into either investors or suppliers with the user's email. Role lookup is done by querying these tables.
This design means:
- A user with a given Google account can only be registered as one role (first registration wins due to the email unique constraint and
onConflictDoNothing). - Switching roles requires manual database intervention.
- Role detection is done on the server inside each API route that needs it, not in middleware, which avoids a database call on every request.
middleware.ts uses a cookie presence check (not a database call) to determine authentication status. This is a lightweight guard to redirect unauthenticated users. Actual authorization (checking session validity and user permissions) is done inside each API route handler by calling auth.api.getSession({ headers: request.headers }).
Once a CSV is submitted:
- The
upload_links.statusis set toused. No further uploads are accepted for that token. - The
csv_uploads.uploadLockedcolumn is set totrueon insertion and never updated. - No delete routes exist for CSV records in the public API.
The SHA-256 hash stored in csv_uploads.fileHash allows independent verification:
sha256sum report.csv
# Compare output to the fileHash value in the databaseFiles pinned via Lighthouse remain on IPFS as long as the Lighthouse account maintains the pin. The IPFS CID is content-addressed, meaning any change to the file produces a completely different CID, making substitution detectable.
ConditionalLayout.tsx wraps the root layout and checks the current pathname. On routes under /upload, /investor, /supplier, and /sign-in, it suppresses the landing-page Navbar and Footer. This avoids duplicating layout logic across every app route while keeping the landing page visually distinct.
| Scope | Tool | Persistence |
|---|---|---|
| Server session data | Better Auth / React | Cookie |
| Upload analysis state | Zustand | localStorage |
| Local UI state | React useState | None |
The Zustand store (uploadStore.ts) holds the state needed after a successful upload: the upload ID (for chat), the thread ID, the auto-analysis result, and the parsed CSV structure. All values are persisted to localStorage so a page refresh does not lose the active analysis session.
AnalysisDashboardis a pure presentational component that receives anAutoAnalysisResultand renders the ESG score, charts viaChartRenderer, and the analysis breakdown.ChatInterfacemanages its own local message state and makes fetch calls to/api/chat. It receivespdfUploadIdandthreadIdas props.CsvPreviewreceives pre-parsedheadersandrowsarrays and handles its own pagination state.
- Lenis provides smooth native scroll behavior and is configured in
LenisProvider.tsxas a React context wrapping the landing page. - GSAP handles timeline-based scroll animations on the landing page sections.
- Framer Motion handles page-level entrance animations and transitions on dashboard and profile pages.
| Script | Description |
|---|---|
pnpm dev |
Start development server with hot reload |
pnpm build |
Build the production bundle |
pnpm start |
Start the production server |
pnpm lint |
Run ESLint across the codebase |
pnpm db:generate |
Generate Drizzle migration files from schema changes |
pnpm db:push |
Push schema directly to the database (no migrations) |
pnpm db:studio |
Open Drizzle Studio browser UI for database inspection |
pnpm db:seed |
Run the seed script to populate initial data |