diff --git a/README.md b/README.md index 049253c..8a1ccc9 100644 --- a/README.md +++ b/README.md @@ -105,3 +105,20 @@ You can also pass any native ClickHouse format name directly. |------|-------------| | `--help` | Show help text | | `--version` | Print version | + +## Agent Skill + +chcli includes an [Agent Skill](https://agentskills.io) so AI coding agents (Claude Code, Cursor, etc.) can query ClickHouse databases on your behalf. + +Install the skill: + +```bash +npx skills add obsessiondb/chcli +``` + +This gives your agent the ability to: + +- Run SQL queries against ClickHouse using `chcli` +- Explore database schemas (`SHOW TABLES`, `DESCRIBE TABLE`) +- Extract data in machine-readable formats (JSON, CSV) +- Follow best practices like using `LIMIT` and structured output formats diff --git a/skills/clickhouse-query/SKILL.md b/skills/clickhouse-query/SKILL.md new file mode 100644 index 0000000..ac6025a --- /dev/null +++ b/skills/clickhouse-query/SKILL.md @@ -0,0 +1,147 @@ +--- +name: clickhouse-query +description: Query ClickHouse databases using the chcli CLI tool. Use when the user wants to run SQL queries against ClickHouse, explore database schemas, inspect tables, or extract data from ClickHouse. +metadata: + author: obsessiondb + version: "1.0" +compatibility: Requires bun or node (for bunx/npx). Needs network access to a ClickHouse instance. +allowed-tools: Bash(bunx chcli:*) Bash(npx chcli:*) Bash(chcli:*) Read Write +--- + +# chcli — ClickHouse CLI + +chcli is a lightweight ClickHouse command-line client. Use it to run SQL queries, explore schemas, and extract data from ClickHouse databases. + +## Running chcli + +Prefer `bunx` if Bun is available, otherwise use `npx`: + +```bash +bunx chcli -q "SELECT 1" +npx chcli -q "SELECT 1" +``` + +Or install globally: + +```bash +bun install -g chcli +chcli -q "SELECT 1" +``` + +## Connection + +Set connection details via environment variables (preferred for agent use) or CLI flags. CLI flags override env vars. + +| Flag | Env Var | Default | +|------|---------|---------| +| `--host` | `CLICKHOUSE_HOST` | `localhost` | +| `--port` | `CLICKHOUSE_PORT` | `8123` | +| `-u, --user` | `CLICKHOUSE_USER` | `default` | +| `--password` | `CLICKHOUSE_PASSWORD` | *(empty)* | +| `-d, --database` | `CLICKHOUSE_DATABASE` | `default` | +| `-s, --secure` | `CLICKHOUSE_SECURE` | `false` | + +For agent workflows, prefer setting env vars in a `.env` file (Bun loads `.env` automatically) so every invocation uses the same connection without repeating flags. + +See `references/connection.md` for detailed connection examples. + +## Query Patterns + +**Inline query** (most common for agents): + +```bash +bunx chcli -q "SELECT count() FROM events" +``` + +**From a SQL file:** + +```bash +bunx chcli -f query.sql +``` + +**Via stdin pipe:** + +```bash +echo "SELECT 1" | bunx chcli +``` + +## Output Formats + +**Always use `-F json` or `-F csv` when the output will be parsed by an agent.** The default format (`pretty`) is for human display and is difficult to parse programmatically. + +```bash +# JSON — best for structured parsing +bunx chcli -q "SELECT * FROM events LIMIT 5" -F json + +# CSV — good for tabular data +bunx chcli -q "SELECT * FROM events LIMIT 5" -F csv + +# JSONL (one JSON object per line) — good for streaming/large results +bunx chcli -q "SELECT * FROM events LIMIT 100" -F jsonl +``` + +Available format aliases: `json`, `jsonl`/`ndjson`, `jsoncompact`, `csv`, `tsv`, `pretty`, `vertical`, `markdown`, `sql`. Any native ClickHouse format name also works. + +See `references/formats.md` for the full format reference. + +## Common Workflows + +### Schema Discovery + +```bash +# List all databases +bunx chcli -q "SHOW DATABASES" -F json + +# List tables in current database +bunx chcli -q "SHOW TABLES" -F json + +# List tables in a specific database +bunx chcli -q "SHOW TABLES FROM analytics" -F json + +# Describe table schema +bunx chcli -q "DESCRIBE TABLE events" -F json + +# Show CREATE TABLE statement +bunx chcli -q "SHOW CREATE TABLE events" +``` + +### Data Exploration + +```bash +# Row count +bunx chcli -q "SELECT count() FROM events" -F json + +# Sample rows +bunx chcli -q "SELECT * FROM events LIMIT 10" -F json + +# Column statistics +bunx chcli -q "SELECT uniq(user_id), min(created_at), max(created_at) FROM events" -F json +``` + +### Data Extraction + +```bash +# Extract to CSV file +bunx chcli -q "SELECT * FROM events WHERE date = '2024-01-01'" -F csv > export.csv + +# Extract as JSON +bunx chcli -q "SELECT * FROM events LIMIT 1000" -F json > export.json +``` + +## Additional Flags + +| Flag | Description | +|------|-------------| +| `-t, --time` | Print execution time to stderr | +| `-v, --verbose` | Print query metadata (format, elapsed time) to stderr | +| `--help` | Show help text | +| `--version` | Print version | + +## Best Practices for Agents + +1. **Always specify `-F json` or `-F csv`** — never rely on the default format, which varies by TTY context. +2. **Always use `LIMIT`** on SELECT queries unless you know the table is small. ClickHouse tables can contain billions of rows. +3. **Start with schema discovery** — run `SHOW TABLES` and `DESCRIBE TABLE` before querying unfamiliar databases. +4. **Use `-t` for timing** — helps gauge whether queries are efficient. +5. **Prefer env vars for connection** — set them once in `.env` rather than repeating flags on every command. +6. **Use `count()` first** — before extracting data, check how many rows match to avoid overwhelming output. diff --git a/skills/clickhouse-query/references/connection.md b/skills/clickhouse-query/references/connection.md new file mode 100644 index 0000000..3ec739c --- /dev/null +++ b/skills/clickhouse-query/references/connection.md @@ -0,0 +1,94 @@ +# Connection Configuration Reference + +chcli connects to ClickHouse over HTTP(S). Connection details can be set via environment variables or CLI flags. + +## Precedence + +CLI flags take precedence over environment variables. If neither is set, the default value is used. + +``` +CLI flag > Environment variable > Default value +``` + +## Configuration Options + +| Flag | Env Var | Default | Description | +|------|---------|---------|-------------| +| `--host ` | `CLICKHOUSE_HOST` | `localhost` | ClickHouse server hostname or IP | +| `--port ` | `CLICKHOUSE_PORT` | `8123` | HTTP interface port | +| `-u, --user ` | `CLICKHOUSE_USER` | `default` | Authentication username | +| `--password ` | `CLICKHOUSE_PASSWORD` | *(empty)* | Authentication password | +| `-d, --database ` | `CLICKHOUSE_DATABASE` | `default` | Default database for queries | +| `-s, --secure` | `CLICKHOUSE_SECURE` | `false` | Use HTTPS instead of HTTP | + +## Connection URL + +chcli constructs the connection URL as: + +``` +{protocol}://{host}:{port} +``` + +Where `protocol` is `https` if `--secure` is set or `CLICKHOUSE_SECURE=true`, otherwise `http`. + +## Examples + +### Local Development (defaults) + +No configuration needed — connects to `http://localhost:8123` with user `default`: + +```bash +bunx chcli -q "SELECT 1" +``` + +### Remote Instance via CLI Flags + +```bash +bunx chcli \ + --host ch.example.com \ + --port 8443 \ + --secure \ + --user admin \ + --password secret \ + -d analytics \ + -q "SELECT count() FROM events" +``` + +### Remote Instance via Environment Variables + +Create a `.env` file (Bun loads it automatically): + +```env +CLICKHOUSE_HOST=ch.example.com +CLICKHOUSE_PORT=8443 +CLICKHOUSE_SECURE=true +CLICKHOUSE_USER=admin +CLICKHOUSE_PASSWORD=secret +CLICKHOUSE_DATABASE=analytics +``` + +Then run queries without connection flags: + +```bash +bunx chcli -q "SELECT count() FROM events" +``` + +### ClickHouse Cloud + +ClickHouse Cloud uses HTTPS on port 8443: + +```env +CLICKHOUSE_HOST=abc123.us-east-1.aws.clickhouse.cloud +CLICKHOUSE_PORT=8443 +CLICKHOUSE_SECURE=true +CLICKHOUSE_USER=default +CLICKHOUSE_PASSWORD=your-password +``` + +### Mixed (Env Vars + Flag Override) + +Set base connection in `.env`, override database per-query: + +```bash +bunx chcli -d other_db -q "SHOW TABLES" +``` diff --git a/skills/clickhouse-query/references/formats.md b/skills/clickhouse-query/references/formats.md new file mode 100644 index 0000000..f60278e --- /dev/null +++ b/skills/clickhouse-query/references/formats.md @@ -0,0 +1,51 @@ +# Output Formats Reference + +chcli supports format aliases that map to ClickHouse native format names. Use the `-F` / `--format` flag to specify the output format. + +## Default Behavior + +- **TTY (interactive terminal):** `pretty` (PrettyCompactMonoBlock) — human-readable tables +- **Piped/redirected output:** `tsv` (TabSeparatedWithNames) — machine-friendly tab-separated values + +## Format Alias Table + +| Alias | ClickHouse Format | Description | +|-------|-------------------|-------------| +| `json` | JSON | Full JSON object with `data` array, column metadata, row count, and statistics | +| `jsonl` | JSONEachRow | One JSON object per line (newline-delimited JSON) | +| `ndjson` | JSONEachRow | Alias for `jsonl` | +| `jsoncompact` | JSONCompact | JSON with column names separate from row data (arrays instead of objects) | +| `csv` | CSVWithNames | Comma-separated values with a header row | +| `tsv` | TabSeparatedWithNames | Tab-separated values with a header row | +| `pretty` | PrettyCompactMonoBlock | Human-readable bordered table | +| `vertical` | Vertical | Each column on its own line (useful for wide rows) | +| `markdown` | Markdown | Markdown-formatted table | +| `sql` | SQLInsert | Output as SQL INSERT statements | + +Any format name not in this table is passed directly to ClickHouse, so you can use any native ClickHouse format (e.g., `Parquet`, `Arrow`, `Avro`). + +## Choosing a Format + +### For Agent/Programmatic Use + +- **`json`** — Best for structured parsing. Returns a complete JSON object: + ```json + { + "meta": [{"name": "count()", "type": "UInt64"}], + "data": [{"count()": "42"}], + "rows": 1, + "statistics": {"elapsed": 0.001, "rows_read": 100} + } + ``` +- **`jsonl`** — Best for large result sets or streaming. One JSON object per line: + ``` + {"id": 1, "name": "Alice"} + {"id": 2, "name": "Bob"} + ``` +- **`csv`** — Good for tabular data and import/export workflows. + +### For Human Display + +- **`pretty`** — Default in terminals. Bordered table layout. +- **`vertical`** — Useful when rows have many columns. Each row displayed vertically. +- **`markdown`** — Useful for embedding query results in documentation.