Skip to content

Security: jaywyawhare/GigaVector

Security

docs/security.md

GigaVector Security Guide

Comprehensive security best practices and guidelines for GigaVector deployments.

Table of Contents

  1. Security Overview
  2. API Key Management
  3. Data Protection
  4. Network Security
  5. Access Control
  6. Secure Coding Practices
  7. Vulnerability Management
  8. Compliance Considerations
  9. Security Checklist

Security Overview

GigaVector handles sensitive data including:

  • User conversations and memories
  • API keys for LLM services
  • Vector embeddings (may contain PII)
  • Metadata with potentially sensitive information

This guide provides best practices for securing GigaVector deployments.


API Key Management

Secure Storage

Never:

  • Hardcode API keys in source code
  • Commit keys to version control
  • Log API keys in application logs
  • Transmit keys over unencrypted channels

Best Practices:

Environment Variables

# Use .env file (not committed to git)
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

# Or use systemd environment file
# /etc/gigavector/environment
OPENAI_API_KEY=sk-...

Secret Management Services

AWS Secrets Manager:

// Retrieve from AWS Secrets Manager
char *api_key = get_secret_from_aws("gigavector/openai-key");

HashiCorp Vault:

// Retrieve from Vault
char *api_key = get_secret_from_vault("secret/data/gigavector/openai");

Kubernetes Secrets:

apiVersion: v1
kind: Secret
metadata:
  name: gigavector-secrets
type: Opaque
stringData:
  openai-api-key: sk-...

Key Rotation

Schedule:

  • Rotate keys every 90 days
  • Rotate immediately if compromised
  • Use key versioning for zero-downtime rotation

Implementation:

// Support multiple keys during rotation
GV_LLMConfig config = {
    .api_key = get_current_api_key(),  // Try primary
    // Fallback to secondary if primary fails
};

Key Validation

GigaVector validates API key formats:

// Automatic validation on creation
GV_LLM *llm = gv_llm_create(&config);
if (llm == NULL) {
    // Key format invalid or other error
    const char *error = gv_llm_get_last_error(llm);
    log_security_event("Invalid API key format");
}

Data Protection

Encryption at Rest

Filesystem Encryption:

# Use LUKS for Linux
cryptsetup luksFormat /dev/sdb1
cryptsetup luksOpen /dev/sdb1 encrypted_volume

# Use BitLocker for Windows
manage-bde -on C: -RecoveryPassword

Application-Level Encryption:

For highly sensitive data, encrypt before storage:

// Encrypt sensitive metadata before storing
char *encrypted_metadata = encrypt_aes256(metadata, encryption_key);
gv_db_add_vector_with_metadata(db, vector, dim, "encrypted_data", encrypted_metadata);

Encryption in Transit

TLS/SSL Requirements:

  • Minimum TLS 1.2
  • Prefer TLS 1.3
  • Use strong cipher suites
  • Validate certificates

LLM API Calls:

GigaVector uses HTTPS for all external API calls:

  • OpenAI: https://api.openai.com (TLS 1.2+)
  • Anthropic: https://api.anthropic.com (TLS 1.2+)
  • Google: https://generativelanguage.googleapis.com (TLS 1.2+)

Internal Communications:

// Use TLS for internal API
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, 1L);
curl_easy_setopt(curl, CURLOPT_SSL_VERIFYHOST, 2L);
curl_easy_setopt(curl, CURLOPT_CAINFO, "/etc/ssl/certs/ca-certificates.crt");

Memory Protection

Secure Memory Clearing:

GigaVector securely clears API keys from memory:

// API keys are cleared using secure_memclear()
void gv_llm_destroy(GV_LLM *llm) {
    if (llm->config.api_key) {
        secure_memclear(llm->config.api_key, strlen(llm->config.api_key));
        free(llm->config.api_key);
    }
}

Memory Locking:

For sensitive data, consider locking pages in memory:

#include <sys/mman.h>

// Lock sensitive memory pages
mlock(sensitive_data, data_size);
// ... use data ...
munlock(sensitive_data, data_size);

Data Sanitization

Input Validation:

// Validate and sanitize user input
int validate_conversation(const char *conversation) {
    if (conversation == NULL) return 0;
    
    size_t len = strlen(conversation);
    if (len > MAX_CONVERSATION_LENGTH) return 0;
    
    // Check for injection attempts
    if (strstr(conversation, "<script>") != NULL) return 0;
    
    return 1;
}

Output Encoding:

// Escape user data in outputs
char *escaped = json_escape_string(user_input);
printf("{\"data\": \"%s\"}", escaped);
free(escaped);

Network Security

Firewall Configuration

Restrict Access:

# Allow only necessary ports
ufw allow 22/tcp    # SSH
ufw allow 443/tcp   # HTTPS
ufw deny all

Application Firewall:

// Whitelist allowed IPs
int is_allowed_ip(const char *client_ip) {
    const char *allowed_ips[] = {"10.0.0.0/8", "192.168.1.0/24", NULL};
    return check_ip_whitelist(client_ip, allowed_ips);
}

VPN and Private Networks

Use VPN for:

  • Internal service communications
  • Database access
  • Administrative operations

Private Networks:

  • Deploy GigaVector on private subnets
  • Use load balancers for public access
  • Implement network segmentation

DDoS Protection

Rate Limiting:

// Implement rate limiting
typedef struct {
    time_t window_start;
    int request_count;
    int max_requests;
} RateLimiter;

int check_rate_limit(RateLimiter *limiter) {
    time_t now = time(NULL);
    if (now - limiter->window_start > 60) {
        limiter->window_start = now;
        limiter->request_count = 0;
    }
    if (limiter->request_count >= limiter->max_requests) {
        return 0;  // Rate limit exceeded
    }
    limiter->request_count++;
    return 1;
}

Connection Limits:

// Limit concurrent connections
sem_t connection_semaphore;
sem_init(&connection_semaphore, 0, MAX_CONCURRENT_CONNECTIONS);

sem_wait(&connection_semaphore);
// Process request
sem_post(&connection_semaphore);

Access Control

Authentication

API Key Authentication:

// Validate API keys
int authenticate_request(const char *provided_key) {
    const char *valid_key = get_api_key_from_secure_storage();
    if (valid_key == NULL || strcmp(provided_key, valid_key) != 0) {
        log_security_event("Invalid API key attempt");
        return 0;
    }
    return 1;
}

Token-Based Authentication:

// JWT token validation
int validate_jwt_token(const char *token) {
    // Verify signature
    // Check expiration
    // Validate claims
    return jwt_verify(token, public_key);
}

Authorization

Role-Based Access Control (RBAC):

typedef enum {
    ROLE_READ_ONLY,
    ROLE_READ_WRITE,
    ROLE_ADMIN
} UserRole;

int check_permission(UserRole role, const char *operation) {
    if (role == ROLE_ADMIN) return 1;
    if (role == ROLE_READ_WRITE && strcmp(operation, "write") == 0) return 1;
    if (strcmp(operation, "read") == 0) return 1;
    return 0;
}

Resource-Level Permissions:

// Check if user can access specific database
int can_access_database(const char *user_id, const char *db_id) {
    // Check user's database permissions
    return check_user_permission(user_id, db_id);
}

Audit Logging

Security Events:

void log_security_event(const char *event_type, const char *details) {
    time_t now = time(NULL);
    char timestamp[64];
    strftime(timestamp, sizeof(timestamp), "%Y-%m-%d %H:%M:%S", localtime(&now));
    
    FILE *log = fopen("/var/log/gigavector/security.log", "a");
    fprintf(log, "[%s] [SECURITY] %s: %s\n", timestamp, event_type, details);
    fclose(log);
}

// Log authentication failures
if (!authenticate_request(api_key)) {
    log_security_event("AUTH_FAILURE", "Invalid API key");
    return 401;
}

Secure Coding Practices

Input Validation

Always validate:

  • Vector dimensions
  • Metadata keys and values
  • Conversation lengths
  • API parameters
int validate_vector_input(const float *data, size_t dimension) {
    if (data == NULL) return 0;
    if (dimension == 0 || dimension > MAX_DIMENSION) return 0;
    
    // Check for NaN or Inf
    for (size_t i = 0; i < dimension; i++) {
        if (!isfinite(data[i])) return 0;
    }
    return 1;
}

Buffer Overflow Prevention

GigaVector uses safe string functions:

// Use snprintf instead of sprintf
char buffer[256];
snprintf(buffer, sizeof(buffer), "format: %s", user_input);

// Check return values
int written = snprintf(buffer, size, format, ...);
if (written < 0 || (size_t)written >= size) {
    // Handle truncation
}

Memory Safety

Always:

  • Check malloc return values
  • Free allocated memory
  • Use valgrind to detect leaks
  • Enable sanitizers in development
void *ptr = malloc(size);
if (ptr == NULL) {
    // Handle allocation failure
    return NULL;
}
// ... use ptr ...
free(ptr);
ptr = NULL;  // Prevent use-after-free

Error Handling

Never expose sensitive information in errors:

// Bad: Exposes internal details
fprintf(stderr, "Database error: %s\n", internal_error_message);

// Good: Generic error message
fprintf(stderr, "Database operation failed\n");
log_internal_error(internal_error_message);  // Log internally

Vulnerability Management

Dependency Management

Regular Updates:

  • Update libcurl regularly
  • Monitor CVE databases
  • Use automated dependency scanning
# Check for outdated packages
apt list --upgradable

# Update libcurl
apt-get update && apt-get upgrade libcurl4-openssl-dev

Security Scanning

Static Analysis:

# Use cppcheck
cppcheck --enable=all src/

# Use clang static analyzer
scan-build make

Dynamic Analysis:

# Use AddressSanitizer
make CFLAGS="-fsanitize=address" test

# Use Valgrind
valgrind --leak-check=full ./test_program

Security Advisories

Monitor:

  • GitHub Security Advisories
  • CVE databases
  • Security mailing lists

Response Process:

  1. Assess severity
  2. Test patches
  3. Deploy fixes
  4. Communicate to users

Compliance Considerations

GDPR Compliance

Right to Erasure:

// Implement data deletion
int delete_user_data(GV_Database *db, const char *user_id) {
    // Find all vectors for user
    // Delete vectors and metadata
    // Log deletion
    return 0;
}

Data Portability:

// Export user data
int export_user_data(GV_Database *db, const char *user_id, FILE *output) {
    // Export all user's vectors and metadata
    // Format as JSON
    return 0;
}

HIPAA Considerations

For healthcare data:

  • Use encryption at rest and in transit
  • Implement strict access controls
  • Maintain audit logs
  • Use BAA with cloud providers

SOC 2 Requirements

Security Controls:

  • Access controls
  • Encryption
  • Monitoring and logging
  • Incident response procedures

Security Checklist

Pre-Deployment

  • API keys stored securely (not in code)
  • TLS/SSL configured for all connections
  • Firewall rules configured
  • Access controls implemented
  • Input validation in place
  • Error handling doesn't expose sensitive data
  • Logging configured (no sensitive data)
  • Backup encryption enabled
  • Security scanning completed
  • Dependencies updated

Ongoing

  • Regular security audits
  • Monitor security logs
  • Update dependencies monthly
  • Rotate API keys quarterly
  • Review access permissions quarterly
  • Test backup recovery monthly
  • Review and update security policies

Incident Response

  • Incident response plan documented
  • Security team contacts identified
  • Escalation procedures defined
  • Communication plan ready
  • Forensics capabilities available

Security Contacts

For security issues:

  • Email: security@gigavector.example (replace with actual)
  • GitHub: Open a private security advisory
  • Response Time: Within 24 hours for critical issues

Additional Resources


Remember: Security is an ongoing process, not a one-time setup. Regularly review and update your security practices.

There aren’t any published security advisories