paazmaya.fi

The Website of Juga Paazmaya | Stories about web development, hardware prototyping, and education

Vector Database Provider Comparison

Comprehensive comparison of vector database providers for Japanese document processing, analyzing storage requirements, costs, performance, and free tier options for the docling-japanese-books project with detailed pricing analysis and migration strategies.

This post compares vector database providers for the docling-japanese-books project, focusing on storage requirements, costs, and capabilities for Japanese document processing.

All pricing information for each service provider were gathered from the linked sources in November 2025.

Provider Positioning Overview

Enterprise Scale / PremiumEnterprise Scale / BudgetStarter / BudgetStarter / PremiumLocal MilvusWeaviatePineconeLanceDB CloudChroma CloudQdrant CloudZilliz CloudLow CostHigh CostSmall ScaleLarge ScaleVector Database Provider Positioning

Quadrant Explanation:

  • Q1 (Enterprise Scale / Premium): High-performance, feature-rich solutions for large deployments
  • Q2 (Enterprise Scale / Budget): Cost-effective solutions that can handle large scale
  • Q3 (Starter / Budget): Affordable options for small to medium projects
  • Q4 (Starter / Premium): High-cost solutions better suited for smaller, specialized use cases

Storage Requirements Analysis

Project Configuration

  • Target Content: 80-page Japanese books
  • Chunking Strategy: Late Chunking with ~400 characters per chunk
  • Overlap: 10% between chunks for context preservation

Table 1. Storage Calculations per Book

Content TypeChars/PageChunks/BookStorage/BookBooks in 5GB
Dense Classical600132 chunks0.6 MB~7,900
Mixed Content45099 chunks0.5 MB~10,600
Modern Layout30066 chunks0.3 MB~15,900

Storage Breakdown per Chunk:

  • Vector embedding: 4KB (1024 dimensions × 4 bytes float32)
  • Metadata: ~1KB (text content + document info)
  • Total per chunk: ~5KB

Vector Database Provider Comparison

1. Zilliz Cloud

Sources: Zilliz Cloud Pricing, Zilliz Free Trials

Free Plan (Permanent):

  • Storage: 5GB free forever (enough for 1M 768-dim vectors)
  • Capacity: 7,000-15,000 books (80 pages each)
  • Compute: 2.5M vCUs per month
  • Collections: Up to 5 collections
  • Features: Full Serverless cluster features, Milvus compatibility, Direct Milvus client support
  • No credit card required to start

Free Trial (Additional):

  • Credits: $100 free credits for 30 days (with work email)
  • Access: Serverless and Dedicated plans during trial
  • Extended: Credits expire in 1 year if payment method added
  • After Trial: Organization frozen, clusters moved to recycle bin (30-day recovery)

Paid Tiers:

  • Starter: $0.10/GB/month + compute
  • Standard: Volume discounts, dedicated resources
  • Enterprise: Custom pricing, SLA guarantees

Pros:

  • Native Milvus compatibility (no code changes needed)
  • Excellent free tier for development/research (5GB permanent)
  • Global CDN and edge locations
  • Built-in backup and disaster recovery
  • Advanced security features
  • Full Serverless cluster features included
  • Enterprise-grade infrastructure

Cons:

  • Newer service (less market presence than some competitors)
  • Pricing can scale up quickly for large datasets

2. Pinecone

Sources: Pinecone Pricing, Pinecone Documentation

Free Tier (Starter Plan - Permanent):

  • Storage: Up to 2GB free forever
  • Capacity: ~4,000 books (80 pages each)
  • Write Units: Up to 2M/month
  • Read Units: Up to 1M/month
  • Index Types: Dense and Sparse supported
  • Indexes: Up to 5 indexes
  • Namespaces: 100 per index
  • Embedding Models: All available models included (5M tokens/month for select models)
  • Console Metrics: Included
  • Users: Up to 2 users per organization
  • Limitations: Single region only (AWS us-east-1), no backups or restore, community support only (no SLA), no SAML SSO/service accounts/API key RBAC, no private endpoints or audit logs

Free Trial (Standard Plan):

  • Credits: $300 credit for 3 weeks
  • Access: All Standard plan features during trial
  • After Trial: $50/month minimum + pay-as-you-go

Paid Tiers:

  • Standard: $50/month minimum + pay-as-you-go
  • Enterprise: Custom pricing with enhanced support and compliance features

Current Pricing (Oct 2025):

  • Write Units: $0.4/1M writes
  • Read Units: $0.5/1M reads
  • Storage: Varies by region (~$0.096-0.128/GB/month)
  • Additional features: SAML SSO, API key RBAC, object storage import

Pros:

  • Mature platform with excellent performance
  • Generous 2GB free tier (vs previous 100MB)
  • Simple API and comprehensive documentation
  • Built-in monitoring and metrics
  • Strong ecosystem and integrations
  • Multi-cloud availability (AWS, GCP, Azure)

Cons:

  • Free tier limited to single region (us-east-1)
  • $50/month minimum after trial period
  • Can become expensive for high-volume applications
  • Vendor lock-in (proprietary API)

3. Weaviate Cloud

Sources: Weaviate Pricing, Serverless Cloud

Free Trial (Sandbox - 14 Days Only):

  • Duration: 14 days only (NOT permanent)
  • Features: Full core DB toolkit during trial (hybrid search BM25 + vector, dynamic index, compression and multi-tenancy, RBAC baseline security)
  • Support: Community support via Slack and Forum
  • After Trial: Must upgrade to paid plan ($45/month minimum)
  • No permanent free tier available

Paid Tiers (Serverless Cloud):

  • Flex: $45/month minimum (pay-as-you-go, shared cloud)
  • Plus: $280/month minimum (prepaid, shared or dedicated)
  • Premium: Custom pricing (dedicated infrastructure, white-glove support)

Current Pricing (Nov 2025):

For 10,000 books (1.02 billion vector dimensions):

  • Flex: ~$760/month (estimated, usage-based)
  • Plus: ~$280/month base + usage
  • Premium: Custom pricing (enterprise)

Pros:

  • Built-in hybrid search (BM25 + vector) out of the box
  • GraphQL interface for complex queries
  • Strong semantic capabilities and AI integrations
  • Multiple SLA tiers with 24/7 support options
  • Good open-source version available
  • Quantization options (PQ, BQ) for cost optimization

Cons:

  • No persistent free tier (14-day sandbox only)
  • Expensive for large document collections ($122+/month minimum)
  • More complex setup than Milvus/Pinecone
  • Higher learning curve for GraphQL queries

4. Qdrant Cloud

Sources: Qdrant Pricing, Qdrant Calculator

Free Tier (Permanent):

  • Storage: 1GB free forever cluster
  • Capacity: ~2,600 books (80 pages each)
  • No credit card required to start
  • Features: Fully managed with all core features (central cluster management, multiple cloud providers AWS/GCP/Azure and regions, horizontal & vertical scaling, monitoring/logging/alerting, high availability and auto-healing, backup & disaster recovery, zero-downtime upgrades, unlimited users, standard support and uptime SLAs)
  • Performance: Rust-based, very fast

Paid Tiers:

  • Hybrid Cloud: $0.014/hour = $10.08/month base cluster + scaling costs
  • Private Cloud: Custom pricing (enterprise)
  • Marketplace: Available on AWS, GCP, and Azure marketplaces

Current Pricing (Oct 2025):

For 10,000 books (~4GB storage):

  • Base cluster: $10.08/month (720 hours × $0.014)
  • Storage scaling: ~$1.50/month (estimated 3GB overage × $0.50/GB)
  • Total estimated: $11.58/month for 10K books

Pros:

  • Excellent performance (Rust implementation)
  • Best value free tier (1GB permanent vs competitors)
  • Advanced filtering and search capabilities
  • Open source with strong community
  • Multi-cloud deployment options
  • No vendor lock-in (can self-host)
  • Built-in quantization and optimization features

Cons:

  • Smaller managed cloud ecosystem compared to Pinecone/Weaviate
  • Would require API migration from Milvus
  • Scaling costs can vary significantly by configuration
  • Less documentation for complex enterprise scenarios

5. Chroma Cloud

Sources: Chroma Pricing, Chroma Documentation

Free Tier (Starter Plan):

  • Credits: $5 free credits + $0/month base
  • Usage-Based: Pay only for what you use after credits
  • Capacity: ~1,000-2,000 books with free credits
  • Features: Vector, full-text, metadata search, Apache 2.0 open source
  • Team: Up to 10 databases, 10 team members

Pricing Model (Usage-Based):

  • Writes: $2.50 per GiB written
  • Storage: $0.33 per GiB/month stored
  • Queries: $0.0075 per TiB queried + $0.09 per GiB returned
  • Example: $79/month for 1M docs written, 6M docs stored, 10M queries

Paid Tiers:

  • Team: $250/month + usage, $100 credits included, up to 100 databases
  • Enterprise: Custom pricing, unlimited databases, single tenant clusters

Current Pricing (Nov 2025):

For 10,000 books (~5GB storage, moderate usage):

  • Storage: $1.65/month (5GB × $0.33)
  • Writes: ~$12.50 (assuming 5GB written once)
  • Queries: ~$8-15/month (moderate query volume)
  • Total estimated: $20-30/month

Pros:

  • Excellent open source foundation (Apache 2.0)
  • Multi-modal search (vector, full-text, metadata, regex)
  • Strong developer ecosystem and integrations
  • Competitive pricing for storage and queries
  • Easy local development with seamless cloud migration
  • Built-in collaboration features and dashboard
  • Strong community support (21k+ GitHub stars)

Cons:

  • Newer cloud service (less mature than Pinecone/Weaviate)
  • Would require API migration from Milvus
  • Usage-based pricing can be unpredictable for high-volume applications
  • Limited enterprise features compared to established providers

6. LanceDB Cloud

Sources: LanceDB Pricing, LanceDB Cloud Docs

Free Credits (One-Time Only):

  • Credits: $100 one-time free credits
  • Not Permanent: Credits-based, no permanent free tier
  • Usage-Based: Pay only for what you use (writes, queries, storage)
  • Capacity: ~5,000-8,000 books with free credits
  • Features: Serverless, multimodal storage, enterprise security
  • Migration: Seamless from LanceDB OSS (just change connection URL)

Pricing Model (Usage-Based):

  • Writes: $6.20 per 1M vectors written
  • Queries: $15.26 per 1M queries
  • Storage: $2.05 per GB/month
  • Total Example: $23.50/month for 1M writes + 1M queries + moderate storage

Current Pricing (Nov 2025):

For 10,000 books (~5GB storage, moderate usage):

  • Storage: $10.25/month (5GB × $2.05)
  • Writes: $6.20 (assuming 1M vectors/month)
  • Queries: $15.26 (assuming 1M queries/month)
  • Total estimated: $31.71/month

Pros:

  • True serverless with automatic scaling to zero
  • Multimodal storage (vectors, text, images together)
  • Lance columnar format for high performance
  • Enterprise security (SOC2, HIPAA compliance)
  • Open source compatibility (easy migration)
  • Pay-per-use model (cost-effective for variable workloads)
  • Built-in observability and monitoring
  • $100 free credits for getting started

Cons:

  • Newer service (public beta, GA coming soon)
  • Usage-based pricing can be unpredictable for high-volume applications
  • Query costs can add up with high traffic ($15.26/1M queries)
  • Would require API migration from Milvus
  • Limited documentation compared to established providers
  • No permanent free tier (credits-based only)

7. Local Milvus Lite

Costs:

  • Storage: Limited only by local disk space
  • Capacity: “Unlimited” books (hardware dependent)

Pros:

  • Complete control and privacy
  • No usage-based costs
  • Full feature access
  • Ideal for development and testing

Cons:

  • No automatic scaling
  • Manual backup and maintenance
  • Single point of failure
  • No built-in redundancy

Free Tier Detailed Comparison

This section provides a comprehensive analysis of free tier offerings to help you choose the best option for getting started without cost.

Table 1: Permanent Free Tiers

ProviderStorageCapacity (Books)Key FeaturesLimitationsBest For
Zilliz Cloud5GB7,000-15,000Full Serverless features, 2.5M vCUs/mo, 5 collectionsNone (full features)Research, development
Pinecone2GB~4,0002M writes/mo, 1M reads/mo, 5 indexes, all embeddingsSingle region (us-east-1), no backupsSmall projects, prototypes
Qdrant1GB~2,600Full managed features, multi-cloud, HA, backupsSmaller storage capacityCost-sensitive projects

Table 2: Trial Credits and Limited-Time Free Tiers

ProviderCredits/DurationAfter TrialFeatures During TrialBest For
Chroma$5 one-timePay-as-you-goMulti-modal search, open sourceOpen source testing
LanceDB$100 one-timePay-as-you-goFull serverless, multimodalInitial exploration
Weaviate14 days$45/month minimumFull features, hybrid searchShort-term testing
Zilliz$100 (30 days)Free plan remainsDedicated/Serverless plans (bonus)Advanced testing

Table 3: Monthly and Annual Cost Projections for 10,000 Books (≈ 5GB)

ProviderMonthly CostAnnual CostNotes
Zilliz Cloud$0 → $50$0 → $6005GB free tier → Standard
Qdrant$0 → $12$0 → $1441GB free tier, $10/month base
Chroma$0 → $25$0 → $300$5 free credits → usage-based
LanceDB$0 → $32$0 → $384$100 free credits → usage-based
Pinecone$50-75$600-9002GB free, $50/month minimum
Weaviate$122+$1,467+No free tier, $45/month minimum
Local Milvus$20-50$240-600Hardware/hosting only

Conclusion

For the docling-japanese-books project processing 80-page Japanese books:

  1. Best Free Option: Zilliz Cloud (5GB permanent = 7,000-15,000 books FREE forever)
  2. Best Free Alternative: Pinecone (2GB permanent = ~4,000 books)
  3. Best Development: Local Milvus Lite (unlimited, no cloud provider costs)