This post compares vector database providers for the docling-japanese-books project, focusing on storage requirements, costs, and capabilities for Japanese document processing.
All pricing information for each service provider were gathered from the linked sources in November 2025.
Provider Positioning Overview
Quadrant Explanation:
- Q1 (Enterprise Scale / Premium): High-performance, feature-rich solutions for large deployments
- Q2 (Enterprise Scale / Budget): Cost-effective solutions that can handle large scale
- Q3 (Starter / Budget): Affordable options for small to medium projects
- Q4 (Starter / Premium): High-cost solutions better suited for smaller, specialized use cases
Storage Requirements Analysis
Project Configuration
- Target Content: 80-page Japanese books
- Chunking Strategy: Late Chunking with ~400 characters per chunk
- Overlap: 10% between chunks for context preservation
Table 1. Storage Calculations per Book
| Content Type | Chars/Page | Chunks/Book | Storage/Book | Books in 5GB |
|---|---|---|---|---|
| Dense Classical | 600 | 132 chunks | 0.6 MB | ~7,900 |
| Mixed Content | 450 | 99 chunks | 0.5 MB | ~10,600 |
| Modern Layout | 300 | 66 chunks | 0.3 MB | ~15,900 |
Storage Breakdown per Chunk:
- Vector embedding: 4KB (1024 dimensions × 4 bytes float32)
- Metadata: ~1KB (text content + document info)
- Total per chunk: ~5KB
Vector Database Provider Comparison
1. Zilliz Cloud
Sources: Zilliz Cloud Pricing, Zilliz Free Trials
Free Plan (Permanent):
- Storage: 5GB free forever (enough for 1M 768-dim vectors)
- Capacity: 7,000-15,000 books (80 pages each)
- Compute: 2.5M vCUs per month
- Collections: Up to 5 collections
- Features: Full Serverless cluster features, Milvus compatibility, Direct Milvus client support
- No credit card required to start
Free Trial (Additional):
- Credits: $100 free credits for 30 days (with work email)
- Access: Serverless and Dedicated plans during trial
- Extended: Credits expire in 1 year if payment method added
- After Trial: Organization frozen, clusters moved to recycle bin (30-day recovery)
Paid Tiers:
- Starter: $0.10/GB/month + compute
- Standard: Volume discounts, dedicated resources
- Enterprise: Custom pricing, SLA guarantees
Pros:
- Native Milvus compatibility (no code changes needed)
- Excellent free tier for development/research (5GB permanent)
- Global CDN and edge locations
- Built-in backup and disaster recovery
- Advanced security features
- Full Serverless cluster features included
- Enterprise-grade infrastructure
Cons:
- Newer service (less market presence than some competitors)
- Pricing can scale up quickly for large datasets
2. Pinecone
Sources: Pinecone Pricing, Pinecone Documentation
Free Tier (Starter Plan - Permanent):
- Storage: Up to 2GB free forever
- Capacity: ~4,000 books (80 pages each)
- Write Units: Up to 2M/month
- Read Units: Up to 1M/month
- Index Types: Dense and Sparse supported
- Indexes: Up to 5 indexes
- Namespaces: 100 per index
- Embedding Models: All available models included (5M tokens/month for select models)
- Console Metrics: Included
- Users: Up to 2 users per organization
- Limitations: Single region only (AWS us-east-1), no backups or restore, community support only (no SLA), no SAML SSO/service accounts/API key RBAC, no private endpoints or audit logs
Free Trial (Standard Plan):
- Credits: $300 credit for 3 weeks
- Access: All Standard plan features during trial
- After Trial: $50/month minimum + pay-as-you-go
Paid Tiers:
- Standard: $50/month minimum + pay-as-you-go
- Enterprise: Custom pricing with enhanced support and compliance features
Current Pricing (Oct 2025):
- Write Units: $0.4/1M writes
- Read Units: $0.5/1M reads
- Storage: Varies by region (~$0.096-0.128/GB/month)
- Additional features: SAML SSO, API key RBAC, object storage import
Pros:
- Mature platform with excellent performance
- Generous 2GB free tier (vs previous 100MB)
- Simple API and comprehensive documentation
- Built-in monitoring and metrics
- Strong ecosystem and integrations
- Multi-cloud availability (AWS, GCP, Azure)
Cons:
- Free tier limited to single region (us-east-1)
- $50/month minimum after trial period
- Can become expensive for high-volume applications
- Vendor lock-in (proprietary API)
3. Weaviate Cloud
Sources: Weaviate Pricing, Serverless Cloud
Free Trial (Sandbox - 14 Days Only):
- Duration: 14 days only (NOT permanent)
- Features: Full core DB toolkit during trial (hybrid search BM25 + vector, dynamic index, compression and multi-tenancy, RBAC baseline security)
- Support: Community support via Slack and Forum
- After Trial: Must upgrade to paid plan ($45/month minimum)
- No permanent free tier available
Paid Tiers (Serverless Cloud):
- Flex: $45/month minimum (pay-as-you-go, shared cloud)
- Plus: $280/month minimum (prepaid, shared or dedicated)
- Premium: Custom pricing (dedicated infrastructure, white-glove support)
Current Pricing (Nov 2025):
For 10,000 books (1.02 billion vector dimensions):
- Flex: ~$760/month (estimated, usage-based)
- Plus: ~$280/month base + usage
- Premium: Custom pricing (enterprise)
Pros:
- Built-in hybrid search (BM25 + vector) out of the box
- GraphQL interface for complex queries
- Strong semantic capabilities and AI integrations
- Multiple SLA tiers with 24/7 support options
- Good open-source version available
- Quantization options (PQ, BQ) for cost optimization
Cons:
- No persistent free tier (14-day sandbox only)
- Expensive for large document collections ($122+/month minimum)
- More complex setup than Milvus/Pinecone
- Higher learning curve for GraphQL queries
4. Qdrant Cloud
Sources: Qdrant Pricing, Qdrant Calculator
Free Tier (Permanent):
- Storage: 1GB free forever cluster
- Capacity: ~2,600 books (80 pages each)
- No credit card required to start
- Features: Fully managed with all core features (central cluster management, multiple cloud providers AWS/GCP/Azure and regions, horizontal & vertical scaling, monitoring/logging/alerting, high availability and auto-healing, backup & disaster recovery, zero-downtime upgrades, unlimited users, standard support and uptime SLAs)
- Performance: Rust-based, very fast
Paid Tiers:
- Hybrid Cloud: $0.014/hour = $10.08/month base cluster + scaling costs
- Private Cloud: Custom pricing (enterprise)
- Marketplace: Available on AWS, GCP, and Azure marketplaces
Current Pricing (Oct 2025):
For 10,000 books (~4GB storage):
- Base cluster: $10.08/month (720 hours × $0.014)
- Storage scaling: ~$1.50/month (estimated 3GB overage × $0.50/GB)
- Total estimated: $11.58/month for 10K books
Pros:
- Excellent performance (Rust implementation)
- Best value free tier (1GB permanent vs competitors)
- Advanced filtering and search capabilities
- Open source with strong community
- Multi-cloud deployment options
- No vendor lock-in (can self-host)
- Built-in quantization and optimization features
Cons:
- Smaller managed cloud ecosystem compared to Pinecone/Weaviate
- Would require API migration from Milvus
- Scaling costs can vary significantly by configuration
- Less documentation for complex enterprise scenarios
5. Chroma Cloud
Sources: Chroma Pricing, Chroma Documentation
Free Tier (Starter Plan):
- Credits: $5 free credits + $0/month base
- Usage-Based: Pay only for what you use after credits
- Capacity: ~1,000-2,000 books with free credits
- Features: Vector, full-text, metadata search, Apache 2.0 open source
- Team: Up to 10 databases, 10 team members
Pricing Model (Usage-Based):
- Writes: $2.50 per GiB written
- Storage: $0.33 per GiB/month stored
- Queries: $0.0075 per TiB queried + $0.09 per GiB returned
- Example: $79/month for 1M docs written, 6M docs stored, 10M queries
Paid Tiers:
- Team: $250/month + usage, $100 credits included, up to 100 databases
- Enterprise: Custom pricing, unlimited databases, single tenant clusters
Current Pricing (Nov 2025):
For 10,000 books (~5GB storage, moderate usage):
- Storage: $1.65/month (5GB × $0.33)
- Writes: ~$12.50 (assuming 5GB written once)
- Queries: ~$8-15/month (moderate query volume)
- Total estimated: $20-30/month
Pros:
- Excellent open source foundation (Apache 2.0)
- Multi-modal search (vector, full-text, metadata, regex)
- Strong developer ecosystem and integrations
- Competitive pricing for storage and queries
- Easy local development with seamless cloud migration
- Built-in collaboration features and dashboard
- Strong community support (21k+ GitHub stars)
Cons:
- Newer cloud service (less mature than Pinecone/Weaviate)
- Would require API migration from Milvus
- Usage-based pricing can be unpredictable for high-volume applications
- Limited enterprise features compared to established providers
6. LanceDB Cloud
Sources: LanceDB Pricing, LanceDB Cloud Docs
Free Credits (One-Time Only):
- Credits: $100 one-time free credits
- Not Permanent: Credits-based, no permanent free tier
- Usage-Based: Pay only for what you use (writes, queries, storage)
- Capacity: ~5,000-8,000 books with free credits
- Features: Serverless, multimodal storage, enterprise security
- Migration: Seamless from LanceDB OSS (just change connection URL)
Pricing Model (Usage-Based):
- Writes: $6.20 per 1M vectors written
- Queries: $15.26 per 1M queries
- Storage: $2.05 per GB/month
- Total Example: $23.50/month for 1M writes + 1M queries + moderate storage
Current Pricing (Nov 2025):
For 10,000 books (~5GB storage, moderate usage):
- Storage: $10.25/month (5GB × $2.05)
- Writes: $6.20 (assuming 1M vectors/month)
- Queries: $15.26 (assuming 1M queries/month)
- Total estimated: $31.71/month
Pros:
- True serverless with automatic scaling to zero
- Multimodal storage (vectors, text, images together)
- Lance columnar format for high performance
- Enterprise security (SOC2, HIPAA compliance)
- Open source compatibility (easy migration)
- Pay-per-use model (cost-effective for variable workloads)
- Built-in observability and monitoring
- $100 free credits for getting started
Cons:
- Newer service (public beta, GA coming soon)
- Usage-based pricing can be unpredictable for high-volume applications
- Query costs can add up with high traffic ($15.26/1M queries)
- Would require API migration from Milvus
- Limited documentation compared to established providers
- No permanent free tier (credits-based only)
7. Local Milvus Lite
Costs:
- Storage: Limited only by local disk space
- Capacity: “Unlimited” books (hardware dependent)
Pros:
- Complete control and privacy
- No usage-based costs
- Full feature access
- Ideal for development and testing
Cons:
- No automatic scaling
- Manual backup and maintenance
- Single point of failure
- No built-in redundancy
Free Tier Detailed Comparison
This section provides a comprehensive analysis of free tier offerings to help you choose the best option for getting started without cost.
Table 1: Permanent Free Tiers
| Provider | Storage | Capacity (Books) | Key Features | Limitations | Best For |
|---|---|---|---|---|---|
| Zilliz Cloud | 5GB | 7,000-15,000 | Full Serverless features, 2.5M vCUs/mo, 5 collections | None (full features) | Research, development |
| Pinecone | 2GB | ~4,000 | 2M writes/mo, 1M reads/mo, 5 indexes, all embeddings | Single region (us-east-1), no backups | Small projects, prototypes |
| Qdrant | 1GB | ~2,600 | Full managed features, multi-cloud, HA, backups | Smaller storage capacity | Cost-sensitive projects |
Table 2: Trial Credits and Limited-Time Free Tiers
| Provider | Credits/Duration | After Trial | Features During Trial | Best For |
|---|---|---|---|---|
| Chroma | $5 one-time | Pay-as-you-go | Multi-modal search, open source | Open source testing |
| LanceDB | $100 one-time | Pay-as-you-go | Full serverless, multimodal | Initial exploration |
| Weaviate | 14 days | $45/month minimum | Full features, hybrid search | Short-term testing |
| Zilliz | $100 (30 days) | Free plan remains | Dedicated/Serverless plans (bonus) | Advanced testing |
Table 3: Monthly and Annual Cost Projections for 10,000 Books (≈ 5GB)
| Provider | Monthly Cost | Annual Cost | Notes |
|---|---|---|---|
| Zilliz Cloud | $0 → $50 | $0 → $600 | 5GB free tier → Standard |
| Qdrant | $0 → $12 | $0 → $144 | 1GB free tier, $10/month base |
| Chroma | $0 → $25 | $0 → $300 | $5 free credits → usage-based |
| LanceDB | $0 → $32 | $0 → $384 | $100 free credits → usage-based |
| Pinecone | $50-75 | $600-900 | 2GB free, $50/month minimum |
| Weaviate | $122+ | $1,467+ | No free tier, $45/month minimum |
| Local Milvus | $20-50 | $240-600 | Hardware/hosting only |
Conclusion
For the docling-japanese-books project processing 80-page Japanese books:
- Best Free Option: Zilliz Cloud (5GB permanent = 7,000-15,000 books FREE forever)
- Best Free Alternative: Pinecone (2GB permanent = ~4,000 books)
- Best Development: Local Milvus Lite (unlimited, no cloud provider costs)
