The Problem with Sonamu Search API
Youβre building a knowledge base app with Sonamu:- βSonamu is a TypeScript frameworkβ - Found
- βSonamu is a Node.js API libraryβ - Not found (different keywords)
- βSonamu is a TS frameworkβ - Not found (abbreviation)
- βTypeScript framworkβ - Not found (typo)
- Doesnβt handle synonyms
- Fails when expressions differ
- Vulnerable to typos
- Fails when meaning is the same but words differ
Semantic Search is Needed
βTypeScript frameworkβ and βNode.js API libraryβ are semantically similar, even though the keywords are different. How can computers understand this meaning? EmbeddingsWhat are Embeddings?
Embeddings convert text into high-dimensional numerical arrays (vectors). Semantically similar texts are placed close together in vector space. Key points:- Converting to numbers enables distance calculation
- Close vectors = semantically similar texts
- Distant vectors = semantically different texts
Flow in Sonamu
The Embedding Class
Sonamu provides anEmbedding class to easily create embeddings:
Which Provider Should You Choose?
Voyage AI vs OpenAI
Sonamu supports two embedding providers:| Item | Voyage AI | OpenAI |
|---|---|---|
| Korean performance | Excellent | Good |
| English performance | Excellent | Excellent |
| Dimensions | 1024 | 1536 |
| Max tokens | 32,000 | 8,191 |
| Batch size | 128 | 100 |
| Asymmetric embeddings | Yes (document/query distinction) | No |
| Sonamu recommendation | Highly recommended | Recommended |
Selection Criteria for Sonamu Projects
Recommend Voyage AI:- Korean services (excellent Korean performance)
- Long document processing (32,000 tokens)
- Search accuracy matters (asymmetric embeddings)
- Global services (balanced multilingual support)
- Already using OpenAI API
Environment Setup
1. Install Packages
2. Configure API Keys
- Voyage AI: https://www.voyageai.com/
- OpenAI: https://platform.openai.com/
Using in Sonamu Model
Generating Embeddings When Saving Documents
- User uploads document via POST /documents
- Sonamu API generates embedding via Voyage AI
- PostgreSQL stores text + embedding together
Search API (to be implemented later)
Asymmetric Embeddings (Voyage AI)
Voyage AI distinguishes between document and query embeddings.Why the Distinction?
Document:- Long text
- Detailed information
- Storage purpose
- Short text
- Search terms
- Search purpose
Usage in Sonamu
Batch Processing
When processing multiple documents at once:- Voyage AI: 128 at a time
- OpenAI: 100 at a time