Skip to main content
When building web apps with Sonamu, you’ll implement features like these:
  • Knowledge base: “Find similar documents”
  • E-commerce: “Products similar to this one”
  • Content: “Related articles recommendation”
  • Customer support: “Find similar questions”
Traditional keyword search (LIKE '%keyword%') has limitations:
  • Searching “TypeScript framework” won’t find “Node.js API library”
  • Vulnerable to typos (“typescript” vs “tyepscript”)
  • Doesn’t handle synonyms (“framework” vs “library”)
Semantic search is needed. For this, we use vector search.

Why pgvector?

To implement vector search, you need a database that can store and search vectors.

Options

MethodProsConsSonamu Recommendation
pgvectorUse existing PostgreSQL, no additional infrastructure, JOIN with existing dataLower performance than dedicated vector DBsHighly recommended
PineconeOptimized for vector search, managed serviceAdditional cost, separate sync neededLow
ElasticsearchPowerful search featuresHeavy, complex setupMedium
Weaviate/MilvusDedicated vector DBSeparate infrastructure, learning curveLow
1. You’re already using PostgreSQL
// sonamu.config.ts
export default defineConfig({
  database: {
    client: "pg",
    connection: { /* ... */ }
  }
});
Sonamu is PostgreSQL + Knex based. No need to add a separate database for vector search. 2. Can be used with existing data
-- JOIN with existing data
SELECT
  d.id, d.title, d.category,
  1 - (d.embedding <=> ?) AS similarity
FROM documents d
JOIN categories c ON d.category_id = c.id
WHERE c.active = true
ORDER BY similarity DESC;
You can mix vector search with regular SQL. A separate DB would require data synchronization. 3. Simple infrastructure
  • Pinecone: Separate API, cost, sync
  • pgvector: Just install extension, no additional cost
4. Natural integration with Sonamu Model
class DocumentModelClass extends BaseModel {
  @api({ httpMethod: 'POST' })
  async search(query: string) {
    // Write vector search SQL with Puri
    const results = await this.getPuri().raw(`...`);
    return results.rows;
  }
}

What is pgvector?

pgvector is an extension that allows PostgreSQL to store and search vector (embedding) data. Key features:
  • vector(N) data type (N-dimensional vector)
  • Similarity operators (<=>, <->, <#>)
  • Indexes (IVFFlat, HNSW)

Required Package Installation

pnpm add pgvector voyageai
Packages:
  • pgvector: PostgreSQL pgvector type support (use with Knex)
  • voyageai: Voyage AI embeddings (recommended for Korean)
  • @ai-sdk/openai: OpenAI embeddings (optional)

PostgreSQL Extension Installation

Installation by Environment

# PostgreSQL development packages
sudo apt-get install postgresql-server-dev-14

# Build and install pgvector
git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

Enable Extension

Connect to PostgreSQL and enable the extension:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Check version
SELECT vector_version();  -- 0.5.1 or higher recommended

Applying to Sonamu Project

1. Environment Variables

# PostgreSQL (you probably already have this)
DATABASE_URL=postgresql://user:password@localhost:5432/mydb

# Embedding API (for later use)
VOYAGE_API_KEY=pa-...
# or
OPENAI_API_KEY=sk-...

2. Verify Sonamu Config

// sonamu.config.ts
import { defineConfig } from "sonamu";

export default defineConfig({
  database: {
    name: "myapp",
    defaultOptions: {
      client: "pg",
      connection: {
        host: "localhost",
        port: 5432,
        user: "postgres",
        password: "postgres",
        database: "myapp",
      },
    },
  },
});
Sonamu already uses PostgreSQL. No additional configuration needed.

3. Create Table with Knex Migration

Create a vector table using Sonamu’s Migration:
// migrations/20240101000000_add_vector_search.ts
import type { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
  // 1. Enable pgvector extension
  await knex.raw('CREATE EXTENSION IF NOT EXISTS vector');

  // 2. Add embedding column
  await knex.schema.table('documents', (table) => {
    // Voyage AI uses 1024 dimensions
    table.specificType('embedding', 'vector(1024)');
  });

  // 3. Index comes later (after data accumulates)
  // await knex.raw(`
  //   CREATE INDEX ON documents
  //   USING hnsw (embedding vector_cosine_ops)
  // `);
}

export async function down(knex: Knex): Promise<void> {
  await knex.schema.table('documents', (table) => {
    table.dropColumn('embedding');
  });
}
Run:
pnpm sonamu migrate:latest

4. Creating a Vector Table from Scratch

If creating a new table:
// migrations/20240101000001_create_knowledge_base.ts
import type { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
  await knex.raw('CREATE EXTENSION IF NOT EXISTS vector');

  await knex.schema.createTable('knowledge_base', (table) => {
    table.increments('id').primary();
    table.text('title').notNullable();
    table.text('content').notNullable();
    table.string('category', 50);

    // Vector column
    table.specificType('embedding', 'vector(1024)');

    table.timestamps(true, true);

    // Regular index
    table.index('category');
  });
}

export async function down(knex: Knex): Promise<void> {
  await knex.schema.dropTableIfExists('knowledge_base');
}

Understanding Vector Dimensions

Different embedding models have different vector dimensions:
import { Embedding } from "sonamu/vector";

// Voyage AI: 1024 dimensions
const voyageDim = Embedding.getDimensions('voyage');
console.log(voyageDim);  // 1024

// OpenAI: 1536 dimensions
const openaiDim = Embedding.getDimensions('openai');
console.log(openaiDim);  // 1536
Match dimensions when creating tables:
-- When using Voyage AI
CREATE TABLE docs (
  embedding vector(1024)
);

-- When using OpenAI
CREATE TABLE docs (
  embedding vector(1536)
);

Index - Create Later

Important: Create the index after sufficient data has accumulated.

Why Later?

// Bad order
await knex.raw('CREATE INDEX ...');  // Index first
await DocumentModel.saveOne({ embedding });  // Data later

// Good order
await DocumentModel.saveOne({ embedding });  // Data first (100+ entries)
await knex.raw('CREATE INDEX ...');  // Index later
The index won’t be optimized without data. After 100+ data entries:
// migrations/20240101000002_add_vector_index.ts
export async function up(knex: Knex): Promise<void> {
  await knex.raw(`
    CREATE INDEX idx_docs_embedding
    ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
  `);
}
Parameters:
  • m = 16: Number of connections (default, usually OK)
  • ef_construction = 64: Search size during construction

IVFFlat Index (Faster Build)

If HNSW is too slow:
CREATE INDEX idx_docs_embedding
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Practical Scenario

Scenario: Building a Knowledge Base

You’re building an internal knowledge base with Sonamu. Step 1: Table Design
CREATE TABLE knowledge_base (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  category VARCHAR(50),
  embedding vector(1024),  -- Voyage AI
  created_at TIMESTAMP DEFAULT NOW()
);
Step 2: Data Entry (later)
// Save after generating embeddings (see embeddings.mdx)
await KnowledgeBaseModel.saveOne({
  title: "Getting Started with Sonamu",
  content: "...",
  category: "documentation",
  embedding: [...],  // array of 1024 numbers
});
Step 3: Create Index (after 100+ data entries)
CREATE INDEX ON knowledge_base
USING hnsw (embedding vector_cosine_ops);
Step 4: Search API (later)
class KnowledgeBaseModelClass extends BaseModel {
  @api({ httpMethod: 'POST' })
  async search(query: string) {
    // See vector-search.mdx
    const results = await this.getPuri().raw(`...`);
    return results.rows;
  }
}

Cautions

Cautions when setting up pgvector:
  1. Dimension match: Table and embedding model dimensions must be the same
    -- Voyage AI (1024)
    CREATE TABLE docs (embedding vector(1024));
    
  2. Index comes later: Create after 100+ data entries
    // 1. Data first
    await saveDocuments();
    
    // 2. Index later
    await createIndex();
    
  3. Allow NULL: May not be able to create embeddings for all documents immediately
    -- Allow NULL (can update later)
    embedding vector(1024)
    
  4. Manage with Migration: Use Migration instead of direct SQL
    pnpm sonamu migrate:latest
    
  5. Extension version: 0.5.1 or higher recommended
    SELECT vector_version();
    

Next Steps

pgvector installation is complete. Now it’s time to generate embeddings and implement search.