pgvector Setup

Why Sonamu Needs Vector Search

When building web apps with Sonamu, you’ll implement features like these:

Knowledge base: “Find similar documents”
E-commerce: “Products similar to this one”
Content: “Related articles recommendation”
Customer support: “Find similar questions”

Traditional keyword search (LIKE '%keyword%') has limitations:

Searching “TypeScript framework” won’t find “Node.js API library”
Vulnerable to typos (“typescript” vs “tyepscript”)
Doesn’t handle synonyms (“framework” vs “library”)

Semantic search is needed. For this, we use vector search.

Why pgvector?

To implement vector search, you need a database that can store and search vectors.

Options

Method	Pros	Cons	Sonamu Recommendation
pgvector	Use existing PostgreSQL, no additional infrastructure, JOIN with existing data	Lower performance than dedicated vector DBs	Highly recommended
Pinecone	Optimized for vector search, managed service	Additional cost, separate sync needed	Low
Elasticsearch	Powerful search features	Heavy, complex setup	Medium
Weaviate/Milvus	Dedicated vector DB	Separate infrastructure, learning curve	Low

Why pgvector is Recommended for Sonamu Projects

1. You’re already using PostgreSQL

// sonamu.config.ts
export default defineConfig({
  database: {
    client: "pg",
    connection: { /* ... */ }
  }
});

Sonamu is PostgreSQL + Knex based. No need to add a separate database for vector search. 2. Can be used with existing data

-- JOIN with existing data
SELECT
  d.id, d.title, d.category,
  1 - (d.embedding <=> ?) AS similarity
FROM documents d
JOIN categories c ON d.category_id = c.id
WHERE c.active = true
ORDER BY similarity DESC;

You can mix vector search with regular SQL. A separate DB would require data synchronization. 3. Simple infrastructure

Pinecone: Separate API, cost, sync
pgvector: Just install extension, no additional cost

4. Natural integration with Sonamu Model

class DocumentModelClass extends BaseModel {
  @api({ httpMethod: 'POST' })
  async search(query: string) {
    // Write vector search SQL with Puri
    const results = await this.getPuri().raw(`...`);
    return results.rows;
  }
}

What is pgvector?

pgvector is an extension that allows PostgreSQL to store and search vector (embedding) data. Key features:

vector(N) data type (N-dimensional vector)
Similarity operators (<=>, <->, <#>)
Indexes (IVFFlat, HNSW)

Required Package Installation

pnpm add pgvector voyageai

Packages:

pgvector: PostgreSQL pgvector type support (use with Knex)
voyageai: Voyage AI embeddings (recommended for Korean)
@ai-sdk/openai: OpenAI embeddings (optional)

PostgreSQL Extension Installation

Installation by Environment

Ubuntu/Debian
macOS (Homebrew)
Docker
Cloud (Supabase/Neon)

# PostgreSQL development packages
sudo apt-get install postgresql-server-dev-14

# Build and install pgvector
git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

brew install pgvector

The simplest option. Homebrew installs it automatically.

Use the official image (recommended):

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg14
    environment:
      POSTGRES_PASSWORD: postgres
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

pgvector is already included.

Supabase, Neon, Railway, etc. provide pgvector by default.Just enable the extension:

CREATE EXTENSION IF NOT EXISTS vector;

A good choice when deploying Sonamu projects to the cloud.

Enable Extension

Connect to PostgreSQL and enable the extension:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify installation
SELECT * FROM pg_extension WHERE extname = 'vector';

-- Check version
SELECT vector_version();  -- 0.5.1 or higher recommended

Applying to Sonamu Project

1. Environment Variables

# PostgreSQL (you probably already have this)
DATABASE_URL=postgresql://user:password@localhost:5432/mydb

# Embedding API (for later use)
VOYAGE_API_KEY=pa-...
# or
OPENAI_API_KEY=sk-...

2. Verify Sonamu Config

// sonamu.config.ts
import { defineConfig } from "sonamu";

export default defineConfig({
  database: {
    name: "myapp",
    defaultOptions: {
      client: "pg",
      connection: {
        host: "localhost",
        port: 5432,
        user: "postgres",
        password: "postgres",
        database: "myapp",
      },
    },
  },
});

Sonamu already uses PostgreSQL. No additional configuration needed.

3. Create Table with Knex Migration

Create a vector table using Sonamu’s Migration:

// migrations/20240101000000_add_vector_search.ts
import type { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
  // 1. Enable pgvector extension
  await knex.raw('CREATE EXTENSION IF NOT EXISTS vector');

  // 2. Add embedding column
  await knex.schema.table('documents', (table) => {
    // Voyage AI uses 1024 dimensions
    table.specificType('embedding', 'vector(1024)');
  });

  // 3. Index comes later (after data accumulates)
  // await knex.raw(`
  //   CREATE INDEX ON documents
  //   USING hnsw (embedding vector_cosine_ops)
  // `);
}

export async function down(knex: Knex): Promise<void> {
  await knex.schema.table('documents', (table) => {
    table.dropColumn('embedding');
  });
}

Run:

pnpm sonamu migrate:latest

4. Creating a Vector Table from Scratch

If creating a new table:

// migrations/20240101000001_create_knowledge_base.ts
import type { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
  await knex.raw('CREATE EXTENSION IF NOT EXISTS vector');

  await knex.schema.createTable('knowledge_base', (table) => {
    table.increments('id').primary();
    table.text('title').notNullable();
    table.text('content').notNullable();
    table.string('category', 50);

    // Vector column
    table.specificType('embedding', 'vector(1024)');

    table.timestamps(true, true);

    // Regular index
    table.index('category');
  });
}

export async function down(knex: Knex): Promise<void> {
  await knex.schema.dropTableIfExists('knowledge_base');
}

Understanding Vector Dimensions

Different embedding models have different vector dimensions:

import { Embedding } from "sonamu/vector";

// Voyage AI: 1024 dimensions
const voyageDim = Embedding.getDimensions('voyage');
console.log(voyageDim);  // 1024

// OpenAI: 1536 dimensions
const openaiDim = Embedding.getDimensions('openai');
console.log(openaiDim);  // 1536

Match dimensions when creating tables:

-- When using Voyage AI
CREATE TABLE docs (
  embedding vector(1024)
);

-- When using OpenAI
CREATE TABLE docs (
  embedding vector(1536)
);

Index - Create Later

Important: Create the index after sufficient data has accumulated.

Why Later?

// Bad order
await knex.raw('CREATE INDEX ...');  // Index first
await DocumentModel.saveOne({ embedding });  // Data later

// Good order
await DocumentModel.saveOne({ embedding });  // Data first (100+ entries)
await knex.raw('CREATE INDEX ...');  // Index later

The index won’t be optimized without data.

HNSW Index (Recommended)

After 100+ data entries:

// migrations/20240101000002_add_vector_index.ts
export async function up(knex: Knex): Promise<void> {
  await knex.raw(`
    CREATE INDEX idx_docs_embedding
    ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
  `);
}

Parameters:

m = 16: Number of connections (default, usually OK)
ef_construction = 64: Search size during construction

IVFFlat Index (Faster Build)

If HNSW is too slow:

CREATE INDEX idx_docs_embedding
ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Practical Scenario

Scenario: Building a Knowledge Base

You’re building an internal knowledge base with Sonamu. Step 1: Table Design

CREATE TABLE knowledge_base (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  category VARCHAR(50),
  embedding vector(1024),  -- Voyage AI
  created_at TIMESTAMP DEFAULT NOW()
);

Step 2: Data Entry (later)

// Save after generating embeddings (see embeddings.mdx)
await KnowledgeBaseModel.saveOne({
  title: "Getting Started with Sonamu",
  content: "...",
  category: "documentation",
  embedding: [...],  // array of 1024 numbers
});

Step 3: Create Index (after 100+ data entries)

CREATE INDEX ON knowledge_base
USING hnsw (embedding vector_cosine_ops);

Step 4: Search API (later)

class KnowledgeBaseModelClass extends BaseModel {
  @api({ httpMethod: 'POST' })
  async search(query: string) {
    // See vector-search.mdx
    const results = await this.getPuri().raw(`...`);
    return results.rows;
  }
}

Cautions

Cautions when setting up pgvector:

Dimension match: Table and embedding model dimensions must be the same
```
-- Voyage AI (1024)
CREATE TABLE docs (embedding vector(1024));
```

Index comes later: Create after 100+ data entries

// 1. Data first
await saveDocuments();

// 2. Index later
await createIndex();

Allow NULL: May not be able to create embeddings for all documents immediately
```
-- Allow NULL (can update later)
embedding vector(1024)
```
Manage with Migration: Use Migration instead of direct SQL
```
pnpm sonamu migrate:latest
```
Extension version: 0.5.1 or higher recommended
```
SELECT vector_version();
```

Next Steps

pgvector installation is complete. Now it’s time to generate embeddings and implement search.

Generating Embeddings

Creating embeddings with Voyage AI in Sonamu

Vector Search

Implementing search API in Sonamu Model

Get Started

Core Concepts

Database

API Development

Frontend Integration

Testing

Advanced Features

Tools & CLI

Configuration

API Reference

Troubleshooting

FAQ

Why Sonamu Needs Vector Search

Why pgvector?

Options

Why pgvector is Recommended for Sonamu Projects

What is pgvector?

Required Package Installation

PostgreSQL Extension Installation

Installation by Environment

Enable Extension

Applying to Sonamu Project

1. Environment Variables

2. Verify Sonamu Config

3. Create Table with Knex Migration

4. Creating a Vector Table from Scratch

Understanding Vector Dimensions

Index - Create Later

Why Later?

HNSW Index (Recommended)

IVFFlat Index (Faster Build)

Practical Scenario

Scenario: Building a Knowledge Base

Cautions

Next Steps

Generating Embeddings

Vector Search

Get Started

Core Concepts

Database

API Development

Frontend Integration

Testing

Advanced Features

Tools & CLI

Configuration

API Reference

Troubleshooting

FAQ

​Why Sonamu Needs Vector Search

​Why pgvector?

​Options

​Why pgvector is Recommended for Sonamu Projects

​What is pgvector?

​Required Package Installation

​PostgreSQL Extension Installation

​Installation by Environment

​Enable Extension

​Applying to Sonamu Project

​1. Environment Variables

​2. Verify Sonamu Config

​3. Create Table with Knex Migration

​4. Creating a Vector Table from Scratch

​Understanding Vector Dimensions

​Index - Create Later

​Why Later?

​HNSW Index (Recommended)

​IVFFlat Index (Faster Build)

​Practical Scenario

​Scenario: Building a Knowledge Base

​Cautions

​Next Steps

Generating Embeddings

Vector Search

Why Sonamu Needs Vector Search

Why pgvector?

Options

Why pgvector is Recommended for Sonamu Projects

What is pgvector?

Required Package Installation

PostgreSQL Extension Installation

Installation by Environment

Enable Extension

Applying to Sonamu Project

1. Environment Variables

2. Verify Sonamu Config

3. Create Table with Knex Migration

4. Creating a Vector Table from Scratch

Understanding Vector Dimensions

Index - Create Later

Why Later?

HNSW Index (Recommended)

IVFFlat Index (Faster Build)

Practical Scenario

Scenario: Building a Knowledge Base

Cautions

Next Steps