๋ฉ”์ธ ์ฝ˜ํ…์ธ ๋กœ ๊ฑด๋„ˆ๋›ฐ๊ธฐ

Sonamu์— ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•œ ์ด์œ 

Sonamu๋กœ ์›น ์•ฑ์„ ๋งŒ๋“ค๋‹ค ๋ณด๋ฉด ์ด๋Ÿฐ ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค:
  • ์ง€์‹ ๋ฒ ์ด์Šค: โ€œ๋น„์Šทํ•œ ๋ฌธ์„œ ์ฐพ๊ธฐโ€
  • ์ปค๋จธ์Šค: โ€œ์ด ์ƒํ’ˆ๊ณผ ์œ ์‚ฌํ•œ ์ œํ’ˆโ€
  • ์ฝ˜ํ…์ธ : โ€œ๊ด€๋ จ ๊ธ€ ์ถ”์ฒœโ€
  • ๊ณ ๊ฐ ์ง€์›: โ€œ๋น„์Šทํ•œ ์งˆ๋ฌธ ์ฐพ๊ธฐโ€
์ „ํ†ต์ ์ธ ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰(LIKE '%keyword%')์€ ํ•œ๊ณ„๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:
  • โŒ โ€œTypeScript ํ”„๋ ˆ์ž„์›Œํฌโ€ ๊ฒ€์ƒ‰ ์‹œ โ€œNode.js API ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌโ€ ๋ชป ์ฐพ์Œ
  • โŒ ์˜คํƒ€์— ์ทจ์•ฝ (โ€œํƒ€์ž…์Šคํฌ๋ฆฝํŠธโ€ vs โ€œํƒ€์ž…์Šคํฌ๋ฆฝํŠธโ€)
  • โŒ ๋™์˜์–ด ์ฒ˜๋ฆฌ ์•ˆ ๋จ (โ€œํ”„๋ ˆ์ž„์›Œํฌโ€ vs โ€œ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌโ€)
์˜๋ฏธ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์™œ pgvector์ธ๊ฐ€?

๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ๊ตฌํ˜„ํ•˜๋ ค๋ฉด ๋ฒกํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•  ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์„ ํƒ์ง€

๋ฐฉ์‹์žฅ์ ๋‹จ์ Sonamu ์ถ”์ฒœ
pgvectorPostgreSQL ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉ, ์ถ”๊ฐ€ ์ธํ”„๋ผ ๋ถˆํ•„์š”, ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ JOIN ๊ฐ€๋Šฅ์ „๋ฌธ ๋ฒกํ„ฐ DB๋ณด๋‹ค ์„ฑ๋Šฅ ๋‚ฎ์Œโญโญโญโญโญ
Pinecone๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์ตœ์ ํ™”, ๊ด€๋ฆฌํ˜• ์„œ๋น„์Šค์ถ”๊ฐ€ ๋น„์šฉ, ๋ณ„๋„ ๋™๊ธฐํ™” ํ•„์š”โญโญ
Elasticsearch๊ฐ•๋ ฅํ•œ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ๋ฌด๊ฑฐ์›€, ์„ค์ • ๋ณต์žกโญโญโญ
Weaviate/Milvus์ „๋ฌธ ๋ฒกํ„ฐ DB๋ณ„๋„ ์ธํ”„๋ผ, ํ•™์Šต ๊ณก์„ โญโญ

Sonamu ํ”„๋กœ์ ํŠธ์—์„œ pgvector๋ฅผ ์ถ”์ฒœํ•˜๋Š” ์ด์œ 

1. ์ด๋ฏธ PostgreSQL์„ ์“ฐ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค
// sonamu.config.ts
export default defineConfig({
  database: {
    client: "pg",
    connection: { /* ... */ }
  }
});
Sonamu๋Š” PostgreSQL + Knex ๊ธฐ๋ฐ˜์ž…๋‹ˆ๋‹ค. ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ๋ณ„๋„ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ถ”๊ฐ€ํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. 2. ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ ํ•จ๊ป˜ ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
-- ๊ธฐ์กด ๋ฐ์ดํ„ฐ์™€ JOIN
SELECT 
  d.id, d.title, d.category,
  1 - (d.embedding <=> ?) AS similarity
FROM documents d
JOIN categories c ON d.category_id = c.id
WHERE c.active = true
ORDER BY similarity DESC;
๋ฒกํ„ฐ ๊ฒ€์ƒ‰๊ณผ ์ผ๋ฐ˜ SQL์„ ์„ž์–ด์„œ ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณ„๋„ DB๋ฉด ๋ฐ์ดํ„ฐ ๋™๊ธฐํ™”๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. 3. ์ธํ”„๋ผ๊ฐ€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค
  • Pinecone: ๋ณ„๋„ API, ๋น„์šฉ, ๋™๊ธฐํ™”
  • pgvector: ํ™•์žฅ๋งŒ ์„ค์น˜, ์ถ”๊ฐ€ ๋น„์šฉ ์—†์Œ
4. Sonamu Model๊ณผ ํ†ตํ•ฉ์ด ์ž์—ฐ์Šค๋Ÿฝ์Šต๋‹ˆ๋‹ค
class DocumentModelClass extends BaseModel {
  @api({ httpMethod: 'POST' })
  async search(query: string) {
    // Puri๋กœ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ SQL ์ž‘์„ฑ
    const results = await this.getPuri().raw(`...`);
    return results.rows;
  }
}

pgvector๋ž€?

pgvector๋Š” PostgreSQL์—์„œ ๋ฒกํ„ฐ(์ž„๋ฒ ๋”ฉ) ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ํ™•์žฅ์ž…๋‹ˆ๋‹ค. ์ฃผ์š” ๊ธฐ๋Šฅ:
  • vector(N) ๋ฐ์ดํ„ฐ ํƒ€์ž… (N์ฐจ์› ๋ฒกํ„ฐ)
  • ์œ ์‚ฌ๋„ ์—ฐ์‚ฐ์ž (<=>, <->, <#>)
  • ์ธ๋ฑ์Šค (IVFFlat, HNSW)

ํ•„์ˆ˜ ํŒจํ‚ค์ง€ ์„ค์น˜

pnpm add pgvector voyageai
ํŒจํ‚ค์ง€:
  • pgvector: PostgreSQL pgvector ํƒ€์ž… ์ง€์› (Knex์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ)
  • voyageai: Voyage AI ์ž„๋ฒ ๋”ฉ (ํ•œ๊ตญ์–ด ์ถ”์ฒœ)
  • @ai-sdk/openai: OpenAI ์ž„๋ฒ ๋”ฉ (์„ ํƒ)

PostgreSQL ํ™•์žฅ ์„ค์น˜

ํ™˜๊ฒฝ๋ณ„ ์„ค์น˜ ๋ฐฉ๋ฒ•

# PostgreSQL ๊ฐœ๋ฐœ ํŒจํ‚ค์ง€
sudo apt-get install postgresql-server-dev-14

# pgvector ๋นŒ๋“œ ๋ฐ ์„ค์น˜
git clone --branch v0.5.1 https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

ํ™•์žฅ ํ™œ์„ฑํ™”

PostgreSQL์— ์ ‘์†ํ•˜์—ฌ ํ™•์žฅ์„ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค:
-- pgvector ํ™•์žฅ ํ™œ์„ฑํ™”
CREATE EXTENSION IF NOT EXISTS vector;

-- ์„ค์น˜ ํ™•์ธ
SELECT * FROM pg_extension WHERE extname = 'vector';

-- ๋ฒ„์ „ ํ™•์ธ
SELECT vector_version();  -- 0.5.1 ์ด์ƒ ๊ถŒ์žฅ

Sonamu ํ”„๋กœ์ ํŠธ์— ์ ์šฉํ•˜๊ธฐ

1. ํ™˜๊ฒฝ๋ณ€์ˆ˜ ์„ค์ •

# PostgreSQL (์ด๋ฏธ ์žˆ์„ ๊ฒƒ)
DATABASE_URL=postgresql://user:password@localhost:5432/mydb

# ์ž„๋ฒ ๋”ฉ API (๋‚˜์ค‘์— ์‚ฌ์šฉ)
VOYAGE_API_KEY=pa-...
# ๋˜๋Š”
OPENAI_API_KEY=sk-...

2. Sonamu Config ํ™•์ธ

// sonamu.config.ts
import { defineConfig } from "sonamu";

export default defineConfig({
  database: {
    name: "myapp",
    defaultOptions: {
      client: "pg",
      connection: {
        host: "localhost",
        port: 5432,
        user: "postgres",
        password: "postgres",
        database: "myapp",
      },
    },
  },
});
Sonamu๋Š” ์ด๋ฏธ PostgreSQL์„ ์“ฐ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€ ์„ค์ • ๋ถˆํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

3. Knex Migration์œผ๋กœ ํ…Œ์ด๋ธ” ์ƒ์„ฑ

Sonamu์˜ Migration์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒกํ„ฐ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค:
// migrations/20240101000000_add_vector_search.ts
import type { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
  // 1. pgvector ํ™•์žฅ ํ™œ์„ฑํ™”
  await knex.raw('CREATE EXTENSION IF NOT EXISTS vector');
  
  // 2. embedding ์ปฌ๋Ÿผ ์ถ”๊ฐ€
  await knex.schema.table('documents', (table) => {
    // Voyage AI๋Š” 1024์ฐจ์›
    table.specificType('embedding', 'vector(1024)');
  });
  
  // 3. ์ธ๋ฑ์Šค๋Š” ๋‚˜์ค‘์— (๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ธ ํ›„)
  // await knex.raw(`
  //   CREATE INDEX ON documents 
  //   USING hnsw (embedding vector_cosine_ops)
  // `);
}

export async function down(knex: Knex): Promise<void> {
  await knex.schema.table('documents', (table) => {
    table.dropColumn('embedding');
  });
}
์‹คํ–‰:
pnpm sonamu migrate:latest

4. ์ฒ˜์Œ๋ถ€ํ„ฐ ๋ฒกํ„ฐ ํ…Œ์ด๋ธ” ๋งŒ๋“ค๊ธฐ

์ƒˆ ํ…Œ์ด๋ธ”์„ ๋งŒ๋“ ๋‹ค๋ฉด:
// migrations/20240101000001_create_knowledge_base.ts
import type { Knex } from "knex";

export async function up(knex: Knex): Promise<void> {
  await knex.raw('CREATE EXTENSION IF NOT EXISTS vector');
  
  await knex.schema.createTable('knowledge_base', (table) => {
    table.increments('id').primary();
    table.text('title').notNullable();
    table.text('content').notNullable();
    table.string('category', 50);
    
    // ๋ฒกํ„ฐ ์ปฌ๋Ÿผ
    table.specificType('embedding', 'vector(1024)');
    
    table.timestamps(true, true);
    
    // ์ผ๋ฐ˜ ์ธ๋ฑ์Šค
    table.index('category');
  });
}

export async function down(knex: Knex): Promise<void> {
  await knex.schema.dropTableIfExists('knowledge_base');
}

๋ฒกํ„ฐ ์ฐจ์› ์ดํ•ดํ•˜๊ธฐ

์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ๋งˆ๋‹ค ๋ฒกํ„ฐ ์ฐจ์›์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค:
import { Embedding } from "sonamu/vector";

// Voyage AI: 1024์ฐจ์›
const voyageDim = Embedding.getDimensions('voyage');
console.log(voyageDim);  // 1024

// OpenAI: 1536์ฐจ์›
const openaiDim = Embedding.getDimensions('openai');
console.log(openaiDim);  // 1536
ํ…Œ์ด๋ธ” ์ƒ์„ฑ ์‹œ ์ฐจ์› ์ˆ˜๋ฅผ ๋งž์ถฐ์•ผ ํ•ฉ๋‹ˆ๋‹ค:
-- Voyage AI ์‚ฌ์šฉ ์‹œ
CREATE TABLE docs (
  embedding vector(1024)
);

-- OpenAI ์‚ฌ์šฉ ์‹œ
CREATE TABLE docs (
  embedding vector(1536)
);

์ธ๋ฑ์Šค - ๋‚˜์ค‘์— ๋งŒ๋“ค๊ธฐ

์ค‘์š”: ์ธ๋ฑ์Šค๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํžˆ ์Œ“์ธ ํ›„์— ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์™œ ๋‚˜์ค‘์—?

// โŒ ๋‚˜์œ ์ˆœ์„œ
await knex.raw('CREATE INDEX ...');  // ๋จผ์ € ์ธ๋ฑ์Šค
await DocumentModel.saveOne({ embedding });  // ๋‚˜์ค‘์— ๋ฐ์ดํ„ฐ

// โœ… ์ข‹์€ ์ˆœ์„œ
await DocumentModel.saveOne({ embedding });  // ๋จผ์ € ๋ฐ์ดํ„ฐ (100๊ฐœ ์ด์ƒ)
await knex.raw('CREATE INDEX ...');  // ๋‚˜์ค‘์— ์ธ๋ฑ์Šค
๋ฐ์ดํ„ฐ๊ฐ€ ์—†์œผ๋ฉด ์ธ๋ฑ์Šค๊ฐ€ ์ตœ์ ํ™”๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

HNSW ์ธ๋ฑ์Šค (๊ถŒ์žฅ)

๋ฐ์ดํ„ฐ๊ฐ€ 100๊ฐœ ์ด์ƒ ์Œ“์ธ ํ›„:
// migrations/20240101000002_add_vector_index.ts
export async function up(knex: Knex): Promise<void> {
  await knex.raw(`
    CREATE INDEX idx_docs_embedding 
    ON documents 
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64)
  `);
}
ํŒŒ๋ผ๋ฏธํ„ฐ:
  • m = 16: ์—ฐ๊ฒฐ ์ˆ˜ (๊ธฐ๋ณธ๊ฐ’, ๋Œ€๋ถ€๋ถ„ OK)
  • ef_construction = 64: ๊ตฌ์ถ• ์‹œ ํƒ์ƒ‰ ํฌ๊ธฐ

IVFFlat ์ธ๋ฑ์Šค (๋น ๋ฅธ ๊ตฌ์ถ•)

HNSW๊ฐ€ ๋„ˆ๋ฌด ๋А๋ฆฌ๋ฉด:
CREATE INDEX idx_docs_embedding 
ON documents 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

์‹ค์ „ ์‹œ๋‚˜๋ฆฌ์˜ค

์‹œ๋‚˜๋ฆฌ์˜ค: ์ง€์‹ ๋ฒ ์ด์Šค ๊ตฌ์ถ•

Sonamu๋กœ ์‚ฌ๋‚ด ์ง€์‹ ๋ฒ ์ด์Šค๋ฅผ ๋งŒ๋“ค๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. 1๋‹จ๊ณ„: ํ…Œ์ด๋ธ” ์„ค๊ณ„
CREATE TABLE knowledge_base (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  content TEXT NOT NULL,
  category VARCHAR(50),
  embedding vector(1024),  -- Voyage AI
  created_at TIMESTAMP DEFAULT NOW()
);
2๋‹จ๊ณ„: ๋ฐ์ดํ„ฐ ์ž…๋ ฅ (๋‚˜์ค‘์—)
// ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ ํ›„ ์ €์žฅ (embeddings.mdx ์ฐธ๊ณ )
await KnowledgeBaseModel.saveOne({
  title: "Sonamu ์‹œ์ž‘ํ•˜๊ธฐ",
  content: "...",
  category: "documentation",
  embedding: [...],  // 1024๊ฐœ ์ˆซ์ž ๋ฐฐ์—ด
});
3๋‹จ๊ณ„: ์ธ๋ฑ์Šค ์ƒ์„ฑ (๋ฐ์ดํ„ฐ 100๊ฐœ ์ด์ƒ ์Œ“์ธ ํ›„)
CREATE INDEX ON knowledge_base 
USING hnsw (embedding vector_cosine_ops);
4๋‹จ๊ณ„: ๊ฒ€์ƒ‰ API (๋‚˜์ค‘์—)
class KnowledgeBaseModelClass extends BaseModel {
  @api({ httpMethod: 'POST' })
  async search(query: string) {
    // vector-search.mdx ์ฐธ๊ณ 
    const results = await this.getPuri().raw(`...`);
    return results.rows;
  }
}

์ฃผ์˜์‚ฌํ•ญ

pgvector ์„ค์ • ์‹œ ์ฃผ์˜์‚ฌํ•ญ:
  1. ์ฐจ์› ์ˆ˜ ์ผ์น˜: ํ…Œ์ด๋ธ”๊ณผ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์˜ ์ฐจ์›์ด ๊ฐ™์•„์•ผ ํ•จ
    -- Voyage AI (1024)
    CREATE TABLE docs (embedding vector(1024));
    
  2. ์ธ๋ฑ์Šค๋Š” ๋‚˜์ค‘์—: ๋ฐ์ดํ„ฐ 100๊ฐœ ์ด์ƒ ํ›„ ์ƒ์„ฑ
    // 1. ๋ฐ์ดํ„ฐ ๋จผ์ €
    await saveDocuments();
    
    // 2. ์ธ๋ฑ์Šค ๋‚˜์ค‘์—
    await createIndex();
    
  3. NULL ํ—ˆ์šฉ: ๋ชจ๋“  ๋ฌธ์„œ์— ์ฆ‰์‹œ ์ž„๋ฒ ๋”ฉ ๋ชป ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ
    -- NULL ํ—ˆ์šฉ (๋‚˜์ค‘์— ์—…๋ฐ์ดํŠธ ๊ฐ€๋Šฅ)
    embedding vector(1024)
    
  4. Migration์œผ๋กœ ๊ด€๋ฆฌ: ์ง์ ‘ SQL๋ณด๋‹ค Migration ์‚ฌ์šฉ
    pnpm sonamu migrate:latest
    
  5. ํ™•์žฅ ๋ฒ„์ „: 0.5.1 ์ด์ƒ ๊ถŒ์žฅ
    SELECT vector_version();
    

๋‹ค์Œ ๋‹จ๊ณ„

pgvector ์„ค์น˜๊ฐ€ ์™„๋ฃŒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•˜๊ณ  ๊ฒ€์ƒ‰์„ ๊ตฌํ˜„ํ•  ์ฐจ๋ก€์ž…๋‹ˆ๋‹ค.