Matryoshka Representation Learning

  • 2025-04-08 (modified: 2025-04-14)
  • 별칭: MRL

러시아 마트료시카 인형처럼 작은 차원의 임베딩이 큰 차원의 임베딩 안에 중첩되는 구조. 하나의 모델로 다양한 차원(예: 64, 128, 256, 512차원)의 임베딩을 동시에 얻어낼 수 있다.

Articles

Models

  • Gemini의 텍스트 임베딩 모델 gemini-embedding-exp-03-07

Normalization

In general, using the dimensions parameter when creating the embedding is the suggested approach. In certain cases, you may need to change the embedding dimension after you generate it. When you change the dimension manually, you need to be sure to normalize the dimensions of the embedding as is shown below.1

from openai import OpenAI
import numpy as np

client = OpenAI()

def normalize_l2(x):
    x = np.array(x)
    if x.ndim == 1:
        norm = np.linalg.norm(x)
        if norm == 0:
            return x
        return x / norm
    else:
        norm = np.linalg.norm(x, 2, axis=1, keepdims=True)
        return np.where(norm == 0, x, x / norm)


response = client.embeddings.create(
    model="text-embedding-3-small", input="Testing 123", encoding_format="float"
)

cut_dim = response.data[0].embedding[:256]
norm_dim = normalize_l2(cut_dim)

print(norm_dim)

Footnotes

  1. platform.openai.com/docs/guides/embeddings

2025 © ak