Matryoshka Representation Learning

2025-04-08 (modified: 2025-06-30)
별칭: MRL

러시아 마트료시카 인형처럼 작은 차원의 임베딩이 큰 차원의 임베딩 안에 중첩되는 구조. 하나의 모델로 다양한 차원(예: 64, 128, 256, 512차원)의 임베딩을 동시에 얻어낼 수 있다.

Articles

Matryoshka Representation Learning: 이 개념을 처음 소개한 2022년 논문

Normalization

In general, using the dimensions parameter when creating the embedding is the suggested approach. In certain cases, you may need to change the embedding dimension after you generate it. When you change the dimension manually, you need to be sure to normalize the dimensions of the embedding as is shown below.¹

from openai import OpenAI
import numpy as np

client = OpenAI()

def normalize_l2(x):
    x = np.array(x)
    if x.ndim == 1:
        norm = np.linalg.norm(x)
        if norm == 0:
            return x
        return x / norm
    else:
        norm = np.linalg.norm(x, 2, axis=1, keepdims=True)
        return np.where(norm == 0, x, x / norm)


response = client.embeddings.create(
    model="text-embedding-3-small", input="Testing 123", encoding_format="float"
)

cut_dim = response.data[0].embedding[:256]
norm_dim = normalize_l2(cut_dim)

print(norm_dim)

Footnotes

platform.openai.com/docs/guides/embeddings ↩