Are there two identical embedding tensors, even though embeddings are shared?

#15
by graefics - opened

The SmolLM models have tied embeddings (config.tie_word_embeddings = True), i.e. input and output embeddings are shared.

However, the model contains two identical embedding tensors lm_head.weight and model.embed_tokens.weight. This seems to defeat the purpose of having shared (aka tied) embeddings. Any ideas?

Clicking on the arrow on the right-hand side (see screenshot below) shows a summary of the parameters without the lm_head tensor, see the second screenshot below (model.norm.weight is the last tensor shown):
image.png

image.png

Here is code showing that the model contains two identical tensors for input and output embeddings:

from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained('HuggingFaceTB/SmolLM-135M')
print(torch.equal(model.lm_head.weight, model.model.embed_tokens.weight))

Above code returns True, because the two tensors are identical.

The issue might be in modeling_llama.py, which doesn't seem to fully support shared embeddings. Specifically, line 1210

            logits = self.lm_head(hidden_states[:, -num_logits_to_keep:, :]).float()

Contrast this for example with OpenELM's code that seems to fully support tied and untied embeddings on line 878

        if self.lm_head is None:
            # shared
            logits = F.linear(hidden_states, weight=self.transformer.token_embeddings.weight)
        else:
            logits = self.lm_head(hidden_states)
graefics changed discussion title from Are there two identical embedding tensors in the model, even though input and output embeddings are shared? to Are there two identical embedding tensors in the model, even though embeddings are shared?
graefics changed discussion title from Are there two identical embedding tensors in the model, even though embeddings are shared? to Are there two identical embedding tensors, even though embeddings are shared?

Sign up or log in to comment