e5-large-v2-nli-v1 / README.md
hongming's picture
Add new SentenceTransformer model.
0411b4a verified
metadata
base_model: intfloat/e5-large-v2
datasets:
  - sentence-transformers/all-nli
language:
  - en
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
  - pearson_manhattan
  - spearman_manhattan
  - pearson_euclidean
  - spearman_euclidean
  - pearson_dot
  - spearman_dot
  - pearson_max
  - spearman_max
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:10000
  - loss:SoftmaxLoss
widget:
  - source_sentence: >-
      A man selling donuts to a customer during a world exhibition event held in
      the city of Angeles
    sentences:
      - The man is doing tricks.
      - A woman drinks her coffee in a small cafe.
      - The building is made of logs.
  - source_sentence: A group of people prepare hot air balloons for takeoff.
    sentences:
      - There are hot air balloons on the ground and air.
      - A man is in an art museum.
      - People watch another person do a trick.
  - source_sentence: Three workers are trimming down trees.
    sentences:
      - The goalie is sleeping at home.
      - There are three workers
      - The girl has brown hair.
  - source_sentence: >-
      Two brown-haired men wearing short-sleeved shirts and shorts are climbing
      stairs.
    sentences:
      - The men have blonde hair.
      - A bicyclist passes an esthetically beautiful building on a sunny day
      - Two men are dancing.
  - source_sentence: A man is sitting in on the side of the street with brass pots.
    sentences:
      - a younger boy looks at his father
      - Children are at the beach.
      - a man does not have brass pots
model-index:
  - name: SentenceTransformer based on intfloat/e5-large-v2
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.25153764364319275
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.3291921844406249
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.2966881773862295
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.32789142408327193
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.29957914563527244
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.3291921844406249
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.2515376443724997
            name: Pearson Dot
          - type: spearman_dot
            value: 0.3291921844406249
            name: Spearman Dot
          - type: pearson_max
            value: 0.29957914563527244
            name: Pearson Max
          - type: spearman_max
            value: 0.3291921844406249
            name: Spearman Max
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.27914347241714155
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.30504478158921217
            name: Spearman Cosine
          - type: pearson_manhattan
            value: 0.3034422953603654
            name: Pearson Manhattan
          - type: spearman_manhattan
            value: 0.30482947439377617
            name: Spearman Manhattan
          - type: pearson_euclidean
            value: 0.30503064655519824
            name: Pearson Euclidean
          - type: spearman_euclidean
            value: 0.30504478158921217
            name: Spearman Euclidean
          - type: pearson_dot
            value: 0.2791434684526028
            name: Pearson Dot
          - type: spearman_dot
            value: 0.30504478158921217
            name: Spearman Dot
          - type: pearson_max
            value: 0.30503064655519824
            name: Pearson Max
          - type: spearman_max
            value: 0.30504478158921217
            name: Spearman Max

SentenceTransformer based on intfloat/e5-large-v2

This is a sentence-transformers model finetuned from intfloat/e5-large-v2 on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("hongming/e5-large-v2-nli-v1")
# Run inference
sentences = [
    'A man is sitting in on the side of the street with brass pots.',
    'a man does not have brass pots',
    'Children are at the beach.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.2515
spearman_cosine 0.3292
pearson_manhattan 0.2967
spearman_manhattan 0.3279
pearson_euclidean 0.2996
spearman_euclidean 0.3292
pearson_dot 0.2515
spearman_dot 0.3292
pearson_max 0.2996
spearman_max 0.3292

Semantic Similarity

Metric Value
pearson_cosine 0.2791
spearman_cosine 0.305
pearson_manhattan 0.3034
spearman_manhattan 0.3048
pearson_euclidean 0.305
spearman_euclidean 0.305
pearson_dot 0.2791
spearman_dot 0.305
pearson_max 0.305
spearman_max 0.305

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 10,000 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 6 tokens
    • mean: 17.38 tokens
    • max: 52 tokens
    • min: 4 tokens
    • mean: 10.7 tokens
    • max: 31 tokens
    • 0: ~33.40%
    • 1: ~33.30%
    • 2: ~33.30%
  • Samples:
    premise hypothesis label
    A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 1
    A person on a horse jumps over a broken down airplane. A person is at a diner, ordering an omelette. 2
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 0
  • Loss: SoftmaxLoss

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 1,000 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 6 tokens
    • mean: 18.44 tokens
    • max: 57 tokens
    • min: 5 tokens
    • mean: 10.57 tokens
    • max: 25 tokens
    • 0: ~33.10%
    • 1: ~33.30%
    • 2: ~33.60%
  • Samples:
    premise hypothesis label
    Two women are embracing while holding to go packages. The sisters are hugging goodbye while holding to go packages after just eating lunch. 1
    Two women are embracing while holding to go packages. Two woman are holding packages. 0
    Two women are embracing while holding to go packages. The men are fighting outside a deli. 2
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - - 0.8888 -
0.16 100 1.0934 1.0656 0.5733 -
0.32 200 1.0461 1.0245 0.3466 -
0.48 300 1.037 1.0152 0.3391 -
0.64 400 1.0013 0.9931 0.3333 -
0.8 500 1.0014 0.9871 0.3825 -
0.96 600 0.9827 0.9705 0.3292 -
1.0 625 - - - 0.3050

Framework Versions

  • Python: 3.8.13
  • Sentence Transformers: 3.1.0.dev0
  • Transformers: 4.43.3
  • PyTorch: 2.1.2
  • Accelerate: 0.33.0
  • Datasets: 2.16.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}