Edit model card

Model Card

StyleDistance is a style embedding model that aims to embed texts with similar writing styles closely and different styles far apart, regardless of content. You may find this model useful for stylistic analysis of text, clustering, authorship identfication and verification tasks, and automatic style transfer evaluation.

Training Data and Variants of StyleDistance

StyleDistance was contrastively trained on SynthSTEL, a synthetically generated dataset of positive and negative examples of 40 style features being used in text. By utilizing this synthetic dataset, StyleDistance is able to achieve stronger content-independence than other style embeddding models currently available. This particular model was purely trained on synthetic data. For a version that is trained using a combination of the synthetic dataset and a real dataset that makes use of authorship datasets from Reddit to train style embeddings, see this other version of StyleDistance.

Example Usage

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

model = SentenceTransformer('StyleDistance/styledistance_synthetic_only') # Load model

input = model.encode("Did you hear about the Wales wing? He'll h8 2 withdraw due 2 injuries from future competitions.")
others = model.encode(["We're raising funds 2 improve our school's storage facilities and add new playground equipment!", "Did you hear about the Wales wing? He'll hate to withdraw due to injuries from future competitions."])
print(cos_sim(input, others))

This model was trained with a synthetic dataset with DataDreamer 🤖💤. The synthetic dataset card and model card can be found here. The training arguments can be found here.

Downloads last month
43
Safetensors
Model size
125M params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for StyleDistance/styledistance_synthetic_only

Finetuned
this model

Dataset used to train StyleDistance/styledistance_synthetic_only