nicolay-r (Nicolay Rusnachenko)

posted an update about 2 hours ago

Post

62

📢 Seriously, We can't go with Big5 or other non structured descriptions to diverse large amount of characters 👨‍👩‍👦‍👦 from many books 📚. Instead, The factorization + open-psychometrics antonyms extracted from dialogues is a key 🔑 for automatic character profiling that purely relies on book content 📖. With that, happy to share delighted to share with you 🙌 more on this topic in YouTube video:

https://youtu.be/UQQsXfZyjjc

🔑 From which you will find out:
✅ How to perform book processing 📖 aimed at personalities extraction
✅ How to impute personalities 👨‍👩‍👦‍👦 and character network for deep learning 🤖
✅ How to evaluate 📊 advances / experiment findings 🧪

Additional materials:
🌟 Github: https://github.com/nicolay-r/book-persona-retriever
📜 Paper: https://www.dropbox.com/scl/fi/0c2axh97hadolwphgu7it/rusnachenko2024personality.pdf?rlkey=g2yyzv01th2rjt4o1oky0q8zc&st=omssztha&dl=1
📙 Google-colab experiments: https://colab.research.google.com/github/nicolay-r/deep-book-processing/blob/master/parlai_gutenberg_experiments.ipynb
🦜 Task: https://github.com/nicolay-r/parlai_bookchar_task/tree/master

posted an update 2 days ago

Post

523

📢 It is less meaningful to prompt LLM directly for opinion mining. Instead, the
Three-hop (💡aspect + 🤔opinion + 🧠reason) Chain-of-Thought reasoning concept
represent a decent solution for reason sentiments.

After a series of the related posts here on huggingface, I am happy to invite you 🙌 on my talk @ NLPSummit2024.
I am going to take part of the Healthcare Day 2 (25th of September) with the calendar details and 🖇 link to the event below 👇

🎤 Event: https://www.nlpsummit.org/nlp-summit-2024/
📅 Calendar event: https://calendar.app.google/f7AUhNHuTw5JtPs36
⏲ Time: 25th of September @ 2:00 PM ET – 2:30 PM ET

📊 Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
🧠 Framework: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework
📝 Paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)

posted an update 13 days ago

Post

360

📢 The Three-hop (💡aspect + 🤔opinion + 🧠reason) Chain-of-Thought concept + LLM represent a decent concept for reasoning emotions of participants in textual dialogues.
Delighted to share the tutorial video which make you aware of:
✅ The proper application of LLM towards implicit IR
✅ Ways for aligning different information types (causes and states) within the same LLM
✅ Launch your LLM in GoogleColab that is capable for characters Emotion Extraction in dialogues 🧪

🎥: https://www.youtube.com/watch?v=vRVDQa7vfkU

Project: https://github.com/nicolay-r/THOR-ECAC
Paper: https://aclanthology.org/2024.semeval-1.4/
Model card: nicolay-r/flan-t5-emotion-cause-thor-base

posted an update 3 months ago

Post

491

📢 Delighted to share the most recent and valuable contributions to the book-related NLP domain 💎 To push forward deeper understanding of the characters 👨‍👩‍👧‍👦 from literature novel books itself 📖 by machine learning models 🤖, releasing the most-accessible version v1.0 of the related workflow, adopted for ParlAI 🦜 agents

🌟 https://github.com/nicolay-r/book-persona-retriever/tree/v1.0

Feel free to follow / share / comment in order to advance the related direction!

posted an update 3 months ago

Post

710

📢 I've tested google/Gemma-2-9b-it in Target Sentiment Analysis (TSA), in zero-shot learning mode on RuSentNE-2023 dataset with texts translated into English (🇺🇸).

🔎 Findings: The key contribution with the most recent Gemma-2 release is reasoning alignment between different langauges. This is basically the first model under 10B category which shows equal results in English and non-english texts. In the case of texts in English it performs similar to LLaMa-3-8B / Mistal-7B

Model: google/gemma-2-9b-it
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark

posted an update 3 months ago

Post

870

📢 I've tested the most recent google/Gemma-2-9b-it in Sentiment Analyis and the obtained results makes me shocked! 🤯 It end up becoming a king 👑 that showcase top-1 across all the models and categories in Target Sentiment Analysis (TSA) on non-english texts (🇷🇺).

That's impressive to say the least, it surpassed all the other models benchmarked before and within categories of 100B and below by F1(PN) and nearly touched GPT-4 by F1(PN0). Google research team did a great job! 👏

Model: google/gemma-2-9b-it
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark

posted an update 3 months ago

Post

677

So far I've implemented more accurate 👌 assessment of LLMs reasoning capabilities in Target Sentiment Analysis (zero-shot mode). With that, recalculated tables of the related benchmark 📊 also has better separation into categories, with the following 🏆 top 🏆 performing models:

🟩 1. Proprietary models (🏆 GPT-4 🇺🇸 / GPT-3.5-0613 🇷🇺 )
🟥 2. Open and < 100B (🏆 LLaMA-3-70B)
🟧 3. Open and < 10B (🏆LLaMA-3-8B-Instruct 🇺🇸 / Qwen-2-7B-Instruct 🇷🇺)
🟨 4. Open and less 1B (🏆Flan-T5-large 🇺🇸 / Qwen2-0.5B-Instruct 🇷🇺)

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark

posted an update 3 months ago

Post

449

With the most recent workshop on Semantic Evaluation as a part of NAACL-2024, this year delighted to contribute with 🧪 on Chain-of-Thought fine-tuning concepts to push forward LLMs reasoning capabilities in:

🧪 1. Reading Comprehension of Numerals in texts 🇨🇳
⭐ https://github.com/GavinZhao19/SemEval24-NumAnalysis-CN
🔒 https://huggingface.co./GavinZhao23/NumAnalysis-Chatglm3-6B

🧪 2. Extracting Emotion-Causes using Reasoning Revision (RR)
⭐ https://github.com/nicolay-r/THOR-ECAC
🔓 nicolay-r/flan-t5-emotion-cause-thor-base
nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States (2404.03361)

🔑 In short, there are three major takeaways:
✅ 1. The scale of the backboned LLM for SFT matters (>1.1B is preferable)
✅ 2. The language of the input data matters in LLM reasoning capabilities: transfering data in English and picking English-based LLM is crucial for the most cases!
✅ 3. CoT and RR takes more time ⏳ for inferring and fine-tuning, proportionally to abount of steps in chain / amount of revisions in reasoning 🧠

posted an update 3 months ago

Post

685

📊 Lovely to share the unique reasoning capabilities 🧠 findings of Qwen2-7B 🇨🇳 in Target Sentiment Analysis (TSA) for original texts (🇷🇺) and their translated version in English (🇺🇸), in zero-shot-learning mode.
Since the last update on 1.5, I have to say:
☑️ 1. Qwen2-7B is the first model in my list that reasons 🔥 better 🔥 in Russian rather than in English; it strongly surpasses other 7B LLMs and LLaMA3-70B by correctly distributing sentiment cases (F1(PN) metric).
☑️ 2. Surprisingly, but Qwen2-7B significantly underperformed to the "earlier bro" Qwen1.5-7B on texts in English. The key problem is that ~17% of answers has mixed entries of labels, so for such cases the automatic and accurate assessment is difficult. Therefore, I believe it is more about particular evaluation, rather something wrong with the model in TSA domain.

What's next? I have to checkout Qwen/Qwen2-72B-Instruct then 🧪 If you know the best hosting for infering, please let me know 🙏

Model: Qwen/Qwen1.5-7B-Chat

replied to their post 3 months ago

Thank you! 🙌

posted an update 3 months ago

Post

870

📊 Just measured reasoning capabilities 🧠 of Qwen1.5-7B 🇨🇳 in Target Sentiment Analysis (TSA) both for original texts (🇷🇺) and translated in English (🇺🇸), in zero-shot-learning mode. Here is what I've noticed:
☑️ 1. Huge gap 📈 with the smaller Qwen1.5 and Qwen2 (1.8B and 1.8B). Qwen1.5-7B strongly outperforms their "smaller bros" so that case when scale of the model matters.
☑️ 2. Qwen1.5-7B in english (🇺🇸) behaves similar but slightly underperforming 📉 to the most latest 7B alternatives ... and even including Phi-3-small (3.4B)
☑️ 3. On texts in (🇷🇺) there is a certain underperforming 📉 gap between the most latest 7B alternatives: F1=34.1, other 7B starts with 40.23.

In terms of responses, for non-english texts (🇷🇺) model answers strict and behaves similar to FlanT5.
Curious about improvements in Qwen2-7B 🔥

Model: Qwen/Qwen1.5-7B-Chat
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

2 replies

·

posted an update 3 months ago

Post

2439

📢 Suprisingly, there are so many works on imputing personalities in LLM and vice versa. However, there is a gap in literature novels 📚 for mining that personalities from book itself. With that I am happy to release worflow that 🔥 solely 🔥 relies on book content only 📖 for personalities extraction:
https://github.com/nicolay-r/book-persona-retriever

💡 The downstream goal of this workflow is to enhance charactes understanding ... and not just through their mentions in books, but through their personalities (⛏ retrieved with the given lexicon from the 📖 itself)

The most closest studies such as PERSONA-CHAT (arXiv:1801.07243v5), BookEmbeddingEval (2022.findings-acl.81.pdf), ALOHA-Chatbot ( arXiv:1910.08293v4), Meet your favorite Character (arXiv:2204.10825), and PRODIGy (arXiv:2311.05195v1) were so valuable 💎 ! 👏

Curious on existance of the fine-tuned LLM for detecting personalities in text passages on huggingface hub 🤗 If you aware about the one coud be potentially embedded into system for further advances, please feel free to recomend 🙌

posted an update 4 months ago

Post

1685

📢Delighted to share personal findings 🔎 East Asian LLM Qwen1.5 🇨🇳 reasoning capabilities 🧠 Target Sentiment Analysis (TSA). Starting with the one of the smallest Qwen1.5-1.8B-Chat version, for the original Eastern-European texts (🇷🇺) and their translated versions (🇺🇸) in zero-shot-learning mode setup, the key takeaways of such experiments were as follows:

✅ 1. Model is capable to perform reasoning in Eastern Eropean languages (🇷🇺) (remember it is 1.8B), switching to Qwen2 results in strong improvement, with results that surpasses LLaMA2-70B-chat (more on difference below).
✅ 2. Despite the size of 1.8B, reasoning in English has a significant gap in underpeforming (F1=~34%) to the most closest Flan-T5-XL (2.8B) which showcases F1=43%.

💡 The most intriguing fact that Qwen1.5-1.8B-Chat:
it generates new words in Russian I've never seen before: "неретеневая" (negativitively), and imputes the entries in Chinese. The reason of such a low results, is that model was not been able to follow the input instruction and shares all the opinions per each class. All of that has been improved though in Qwen2-1.5B.

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: Qwen/Qwen1.5-1.8B-Chat
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

posted an update 4 months ago

Post

652

📢 The Chain-of-Thought (CoT)-tuned 🔥 FlanT5-base (248M) for Emotion State and Emotion-Causes Extraction as a part of ECAC-2024 competiotion model is now available.
💡 The main reason for make it publicly available is as follows:
✅ 1. One of the CoT-based attempts in this field so that I promote studies by making intial steps 👣 and attepts on assessing LLM reasoning capabilties
✅ 2. This model showcases top 3 🥉 on the ECAC-2024 competition https://codalab.lisn.upsaclay.fr/competitions/16141#results
✅ 3. Easy colab for frameworkless lauch and experiments 🧪
https://colab.research.google.com/github/nicolay-r/THOR-ECAC/blob/master/SemEval_2024_Task_3_FlanT5_Finetuned_Model_Usage.ipynb

You may find more on the model card, while the fine-tuning concept showcased on the figure below. It is worth to add that the more robust preformance been seen with larger scaled model (large and xl), so that there is a huge potential there

Model: nicolay-r/flan-t5-emotion-cause-thor-base
Related paper: nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States (2404.03361)
Collection: nicolay-r/emotions-extraction-665ba47a20dee2925d607a40

replied to their post 4 months ago

Well, your choice then would be LLM, where DialoGPT is not quite suitable for this category. Counting on your resources with Mac, I would say you have to go with something similar scale as DialoGPT and the most recent. So that the model I mentioned (https://huggingface.co./Qwen/Qwen2-0.5B) would be a perfect choice to start and try out

replied to their post 4 months ago

you can try out this one:
https://huggingface.co./Qwen/Qwen2-0.5B
Which is ~ 2 times larger.
You might probably have to update jupyter notebook with respect to the model card usage of Qwen2-0.5B

replied to their post 4 months ago

I believe that switching from the microsoft/DialoGPT-small (176M) model to the 8B sized model which is ~45 times larger results in way more longer inference. Especially if you're able to launch that inference from CPU mode.

posted an update 4 months ago

Post

1648

📢 Releasing the Chain-of-Thought (CoT)-tuned 🔥 FlanT5-xl (3B) for Target Sentiment Analysis (TSA) on english texts.
💡 The main reason for adopting this model or smaller version (large and base) are as follows:
✅ 1. Reasoning in sentiment-analysis in zero-shot-learning mode significantly underperforms the fine-tuned FlanT5.
✅ 2. This model showcases top 1 🏆 on the RuSentNE-2023 competitions: https://codalab.lisn.upsaclay.fr/competitions/9538
✅ 3. Easy colab for frameworkless lauch and experiments 🧪 https://colab.research.google.com/github/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework/blob/main/FlanT5_Finetuned_Model_Usage.ipynb

You may find more on the model card, while the fine-tuning statistics per each model size is shown in attachment.

Model: nicolay-r/flan-t5-tsa-thor-xl
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

posted an update 4 months ago

Post

2114

The application of Phi-3-small-8k-instruct for reasoning in Target Sentiment Analysis (TSA), in a zero-shot-learning mode. Comparing with the other 7B vendors, the key takeaways are as follows:
✅ 1. At the moment this model on the top 🎉 of the 7B sized versions for texts translated in English (🇺🇸) by surpassing Mistral-7B-v0.3 and LLaMA-3-8B 🔥 (Figure 1)
✅ 2. It remains similar to 7B alternatives in original non-english texts (🇷🇺), however show confidence in sentiment presence among other 7B alternatives (checkout F1(PN0) results in Figure 2)

In comparison with its mini (3B) brother Phi-3-mini, the small (7B) version showcases a huge step in reasoning capabilities 🔥

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: microsoft/Phi-3-small-8k-instruct
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

6 replies

·

posted an update 4 months ago

Post

2409

📢 The most recent Mistral-7B-Instruct-v0.3 release showcases more robust advances in zero-shot-mode mode on Target Sentiment Analysis.
🧪 We experiment with the original texts (🇷🇺 ) and their translated version into English (🇺🇸).
💡 The key takeaways on the expectation from this model are as follows:
✔️ 1. On translated texts into English (🇺🇸), it surpasses LLaMA-3 and and nearly touches MOE Mixtral 8x7B versions being quite precise by F1 across all the classes by F1(PN)
✔️2. On orignal texts (🇷🇺) It slightly surpasses LLaMA-3 by F1(PN) by being less tolerant in neutral (F1(PN0)). Using larger versions (Mixtral) are still preferable choice for reasoning 🧠 in non-eng texts.
✔️3. You can clearly see the difference between 7B version and MOE (figure 3) by F1(PN0)
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: mistralai/Mistral-7B-Instruct-v0.3
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

replied to their post 4 months ago

Hi @hiauiarau ! I would love to, if there would be certain platforms for hosting such as replicate.io due to the resources limit. Soon I will share the reasoning results 🧠 of the 7B sized Phi-3-Small

posted an update 4 months ago

Post

1669

📢 Impressed with the application of the microsoft/Phi-3-mini-4k-instruct (3B) performance in zero-shot-learning (ZSL) mode reasoning 🧠 on Target Sentiment Analysis (TSA) problem.
💡 There are three major takeaways out of this experiment 🧪 and they are as follows:
✅ 1. Phi-3 slightly outperforms Mistral-7B (official Mistral API, v0.1 or v0.2) on texts written in English
✅ 2. Performs similar to LLaMA-3-8B-Instruct on texts translated in English 🔥
☑️ 3. Reasoning in non-english language (🇷🇺) is pretty decent but underperforms to the similar 7B sized models.

This is a huge step forward since release of Phi-2, especially because the predecessor (microsoft/phi-2) was not capable for performing reasoning in non-english texts (🇷🇺) at all!

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: microsoft/Phi-3-mini-4k-instruct
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

3 replies

·

posted an update 4 months ago

Post

2183

The most recent LLaMA-3-70B Instruct showcases the beast performance in zero-shot-learning mode in Target-Sentiment-Analsys (TSA) 🔥🚀 In particular we experiment with sentence-level analysis, with sentences fetched from the WikiArticles that were formed into RuSentNE-2023 dataset.

The key takeaways out of LLaMA-3-70B performance on original (🇷🇺) texts and translated into English are as follows:
1. Outperforms all ChatGPT-4 and all predecessors on non-english-texts (🇷🇺)
2. Surpasses all ChatGPT-3.5 / nearly performs as good as ChatGPT-4 on english texts 🥳

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: meta-llama/Meta-Llama-3-70B-Instruct
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101

Nicolay Rusnachenko

AI & ML interests

Organizations

nicolay-r's activity