Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information Retrieval・Medical Multimodal NLP (πŸ–Ό+πŸ“) Research Fellow @BU_Research・software developer http://arekit.io・PhD in NLP

Organizations

None yet

nicolay-r's activity

posted an update about 2 hours ago
view post
Post
62
πŸ“’ Seriously, We can't go with Big5 or other non structured descriptions to diverse large amount of characters πŸ‘¨β€πŸ‘©β€πŸ‘¦β€πŸ‘¦ from many books πŸ“š. Instead, The factorization + open-psychometrics antonyms extracted from dialogues is a key πŸ”‘ for automatic character profiling that purely relies on book content πŸ“–. With that, happy to share delighted to share with you πŸ™Œ more on this topic in YouTube video:

https://youtu.be/UQQsXfZyjjc

πŸ”‘ From which you will find out:
βœ… How to perform book processing πŸ“– aimed at personalities extraction
βœ… How to impute personalities πŸ‘¨β€πŸ‘©β€πŸ‘¦β€πŸ‘¦ and character network for deep learning πŸ€–
βœ… How to evaluate πŸ“Š advances / experiment findings πŸ§ͺ

Additional materials:
🌟 Github: https://github.com/nicolay-r/book-persona-retriever
πŸ“œ Paper: https://www.dropbox.com/scl/fi/0c2axh97hadolwphgu7it/rusnachenko2024personality.pdf?rlkey=g2yyzv01th2rjt4o1oky0q8zc&st=omssztha&dl=1
πŸ“™ Google-colab experiments: https://colab.research.google.com/github/nicolay-r/deep-book-processing/blob/master/parlai_gutenberg_experiments.ipynb
🦜 Task: https://github.com/nicolay-r/parlai_bookchar_task/tree/master
posted an update 2 days ago
view post
Post
523
πŸ“’ It is less meaningful to prompt LLM directly for opinion mining. Instead, the
Three-hop (πŸ’‘aspect + πŸ€”opinion + 🧠reason) Chain-of-Thought reasoning concept
represent a decent solution for reason sentiments.

After a series of the related posts here on huggingface, I am happy to invite you πŸ™Œ on my talk @ NLPSummit2024.
I am going to take part of the Healthcare Day 2 (25th of September) with the calendar details and πŸ–‡ link to the event below πŸ‘‡

🎀 Event: https://www.nlpsummit.org/nlp-summit-2024/
πŸ“… Calendar event: https://calendar.app.google/f7AUhNHuTw5JtPs36
⏲ Time: 25th of September @ 2:00 PM ET – 2:30 PM ET

πŸ“Š Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
🧠 Framework: https://github.com/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework
πŸ“ Paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
posted an update 13 days ago
view post
Post
360
πŸ“’ The Three-hop (πŸ’‘aspect + πŸ€”opinion + 🧠reason) Chain-of-Thought concept + LLM represent a decent concept for reasoning emotions of participants in textual dialogues.
Delighted to share the tutorial video which make you aware of:
βœ… The proper application of LLM towards implicit IR
βœ… Ways for aligning different information types (causes and states) within the same LLM
βœ… Launch your LLM in GoogleColab that is capable for characters Emotion Extraction in dialogues πŸ§ͺ

πŸŽ₯: https://www.youtube.com/watch?v=vRVDQa7vfkU

Project: https://github.com/nicolay-r/THOR-ECAC
Paper: https://aclanthology.org/2024.semeval-1.4/
Model card: nicolay-r/flan-t5-emotion-cause-thor-base
posted an update 3 months ago
view post
Post
491
πŸ“’ Delighted to share the most recent and valuable contributions to the book-related NLP domain πŸ’Ž To push forward deeper understanding of the characters πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦ from literature novel books itself πŸ“– by machine learning models πŸ€–, releasing the most-accessible version v1.0 of the related workflow, adopted for ParlAI 🦜 agents

🌟 https://github.com/nicolay-r/book-persona-retriever/tree/v1.0

Feel free to follow / share / comment in order to advance the related direction!
posted an update 3 months ago
view post
Post
710
πŸ“’ I've tested google/Gemma-2-9b-it in Target Sentiment Analysis (TSA), in zero-shot learning mode on RuSentNE-2023 dataset with texts translated into English (πŸ‡ΊπŸ‡Έ).

πŸ”Ž Findings: The key contribution with the most recent Gemma-2 release is reasoning alignment between different langauges. This is basically the first model under 10B category which shows equal results in English and non-english texts. In the case of texts in English it performs similar to LLaMa-3-8B / Mistal-7B

Model: google/gemma-2-9b-it
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
posted an update 3 months ago
view post
Post
870
πŸ“’ I've tested the most recent google/Gemma-2-9b-it in Sentiment Analyis and the obtained results makes me shocked! 🀯 It end up becoming a king πŸ‘‘ that showcase top-1 across all the models and categories in Target Sentiment Analysis (TSA) on non-english texts (πŸ‡·πŸ‡Ί).

That's impressive to say the least, it surpassed all the other models benchmarked before and within categories of 100B and below by F1(PN) and nearly touched GPT-4 by F1(PN0). Google research team did a great job! πŸ‘

Model: google/gemma-2-9b-it
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark

posted an update 3 months ago
view post
Post
677
So far I've implemented more accurate πŸ‘Œ assessment of LLMs reasoning capabilities in Target Sentiment Analysis (zero-shot mode). With that, recalculated tables of the related benchmark πŸ“Š also has better separation into categories, with the following πŸ† top πŸ† performing models:

🟩 1. Proprietary models (πŸ† GPT-4 πŸ‡ΊπŸ‡Έ / GPT-3.5-0613 πŸ‡·πŸ‡Ί )
πŸŸ₯ 2. Open and < 100B (πŸ† LLaMA-3-70B)
🟧 3. Open and < 10B (πŸ†LLaMA-3-8B-Instruct πŸ‡ΊπŸ‡Έ / Qwen-2-7B-Instruct πŸ‡·πŸ‡Ί)
🟨 4. Open and less 1B (πŸ†Flan-T5-large πŸ‡ΊπŸ‡Έ / Qwen2-0.5B-Instruct πŸ‡·πŸ‡Ί)

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
posted an update 3 months ago
view post
Post
449
With the most recent workshop on Semantic Evaluation as a part of NAACL-2024, this year delighted to contribute with πŸ§ͺ on Chain-of-Thought fine-tuning concepts to push forward LLMs reasoning capabilities in:

πŸ§ͺ 1. Reading Comprehension of Numerals in texts πŸ‡¨πŸ‡³
⭐ https://github.com/GavinZhao19/SemEval24-NumAnalysis-CN
πŸ”’ https://huggingface.co./GavinZhao23/NumAnalysis-Chatglm3-6B

πŸ§ͺ 2. Extracting Emotion-Causes using Reasoning Revision (RR)
⭐ https://github.com/nicolay-r/THOR-ECAC
πŸ”“ nicolay-r/flan-t5-emotion-cause-thor-base
nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States (2404.03361)

πŸ”‘ In short, there are three major takeaways:
βœ… 1. The scale of the backboned LLM for SFT matters (>1.1B is preferable)
βœ… 2. The language of the input data matters in LLM reasoning capabilities: transfering data in English and picking English-based LLM is crucial for the most cases!
βœ… 3. CoT and RR takes more time ⏳ for inferring and fine-tuning, proportionally to abount of steps in chain / amount of revisions in reasoning 🧠
posted an update 3 months ago
view post
Post
685
πŸ“Š Lovely to share the unique reasoning capabilities 🧠 findings of Qwen2-7B πŸ‡¨πŸ‡³ in Target Sentiment Analysis (TSA) for original texts (πŸ‡·πŸ‡Ί) and their translated version in English (πŸ‡ΊπŸ‡Έ), in zero-shot-learning mode.
Since the last update on 1.5, I have to say:
β˜‘οΈ 1. Qwen2-7B is the first model in my list that reasons πŸ”₯ better πŸ”₯ in Russian rather than in English; it strongly surpasses other 7B LLMs and LLaMA3-70B by correctly distributing sentiment cases (F1(PN) metric).
β˜‘οΈ 2. Surprisingly, but Qwen2-7B significantly underperformed to the "earlier bro" Qwen1.5-7B on texts in English. The key problem is that ~17% of answers has mixed entries of labels, so for such cases the automatic and accurate assessment is difficult. Therefore, I believe it is more about particular evaluation, rather something wrong with the model in TSA domain.

What's next? I have to checkout Qwen/Qwen2-72B-Instruct then πŸ§ͺ If you know the best hosting for infering, please let me know πŸ™

Model: Qwen/Qwen1.5-7B-Chat
replied to their post 3 months ago
posted an update 3 months ago
view post
Post
870
πŸ“Š Just measured reasoning capabilities 🧠 of Qwen1.5-7B πŸ‡¨πŸ‡³ in Target Sentiment Analysis (TSA) both for original texts (πŸ‡·πŸ‡Ί) and translated in English (πŸ‡ΊπŸ‡Έ), in zero-shot-learning mode. Here is what I've noticed:
β˜‘οΈ 1. Huge gap πŸ“ˆ with the smaller Qwen1.5 and Qwen2 (1.8B and 1.8B). Qwen1.5-7B strongly outperforms their "smaller bros" so that case when scale of the model matters.
β˜‘οΈ 2. Qwen1.5-7B in english (πŸ‡ΊπŸ‡Έ) behaves similar but slightly underperforming πŸ“‰ to the most latest 7B alternatives ... and even including Phi-3-small (3.4B)
β˜‘οΈ 3. On texts in (πŸ‡·πŸ‡Ί) there is a certain underperforming πŸ“‰ gap between the most latest 7B alternatives: F1=34.1, other 7B starts with 40.23.

In terms of responses, for non-english texts (πŸ‡·πŸ‡Ί) model answers strict and behaves similar to FlanT5.
Curious about improvements in Qwen2-7B πŸ”₯

Model: Qwen/Qwen1.5-7B-Chat
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101
  • 2 replies
Β·
posted an update 3 months ago
view post
Post
2439
πŸ“’ Suprisingly, there are so many works on imputing personalities in LLM and vice versa. However, there is a gap in literature novels πŸ“š for mining that personalities from book itself. With that I am happy to release worflow that πŸ”₯ solely πŸ”₯ relies on book content only πŸ“– for personalities extraction:
https://github.com/nicolay-r/book-persona-retriever

πŸ’‘ The downstream goal of this workflow is to enhance charactes understanding ... and not just through their mentions in books, but through their personalities (⛏ retrieved with the given lexicon from the πŸ“– itself)

The most closest studies such as PERSONA-CHAT (arXiv:1801.07243v5), BookEmbeddingEval (2022.findings-acl.81.pdf), ALOHA-Chatbot ( arXiv:1910.08293v4), Meet your favorite Character (arXiv:2204.10825), and PRODIGy (arXiv:2311.05195v1) were so valuable πŸ’Ž ! πŸ‘

Curious on existance of the fine-tuned LLM for detecting personalities in text passages on huggingface hub πŸ€— If you aware about the one coud be potentially embedded into system for further advances, please feel free to recomend πŸ™Œ
posted an update 4 months ago
view post
Post
1685
πŸ“’Delighted to share personal findings πŸ”Ž East Asian LLM Qwen1.5 πŸ‡¨πŸ‡³ reasoning capabilities 🧠 Target Sentiment Analysis (TSA). Starting with the one of the smallest Qwen1.5-1.8B-Chat version, for the original Eastern-European texts (πŸ‡·πŸ‡Ί) and their translated versions (πŸ‡ΊπŸ‡Έ) in zero-shot-learning mode setup, the key takeaways of such experiments were as follows:

βœ… 1. Model is capable to perform reasoning in Eastern Eropean languages (πŸ‡·πŸ‡Ί) (remember it is 1.8B), switching to Qwen2 results in strong improvement, with results that surpasses LLaMA2-70B-chat (more on difference below).
βœ… 2. Despite the size of 1.8B, reasoning in English has a significant gap in underpeforming (F1=~34%) to the most closest Flan-T5-XL (2.8B) which showcases F1=43%.

πŸ’‘ The most intriguing fact that Qwen1.5-1.8B-Chat:
it generates new words in Russian I've never seen before: "нСрСтСнСвая" (negativitively), and imputes the entries in Chinese. The reason of such a low results, is that model was not been able to follow the input instruction and shares all the opinions per each class. All of that has been improved though in Qwen2-1.5B.

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: Qwen/Qwen1.5-1.8B-Chat
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101
posted an update 4 months ago
view post
Post
652
πŸ“’ The Chain-of-Thought (CoT)-tuned πŸ”₯ FlanT5-base (248M) for Emotion State and Emotion-Causes Extraction as a part of ECAC-2024 competiotion model is now available.
πŸ’‘ The main reason for make it publicly available is as follows:
βœ… 1. One of the CoT-based attempts in this field so that I promote studies by making intial steps πŸ‘£ and attepts on assessing LLM reasoning capabilties
βœ… 2. This model showcases top 3 πŸ₯‰ on the ECAC-2024 competition https://codalab.lisn.upsaclay.fr/competitions/16141#results
βœ… 3. Easy colab for frameworkless lauch and experiments πŸ§ͺ
https://colab.research.google.com/github/nicolay-r/THOR-ECAC/blob/master/SemEval_2024_Task_3_FlanT5_Finetuned_Model_Usage.ipynb

You may find more on the model card, while the fine-tuning concept showcased on the figure below. It is worth to add that the more robust preformance been seen with larger scaled model (large and xl), so that there is a huge potential there

Model: nicolay-r/flan-t5-emotion-cause-thor-base
Related paper: nicolay-r at SemEval-2024 Task 3: Using Flan-T5 for Reasoning Emotion Cause in Conversations with Chain-of-Thought on Emotion States (2404.03361)
Collection: nicolay-r/emotions-extraction-665ba47a20dee2925d607a40
replied to their post 4 months ago
view reply

Well, your choice then would be LLM, where DialoGPT is not quite suitable for this category. Counting on your resources with Mac, I would say you have to go with something similar scale as DialoGPT and the most recent. So that the model I mentioned (https://huggingface.co./Qwen/Qwen2-0.5B) would be a perfect choice to start and try out

replied to their post 4 months ago
replied to their post 4 months ago
view reply

I believe that switching from the microsoft/DialoGPT-small (176M) model to the 8B sized model which is ~45 times larger results in way more longer inference. Especially if you're able to launch that inference from CPU mode.

posted an update 4 months ago
view post
Post
1648
πŸ“’ Releasing the Chain-of-Thought (CoT)-tuned πŸ”₯ FlanT5-xl (3B) for Target Sentiment Analysis (TSA) on english texts.
πŸ’‘ The main reason for adopting this model or smaller version (large and base) are as follows:
βœ… 1. Reasoning in sentiment-analysis in zero-shot-learning mode significantly underperforms the fine-tuned FlanT5.
βœ… 2. This model showcases top 1 πŸ† on the RuSentNE-2023 competitions: https://codalab.lisn.upsaclay.fr/competitions/9538
βœ… 3. Easy colab for frameworkless lauch and experiments πŸ§ͺ https://colab.research.google.com/github/nicolay-r/Reasoning-for-Sentiment-Analysis-Framework/blob/main/FlanT5_Finetuned_Model_Usage.ipynb

You may find more on the model card, while the fine-tuning statistics per each model size is shown in attachment.

Model: nicolay-r/flan-t5-tsa-thor-xl
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101
posted an update 4 months ago
view post
Post
2114
The application of Phi-3-small-8k-instruct for reasoning in Target Sentiment Analysis (TSA), in a zero-shot-learning mode. Comparing with the other 7B vendors, the key takeaways are as follows:
βœ… 1. At the moment this model on the top πŸŽ‰ of the 7B sized versions for texts translated in English (πŸ‡ΊπŸ‡Έ) by surpassing Mistral-7B-v0.3 and LLaMA-3-8B πŸ”₯ (Figure 1)
βœ… 2. It remains similar to 7B alternatives in original non-english texts (πŸ‡·πŸ‡Ί), however show confidence in sentiment presence among other 7B alternatives (checkout F1(PN0) results in Figure 2)

In comparison with its mini (3B) brother Phi-3-mini, the small (7B) version showcases a huge step in reasoning capabilities πŸ”₯

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: microsoft/Phi-3-small-8k-instruct
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101
  • 6 replies
Β·
posted an update 4 months ago
view post
Post
2409
πŸ“’ The most recent Mistral-7B-Instruct-v0.3 release showcases more robust advances in zero-shot-mode mode on Target Sentiment Analysis.
πŸ§ͺ We experiment with the original texts (πŸ‡·πŸ‡Ί ) and their translated version into English (πŸ‡ΊπŸ‡Έ).
πŸ’‘ The key takeaways on the expectation from this model are as follows:
βœ”οΈ 1. On translated texts into English (πŸ‡ΊπŸ‡Έ), it surpasses LLaMA-3 and and nearly touches MOE Mixtral 8x7B versions being quite precise by F1 across all the classes by F1(PN)
βœ”οΈ2. On orignal texts (πŸ‡·πŸ‡Ί) It slightly surpasses LLaMA-3 by F1(PN) by being less tolerant in neutral (F1(PN0)). Using larger versions (Mixtral) are still preferable choice for reasoning 🧠 in non-eng texts.
βœ”οΈ3. You can clearly see the difference between 7B version and MOE (figure 3) by F1(PN0)
Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: mistralai/Mistral-7B-Instruct-v0.3
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101
replied to their post 4 months ago
posted an update 4 months ago
view post
Post
1669
πŸ“’ Impressed with the application of the microsoft/Phi-3-mini-4k-instruct (3B) performance in zero-shot-learning (ZSL) mode reasoning 🧠 on Target Sentiment Analysis (TSA) problem.
πŸ’‘ There are three major takeaways out of this experiment πŸ§ͺ and they are as follows:
βœ… 1. Phi-3 slightly outperforms Mistral-7B (official Mistral API, v0.1 or v0.2) on texts written in English
βœ… 2. Performs similar to LLaMA-3-8B-Instruct on texts translated in English πŸ”₯
β˜‘οΈ 3. Reasoning in non-english language (πŸ‡·πŸ‡Ί) is pretty decent but underperforms to the similar 7B sized models.

This is a huge step forward since release of Phi-2, especially because the predecessor (microsoft/phi-2) was not capable for performing reasoning in non-english texts (πŸ‡·πŸ‡Ί) at all!

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: microsoft/Phi-3-mini-4k-instruct
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101
  • 3 replies
Β·
posted an update 4 months ago
view post
Post
2183
The most recent LLaMA-3-70B Instruct showcases the beast performance in zero-shot-learning mode in Target-Sentiment-Analsys (TSA) πŸ”₯πŸš€ In particular we experiment with sentence-level analysis, with sentences fetched from the WikiArticles that were formed into RuSentNE-2023 dataset.

The key takeaways out of LLaMA-3-70B performance on original (πŸ‡·πŸ‡Ί) texts and translated into English are as follows:
1. Outperforms all ChatGPT-4 and all predecessors on non-english-texts (πŸ‡·πŸ‡Ί)
2. Surpasses all ChatGPT-3.5 / nearly performs as good as ChatGPT-4 on english texts πŸ₯³

Benchmark: https://github.com/nicolay-r/RuSentNE-LLM-Benchmark
Model: meta-llama/Meta-Llama-3-70B-Instruct
Dataset: https://github.com/dialogue-evaluation/RuSentNE-evaluation
Related paper: Large Language Models in Targeted Sentiment Analysis (2404.12342)
Collection: https://huggingface.co./collections/nicolay-r/sentiment-analysis-665ba391e0eba729021ea101