@singhsidhukuldeep on Hugging Face: "Remember stacking in ensemble ML? 🤔 What happens if you do the reverse of…"

posted an update May 27

Post

1052

Remember stacking in ensemble ML? 🤔

What happens if you do the reverse of that but with LLMs? 🤯

Basically, MoE created by merging multiple models (instead of being pre-trained like Mixtral)? 🧠

Frankenstein MoE! (not an official name) 🧟‍♂️

That's the new Kraken architecture! 🐙

It uses a sequence classification model to route inputs to the most suitable language model based on the input's characteristics. 🚦

Yup, multiple full-fledged LLMs are loaded into memory, and then a classification layer decides who gets to generate an output! 🎰

Tell me you have too many GPUs without telling me you have too many GPUs! 🖥️🔥

Jokes aside, extremely fascinating research but I don't understand why this can't just be a big model with multiple LORA adapters, that can be decided on the fly? 🤷‍♂️

Model: cognitivecomputations/Kraken
Github: https://github.com/cognitivecomputations/kraken

elsatch

May 28

Maybe @alexsherstinsky can solve the question about the multiple LoRA adapters :)

alexsherstinsky

May 28

@elsatch Thank you for tagging me in this conversation! I think that while the approach with LoRA adapters would (in my opinion) be relying on a different technique, the results could indeed be favorable. Here are some recent papers that point to high effectiveness of LoRA as a specific PEFT method on a variety of examples and application domains:

Happy to discuss! Thanks again!

Join the conversation