1 11 15

Jaward Sesay

Jaward

https://github.com/Jaykef

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22

• 2

On Coding Your First Attention

Apr 21

• 7

Organizations

Posts 50

Post

445

Some interesting findings in this paper:
- They consider o1 a Large Reasoning Model (LRM) with a different arch from SOTA LLMs.
- Creative justifications: “It is almost as if o1 has gone from hallucinating to gaslighting!”. This is so true, I noticed also it can “hallucinate” its chain-of-thoughts lol.
- Accuracy/Cost Tradeoffs: o1 provides high accuracy but at significant computational and monetary costs due to hidden "reasoning tokens."
Paper: https://www.arxiv.org/abs/2409.13373

Post

1416

nanoGPT with Sigmoid Self-Attention
I couldn’t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb

View all posts