Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Articles

Organizations

Posts 50

view post
Post
445
Some interesting findings in this paper:
- They consider o1 a Large Reasoning Model (LRM) with a different arch from SOTA LLMs.
- Creative justifications: “It is almost as if o1 has gone from hallucinating to gaslighting!”. This is so true, I noticed also it can “hallucinate” its chain-of-thoughts lol.
- Accuracy/Cost Tradeoffs: o1 provides high accuracy but at significant computational and monetary costs due to hidden "reasoning tokens."
Paper: https://www.arxiv.org/abs/2409.13373
view post
Post
1416
nanoGPT with Sigmoid Self-Attention
I couldn’t resist had to give it a try:)

Some observations on M2:
SSA was ~5-10% faster in training with similar final loss values, slightly less coherent text generation, marginally higher perplexity, and lower memory usage compared to softmax.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/sigmoid_attn.ipynb

datasets

None public yet