Bits, Bytes & Life

Exploring LLMs, edge AI, doing projects, reading papers… and the joys of life outside the lab. Some stories come straight from my wife’s pen, polished by me.

Disclaimer: This is just a collection of notes – some parts were generated with a bit of AI help and then cleaned up. A lot of it is mainly for my own reference, so things might be rough or incomplete in places.

RL Notes: Huggin Face RL Course
HFRL Unit-1 Summary Reinforcement Learning is a method where an agent learns by interacting with its environment, using trial and error and feedback from rewards.
Attention-Free Transformer: Escaping the Quadratic Bottleneck
Attention-Free Transformer: Escaping the Quadratic Bottleneck How AFT rethinks attention to achieve linear complexity while preserving performance The Transformer architecture revolutionized AI, but its self-attention mechanism comes with a fundamental limitation: quadratic complexity $\mathcal{O}(n^2)$ with sequence length. This makes processing long sequences computationally prohibitive. The Attention-Free Transformer (AFT) emerges as an elegant solution that maintains strong performance while achieving linear complexity.
Demystifying Transformers: Attention, Multi-Head Magic, and the Math Behind the Revolution
Demystifying Transformers: Attention, Multi-Head Magic, and the Math Behind the Revolution From single head to multi-head attention - understanding the architectural breakthrough that changed AI forever The Transformer architecture, introduced in the seminal “Attention Is All You Need” paper, revolutionized natural language processing by replacing recurrent networks with a purely attention-based approach. At its heart lies the self-attention mechanism - a powerful way for models to understand relationships between all words in a sequence simultaneously.
Demystifying RNNs: A Deep Dive into Dimensions and Parameters
Demystifying RNNs: A Deep Dive into Dimensions and Parameters Understanding what really happens inside Recurrent Neural Networks When learning about Recurrent Neural Networks (RNNs), many tutorials focus on the high-level concept of “memory” but gloss over the practical details of how they actually work. As someone who struggled with these details, I want to share the insights that finally made RNNs click for me.