Welcome to my blog! -

Categories

All (23)

Deep Learning (1)

Inference (1)

KV cache (1)

LLMs (9)

Mistral (3)

Mixture of Experts (2)

Model Sharding (1)

ORPO (1)

Optimization (1)

PGO (2)

PPO (2)

RLHF (4)

Reinforcement Learning (4)

Transformer (1)

computer vision (1)

convolutional neural network (1)

convolutional neural networks (1)

data engineering (1)

data preparation (1)

deep learning (3)

git (1)

github (1)

grouped query attention (1)

image classification (1)

llama (3)

machine learning (7)

natural language processing (1)

pattern recognition (1)

softmax (1)

transformers (2)

version control (1)

word embeddings (2)

word vectors (2)

Online Softmax: A Comprehensive Guide

softmax

At first, we need to understand the multi-head attention mechanism. In the Transformer model, the attention mechanism is used to capture the relationship between all the…

Scaling Transformer Models

Transformer

Optimization

LLMs

Inference

Language models are the probabilistic models that assigns probability to the sequence of words. In other words, language models assigns the probability of generating a next…

Odds Ratio Preference Optimization (ORPO)

Reinforcement Learning

ORPO

RLHF

LLMs

PPO

In this blog post, we will discuss the reference model free monolithic odds ratio preference optimization algorithm (ORPO) proposed in the paper ORPO: Monolithic Preference…

Proximal Policy Optimization (PPO)

Reinforcement Learning

PGO

RLHF

LLMs

PPO

In my previous blog post, we discussed the Policy Gradient Optimization where we derived the expression for the gradient of the objective function w.r.t the policy…

Policy Gradient Optimization

Reinforcement Learning

PGO

RLHF

LLMs

This blog is the continuation of my previous blog, Introduction to Reinforcement Learning. In this blog, we will discuss the concept of Policy Gradient algorithm in the…

Introduction to Reinforcement Learning

Reinforcement Learning

Deep Learning

RLHF

I am going to write a series of posts on Reinforcement Learning. This post is the first post in the series. In this post, I will introduce the basic concepts of…

Comprehensive Understanding of Mistral Model

Mixture of Experts

Mistral

LLMs

Attention mechanism is a key component in Transformer models. It allows the model to focus on different parts of the input sequence and derive the relationship between…

Mixture of Experts in Mistral

Mixture of Experts

Mistral

LLMs

Mixture of Experts (MoE) is a neural network that divides the list of Modules into specialized experts, each responsible for processing specific tokens or aspects of the…

Model Sharding

Model Sharding

Mistral

LLMs

Model Sharding is a technique used to distribute the model parameters, gradients, and optimizer states across multiple GPUs. In this technique, the model is divided into…

Understanding KV Cache

llama

KV cache

LLMs

In this article, we will discuss the Key-Value cache. We will start with the introduction of the Key-Value cache, then we will discuss the problem, solution, limitations…

Grouped Query Attention (GQA)

llama

grouped query attention

LLMs

In this article, we will discuss the Grouped Query Attention. We will start with the introduction of the Grouped Query Attention, then we will discuss the limitations of…

LLaMA: Open and Efficient LLM Notes

llama

In this article, I will be sharing the notes and concepts which I have learned while reading the papers and while discussing with Umar. The ultimate goal that I have on my…

Git & GitHub

git

github

version control

Mainline Development (“Always Be Integrating”).

Self-Attention & Transformer

machine learning

word vectors

word embeddings

transformers

deep learning

The necessities for a self-attention model are as follows:

Word Vectors

machine learning

word vectors

word embeddings

transformers

deep learning

Word vectors are also called word embeddings or neural word representations because these whole bunch of words are represented in a high dimensional vector space and they…

Data Engineering Fundamentals

machine learning

data engineering

User input data can be text, images, videos, uploaded files, etc. It requires more heavy-duty checking and processing. User input data tends to require fast processing as…

Data Fundamentals

machine learning

data preparation

Outliers are examples that look dissimilar to the majority of examples from the dataset. Dissimilarity is measured by some distance metric, such as Euclidean distance. Deleti…

Machine Learning

deep learning

machine learning

Machine learning can be defined as the process of solving a practical problem by collecting a dataset, and algorithmically training a statistical model based on that dataset.

Pattern Recognition & ML

machine learning

pattern recognition

The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these…

Convolutional Neural Networks Architectures

convolutional neural networks

The five architectures of CNNs that have been pre-trained on the ImageNet dataset and, are present in the Keras library are mentioned below:

Fundamentals of CNNs

machine learning

convolutional neural network

Neural networks are the building blocks of deep learning systems. A system is called a neural network if it contains a labeled, directed graph structure where each node in…

Fundamentals of Image Classification

computer vision

image classification

1. Image Classification is the task of using computer vision and machine learning algorithms to extract meaning from an image. It is the task of assigning a label to an…

Journey of 66DaysOfData in Natural Language Processing

natural language processing

Day1 of 66DaysOfData! - Natural Language Processing: Natural Language Processing is a field of Linguistics, Computer Science, and Artificial Intelligence concerned with the…