Welcome to my blog!
  • Thinam Tamang
Categories
All (23)
Deep Learning (1)
Inference (1)
KV cache (1)
LLMs (9)
Mistral (3)
Mixture of Experts (2)
Model Sharding (1)
ORPO (1)
Optimization (1)
PGO (2)
PPO (2)
RLHF (4)
Reinforcement Learning (4)
Transformer (1)
computer vision (1)
convolutional neural network (1)
convolutional neural networks (1)
data engineering (1)
data preparation (1)
deep learning (3)
git (1)
github (1)
grouped query attention (1)
image classification (1)
llama (3)
machine learning (7)
natural language processing (1)
pattern recognition (1)
softmax (1)
transformers (2)
version control (1)
word embeddings (2)
word vectors (2)

Online Softmax: A Comprehensive Guide

softmax
At first, we need to understand the multi-head attention mechanism. In the Transformer model, the attention mechanism is used to capture the relationship between all the…
Jan 29, 2025
Thinam Tamang

Scaling Transformer Models

Transformer
Optimization
LLMs
Inference
Language models are the probabilistic models that assigns probability to the sequence of words. In other words, language models assigns the probability of generating a next…
May 24, 2024
Thinam Tamang

Odds Ratio Preference Optimization (ORPO)

Reinforcement Learning
ORPO
RLHF
LLMs
PPO
In this blog post, we will discuss the reference model free monolithic odds ratio preference optimization algorithm (ORPO) proposed in the paper ORPO: Monolithic Preference…
May 3, 2024
Thinam Tamang

Proximal Policy Optimization (PPO)

Reinforcement Learning
PGO
RLHF
LLMs
PPO
In my previous blog post, we discussed the Policy Gradient Optimization where we derived the expression for the gradient of the objective function w.r.t the policy…
Apr 20, 2024
Thinam Tamang

Policy Gradient Optimization

Reinforcement Learning
PGO
RLHF
LLMs
This blog is the continuation of my previous blog, Introduction to Reinforcement Learning. In this blog, we will discuss the concept of Policy Gradient algorithm in the…
Apr 14, 2024
Thinam Tamang

Introduction to Reinforcement Learning

Reinforcement Learning
Deep Learning
RLHF
I am going to write a series of posts on Reinforcement Learning. This post is the first post in the series. In this post, I will introduce the basic concepts of…
Mar 31, 2024
Thinam Tamang

Comprehensive Understanding of Mistral Model

Mixture of Experts
Mistral
LLMs
Attention mechanism is a key component in Transformer models. It allows the model to focus on different parts of the input sequence and derive the relationship between…
Mar 9, 2024
Thinam Tamang

Mixture of Experts in Mistral

Mixture of Experts
Mistral
LLMs
Mixture of Experts (MoE) is a neural network that divides the list of Modules into specialized experts, each responsible for processing specific tokens or aspects of the…
Mar 2, 2024
Thinam Tamang

Model Sharding

Model Sharding
Mistral
LLMs
Model Sharding is a technique used to distribute the model parameters, gradients, and optimizer states across multiple GPUs. In this technique, the model is divided into…
Feb 23, 2024
Thinam Tamang

Understanding KV Cache

llama
KV cache
LLMs
In this article, we will discuss the Key-Value cache. We will start with the introduction of the Key-Value cache, then we will discuss the problem, solution, limitations…
Feb 10, 2024
Thinam Tamang

Grouped Query Attention (GQA)

llama
grouped query attention
LLMs
In this article, we will discuss the Grouped Query Attention. We will start with the introduction of the Grouped Query Attention, then we will discuss the limitations of…
Feb 9, 2024
Thinam Tamang

LLaMA: Open and Efficient LLM Notes

llama
In this article, I will be sharing the notes and concepts which I have learned while reading the papers and while discussing with Umar. The ultimate goal that I have on my…
Jan 20, 2024
Thinam Tamang

Git & GitHub

git
github
version control
Mainline Development (“Always Be Integrating”).
May 14, 2023
Thinam Tamang

Self-Attention & Transformer

machine learning
word vectors
word embeddings
transformers
deep learning
The necessities for a self-attention model are as follows:
Oct 23, 2022
Thinam Tamang

Word Vectors

machine learning
word vectors
word embeddings
transformers
deep learning
Word vectors are also called word embeddings or neural word representations because these whole bunch of words are represented in a high dimensional vector space and they…
Oct 15, 2022
Thinam Tamang

Data Engineering Fundamentals

machine learning
data engineering
User input data can be text, images, videos, uploaded files, etc. It requires more heavy-duty checking and processing. User input data tends to require fast processing as…
Sep 4, 2022
Thinam Tamang

Data Fundamentals

machine learning
data preparation
Outliers are examples that look dissimilar to the majority of examples from the dataset. Dissimilarity is measured by some distance metric, such as Euclidean distance. Deleti…
Apr 3, 2022
Thinam Tamang

Machine Learning

deep learning
machine learning
Machine learning can be defined as the process of solving a practical problem by collecting a dataset, and algorithmically training a statistical model based on that dataset.
Mar 27, 2022
Thinam Tamang

Pattern Recognition & ML

machine learning
pattern recognition
The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these…
Feb 10, 2022
Thinam Tamang

Convolutional Neural Networks Architectures

convolutional neural networks
The five architectures of CNNs that have been pre-trained on the ImageNet dataset and, are present in the Keras library are mentioned below:
Jan 22, 2022
Thinam Tamang

Fundamentals of CNNs

machine learning
convolutional neural network
Neural networks are the building blocks of deep learning systems. A system is called a neural network if it contains a labeled, directed graph structure where each node in…
Dec 31, 2021
Thinam Tamang

Fundamentals of Image Classification

computer vision
image classification
1. Image Classification is the task of using computer vision and machine learning algorithms to extract meaning from an image. It is the task of assigning a label to an…
Dec 6, 2021
Thinam Tamang

Journey of 66DaysOfData in Natural Language Processing

natural language processing
Day1 of 66DaysOfData! - Natural Language Processing: Natural Language Processing is a field of Linguistics, Computer Science, and Artificial Intelligence concerned with the…
Oct 15, 2021
Thinam Tamang
No matching items