Santosh Sawant
  • Home
  • Articles
  • Resume
Categories
All (193)
agents (2)
efficiency (2)
hugging face (2)
llm (193)
model building (3)
research paper (186)
tools (1)

Articles

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization
2 min
llm
agents
Tencent’s modular framework for building, running, and evaluating autonomous agents, with automated tool generation and hybrid policy optimization.
Jan 5, 2026

Memory As Action (MemAct) : Autonomous Context Curation For Long-Horizon Agentic Tasks
2 min
llm
agents
A framework that treats context curation as learnable memory-editing actions, letting an agent manage its own working memory for long-horizon tasks.
Oct 15, 2025

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
2 min
llm
efficiency
TII’s family of parallel hybrid-head models that combine transformer attention with Mamba SSMs for faster, leaner, state-of-the-art performance.
Jul 31, 2025

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs
2 min
llm
efficiency
Native 4-bit activation quantization for 1-bit LLMs via an online Hadamard transformation that tames activation outliers.
Apr 28, 2025

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
2 min
llm
research paper
Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning LLMs with human preferences. While recent research has focused on algorithmic improvements, the…
Mar 31, 2025

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
2 min
llm
research paper
The rapid advancement of large language models (LLMs) has increased the need for guardrail models to ensure responsible use, particularly in detecting unsafe and illegal…
Feb 10, 2025

DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning
3 min
llm
research paper
A typical training process for LLMs consists of three phases: (1) Pre-training: In this stage, LLMs are pre-trained on vast amounts of text and code to learn general-purpose…
Jan 28, 2025

Mind Evolution: Evolving Deeper LLM Thinking
2 min
llm
research paper
Recently Google have released an evolutionary search strategy for scaling inference time compute in Large Language Model called Mind Evolution, uses a language model to…
Jan 21, 2025

MiniMax-01: Scaling Foundation Models with Lightning Attention
2 min
llm
research paper
Recently, Long context LLMs have been pinnacle in further advancement of generative ai in various fields. Now researchers have introduced the MiniMax-01 series long context…
Jan 16, 2025

Training Large Language Models to Reason in a Continuous Latent Space
2 min
llm
research paper
Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a…
Dec 10, 2024

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
2 min
llm
research paper
Recent advancements in text-to-video (T2V) generative models have shown impressive capabilities. However, these models are still inadequate in aligning synthesized videos…
Dec 9, 2024

PaliGemma 2: A Family of Versatile VLMs for Transfer
2 min
llm
research paper
Google has released the PaliGemma 2 family of models, an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. PailGemma…
Dec 5, 2024

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
2 min
llm
research paper
Retrieval-augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge to reduce hallucinations and incorporate up-to-date information…
Dec 4, 2024

Efficient Track Anything
2 min
llm
research paper
Segment Anything Model 2 (SAM 2) has emerged as a powerful tool for video object segmentation and tracking anything. Key components of SAM 2 that drive the impressive video…
Dec 3, 2024

LongKey: Keyphrase Extraction for Long Documents
2 min
llm
research paper
In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase…
Nov 29, 2024

VisualLens: Personalization through Visual History
2 min
llm
research paper
Recent Large Language Models (LLMs) can support contexts up to millions of tokens in length However, processing such long sequences with LLMs requires substantial…
Nov 27, 2024

VisualLens: Personalization through Visual History
2 min
llm
research paper
Imagine a personal assistant observing what you do in your daily life. When you ask for recommendations on anything from restaurants and activities to movies, books, and…
Nov 26, 2024

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection
2 min
llm
research paper
Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often…
Nov 25, 2024

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
2 min
llm
research paper
Recently, the ability to follow complex instructions with multiple constraints is gaining increasing attention as LLMs are deployed in sophisticated real-world applications.…
Nov 15, 2024

SEALONG: Large Language Models Can Self-Improve in Long-context Reasoning
2 min
llm
research paper
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically…
Nov 14, 2024

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
2 min
llm
research paper
Recently there has been growing trends of developing sophisticated LLM models specialized in both image comprehension and text-to-image generation. This is achieved…
Nov 13, 2024

NEKO: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
2 min
llm
research paper
The challenge in building a general-purpose post recognition error corrector, which is required to evaluate your fine tuned models on a custom dataset, is how to train a…
Nov 12, 2024

BitNet a4.8: 4-bit Activations for 1-bit LLMs
2 min
llm
research paper
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining…
Nov 11, 2024

Structrag: Boosting Knowledge Intensive Reasoning Of Llms Via Inference-Time Hybrid Information Structurization
2 min
llm
research paper
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) in many knowledge-based tasks. However, existing RAG methods struggle…
Oct 14, 2024

Backtracking Improves Generation Safety
2 min
llm
research paper
LLM has a fundamental limitation almost by definition: there is no taking back tokens that have been generated, even when they are clearly problematic. In the context of…
Oct 3, 2024

RULER : A Model-Agnostic Method to Control Generated Length for Large Language Models
2 min
llm
research paper
The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a…
Oct 1, 2024

Style over Substance: failure modes of LLM judges in alignment benchmarking.
2 min
llm
research paper
Recently LLM-judge benchmarks such as MT-Bench, Alpaca Eval, and Arena-Hard-Auto have been a go to tool to simultaneously automate evaluation of LLMs while also aligning…
Sep 27, 2024

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
2 min
llm
research paper
Large Language Models (LLMs) have demonstrated remarkable effectiveness across a diverse range of tasks. However, LLMs are usually distinguished by their massive parameter…
Sep 27, 2024

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
2 min
llm
research paper
Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced…
Sep 26, 2024

Making Text Embedders Few-Shot Learners
2 min
llm
research paper
LLM-based embedding models have demonstrated remarkable improvements in in-domain accuracy and generalization, particularly when trained using supervised learning approaches…
Sep 25, 2024

Introducing Contextual Retrieval
2 min
llm
research paper
In traditional RAG, documents are typically split into smaller chunks for efficient retrieval. While this approach works well for many applications, it can lead to problems…
Sep 23, 2024

Training Language Models to Self-Correct via Reinforcement Learning
2 min
llm
research paper
Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing…
Sep 20, 2024

Training Language Models to Self-Correct via Reinforcement Learning
2 min
llm
research paper
Recently, jina.ai have released jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and…
Sep 19, 2024

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
2 min
llm
research paper
Modern information retrieval (IR) models generally match queries to passages based on a single semantic similarity score. This can make the search experience confusing for…
Sep 18, 2024

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
2 min
llm
research paper
Transformer-based large Language Models (LLMs) become increasingly important in various domains. However, the quadratic time complexity of attention operation poses a…
Sep 17, 2024

Self-Harmonized Chain of Thought
2 min
llm
research paper
Chain-of-thought (CoT) prompting reveals that large language models are capable of performing complex reasoning via intermediate steps. CoT methods in large language models…
Sep 16, 2024

OneGen: efficient one-pass unified generation and retrieval for llms
2 min
llm
research paper
Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face…
Sep 13, 2024

Agent Workflow Memory (AWM)
2 min
llm
research paper
Recently, LLM-based agents have shown promise for real-world tasks like web navigation, but they still struggle with complex, long-term tasks. Unlike these models, humans…
Sep 12, 2024

MemoRAG: moving towards next-gen rag via memory-inspired knowledge discovery
2 min
llm
research paper
Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through…
Sep 11, 2024

GraphRAG auto-tuning provides rapid adaptation to new domains
2 min
llm
research paper
GraphRAG uses large language models (LLMs), guided by a set of domain-specific prompts, to create a comprehensive knowledge graph that details entities and their…
Sep 10, 2024

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
2 min
llm
research paper
Multimodel Large Language Models(MLLMs) have achieved promising OCR free Document Understanding performance by increasing the supported resolution of document images.…
Sep 9, 2024

Generative Verifiers: Reward Modeling as Next-Token Prediction
2 min
llm
research paper
While large language models (LLMs) demonstrate remarkable capabilities, they often confidently make logical and factual mistakes, which can invalidate the entire solution. A…
Aug 30, 2024

GEagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
2 min
llm
research paper
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual…
Aug 28, 2024

Efficient Detection of Toxic Prompts in Large Language Models
2 min
llm
research paper
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing. However, these models can be exploited by malicious individuals…
Aug 27, 2024

LLM Pruning and Distillation in Practice: The Minitron Approach
2 min
llm
research paper
Over the past few years, significant advancements have blossomed in the two key pillars of multimodal intelligence: understanding and generation. Recent works have tried to…
Aug 26, 2024

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search
2 min
llm
research paper
Recent studies have demonstrated how Large Language Models (LLMs) can be utilized to learn skills for improved decision-making in interactive environments. However, learning…
Aug 23, 2024

LLM Pruning and Distillation in Practice: The Minitron Approach
2 min
llm
research paper
Training multiple multi-billion parameter large language models from scratch is extremely time-, data- and resource-intensive. However, recent work has demonstrated the…
Aug 22, 2024

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
2 min
llm
research paper
Multi-modal generative models need to be able to perceive, process, and produce both discrete elements (such as text or code) and continuous elements (e.g. image, audio, and…
Aug 21, 2024

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
2 min
llm
research paper
The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs…
Aug 20, 2024

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
2 min
llm
research paper
Large Multimodal Models (LMMs) have attracted significant attention with their potential applications and emergent capabilities. However, recent works have demonstrated that…
Aug 19, 2024

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
2 min
llm
research paper
Recent advancements in large language models have significantly influenced mathematical reasoning and theorem proving in artificial intelligence. Despite notable progress in…
Aug 16, 2024

rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
2 min
llm
research paper
Despite their success, large language models face significant challenges in complex reasoning tasks. Although fine-tuning is shown to be an effective way to improve…
Aug 13, 2024

PAD: Prioritize Alignment in Dataset Distillation
1 min
llm
research paper
Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve…
Aug 12, 2024

CODEXGRAPH: Bridging Large Language Models and Code Repositories via Code Graph Databases
2 min
llm
research paper
Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. Current solutions rely on…
Aug 9, 2024

Synthesizing Text-to-SQL Data from Weak and Strong LLMs
1 min
llm
research paper
Text-to-SQL has been one of the shout-out use cases in AI application development especially with close source LLM such as GPT4. However, the adoption of closed source LLMs…
Aug 8, 2024

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
2 min
llm
research paper
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions.…
Aug 6, 2024

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget
2 min
llm
research paper
Extracting structured information from unstructured text lies at the core of many Gen AI problems such as Information Retrieval, Knowledge Graph Construction, Knowledge…
Aug 5, 2024

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
2 min
llm
research paper
Standard prompt-based LLM inference has two sequential stages: prefilling and decoding. During the prefilling stage, the model computes and saves the KV cache of each token…
Aug 2, 2024

DiT-MoE : Scaling Diffusion Transformers to 16 Billion Parameters
2 min
llm
research paper
Recently, diffusion models (DiT) have emerged as powerful deep generative models in various domains, such as image, video and 3D objects. However, training and serving such…
Aug 1, 2024

DDK: Distilling Domain Knowledge for Efficient Large Language Models
2 min
llm
research paper
Despite the advance of large language models (LLMs) in various applications, it still faces significant challenges to propagate further due to high computational and storage…
Jul 31, 2024

Chain of Diagnosis (CoD): Towards an Interpretable Medical Agent
2 min
llm
research paper
The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within…
Jul 26, 2024

LAMBDA: A Large Model Based Data Agent
2 min
llm
research paper
Large Language Models (LLMs) have been instrumental in pushing innovation across multiple domains. However, despite these advancements, the current LLM paradigm encounters…
Jul 26, 2024

VILA2: VILA Augmented VILA
2 min
llm
research paper
Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). However data curation of VLMs still remains under-explored.…
Jul 25, 2024

The Llama 3 Herd of Models
2 min
llm
research paper
The Llama 3.1 release marked a big milestone for LLM researchers and the open source AI community. Meta engineers trained Llama 3.1 on NVIDIA H100 Tensor Core GPUs. They…
Jul 24, 2024

BOND: Aligning LLMs with Best-of-N Distillation
2 min
llm
research paper
State-of-the-art large language models (LLMs) such as Gemin and GPT-4 are generally trained in three stages. First, LLMs are pre-trained on large corpora of knowledge using…
Jul 23, 2024

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
2 min
llm
research paper
Recently, considerable research work has been going towards reducing high computational cost and memory footprint of LLMs, especially during the inference stage. Sparsity is…
Jul 22, 2024

Beyond KV Caching: Shared Attention for Efficient LLMs
2 min
llm
research paper
The efficiency of large language models (LLMs) remains a critical challenge, particularly in contexts where computational resources are limited. Traditional attention…
Jul 19, 2024

E5-V: Universal Embeddings with Multimodal Large Language Models
2 min
llm
research paper
With the development of Multimodal Large Language Models (MLLMs), there is an increasing need for embedding models to represent multimodal inputs. Although CLIP shows…
Jul 17, 2024

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
2 min
llm
research paper
The capability of LLMs to process long texts is particularly crucial across various domains. Considering the critical role of LLMs in handling long texts, numerous…
Jul 17, 2024

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
1 min
llm
research paper
Parameter-efficient transfer learning (PETL) is widely used for domain adaptation of large pre-trained models to specific downstream tasks, greatly reducing trainable…
Jul 16, 2024

AgentInstruct: Toward Generative Teaching with Agentic Flows
2 min
llm
research paper
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases…
Jul 15, 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
2 min
llm
research paper
FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is used across various libs to accelerate…
Jul 12, 2024

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
2 min
llm
research paper
Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems…
Jul 11, 2024

Composable Interventions for Language Models
2 min
llm
research paper
Language models (LMs) exhibit striking capabilities on various important tasks but despite such high performance, LMs generated content are usually prone to be…
Jul 10, 2024

Associative Recurrent Memory Transformer
1 min
llm
research paper
Long sequence LLMs are some of the challenging models to work around as memory plays a crucial role processing extremely long contexts and utilizing remote past information.…
Jul 9, 2024

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
2 min
llm
research paper
Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory…
Jul 8, 2024

Adam-mini: Use Fewer Learning Rates To Gain More
2 min
llm
research paper
Adam(W) has become the de-facto optimizer for training large language models (LLMs). Despite its superior performance, Adam is expensive to use. Specifically, Adam requires…
Jul 5, 2024

Searching for Best Practices in Retrieval-Augmented Generation
4 min
llm
research paper
Retrieval-augmented generation (RAG) techniques have proven to be effective in enhancing LLMs response quality, particularly in specialized domains. While many RAG…
Jul 4, 2024

MInference: a Million-token inference on a single A100 machine
2 min
llm
research paper
The computational challenges of LLM inference remain a significant barrier to their widespread deployment, especially as context lengths continue to increase. Existing…
Jul 3, 2024

MIRAI: Evaluating LLM Agents for Event Forecasting
2 min
llm
research paper
Recent advancements in Large Language Models (LLMs) have enabled LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex…
Jul 2, 2024

AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation
2 min
llm
research paper
Retrieval-Augmented Generation (RAG) has emerged as a prominent framework for building ML/AI solutions with LLMs. Additional modules such as query rewriting, prompt…
Jul 1, 2024

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
2 min
llm
research paper
Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of…
Jun 28, 2024

Instruction Pre-Training: Language Models are Supervised Multi Task Learners
2 min
llm
research paper
Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds…
Jun 26, 2024

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
2 min
llm
research paper
In the traditional RAG framework, the basic retrieval units are normally short but the retriever needs to scan over a massive amount of units to find the relevant piece.…
Jun 24, 2024

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
2 min
llm
research paper
Large language models have shown promising results in arithmetic and symbolic reasoning by expressing intermediate reasoning in text as a chain of thought, yet struggle to…
Jun 20, 2024

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
2 min
llm
research paper
Today’s almost all LLMs are predominantly designed as monolithic architectures, these models rely extensively on large-scale data to embed generalized language capabilities…
Jun 20, 2024

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
2 min
llm
research paper
Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by…
Jun 19, 2024

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
2 min
llm
research paper
Nowadays Large language models (LLMs) with large context windows are capable of processing lengthy dialogue histories during prolonged interaction with users without…
Jun 18, 2024

Ad Auctions for LLMs via Retrieval Augmented Generation
2 min
llm
research paper
Large language models (LLMs) have been making headway in various domains and now also in the field of computational advertising. Now with the integration of ads into the…
Jun 17, 2024

Improving Alignment and Robustness with Circuit Breakers
2 min
llm
research paper
Large language models (LLMs) have been instrumental in pushing the boundaries of various real-world applications mostly which are associated with long-sequence inputs, such…
Jun 14, 2024

Improving Alignment and Robustness with Circuit Breakers
2 min
llm
research paper
The landscape of artificial intelligence (AI) has long been marred by the persistent threat of adversarial attacks, particularly those targeting neural networks. The rise of…
Jun 13, 2024

TEXTGRAD : Automatic “Differentiation” via Text
2 min
llm
research paper
There is an emerging paradigm shift in how AI systems are built these days. The new generation of AI applications are increasingly compound systems involving multiple…
Jun 12, 2024

HUSKY: A Unified, Open-Source Language Agent for Multi-Step Reasoning
2 min
llm
research paper
Recent advances in the capabilities of large language models (LLMs) have led to the development of language agents to address complex, multi-step tasks. However, most…
Jun 11, 2024

Mixture-of-Agents : Enhances Large Language Model Capabilities
2 min
llm
research paper
Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. However, despite the plethora of…
Jun 10, 2024

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
2 min
llm
research paper
Recently, various prompting methods such as CoT, ToT and GoT have been instrumental in improving reasoning performance of LLMs. All these methods can be broadly divided into…
Jun 7, 2024

Block Transformer: Global-to-Local Language Modeling for Fast Inference
2 min
llm
research paper
Generating tokens with transformer-based autoregressive language models (LMs) is costly due to the self-attention mechanism that attends to all previous tokens. To apply…
Jun 6, 2024

Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback
2 min
llm
research paper
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, alignment can be challenging, especially for…
Jun 5, 2024

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
2 min
llm
research paper
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can…
Jun 4, 2024

Contextual Position Encoding: Learning to Count What’s Important
2 min
llm
research paper
The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but the attention mechanism…
Jun 3, 2024

Similarity is Not All You Need: Endowing Retrieval-Augmented Generation with Multi–layered Thoughts
2 min
llm
research paper
Retrieval-augmented generation (RAG) has been pencil in pushing LLM use cases in the Knowledge management system. Nevertheless, existing retrieval-augmented generation…
May 31, 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
2 min
llm
research paper
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these…
May 30, 2024

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
2 min
llm
research paper
LLM Training and finetuning are still far too computationally and memory intensive tasks. Several techniques have been proposed to reduce these memory requirements, such as…
May 29, 2024

Zamba: A Compact 7B SSM Hybrid Model
2 min
llm
research paper
Recently, State-of-the-art Transformer-SSM hybrid Architecture has been a driving force in Open source LLMs. Inline with such trends researchers from Zyphra have launched…
May 28, 2024

Layer-Condensed KV Cache for Efficient Inference of Large Language Models
2 min
llm
research paper
Key-value (KV) cache is one of the most significant parts of any transformer based LLM model and takes over 30% of the GPU memory during deployment. Hence KV cache plays a…
May 20, 2024

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
2 min
llm
research paper
Recently small-scale visual language models performance have come in par with its larger-scale counterparts. Models such as LLaVAPhi [47], which combines the open source…
May 16, 2024

SUTRA: Scalable Multilingual language model architecture
2 min
llm
research paper
Recent advancements in Large Language Models (LLMs) have predominantly focused on a limited set of data-rich languages, with training datasets being notably skewed towards…
May 15, 2024

Linearizing Large Language Models
2 min
llm
research paper
Over the last few years, Transformers have displaced Recurrent Neural Networks (RNNs) in sequence modeling tasks, owing to their highly parallel training efficiency and…
May 14, 2024

From Local to Global: A Graph RAG Approach to Query-Focused Summarization
2 min
llm
research paper
The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions…
May 13, 2024

Is Flash Attention Stable?
2 min
llm
research paper
Given the size and complexity of workloads, training Large Language Models (LLMs) often takes months together, across hundreds or thousands of GPUs. For example, LLaMA2’s…
May 10, 2024

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
2 min
llm
research paper
Optimizing LLMs operational cost and computation requirement is one of the sortout topics for researchers. Accelerated solutions deploy on mobile, edge devices or commodity…
May 9, 2024

Better & Faster Large Language Models via Multi-token Prediction
2 min
llm
research paper
All Large language models such as GPT and Llama are trained with a next-token prediction loss. However, despite the recent wave of impressive achievements in LLMs…
May 8, 2024

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
1 min
llm
research paper
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment…
May 7, 2024

PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models
2 min
llm
research paper
Proprietary LMs such as GPT-4 model-based evaluation have emerged as a scalable solution for assessing LM-generated text. However, concerns related to transparency…
May 3, 2024

Octopus v4: Graph of language models
2 min
llm
research paper
LLMs have been effective in a wide range of applications, yet the most sophisticated models are often proprietary (GPT 4, Gemini) and considerably costly than open source…
May 2, 2024

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
2 min
llm
research paper
Evaluating language models is a challenging task: not only is it difficult to find meaningful data to test the models, but evaluating the correctness of a generated response…
Apr 30, 2024

Make Your LLM Fully Utilize the Context
2 min
llm
research paper
These days the training context windows of many contemporary LLMs have been expanded to tens of thousands of tokens, thereby enabling these models to process extensive…
Apr 29, 2024

CodecLM: Aligning Language Models with Tailored Synthetic Data
2 min
llm
research paper
Recent progress in instruction tuned LLM highlights the critical role of high-quality data in enhancing LLMs’ instruction-following capabilities. However, acquiring such…
Apr 25, 2024

LLM-R2 : A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
2 min
llm
research paper
Recently, DB query rewrite using LLMs has been one of the sort out use cases. The aim of query rewrite is to output a new query equivalent to the original SQL query, while…
Apr 24, 2024

TransformerFAM: Feedback attention is working memory
2 min
llm
research paper
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. One of the widely used…
Apr 19, 2024

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
1 min
llm
research paper
Recently Google has released RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear RNN with local attention to…
Apr 18, 2024

MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length
2 min
llm
research paper
The Transformer architecture is backbone of any production LLMs, but despite its remarkable capabilities, it faces challenges with quadratic computational complexity and…
Apr 17, 2024

Trust Region Direct Preference Optimization (TR-DPO) : Learn Your Reference Model for Real Good Alignment
2 min
llm
research paper
Aligning large language models with human preferences (RLHF) has become increasingly important to ensure safety and overall usefulness of the model. Traditionally, the…
Apr 16, 2024

RHO-1: Not All Tokens Are What You Need
2 min
llm
research paper
High quality training data sets are crucial to boost LLMs performance. Various data filtering techniques such as heuristics and classifiers are being utilized to select such…
Apr 15, 2024

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
2 min
llm
research paper
Fine-tuning LLMs using Reinforcement Learning from Human Feedback (RLHF) has alway been a preferred way for making LLMs more useful by aligning them with human values or…
Apr 9, 2024

Stream of Search (SoS): Learning to Search in Language
2 min
llm
research paper
Transformer-based auto-regressive models such as GPT have shown remarkable performance in generative tasks but struggle when it comes to complex decision-making and…
Apr 8, 2024

ReFT: Representation Finetuning for Language Models
2 min
llm
research paper
Parameter-efficient finetuning (PEFT) methods have been instrumental in rapid adoption of fine tuned domain specific LLMs. PEFTs not only reduced memory usage and time…
Apr 5, 2024

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
2 min
llm
research paper
Transformer FLOPs Equation or FLOPs-per-token is one of the key attributes in determining computation budget for any transformer base LLM models. Usually in language models…
Apr 4, 2024

sDPO: Don’t Use Your Data All at Once
2 min
llm
research paper
As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important to ensure safety and usefulness of the…
Apr 3, 2024

Gecko: Versatile Text Embeddings Distilled from Large Language Models
2 min
llm
research paper
Recent advancement in the Text Embedding model has been instrumental for various downstream tasks including document retrieval, sentence similarity, classification, and…
Apr 2, 2024

Jamba: A Hybrid Transformer-Mamba Language Model
2 min
llm
research paper
Finally, the first production-grade commercially available Mamba-based model delivering best-in-class quality and performance is here. Introducing Jamba, a novel…
Apr 1, 2024

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
2 min
llm
research paper
LLM empowered multi-modality inputs are becoming an essential part of Vision Language Models (VLMs) such as LLaVA and Otter. However, despite these advancements, a…
Mar 28, 2024

RigorLLM: Resilient Guardrails for large language models against undesired content
2 min
llm
research paper
Large language models (LLMs) have demonstrated impressive capabilities in NLG and different downstream tasks. However, the potential of LLMs to produce biased or harmful…
Mar 27, 2024

DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models
2 min
llm
research paper
Traditional methods of RAG typically rely on single-round retrieval, using the LLM’s initial input to retrieve relevant information from external corpora. While this method…
Mar 26, 2024

SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series
2 min
llm
research paper
Recently, Structured State Space models (SSM) such as Mumba have been pitched as an for Transformer based models especially when it comes to increase efficiency and…
Mar 25, 2024

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
2 min
llm
research paper
Vision language models (VLMs) like GPT-4, LLaMAadapter, and LLaVA have been instrumental in augmenting LLMs with visual understanding capabilities. VLMs serve as…
Mar 22, 2024

Evolutionary Optimization of Model Merging Recipes
2 min
llm
research paper
Model merging offers a novel approach to leverage the strengths of multiple pre-trained models. It allows us to combine task-specific models, each potentially fine-tuned for…
Mar 21, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
2 min
llm
research paper
Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Most of Existing Multimodal Large Language…
Mar 20, 2024

PERL: Parameter Efficient Reinforcement Learning from Human Feedback
2 min
llm
research paper
Reinforcement Learning from Human Feedback (RLHF) is one of the most popular methods to align Pretrained Large Language Models (LLMs) with human preferences. It involves…
Mar 19, 2024

RAFT: Adapting Language Model to Domain Specific RAG
2 min
llm
research paper
Adapting LLMs to the specialized domains, which is essential to many emerging applications, usually takes two paths: in-context learning through Retrieval-Augmented…
Mar 18, 2024

USER-LLM: Efficient LLM Contextualization with User Embeddings
2 min
llm
research paper
Large language models (LLMs) have revolutionized the field of user modeling and personalization due to its ability to learn and adapt from massive amounts of textual data.…
Mar 15, 2024

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
2 min
llm
research paper
Factual correctness has been one of the growing concerns around LLMs reasoning capabilities. This issue becomes more significant when it comes to zero-shot CoT (Chain of…
Mar 14, 2024

MoAI: Mixture of All Intelligence for Large Language and Vision Models
2 min
llm
research paper
Following the success of the instruction-tuned LLMs, several visual instruction tuning datasets have been meticulously curated to enhance zero-shot vision language (VL)…
Mar 13, 2024

VideoMamba: State Space Model for Efficient Video Understanding
2 min
llm
research paper
Mastering spatiotemporal representation is one of the key areas in any video understanding task. However there usually are two challenges associated with it: (1) the large…
Mar 12, 2024

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
2 min
llm
research paper
Adapter-based fine-tuning methods, such as LoRA, are key to making large language models disruptive in various domain specific applications. LoRA introduces a limited number…
Mar 11, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
2 min
llm
research paper
Training Large Language Models (LLMs) is challenging due to memory constraints from weight and optimizer size. Low-rank adaptation (LoRA) addresses this by adding trainable…
Mar 8, 2024

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
2 min
llm
research paper

Mar 7, 2024

Design2Code: How Far Are We From Automating Front-End Engineering?
2 min
llm
research paper
Recent releases of advanced multimodal LLMs such as GPT-4V and Gemini version pro have led to breakthroughs in visual and code generation understanding. This has opened up…
Mar 6, 2024

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
2 min
llm
research paper
In recent years, text-to-image (T2I) generation models such as DreamBooth and BLIP-Diffusion have rapidly evolved, generating intricate and highly detailed images that often…
Mar 5, 2024

VisionLLaMA : A Unified LLaMA Interface for Vision Tasks
2 min
llm
research paper
Large language models, especially the LLaMA family of models, aroused great interest in the research community for multimodal models application, where many methods heavily…
Mar 4, 2024

Beyond Language Models: Byte Models are Digital World Simulators
2 min
llm
research paper
Bytes are the foundation of all digital data, devices, and software, from computer processors to operating systems in everyday electronics. Therefore, training models for…
Mar 1, 2024

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
2 min
llm
research paper
Large Language Models (LLMs) have demonstrated remarkable performance in a wide range of natural language processing tasks, but their increasing size has posed challenges…
Feb 29, 2024

ChunkLlama : Training-Free Long-Context Scaling of Large Language Models
2 min
llm
research paper
The ability to comprehend and process long-context information is essential for large language models (LLMs) to cater to a wide range of applications effectively. Finetuning…
Feb 28, 2024

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
1 min
llm
research paper
MobiLlama, another Small Language Models (SLMs) for resource constrained devices. MobileLlama is a SLM design that initiates from a larger model and applies a careful…
Feb 27, 2024

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
2 min
llm
research paper
Self-attention, one of the critical components in LLM, has a poor performance during inference since it performs intensive memory operations on key/value tensors of context…
Feb 26, 2024

TinyLLaVA: A Framework of Small-scale Large Multimodal Models
2 min
llm
research paper
Large language models (LLMs) with large model size can greatly improve task performance but demand expensive computational resources for training. To address this, the LLM…
Feb 23, 2024

The FinBen: An Holistic Financial Benchmark for Large Language Models
2 min
llm
research paper
Recent studies have shown the great potential of advanced LLMs such as GPT-4 on financial text analysis and prediction tasks in the financial domain. While their potential…
Feb 22, 2024

GRIT : Generative Representational Instruction Tuning
2 min
llm
research paper
All text-based language problems can be reduced to either generation or embedding. Creating a single general model that performs such a wide range of tasks has been a…
Feb 16, 2024

Aespa: Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
2 min
llm
research paper
With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices…
Feb 15, 2024

Graph Mamba: Towards Learning on Graphs with State Space Models
2 min
llm
research paper
Graph Transformers (GTs) has shown promising potential in graph representation learning. GTs, however, have quadratic computational cost, lack inductive biases on graph…
Feb 14, 2024

Fiddler: CPU-GPU Orchestration for Fast Local Inference of MoE Models
2 min
llm
research paper
Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architectures are showing remarkable performance on various tasks. By activating a subset of experts inside…
Feb 13, 2024

PHATGOOSE: Learning to Route Among Specialized Experts for Zero-Shot Generalization
2 min
llm
research paper
The availability of Huggingface PEFT modules has made it cheap and easy to modularly adapt a given pre-trained model to a specific task or domain. In the meantime, extremely…
Feb 12, 2024

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
2 min
llm
research paper
General-purpose LLMs like LLaMA and GPT-4 have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in…
Feb 9, 2024

Hydragen: High-Throughput LLM Inference with Shared Prefixes
2 min
llm
research paper
Transformer-based large language models (LLMs) such as OpenAI GPT3.5 and GPT4 are now deployed to hundreds of millions of users. LLM inference in such scenarios commonly…
Feb 8, 2024

MambaFormer: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
2 min
llm
research paper
State-space models (SSMs), such as Mamba, have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and…
Feb 7, 2024

BlackMamba: Mixture of Experts for State-Space Models
2 min
llm
research paper
State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and…
Feb 6, 2024

Repeat After Me: Transformers are Better than State Space Models at Copying
4 min
llm
research paper
Feb 5, 2024

Re3val: Reinforced and Reranked Generative Retrieval
2 min
llm
research paper
The primary objective of retrieval models is to enhance the accuracy of answers by selecting the most relevant documents retrieved for a given query, ensuring models have…
Feb 2, 2024

FIND: INterface for Foundation models’ embeDDings
2 min
llm
research paper
Foundation models across the vision and language domains, such as GPT4, DALLE-3, SAM and LLaMA etc., have demonstrated significant advancements in addressing open-ended…
Feb 1, 2024

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
2 min
llm
research paper
Vision Language Models (VLMs), such as OpenAI’s GPT-4, Flamingo, BLIP-2 and LLaVA have demonstrated significant advancements in addressing open-ended visual…
Jan 31, 2024

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
2 min
llm
research paper
For Large Vision-Language Models (LVLMs), scaling the model can effectively improve performance. However, expanding model parameters significantly increases the training and…
Jan 30, 2024

EAGLE: Extrapolation Algorithm for Greater Language-model Efficiency
2 min
llm
research paper
Auto-regressive decoding has become the de facto standard for large language models (LLMs). This process generates output tokens one at a time, which makes the generation by…
Jan 29, 2024

MambaByte: Token-free Selective State Space Model
2 min
llm
research paper
In December 2023, “Mamba : Linear-Time Sequence Modeling with Selective State Spaces” paper was release and with it the whole discussion about Mamba (SSM) been a viable…
Jan 25, 2024

Instruction-Tune Llama2 with TRL
7 min
hugging face
llm
model building
This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI. The idea of the blog post is to focus on creating the instruction dataset, which we can then…
Jan 25, 2024

Towards Conversational Diagnostic AI
2 min
llm
research paper
With the Med-PaLM series of LLMs Google is one of the few companies you can claim expertise in building medical domain specific LLMs. The latest addition has been AMIE…
Jan 24, 2024

ChatQA: Building GPT-4 Level Conversational QA Models
1 min
llm
research paper
With all open source LLM models trying to outperform GPT-4 one may wonder, which one has truly been successful in Conversational QA - one of the elementary use cases of LLMs.
Jan 23, 2024

How to Fine-Tune LLMs with TRL
11 min
hugging face
llm
model building
Large Language Models or LLMs have seen a lot of progress in the last year. We went from now ChatGPT competitor to a whole zoo of LLMs, including Meta AI’s Llama 2, Mistrals …
Jan 23, 2024

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
2 min
llm
research paper
 
Jan 22, 2024

Merge Model using Mergekit
8 min
tools
llm
model building
Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU…
Jan 22, 2024

Tuning Language Models by Proxy
1 min
llm
research paper
These days capabilities of large pretrained LLMs can be significantly enhanced for specific domains of interest or task using additional fine tuning. However, tuning these…
Jan 19, 2024

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
1 min
llm
research paper
Recently Microsoft DeepSpeed launched DeepSpeed-FastGen LLM serving framework, which offers up to 2.3x higher effective throughput compared to state-of-the-art systems like…
Jan 18, 2024

Self-Evaluation Improves Selective Generation in Large Language Models
2 min
llm
research paper
Trustworthiness of LLMs output is one of the important considerations for safe deployment of LLMs in production.Once of the straightforward way to do so is by measuring…
Jan 17, 2024

Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections
1 min
llm
research paper
Self-RAG is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the…
Jan 16, 2024

Reciprocal Rank Fusion (RRF) with LambdaMART: Context Tuning for Retrieval Augmented Generation (RAG)
1 min
llm
research paper
RAG typically consists of three primary components: Tool Retrieval, Plan Generation, and Execution. Existing RAG methodologies rely heavily on semantic search for tool…
Jan 15, 2024

Chain of Thought (CoT): The Impact of Reasoning Step Length on Large Language Models
2 min
llm
research paper
If you are doing prompt engineering for LLMs then you might have come across Chain of Thought (CoT) prompting, which is significant in improving the reasoning abilities of…
Jan 12, 2024

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
2 min
llm
research paper
Introducing DistAttention, a distributed attention algorithm, and DistKV-LLM, a distributed LLM serving system, to improve the performance and resource management of…
Jan 11, 2024

Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
1 min
llm
research paper
Activation Beacon is a plug-and-play module for large language models that allows them to process longer contexts with a limited context window, while preserving their…
Jan 10, 2024

Improving Text Embeddings with Large Language Models using fine-tuned Mistral-7B LLM
1 min
llm
research paper
Check out a groundbreaking paper on improving text embeddings with large language models (LLMs) like GPT-4! The authors propose generating synthetic training data for text…
Jan 9, 2024

DOCLLM: A Layout Aware Generative Language Models for Multi model document understanding
1 min
llm
research paper
Introducing DocLLM, a groundbreaking generative language model that can understand visually rich documents without the need for expensive image encoders. DocLLM uses a…
Jan 8, 2024

Self-Play Fine-Tuning (SPIN): Converts Weak Language Models to Strong Language Models
1 min
llm
research paper
Self-Play Fine-Tuning (SPIN) is a new fine-tuning method to improve large language models (LLMs) without needing additional human-annotated data.
Jan 5, 2024

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
1 min
llm
research paper
The paper provides a comprehensive taxonomy categorizing over 32 techniques for mitigating hallucinations in large language models (LLMs). It groups the techniques into…
Jan 4, 2024

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
1 min
llm
research paper
With only four lines of code modification, the proposed method can effortlessly extend existing LLMs’ context window without any fine-tuning. This work elicits LLMs’…
Jan 3, 2024

Mamba-Chat: A Chat LLM based on State Space Models
1 min
llm
research paper
Mamba-Chat is the first chat language model based on a state-space model architecture, not a transformer.
Jan 2, 2024

KwaiAgents: Generalized Information-seeking Agent System with LLMs - 2 Open-source models fine tuned for agent systems! Better than GPT-3.5 turbo as an agent!
2 min
llm
research paper
Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this…
Jan 1, 2024
No matching items
     

    © 2025 Santosh Sawant · LLM Architect