Home
Resume
Articles
Categories
All
(175)
hugging face
(2)
llm
(175)
model building
(3)
research paper
(172)
tools
(1)
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization
llm
research paper
Recently, the ability to follow complex instructions with multiple constraints is gaining increasing attention as LLMs are deployed in sophisticated real-world applications.…
Nov 15, 2024
Santosh Sawant
SEALONG: Large Language Models Can Self-Improve in Long-context Reasoning
llm
research paper
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically…
Nov 14, 2024
Santosh Sawant
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
llm
research paper
Recently there has been growing trends of developing sophisticated LLM models specialized in both image comprehension and text-to-image generation. This is achieved…
Nov 13, 2024
Santosh Sawant
NEKO: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
llm
research paper
The challenge in building a general-purpose post recognition error corrector, which is required to evaluate your fine tuned models on a custom dataset, is how to train a…
Nov 12, 2024
Santosh Sawant
BitNet a4.8: 4-bit Activations for 1-bit LLMs
llm
research paper
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining…
Nov 11, 2024
Santosh Sawant
Structrag: Boosting Knowledge Intensive Reasoning Of Llms Via Inference-Time Hybrid Information Structurization
llm
research paper
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) in many knowledge-based tasks. However, existing RAG methods struggle…
Oct 14, 2024
Santosh Sawant
Backtracking Improves Generation Safety
llm
research paper
LLM has a fundamental limitation almost by definition: there is no taking back tokens that have been generated, even when they are clearly problematic. In the context of…
Oct 3, 2024
Santosh Sawant
RULER : A Model-Agnostic Method to Control Generated Length for Large Language Models
llm
research paper
The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a…
Oct 1, 2024
Santosh Sawant
Style over Substance: failure modes of LLM judges in alignment benchmarking.
llm
research paper
Recently LLM-judge benchmarks such as MT-Bench, Alpaca Eval, and Arena-Hard-Auto have been a go to tool to simultaneously automate evaluation of LLMs while also aligning…
Sep 27, 2024
Santosh Sawant
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models
llm
research paper
Large Language Models (LLMs) have demonstrated remarkable effectiveness across a diverse range of tasks. However, LLMs are usually distinguished by their massive parameter…
Sep 27, 2024
Santosh Sawant
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
llm
research paper
Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced…
Sep 26, 2024
Santosh Sawant
Making Text Embedders Few-Shot Learners
llm
research paper
LLM-based embedding models have demonstrated remarkable improvements in in-domain accuracy and generalization, particularly when trained using supervised learning approaches…
Sep 25, 2024
Santosh Sawant
Introducing Contextual Retrieval
llm
research paper
In traditional RAG, documents are typically split into smaller chunks for efficient retrieval. While this approach works well for many applications, it can lead to problems…
Sep 23, 2024
Santosh Sawant
Training Language Models to Self-Correct via Reinforcement Learning
llm
research paper
Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing…
Sep 20, 2024
Santosh Sawant
Training Language Models to Self-Correct via Reinforcement Learning
llm
research paper
Recently, jina.ai have released jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and…
Sep 19, 2024
Santosh Sawant
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
llm
research paper
Modern information retrieval (IR) models generally match queries to passages based on a single semantic similarity score. This can make the search experience confusing for…
Sep 18, 2024
Santosh Sawant
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
llm
research paper
Transformer-based large Language Models (LLMs) become increasingly important in various domains. However, the quadratic time complexity of attention operation poses a…
Sep 17, 2024
Santosh Sawant
Self-Harmonized Chain of Thought
llm
research paper
Chain-of-thought (CoT) prompting reveals that large language models are capable of performing complex reasoning via intermediate steps. CoT methods in large language models…
Sep 16, 2024
Santosh Sawant
OneGen: efficient one-pass unified generation and retrieval for llms
llm
research paper
Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face…
Sep 13, 2024
Santosh Sawant
Agent Workflow Memory (AWM)
llm
research paper
Recently, LLM-based agents have shown promise for real-world tasks like web navigation, but they still struggle with complex, long-term tasks. Unlike these models, humans…
Sep 12, 2024
Santosh Sawant
MemoRAG: moving towards next-gen rag via memory-inspired knowledge discovery
llm
research paper
Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through…
Sep 11, 2024
Santosh Sawant
GraphRAG auto-tuning provides rapid adaptation to new domains
llm
research paper
GraphRAG uses large language models (LLMs), guided by a set of domain-specific prompts, to create a comprehensive knowledge graph that details entities and their…
Sep 10, 2024
Santosh Sawant
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding
llm
research paper
Multimodel Large Language Models(MLLMs) have achieved promising OCR free Document Understanding performance by increasing the supported resolution of document images.…
Sep 9, 2024
Santosh Sawant
Generative Verifiers: Reward Modeling as Next-Token Prediction
llm
research paper
While large language models (LLMs) demonstrate remarkable capabilities, they often confidently make logical and factual mistakes, which can invalidate the entire solution. A…
Aug 30, 2024
Santosh Sawant
GEagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
llm
research paper
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual…
Aug 28, 2024
Santosh Sawant
Efficient Detection of Toxic Prompts in Large Language Models
llm
research paper
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing. However, these models can be exploited by malicious individuals…
Aug 27, 2024
Santosh Sawant
LLM Pruning and Distillation in Practice: The Minitron Approach
llm
research paper
Over the past few years, significant advancements have blossomed in the two key pillars of multimodal intelligence: understanding and generation. Recent works have tried to…
Aug 26, 2024
Santosh Sawant
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search
llm
research paper
Recent studies have demonstrated how Large Language Models (LLMs) can be utilized to learn skills for improved decision-making in interactive environments. However, learning…
Aug 23, 2024
Santosh Sawant
LLM Pruning and Distillation in Practice: The Minitron Approach
llm
research paper
Training multiple multi-billion parameter large language models from scratch is extremely time-, data- and resource-intensive. However, recent work has demonstrated the…
Aug 22, 2024
Santosh Sawant
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
llm
research paper
Multi-modal generative models need to be able to perceive, process, and produce both discrete elements (such as text or code) and continuous elements (e.g. image, audio, and…
Aug 21, 2024
Santosh Sawant
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
llm
research paper
The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs…
Aug 20, 2024
Santosh Sawant
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
llm
research paper
Large Multimodal Models (LMMs) have attracted significant attention with their potential applications and emergent capabilities. However, recent works have demonstrated that…
Aug 19, 2024
Santosh Sawant
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
llm
research paper
Recent advancements in large language models have significantly influenced mathematical reasoning and theorem proving in artificial intelligence. Despite notable progress in…
Aug 16, 2024
Santosh Sawant
rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
llm
research paper
Despite their success, large language models face significant challenges in complex reasoning tasks. Although fine-tuning is shown to be an effective way to improve…
Aug 13, 2024
Santosh Sawant
PAD: Prioritize Alignment in Dataset Distillation
llm
research paper
Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve…
Aug 12, 2024
Santosh Sawant
CODEXGRAPH: Bridging Large Language Models and Code Repositories via Code Graph Databases
llm
research paper
Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. Current solutions rely on…
Aug 9, 2024
Santosh Sawant
Synthesizing Text-to-SQL Data from Weak and Strong LLMs
llm
research paper
Text-to-SQL has been one of the shout-out use cases in AI application development especially with close source LLM such as GPT4. However, the adoption of closed source LLMs…
Aug 8, 2024
Santosh Sawant
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
llm
research paper
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions.…
Aug 6, 2024
Santosh Sawant
ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget
llm
research paper
Extracting structured information from unstructured text lies at the core of many Gen AI problems such as Information Retrieval, Knowledge Graph Construction, Knowledge…
Aug 5, 2024
Santosh Sawant
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
llm
research paper
Standard prompt-based LLM inference has two sequential stages: prefilling and decoding. During the prefilling stage, the model computes and saves the KV cache of each token…
Aug 2, 2024
Santosh Sawant
DiT-MoE : Scaling Diffusion Transformers to 16 Billion Parameters
llm
research paper
Recently, diffusion models (DiT) have emerged as powerful deep generative models in various domains, such as image, video and 3D objects. However, training and serving such…
Aug 1, 2024
Santosh Sawant
DDK: Distilling Domain Knowledge for Efficient Large Language Models
llm
research paper
Despite the advance of large language models (LLMs) in various applications, it still faces significant challenges to propagate further due to high computational and storage…
Jul 31, 2024
Santosh Sawant
Chain of Diagnosis (CoD): Towards an Interpretable Medical Agent
llm
research paper
The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within…
Jul 26, 2024
Santosh Sawant
LAMBDA: A Large Model Based Data Agent
llm
research paper
Large Language Models (LLMs) have been instrumental in pushing innovation across multiple domains. However, despite these advancements, the current LLM paradigm encounters…
Jul 26, 2024
Santosh Sawant
VILA2: VILA Augmented VILA
llm
research paper
Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). However data curation of VLMs still remains under-explored.…
Jul 25, 2024
Santosh Sawant
The Llama 3 Herd of Models
llm
research paper
The Llama 3.1 release marked a big milestone for LLM researchers and the open source AI community. Meta engineers trained Llama 3.1 on NVIDIA H100 Tensor Core GPUs. They…
Jul 24, 2024
Santosh Sawant
BOND: Aligning LLMs with Best-of-N Distillation
llm
research paper
State-of-the-art large language models (LLMs) such as Gemin and GPT-4 are generally trained in three stages. First, LLMs are pre-trained on large corpora of knowledge using…
Jul 23, 2024
Santosh Sawant
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
llm
research paper
Recently, considerable research work has been going towards reducing high computational cost and memory footprint of LLMs, especially during the inference stage. Sparsity is…
Jul 22, 2024
Santosh Sawant
Beyond KV Caching: Shared Attention for Efficient LLMs
llm
research paper
The efficiency of large language models (LLMs) remains a critical challenge, particularly in contexts where computational resources are limited. Traditional attention…
Jul 19, 2024
Santosh Sawant
E5-V: Universal Embeddings with Multimodal Large Language Models
llm
research paper
With the development of Multimodal Large Language Models (MLLMs), there is an increasing need for embedding models to represent multimodal inputs. Although CLIP shows…
Jul 17, 2024
Santosh Sawant
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?
llm
research paper
The capability of LLMs to process long texts is particularly crucial across various domains. Considering the critical role of LLMs in handling long texts, numerous…
Jul 17, 2024
Santosh Sawant
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
llm
research paper
Parameter-efficient transfer learning (PETL) is widely used for domain adaptation of large pre-trained models to specific downstream tasks, greatly reducing trainable…
Jul 16, 2024
Santosh Sawant
AgentInstruct: Toward Generative Teaching with Agentic Flows
llm
research paper
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases…
Jul 15, 2024
Santosh Sawant
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
llm
research paper
FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is used across various libs to accelerate…
Jul 12, 2024
Santosh Sawant
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems
llm
research paper
Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems…
Jul 11, 2024
Santosh Sawant
Composable Interventions for Language Models
llm
research paper
Language models (LMs) exhibit striking capabilities on various important tasks but despite such high performance, LMs generated content are usually prone to be…
Jul 10, 2024
Santosh Sawant
Associative Recurrent Memory Transformer
llm
research paper
Long sequence LLMs are some of the challenging models to work around as memory plays a crucial role processing extremely long contexts and utilizing remote past information.…
Jul 9, 2024
Santosh Sawant
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
llm
research paper
Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory…
Jul 8, 2024
Santosh Sawant
Adam-mini: Use Fewer Learning Rates To Gain More
llm
research paper
Adam(W) has become the de-facto optimizer for training large language models (LLMs). Despite its superior performance, Adam is expensive to use. Specifically, Adam requires…
Jul 5, 2024
Santosh Sawant
Searching for Best Practices in Retrieval-Augmented Generation
llm
research paper
Retrieval-augmented generation (RAG) techniques have proven to be effective in enhancing LLMs response quality, particularly in specialized domains. While many RAG…
Jul 4, 2024
Santosh Sawant
MInference: a Million-token inference on a single A100 machine
llm
research paper
The computational challenges of LLM inference remain a significant barrier to their widespread deployment, especially as context lengths continue to increase. Existing…
Jul 3, 2024
Santosh Sawant
MIRAI: Evaluating LLM Agents for Event Forecasting
llm
research paper
Recent advancements in Large Language Models (LLMs) have enabled LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex…
Jul 2, 2024
Santosh Sawant
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation
llm
research paper
Retrieval-Augmented Generation (RAG) has emerged as a prominent framework for building ML/AI solutions with LLMs. Additional modules such as query rewriting, prompt…
Jul 1, 2024
Santosh Sawant
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization
llm
research paper
Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of…
Jun 28, 2024
Santosh Sawant
Instruction Pre-Training: Language Models are Supervised Multi Task Learners
llm
research paper
Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds…
Jun 26, 2024
Santosh Sawant
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs
llm
research paper
In the traditional RAG framework, the basic retrieval units are normally short but the retriever needs to scan over a massive amount of units to find the relevant piece.…
Jun 24, 2024
Santosh Sawant
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
llm
research paper
Large language models have shown promising results in arithmetic and symbolic reasoning by expressing intermediate reasoning in text as a chain of thought, yet struggle to…
Jun 20, 2024
Santosh Sawant
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
llm
research paper
Today’s almost all LLMs are predominantly designed as monolithic architectures, these models rely extensively on large-scale data to embed generalized language capabilities…
Jun 20, 2024
Santosh Sawant
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
llm
research paper
Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by…
Jun 19, 2024
Santosh Sawant
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation
llm
research paper
Nowadays Large language models (LLMs) with large context windows are capable of processing lengthy dialogue histories during prolonged interaction with users without…
Jun 18, 2024
Santosh Sawant
Ad Auctions for LLMs via Retrieval Augmented Generation
llm
research paper
Large language models (LLMs) have been making headway in various domains and now also in the field of computational advertising. Now with the integration of ads into the…
Jun 17, 2024
Santosh Sawant
Improving Alignment and Robustness with Circuit Breakers
llm
research paper
Large language models (LLMs) have been instrumental in pushing the boundaries of various real-world applications mostly which are associated with long-sequence inputs, such…
Jun 14, 2024
Santosh Sawant
Improving Alignment and Robustness with Circuit Breakers
llm
research paper
The landscape of artificial intelligence (AI) has long been marred by the persistent threat of adversarial attacks, particularly those targeting neural networks. The rise of…
Jun 13, 2024
Santosh Sawant
TEXTGRAD : Automatic “Differentiation” via Text
llm
research paper
There is an emerging paradigm shift in how AI systems are built these days. The new generation of AI applications are increasingly compound systems involving multiple…
Jun 12, 2024
Santosh Sawant
HUSKY: A Unified, Open-Source Language Agent for Multi-Step Reasoning
llm
research paper
Recent advances in the capabilities of large language models (LLMs) have led to the development of language agents to address complex, multi-step tasks. However, most…
Jun 11, 2024
Santosh Sawant
Mixture-of-Agents : Enhances Large Language Model Capabilities
llm
research paper
Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. However, despite the plethora of…
Jun 10, 2024
Santosh Sawant
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
llm
research paper
Recently, various prompting methods such as CoT, ToT and GoT have been instrumental in improving reasoning performance of LLMs. All these methods can be broadly divided into…
Jun 7, 2024
Santosh Sawant
Block Transformer: Global-to-Local Language Modeling for Fast Inference
llm
research paper
Generating tokens with transformer-based autoregressive language models (LMs) is costly due to the self-attention mechanism that attends to all previous tokens. To apply…
Jun 6, 2024
Santosh Sawant
Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback
llm
research paper
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, alignment can be challenging, especially for…
Jun 5, 2024
Santosh Sawant
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
llm
research paper
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can…
Jun 4, 2024
Santosh Sawant
Contextual Position Encoding: Learning to Count What’s Important
llm
research paper
The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but the attention mechanism…
Jun 3, 2024
Santosh Sawant
Similarity is Not All You Need: Endowing Retrieval-Augmented Generation with Multi–layered Thoughts
llm
research paper
Retrieval-augmented generation (RAG) has been pencil in pushing LLM use cases in the Knowledge management system. Nevertheless, existing retrieval-augmented generation…
May 31, 2024
Santosh Sawant
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
llm
research paper
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these…
May 30, 2024
Santosh Sawant
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
llm
research paper
LLM Training and finetuning are still far too computationally and memory intensive tasks. Several techniques have been proposed to reduce these memory requirements, such as…
May 29, 2024
Santosh Sawant
Zamba: A Compact 7B SSM Hybrid Model
llm
research paper
Recently, State-of-the-art Transformer-SSM hybrid Architecture has been a driving force in Open source LLMs. Inline with such trends researchers from Zyphra have launched…
May 28, 2024
Santosh Sawant
Layer-Condensed KV Cache for Efficient Inference of Large Language Models
llm
research paper
Key-value (KV) cache is one of the most significant parts of any transformer based LLM model and takes over 30% of the GPU memory during deployment. Hence KV cache plays a…
May 20, 2024
Santosh Sawant
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
llm
research paper
Recently small-scale visual language models performance have come in par with its larger-scale counterparts. Models such as LLaVAPhi [47], which combines the open source…
May 16, 2024
Santosh Sawant
SUTRA: Scalable Multilingual language model architecture
llm
research paper
Recent advancements in Large Language Models (LLMs) have predominantly focused on a limited set of data-rich languages, with training datasets being notably skewed towards…
May 15, 2024
Santosh Sawant
Linearizing Large Language Models
llm
research paper
Over the last few years, Transformers have displaced Recurrent Neural Networks (RNNs) in sequence modeling tasks, owing to their highly parallel training efficiency and…
May 14, 2024
Santosh Sawant
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
llm
research paper
The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions…
May 13, 2024
Santosh Sawant
Is Flash Attention Stable?
llm
research paper
Given the size and complexity of workloads, training Large Language Models (LLMs) often takes months together, across hundreds or thousands of GPUs. For example, LLaMA2’s…
May 10, 2024
Santosh Sawant
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
llm
research paper
Optimizing LLMs operational cost and computation requirement is one of the sortout topics for researchers. Accelerated solutions deploy on mobile, edge devices or commodity…
May 9, 2024
Santosh Sawant
Better & Faster Large Language Models via Multi-token Prediction
llm
research paper
All Large language models such as GPT and Llama are trained with a next-token prediction loss. However, despite the recent wave of impressive achievements in LLMs…
May 8, 2024
Santosh Sawant
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
llm
research paper
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment…
May 7, 2024
Santosh Sawant
PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models
llm
research paper
Proprietary LMs such as GPT-4 model-based evaluation have emerged as a scalable solution for assessing LM-generated text. However, concerns related to transparency…
May 3, 2024
Santosh Sawant
Octopus v4: Graph of language models
llm
research paper
LLMs have been effective in a wide range of applications, yet the most sophisticated models are often proprietary (GPT 4, Gemini) and considerably costly than open source…
May 2, 2024
Santosh Sawant
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
llm
research paper
Evaluating language models is a challenging task: not only is it difficult to find meaningful data to test the models, but evaluating the correctness of a generated response…
Apr 30, 2024
Santosh Sawant
Make Your LLM Fully Utilize the Context
llm
research paper
These days the training context windows of many contemporary LLMs have been expanded to tens of thousands of tokens, thereby enabling these models to process extensive…
Apr 29, 2024
Santosh Sawant
CodecLM: Aligning Language Models with Tailored Synthetic Data
llm
research paper
Recent progress in instruction tuned LLM highlights the critical role of high-quality data in enhancing LLMs’ instruction-following capabilities. However, acquiring such…
Apr 25, 2024
Santosh Sawant
LLM-R2 : A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
llm
research paper
Recently, DB query rewrite using LLMs has been one of the sort out use cases. The aim of query rewrite is to output a new query equivalent to the original SQL query, while…
Apr 24, 2024
Santosh Sawant
TransformerFAM: Feedback attention is working memory
llm
research paper
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. One of the widely used…
Apr 19, 2024
Santosh Sawant
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
llm
research paper
Recently Google has released RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear RNN with local attention to…
Apr 18, 2024
Santosh Sawant
MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length
llm
research paper
The Transformer architecture is backbone of any production LLMs, but despite its remarkable capabilities, it faces challenges with quadratic computational complexity and…
Apr 17, 2024
Santosh Sawant
Trust Region Direct Preference Optimization (TR-DPO) : Learn Your Reference Model for Real Good Alignment
llm
research paper
Aligning large language models with human preferences (RLHF) has become increasingly important to ensure safety and overall usefulness of the model. Traditionally, the…
Apr 16, 2024
Santosh Sawant
RHO-1: Not All Tokens Are What You Need
llm
research paper
High quality training data sets are crucial to boost LLMs performance. Various data filtering techniques such as heuristics and classifiers are being utilized to select such…
Apr 15, 2024
Santosh Sawant
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
llm
research paper
Fine-tuning LLMs using Reinforcement Learning from Human Feedback (RLHF) has alway been a preferred way for making LLMs more useful by aligning them with human values or…
Apr 9, 2024
Santosh Sawant
Stream of Search (SoS): Learning to Search in Language
llm
research paper
Transformer-based auto-regressive models such as GPT have shown remarkable performance in generative tasks but struggle when it comes to complex decision-making and…
Apr 8, 2024
Santosh Sawant
ReFT: Representation Finetuning for Language Models
llm
research paper
Parameter-efficient finetuning (PEFT) methods have been instrumental in rapid adoption of fine tuned domain specific LLMs. PEFTs not only reduced memory usage and time…
Apr 5, 2024
Santosh Sawant
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
llm
research paper
Transformer FLOPs Equation or FLOPs-per-token is one of the key attributes in determining computation budget for any transformer base LLM models. Usually in language models…
Apr 4, 2024
Santosh Sawant
sDPO: Don’t Use Your Data All at Once
llm
research paper
As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important to ensure safety and usefulness of the…
Apr 3, 2024
Santosh Sawant
Gecko: Versatile Text Embeddings Distilled from Large Language Models
llm
research paper
Recent advancement in the Text Embedding model has been instrumental for various downstream tasks including document retrieval, sentence similarity, classification, and…
Apr 2, 2024
Santosh Sawant
Jamba: A Hybrid Transformer-Mamba Language Model
llm
research paper
Finally, the first production-grade commercially available Mamba-based model delivering best-in-class quality and performance is here. Introducing Jamba, a novel…
Apr 1, 2024
Santosh Sawant
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models
llm
research paper
LLM empowered multi-modality inputs are becoming an essential part of Vision Language Models (VLMs) such as LLaVA and Otter. However, despite these advancements, a…
Mar 28, 2024
Santosh Sawant
RigorLLM: Resilient Guardrails for large language models against undesired content
llm
research paper
Large language models (LLMs) have demonstrated impressive capabilities in NLG and different downstream tasks. However, the potential of LLMs to produce biased or harmful…
Mar 27, 2024
Santosh Sawant
DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models
llm
research paper
Traditional methods of RAG typically rely on single-round retrieval, using the LLM’s initial input to retrieve relevant information from external corpora. While this method…
Mar 26, 2024
Santosh Sawant
SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series
llm
research paper
Recently, Structured State Space models (SSM) such as Mumba have been pitched as an for Transformer based models especially when it comes to increase efficiency and…
Mar 25, 2024
Santosh Sawant
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
llm
research paper
Vision language models (VLMs) like GPT-4, LLaMAadapter, and LLaVA have been instrumental in augmenting LLMs with visual understanding capabilities. VLMs serve as…
Mar 22, 2024
Santosh Sawant
Evolutionary Optimization of Model Merging Recipes
llm
research paper
Model merging offers a novel approach to leverage the strengths of multiple pre-trained models. It allows us to combine task-specific models, each potentially fine-tuned for…
Mar 21, 2024
Santosh Sawant
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
llm
research paper
Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Most of Existing Multimodal Large Language…
Mar 20, 2024
Santosh Sawant
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
llm
research paper
Reinforcement Learning from Human Feedback (RLHF) is one of the most popular methods to align Pretrained Large Language Models (LLMs) with human preferences. It involves…
Mar 19, 2024
Santosh Sawant
RAFT: Adapting Language Model to Domain Specific RAG
llm
research paper
Adapting LLMs to the specialized domains, which is essential to many emerging applications, usually takes two paths: in-context learning through Retrieval-Augmented…
Mar 18, 2024
Santosh Sawant
USER-LLM: Efficient LLM Contextualization with User Embeddings
llm
research paper
Large language models (LLMs) have revolutionized the field of user modeling and personalization due to its ability to learn and adapt from massive amounts of textual data.…
Mar 15, 2024
Santosh Sawant
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation
llm
research paper
Factual correctness has been one of the growing concerns around LLMs reasoning capabilities. This issue becomes more significant when it comes to zero-shot CoT (Chain of…
Mar 14, 2024
Santosh Sawant
MoAI: Mixture of All Intelligence for Large Language and Vision Models
llm
research paper
Following the success of the instruction-tuned LLMs, several visual instruction tuning datasets have been meticulously curated to enhance zero-shot vision language (VL)…
Mar 13, 2024
Santosh Sawant
VideoMamba: State Space Model for Efficient Video Understanding
llm
research paper
Mastering spatiotemporal representation is one of the key areas in any video understanding task. However there usually are two challenges associated with it: (1) the large…
Mar 12, 2024
Santosh Sawant
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models
llm
research paper
Adapter-based fine-tuning methods, such as LoRA, are key to making large language models disruptive in various domain specific applications. LoRA introduces a limited number…
Mar 11, 2024
Santosh Sawant
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
llm
research paper
Training Large Language Models (LLMs) is challenging due to memory constraints from weight and optimizer size. Low-rank adaptation (LoRA) addresses this by adding trainable…
Mar 8, 2024
Santosh Sawant
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
llm
research paper
Mar 7, 2024
Santosh Sawant
Design2Code: How Far Are We From Automating Front-End Engineering?
llm
research paper
Recent releases of advanced multimodal LLMs such as GPT-4V and Gemini version pro have led to breakthroughs in visual and code generation understanding. This has opened up…
Mar 6, 2024
Santosh Sawant
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model
llm
research paper
In recent years, text-to-image (T2I) generation models such as DreamBooth and BLIP-Diffusion have rapidly evolved, generating intricate and highly detailed images that often…
Mar 5, 2024
Santosh Sawant
VisionLLaMA : A Unified LLaMA Interface for Vision Tasks
llm
research paper
Large language models, especially the LLaMA family of models, aroused great interest in the research community for multimodal models application, where many methods heavily…
Mar 4, 2024
Santosh Sawant
Beyond Language Models: Byte Models are Digital World Simulators
llm
research paper
Bytes are the foundation of all digital data, devices, and software, from computer processors to operating systems in everyday electronics. Therefore, training models for…
Mar 1, 2024
Santosh Sawant
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
llm
research paper
Large Language Models (LLMs) have demonstrated remarkable performance in a wide range of natural language processing tasks, but their increasing size has posed challenges…
Feb 29, 2024
Santosh Sawant
ChunkLlama : Training-Free Long-Context Scaling of Large Language Models
llm
research paper
The ability to comprehend and process long-context information is essential for large language models (LLMs) to cater to a wide range of applications effectively. Finetuning…
Feb 28, 2024
Santosh Sawant
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT
llm
research paper
MobiLlama, another Small Language Models (SLMs) for resource constrained devices. MobileLlama is a SLM design that initiates from a larger model and applies a careful…
Feb 27, 2024
Santosh Sawant
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition
llm
research paper
Self-attention, one of the critical components in LLM, has a poor performance during inference since it performs intensive memory operations on key/value tensors of context…
Feb 26, 2024
Santosh Sawant
TinyLLaVA: A Framework of Small-scale Large Multimodal Models
llm
research paper
Large language models (LLMs) with large model size can greatly improve task performance but demand expensive computational resources for training. To address this, the LLM…
Feb 23, 2024
Santosh Sawant
The FinBen: An Holistic Financial Benchmark for Large Language Models
llm
research paper
Recent studies have shown the great potential of advanced LLMs such as GPT-4 on financial text analysis and prediction tasks in the financial domain. While their potential…
Feb 22, 2024
Santosh Sawant
GRIT : Generative Representational Instruction Tuning
llm
research paper
All text-based language problems can be reduced to either generation or embedding. Creating a single general model that performs such a wide range of tasks has been a…
Feb 16, 2024
Santosh Sawant
Aespa: Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
llm
research paper
With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices…
Feb 15, 2024
Santosh Sawant
Graph Mamba: Towards Learning on Graphs with State Space Models
llm
research paper
Graph Transformers (GTs) has shown promising potential in graph representation learning. GTs, however, have quadratic computational cost, lack inductive biases on graph…
Feb 14, 2024
Santosh Sawant
Fiddler: CPU-GPU Orchestration for Fast Local Inference of MoE Models
llm
research paper
Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architectures are showing remarkable performance on various tasks. By activating a subset of experts inside…
Feb 13, 2024
Santosh Sawant
PHATGOOSE: Learning to Route Among Specialized Experts for Zero-Shot Generalization
llm
research paper
The availability of Huggingface PEFT modules has made it cheap and easy to modularly adapt a given pre-trained model to a specific task or domain. In the meantime, extremely…
Feb 12, 2024
Santosh Sawant
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
llm
research paper
General-purpose LLMs like LLaMA and GPT-4 have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in…
Feb 9, 2024
Santosh Sawant
Hydragen: High-Throughput LLM Inference with Shared Prefixes
llm
research paper
Transformer-based large language models (LLMs) such as OpenAI GPT3.5 and GPT4 are now deployed to hundreds of millions of users. LLM inference in such scenarios commonly…
Feb 8, 2024
Santosh Sawant
MambaFormer: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
llm
research paper
State-space models (SSMs), such as Mamba, have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and…
Feb 7, 2024
Santosh Sawant
BlackMamba: Mixture of Experts for State-Space Models
llm
research paper
State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and…
Feb 6, 2024
Santosh Sawant
Repeat After Me: Transformers are Better than State Space Models at Copying
llm
research paper
Feb 5, 2024
Santosh Sawant
Re3val: Reinforced and Reranked Generative Retrieval
llm
research paper
The primary objective of retrieval models is to enhance the accuracy of answers by selecting the most relevant documents retrieved for a given query, ensuring models have…
Feb 2, 2024
Santosh Sawant
FIND: INterface for Foundation models’ embeDDings
llm
research paper
Foundation models across the vision and language domains, such as GPT4, DALLE-3, SAM and LLaMA etc., have demonstrated significant advancements in addressing open-ended…
Feb 1, 2024
Santosh Sawant
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
llm
research paper
Vision Language Models (VLMs), such as OpenAI’s GPT-4, Flamingo, BLIP-2 and LLaVA have demonstrated significant advancements in addressing open-ended visual…
Jan 31, 2024
Santosh Sawant
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
llm
research paper
For Large Vision-Language Models (LVLMs), scaling the model can effectively improve performance. However, expanding model parameters significantly increases the training and…
Jan 30, 2024
Santosh Sawant
EAGLE: Extrapolation Algorithm for Greater Language-model Efficiency
llm
research paper
Auto-regressive decoding has become the de facto standard for large language models (LLMs). This process generates output tokens one at a time, which makes the generation by…
Jan 29, 2024
Santosh Sawant
MambaByte: Token-free Selective State Space Model
llm
research paper
In December 2023, “Mamba : Linear-Time Sequence Modeling with Selective State Spaces” paper was release and with it the whole discussion about Mamba (SSM) been a viable…
Jan 25, 2024
Santosh Sawant
Instruction-Tune Llama2 with TRL
hugging face
llm
model building
This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI. The idea of the blog post is to focus on creating the instruction dataset, which we can then…
Jan 25, 2024
Santosh Sawant
Towards Conversational Diagnostic AI
llm
research paper
With the Med-PaLM series of LLMs Google is one of the few companies you can claim expertise in building medical domain specific LLMs. The latest addition has been AMIE…
Jan 24, 2024
Santosh Sawant
ChatQA: Building GPT-4 Level Conversational QA Models
llm
research paper
With all open source LLM models trying to outperform GPT-4 one may wonder, which one has truly been successful in Conversational QA - one of the elementary use cases of LLMs.
Jan 23, 2024
Santosh Sawant
How to Fine-Tune LLMs with TRL
hugging face
llm
model building
Large Language Models or LLMs have seen a lot of progress in the last year. We went from now ChatGPT competitor to a whole zoo of LLMs, including Meta AI’s Llama 2, Mistrals …
Jan 23, 2024
Santosh Sawant
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
llm
research paper
Jan 22, 2024
Santosh Sawant
Merge Model using Mergekit
tools
llm
model building
Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU…
Jan 22, 2024
Santosh Sawant
Tuning Language Models by Proxy
llm
research paper
These days capabilities of large pretrained LLMs can be significantly enhanced for specific domains of interest or task using additional fine tuning. However, tuning these…
Jan 19, 2024
Santosh Sawant
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
llm
research paper
Recently Microsoft DeepSpeed launched DeepSpeed-FastGen LLM serving framework, which offers up to 2.3x higher effective throughput compared to state-of-the-art systems like…
Jan 18, 2024
Santosh Sawant
Self-Evaluation Improves Selective Generation in Large Language Models
llm
research paper
Trustworthiness of LLMs output is one of the important considerations for safe deployment of LLMs in production.Once of the straightforward way to do so is by measuring…
Jan 17, 2024
Santosh Sawant
Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections
llm
research paper
Self-RAG is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the…
Jan 16, 2024
Santosh Sawant
Reciprocal Rank Fusion (RRF) with LambdaMART: Context Tuning for Retrieval Augmented Generation (RAG)
llm
research paper
RAG typically consists of three primary components: Tool Retrieval, Plan Generation, and Execution. Existing RAG methodologies rely heavily on semantic search for tool…
Jan 15, 2024
Santosh Sawant
Chain of Thought (CoT): The Impact of Reasoning Step Length on Large Language Models
llm
research paper
If you are doing prompt engineering for LLMs then you might have come across Chain of Thought (CoT) prompting, which is significant in improving the reasoning abilities of…
Jan 12, 2024
Santosh Sawant
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
llm
research paper
Introducing DistAttention, a distributed attention algorithm, and DistKV-LLM, a distributed LLM serving system, to improve the performance and resource management of…
Jan 11, 2024
Santosh Sawant
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
llm
research paper
Activation Beacon is a plug-and-play module for large language models that allows them to process longer contexts with a limited context window, while preserving their…
Jan 10, 2024
Santosh Sawant
Improving Text Embeddings with Large Language Models using fine-tuned Mistral-7B LLM
llm
research paper
Check out a groundbreaking paper on improving text embeddings with large language models (LLMs) like GPT-4! The authors propose generating synthetic training data for text…
Jan 9, 2024
Santosh Sawant
DOCLLM: A Layout Aware Generative Language Models for Multi model document understanding
llm
research paper
Introducing DocLLM, a groundbreaking generative language model that can understand visually rich documents without the need for expensive image encoders. DocLLM uses a…
Jan 8, 2024
Santosh Sawant
Self-Play Fine-Tuning (SPIN): Converts Weak Language Models to Strong Language Models
llm
research paper
Self-Play Fine-Tuning (SPIN) is a new fine-tuning method to improve large language models (LLMs) without needing additional human-annotated data.
Jan 5, 2024
Santosh Sawant
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
llm
research paper
The paper provides a comprehensive taxonomy categorizing over 32 techniques for mitigating hallucinations in large language models (LLMs). It groups the techniques into…
Jan 4, 2024
Santosh Sawant
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
llm
research paper
With only four lines of code modification, the proposed method can effortlessly extend existing LLMs’ context window without any fine-tuning. This work elicits LLMs’…
Jan 3, 2024
Santosh Sawant
Mamba-Chat: A Chat LLM based on State Space Models
llm
research paper
Mamba-Chat is the first chat language model based on a state-space model architecture, not a transformer.
Jan 2, 2024
Santosh Sawant
KwaiAgents: Generalized Information-seeking Agent System with LLMs - 2 Open-source models fine tuned for agent systems! Better than GPT-3.5 turbo as an agent!
llm
research paper
Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this…
Jan 1, 2024
Santosh Sawant
No matching items