• Home
  • Resume
  • Articles
Categories
All (190)
hugging face (2)
llm (190)
model building (3)
research paper (187)
tools (1)

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

llm
research paper
Recently pioneering work like BitNet b1.58 demonstrated that 1.58-bit LLMs can match full-precision performance while drastically reducing inference costs (latency, memory…
Apr 28, 2025
Santosh Sawant

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

llm
research paper
Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning LLMs with human preferences. While recent research has focused on algorithmic improvements, the…
Mar 31, 2025
Santosh Sawant

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

llm
research paper
The rapid advancement of large language models (LLMs) has increased the need for guardrail models to ensure responsible use, particularly in detecting unsafe and illegal…
Feb 10, 2025
Santosh Sawant

DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning

llm
research paper
A typical training process for LLMs consists of three phases: (1) Pre-training: In this stage, LLMs are pre-trained on vast amounts of text and code to learn general-purpose…
Jan 28, 2025
Santosh Sawant

Mind Evolution: Evolving Deeper LLM Thinking

llm
research paper
Recently Google have released an evolutionary search strategy for scaling inference time compute in Large Language Model called Mind Evolution, uses a language model to…
Jan 21, 2025
Santosh Sawant

MiniMax-01: Scaling Foundation Models with Lightning Attention

llm
research paper
Recently, Long context LLMs have been pinnacle in further advancement of generative ai in various fields. Now researchers have introduced the MiniMax-01 series long context…
Jan 16, 2025
Santosh Sawant

Training Large Language Models to Reason in a Continuous Latent Space

llm
research paper
Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a…
Dec 10, 2024
Santosh Sawant

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

llm
research paper
Recent advancements in text-to-video (T2V) generative models have shown impressive capabilities. However, these models are still inadequate in aligning synthesized videos…
Dec 9, 2024
Santosh Sawant

PaliGemma 2: A Family of Versatile VLMs for Transfer

llm
research paper
Google has released the PaliGemma 2 family of models, an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. PailGemma…
Dec 5, 2024
Santosh Sawant

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

llm
research paper
Retrieval-augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge to reduce hallucinations and incorporate up-to-date information…
Dec 4, 2024
Santosh Sawant

Efficient Track Anything

llm
research paper
Segment Anything Model 2 (SAM 2) has emerged as a powerful tool for video object segmentation and tracking anything. Key components of SAM 2 that drive the impressive video…
Dec 3, 2024
Santosh Sawant

LongKey: Keyphrase Extraction for Long Documents

llm
research paper
In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase…
Nov 29, 2024
Santosh Sawant

VisualLens: Personalization through Visual History

llm
research paper
Recent Large Language Models (LLMs) can support contexts up to millions of tokens in length However, processing such long sequences with LLMs requires substantial…
Nov 27, 2024
Santosh Sawant

VisualLens: Personalization through Visual History

llm
research paper
Imagine a personal assistant observing what you do in your daily life. When you ask for recommendations on anything from restaurants and activities to movies, books, and…
Nov 26, 2024
Santosh Sawant

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

llm
research paper
Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often…
Nov 25, 2024
Santosh Sawant

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

llm
research paper
Recently, the ability to follow complex instructions with multiple constraints is gaining increasing attention as LLMs are deployed in sophisticated real-world applications.…
Nov 15, 2024
Santosh Sawant

SEALONG: Large Language Models Can Self-Improve in Long-context Reasoning

llm
research paper
Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically…
Nov 14, 2024
Santosh Sawant

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

llm
research paper
Recently there has been growing trends of developing sophisticated LLM models specialized in both image comprehension and text-to-image generation. This is achieved…
Nov 13, 2024
Santosh Sawant

NEKO: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

llm
research paper
The challenge in building a general-purpose post recognition error corrector, which is required to evaluate your fine tuned models on a custom dataset, is how to train a…
Nov 12, 2024
Santosh Sawant

BitNet a4.8: 4-bit Activations for 1-bit LLMs

llm
research paper
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining…
Nov 11, 2024
Santosh Sawant

Structrag: Boosting Knowledge Intensive Reasoning Of Llms Via Inference-Time Hybrid Information Structurization

llm
research paper
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) in many knowledge-based tasks. However, existing RAG methods struggle…
Oct 14, 2024
Santosh Sawant

Backtracking Improves Generation Safety

llm
research paper
LLM has a fundamental limitation almost by definition: there is no taking back tokens that have been generated, even when they are clearly problematic. In the context of…
Oct 3, 2024
Santosh Sawant

RULER : A Model-Agnostic Method to Control Generated Length for Large Language Models

llm
research paper
The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a…
Oct 1, 2024
Santosh Sawant

Style over Substance: failure modes of LLM judges in alignment benchmarking.

llm
research paper
Recently LLM-judge benchmarks such as MT-Bench, Alpaca Eval, and Arena-Hard-Auto have been a go to tool to simultaneously automate evaluation of LLMs while also aligning…
Sep 27, 2024
Santosh Sawant

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

llm
research paper
Large Language Models (LLMs) have demonstrated remarkable effectiveness across a diverse range of tasks. However, LLMs are usually distinguished by their massive parameter…
Sep 27, 2024
Santosh Sawant

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

llm
research paper
Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced…
Sep 26, 2024
Santosh Sawant

Making Text Embedders Few-Shot Learners

llm
research paper
LLM-based embedding models have demonstrated remarkable improvements in in-domain accuracy and generalization, particularly when trained using supervised learning approaches…
Sep 25, 2024
Santosh Sawant

Introducing Contextual Retrieval

llm
research paper
In traditional RAG, documents are typically split into smaller chunks for efficient retrieval. While this approach works well for many applications, it can lead to problems…
Sep 23, 2024
Santosh Sawant

Training Language Models to Self-Correct via Reinforcement Learning

llm
research paper
Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing…
Sep 20, 2024
Santosh Sawant

Training Language Models to Self-Correct via Reinforcement Learning

llm
research paper
Recently, jina.ai have released jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and…
Sep 19, 2024
Santosh Sawant

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

llm
research paper
Modern information retrieval (IR) models generally match queries to passages based on a single semantic similarity score. This can make the search experience confusing for…
Sep 18, 2024
Santosh Sawant

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

llm
research paper
Transformer-based large Language Models (LLMs) become increasingly important in various domains. However, the quadratic time complexity of attention operation poses a…
Sep 17, 2024
Santosh Sawant

Self-Harmonized Chain of Thought

llm
research paper
Chain-of-thought (CoT) prompting reveals that large language models are capable of performing complex reasoning via intermediate steps. CoT methods in large language models…
Sep 16, 2024
Santosh Sawant

OneGen: efficient one-pass unified generation and retrieval for llms

llm
research paper
Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face…
Sep 13, 2024
Santosh Sawant

Agent Workflow Memory (AWM)

llm
research paper
Recently, LLM-based agents have shown promise for real-world tasks like web navigation, but they still struggle with complex, long-term tasks. Unlike these models, humans…
Sep 12, 2024
Santosh Sawant

MemoRAG: moving towards next-gen rag via memory-inspired knowledge discovery

llm
research paper
Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through…
Sep 11, 2024
Santosh Sawant

GraphRAG auto-tuning provides rapid adaptation to new domains

llm
research paper
GraphRAG uses large language models (LLMs), guided by a set of domain-specific prompts, to create a comprehensive knowledge graph that details entities and their…
Sep 10, 2024
Santosh Sawant

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

llm
research paper
Multimodel Large Language Models(MLLMs) have achieved promising OCR free Document Understanding performance by increasing the supported resolution of document images.…
Sep 9, 2024
Santosh Sawant

Generative Verifiers: Reward Modeling as Next-Token Prediction

llm
research paper
While large language models (LLMs) demonstrate remarkable capabilities, they often confidently make logical and factual mistakes, which can invalidate the entire solution. A…
Aug 30, 2024
Santosh Sawant

GEagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

llm
research paper
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual…
Aug 28, 2024
Santosh Sawant

Efficient Detection of Toxic Prompts in Large Language Models

llm
research paper
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing. However, these models can be exploited by malicious individuals…
Aug 27, 2024
Santosh Sawant

LLM Pruning and Distillation in Practice: The Minitron Approach

llm
research paper
Over the past few years, significant advancements have blossomed in the two key pillars of multimodal intelligence: understanding and generation. Recent works have tried to…
Aug 26, 2024
Santosh Sawant

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

llm
research paper
Recent studies have demonstrated how Large Language Models (LLMs) can be utilized to learn skills for improved decision-making in interactive environments. However, learning…
Aug 23, 2024
Santosh Sawant

LLM Pruning and Distillation in Practice: The Minitron Approach

llm
research paper
Training multiple multi-billion parameter large language models from scratch is extremely time-, data- and resource-intensive. However, recent work has demonstrated the…
Aug 22, 2024
Santosh Sawant

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

llm
research paper
Multi-modal generative models need to be able to perceive, process, and produce both discrete elements (such as text or code) and continuous elements (e.g. image, audio, and…
Aug 21, 2024
Santosh Sawant

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

llm
research paper
The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs…
Aug 20, 2024
Santosh Sawant

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

llm
research paper
Large Multimodal Models (LMMs) have attracted significant attention with their potential applications and emergent capabilities. However, recent works have demonstrated that…
Aug 19, 2024
Santosh Sawant

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

llm
research paper
Recent advancements in large language models have significantly influenced mathematical reasoning and theorem proving in artificial intelligence. Despite notable progress in…
Aug 16, 2024
Santosh Sawant

rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

llm
research paper
Despite their success, large language models face significant challenges in complex reasoning tasks. Although fine-tuning is shown to be an effective way to improve…
Aug 13, 2024
Santosh Sawant

PAD: Prioritize Alignment in Dataset Distillation

llm
research paper
Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve…
Aug 12, 2024
Santosh Sawant

CODEXGRAPH: Bridging Large Language Models and Code Repositories via Code Graph Databases

llm
research paper
Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. Current solutions rely on…
Aug 9, 2024
Santosh Sawant

Synthesizing Text-to-SQL Data from Weak and Strong LLMs

llm
research paper
Text-to-SQL has been one of the shout-out use cases in AI application development especially with close source LLM such as GPT4. However, the adoption of closed source LLMs…
Aug 8, 2024
Santosh Sawant

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

llm
research paper
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions.…
Aug 6, 2024
Santosh Sawant

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

llm
research paper
Extracting structured information from unstructured text lies at the core of many Gen AI problems such as Information Retrieval, Knowledge Graph Construction, Knowledge…
Aug 5, 2024
Santosh Sawant

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

llm
research paper
Standard prompt-based LLM inference has two sequential stages: prefilling and decoding. During the prefilling stage, the model computes and saves the KV cache of each token…
Aug 2, 2024
Santosh Sawant

DiT-MoE : Scaling Diffusion Transformers to 16 Billion Parameters

llm
research paper
Recently, diffusion models (DiT) have emerged as powerful deep generative models in various domains, such as image, video and 3D objects. However, training and serving such…
Aug 1, 2024
Santosh Sawant

DDK: Distilling Domain Knowledge for Efficient Large Language Models

llm
research paper
Despite the advance of large language models (LLMs) in various applications, it still faces significant challenges to propagate further due to high computational and storage…
Jul 31, 2024
Santosh Sawant

Chain of Diagnosis (CoD): Towards an Interpretable Medical Agent

llm
research paper
The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within…
Jul 26, 2024
Santosh Sawant

LAMBDA: A Large Model Based Data Agent

llm
research paper
Large Language Models (LLMs) have been instrumental in pushing innovation across multiple domains. However, despite these advancements, the current LLM paradigm encounters…
Jul 26, 2024
Santosh Sawant

VILA2: VILA Augmented VILA

llm
research paper
Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). However data curation of VLMs still remains under-explored.…
Jul 25, 2024
Santosh Sawant

The Llama 3 Herd of Models

llm
research paper
The Llama 3.1 release marked a big milestone for LLM researchers and the open source AI community. Meta engineers trained Llama 3.1 on NVIDIA H100 Tensor Core GPUs. They…
Jul 24, 2024
Santosh Sawant

BOND: Aligning LLMs with Best-of-N Distillation

llm
research paper
State-of-the-art large language models (LLMs) such as Gemin and GPT-4 are generally trained in three stages. First, LLMs are pre-trained on large corpora of knowledge using…
Jul 23, 2024
Santosh Sawant

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

llm
research paper
Recently, considerable research work has been going towards reducing high computational cost and memory footprint of LLMs, especially during the inference stage. Sparsity is…
Jul 22, 2024
Santosh Sawant

Beyond KV Caching: Shared Attention for Efficient LLMs

llm
research paper
The efficiency of large language models (LLMs) remains a critical challenge, particularly in contexts where computational resources are limited. Traditional attention…
Jul 19, 2024
Santosh Sawant

E5-V: Universal Embeddings with Multimodal Large Language Models

llm
research paper
With the development of Multimodal Large Language Models (MLLMs), there is an increasing need for embedding models to represent multimodal inputs. Although CLIP shows…
Jul 17, 2024
Santosh Sawant

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

llm
research paper
The capability of LLMs to process long texts is particularly crucial across various domains. Considering the critical role of LLMs in handling long texts, numerous…
Jul 17, 2024
Santosh Sawant

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

llm
research paper
Parameter-efficient transfer learning (PETL) is widely used for domain adaptation of large pre-trained models to specific downstream tasks, greatly reducing trainable…
Jul 16, 2024
Santosh Sawant

AgentInstruct: Toward Generative Teaching with Agentic Flows

llm
research paper
Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases…
Jul 15, 2024
Santosh Sawant

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

llm
research paper
FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is used across various libs to accelerate…
Jul 12, 2024
Santosh Sawant

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

llm
research paper
Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems…
Jul 11, 2024
Santosh Sawant

Composable Interventions for Language Models

llm
research paper
Language models (LMs) exhibit striking capabilities on various important tasks but despite such high performance, LMs generated content are usually prone to be…
Jul 10, 2024
Santosh Sawant

Associative Recurrent Memory Transformer

llm
research paper
Long sequence LLMs are some of the challenging models to work around as memory plays a crucial role processing extremely long contexts and utilizing remote past information.…
Jul 9, 2024
Santosh Sawant

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

llm
research paper
Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory…
Jul 8, 2024
Santosh Sawant

Adam-mini: Use Fewer Learning Rates To Gain More

llm
research paper
Adam(W) has become the de-facto optimizer for training large language models (LLMs). Despite its superior performance, Adam is expensive to use. Specifically, Adam requires…
Jul 5, 2024
Santosh Sawant

Searching for Best Practices in Retrieval-Augmented Generation

llm
research paper
Retrieval-augmented generation (RAG) techniques have proven to be effective in enhancing LLMs response quality, particularly in specialized domains. While many RAG…
Jul 4, 2024
Santosh Sawant

MInference: a Million-token inference on a single A100 machine

llm
research paper
The computational challenges of LLM inference remain a significant barrier to their widespread deployment, especially as context lengths continue to increase. Existing…
Jul 3, 2024
Santosh Sawant

MIRAI: Evaluating LLM Agents for Event Forecasting

llm
research paper
Recent advancements in Large Language Models (LLMs) have enabled LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex…
Jul 2, 2024
Santosh Sawant

AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

llm
research paper
Retrieval-Augmented Generation (RAG) has emerged as a prominent framework for building ML/AI solutions with LLMs. Additional modules such as query rewriting, prompt…
Jul 1, 2024
Santosh Sawant

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

llm
research paper
Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of…
Jun 28, 2024
Santosh Sawant

Instruction Pre-Training: Language Models are Supervised Multi Task Learners

llm
research paper
Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds…
Jun 26, 2024
Santosh Sawant

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

llm
research paper
In the traditional RAG framework, the basic retrieval units are normally short but the retriever needs to scan over a massive amount of units to find the relevant piece.…
Jun 24, 2024
Santosh Sawant

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

llm
research paper
Large language models have shown promising results in arithmetic and symbolic reasoning by expressing intermediate reasoning in text as a chain of thought, yet struggle to…
Jun 20, 2024
Santosh Sawant

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

llm
research paper
Today’s almost all LLMs are predominantly designed as monolithic architectures, these models rely extensively on large-scale data to embed generalized language capabilities…
Jun 20, 2024
Santosh Sawant

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

llm
research paper
Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by…
Jun 19, 2024
Santosh Sawant

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

llm
research paper
Nowadays Large language models (LLMs) with large context windows are capable of processing lengthy dialogue histories during prolonged interaction with users without…
Jun 18, 2024
Santosh Sawant

Ad Auctions for LLMs via Retrieval Augmented Generation

llm
research paper
Large language models (LLMs) have been making headway in various domains and now also in the field of computational advertising. Now with the integration of ads into the…
Jun 17, 2024
Santosh Sawant

Improving Alignment and Robustness with Circuit Breakers

llm
research paper
Large language models (LLMs) have been instrumental in pushing the boundaries of various real-world applications mostly which are associated with long-sequence inputs, such…
Jun 14, 2024
Santosh Sawant

Improving Alignment and Robustness with Circuit Breakers

llm
research paper
The landscape of artificial intelligence (AI) has long been marred by the persistent threat of adversarial attacks, particularly those targeting neural networks. The rise of…
Jun 13, 2024
Santosh Sawant

TEXTGRAD : Automatic “Differentiation” via Text

llm
research paper
There is an emerging paradigm shift in how AI systems are built these days. The new generation of AI applications are increasingly compound systems involving multiple…
Jun 12, 2024
Santosh Sawant

HUSKY: A Unified, Open-Source Language Agent for Multi-Step Reasoning

llm
research paper
Recent advances in the capabilities of large language models (LLMs) have led to the development of language agents to address complex, multi-step tasks. However, most…
Jun 11, 2024
Santosh Sawant

Mixture-of-Agents : Enhances Large Language Model Capabilities

llm
research paper
Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. However, despite the plethora of…
Jun 10, 2024
Santosh Sawant

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

llm
research paper
Recently, various prompting methods such as CoT, ToT and GoT have been instrumental in improving reasoning performance of LLMs. All these methods can be broadly divided into…
Jun 7, 2024
Santosh Sawant

Block Transformer: Global-to-Local Language Modeling for Fast Inference

llm
research paper
Generating tokens with transformer-based autoregressive language models (LMs) is costly due to the self-attention mechanism that attends to all previous tokens. To apply…
Jun 6, 2024
Santosh Sawant

Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

llm
research paper
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, alignment can be challenging, especially for…
Jun 5, 2024
Santosh Sawant

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

llm
research paper
In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can…
Jun 4, 2024
Santosh Sawant

Contextual Position Encoding: Learning to Count What’s Important

llm
research paper
The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but the attention mechanism…
Jun 3, 2024
Santosh Sawant

Similarity is Not All You Need: Endowing Retrieval-Augmented Generation with Multi–layered Thoughts

llm
research paper
Retrieval-augmented generation (RAG) has been pencil in pushing LLM use cases in the Knowledge management system. Nevertheless, existing retrieval-augmented generation…
May 31, 2024
Santosh Sawant

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

llm
research paper
Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these…
May 30, 2024
Santosh Sawant

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

llm
research paper
LLM Training and finetuning are still far too computationally and memory intensive tasks. Several techniques have been proposed to reduce these memory requirements, such as…
May 29, 2024
Santosh Sawant

Zamba: A Compact 7B SSM Hybrid Model

llm
research paper
Recently, State-of-the-art Transformer-SSM hybrid Architecture has been a driving force in Open source LLMs. Inline with such trends researchers from Zyphra have launched…
May 28, 2024
Santosh Sawant

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

llm
research paper
Key-value (KV) cache is one of the most significant parts of any transformer based LLM model and takes over 30% of the GPU memory during deployment. Hence KV cache plays a…
May 20, 2024
Santosh Sawant

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

llm
research paper
Recently small-scale visual language models performance have come in par with its larger-scale counterparts. Models such as LLaVAPhi [47], which combines the open source…
May 16, 2024
Santosh Sawant

SUTRA: Scalable Multilingual language model architecture

llm
research paper
Recent advancements in Large Language Models (LLMs) have predominantly focused on a limited set of data-rich languages, with training datasets being notably skewed towards…
May 15, 2024
Santosh Sawant

Linearizing Large Language Models

llm
research paper
Over the last few years, Transformers have displaced Recurrent Neural Networks (RNNs) in sequence modeling tasks, owing to their highly parallel training efficiency and…
May 14, 2024
Santosh Sawant

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

llm
research paper
The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions…
May 13, 2024
Santosh Sawant

Is Flash Attention Stable?

llm
research paper
Given the size and complexity of workloads, training Large Language Models (LLMs) often takes months together, across hundreds or thousands of GPUs. For example, LLaMA2’s…
May 10, 2024
Santosh Sawant

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

llm
research paper
Optimizing LLMs operational cost and computation requirement is one of the sortout topics for researchers. Accelerated solutions deploy on mobile, edge devices or commodity…
May 9, 2024
Santosh Sawant

Better & Faster Large Language Models via Multi-token Prediction

llm
research paper
All Large language models such as GPT and Llama are trained with a next-token prediction loss. However, despite the recent wave of impressive achievements in LLMs…
May 8, 2024
Santosh Sawant

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

llm
research paper
Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment…
May 7, 2024
Santosh Sawant

PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models

llm
research paper
Proprietary LMs such as GPT-4 model-based evaluation have emerged as a scalable solution for assessing LM-generated text. However, concerns related to transparency…
May 3, 2024
Santosh Sawant

Octopus v4: Graph of language models

llm
research paper
LLMs have been effective in a wide range of applications, yet the most sophisticated models are often proprietary (GPT 4, Gemini) and considerably costly than open source…
May 2, 2024
Santosh Sawant

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

llm
research paper
Evaluating language models is a challenging task: not only is it difficult to find meaningful data to test the models, but evaluating the correctness of a generated response…
Apr 30, 2024
Santosh Sawant

Make Your LLM Fully Utilize the Context

llm
research paper
These days the training context windows of many contemporary LLMs have been expanded to tens of thousands of tokens, thereby enabling these models to process extensive…
Apr 29, 2024
Santosh Sawant

CodecLM: Aligning Language Models with Tailored Synthetic Data

llm
research paper
Recent progress in instruction tuned LLM highlights the critical role of high-quality data in enhancing LLMs’ instruction-following capabilities. However, acquiring such…
Apr 25, 2024
Santosh Sawant

LLM-R2 : A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency

llm
research paper
Recently, DB query rewrite using LLMs has been one of the sort out use cases. The aim of query rewrite is to output a new query equivalent to the original SQL query, while…
Apr 24, 2024
Santosh Sawant

TransformerFAM: Feedback attention is working memory

llm
research paper
While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. One of the widely used…
Apr 19, 2024
Santosh Sawant

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

llm
research paper
Recently Google has released RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear RNN with local attention to…
Apr 18, 2024
Santosh Sawant

MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length

llm
research paper
The Transformer architecture is backbone of any production LLMs, but despite its remarkable capabilities, it faces challenges with quadratic computational complexity and…
Apr 17, 2024
Santosh Sawant

Trust Region Direct Preference Optimization (TR-DPO) : Learn Your Reference Model for Real Good Alignment

llm
research paper
Aligning large language models with human preferences (RLHF) has become increasingly important to ensure safety and overall usefulness of the model. Traditionally, the…
Apr 16, 2024
Santosh Sawant

RHO-1: Not All Tokens Are What You Need

llm
research paper
High quality training data sets are crucial to boost LLMs performance. Various data filtering techniques such as heuristics and classifiers are being utilized to select such…
Apr 15, 2024
Santosh Sawant

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

llm
research paper
Fine-tuning LLMs using Reinforcement Learning from Human Feedback (RLHF) has alway been a preferred way for making LLMs more useful by aligning them with human values or…
Apr 9, 2024
Santosh Sawant

Stream of Search (SoS): Learning to Search in Language

llm
research paper
Transformer-based auto-regressive models such as GPT have shown remarkable performance in generative tasks but struggle when it comes to complex decision-making and…
Apr 8, 2024
Santosh Sawant

ReFT: Representation Finetuning for Language Models

llm
research paper
Parameter-efficient finetuning (PEFT) methods have been instrumental in rapid adoption of fine tuned domain specific LLMs. PEFTs not only reduced memory usage and time…
Apr 5, 2024
Santosh Sawant

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

llm
research paper
Transformer FLOPs Equation or FLOPs-per-token is one of the key attributes in determining computation budget for any transformer base LLM models. Usually in language models…
Apr 4, 2024
Santosh Sawant

sDPO: Don’t Use Your Data All at Once

llm
research paper
As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important to ensure safety and usefulness of the…
Apr 3, 2024
Santosh Sawant

Gecko: Versatile Text Embeddings Distilled from Large Language Models

llm
research paper
Recent advancement in the Text Embedding model has been instrumental for various downstream tasks including document retrieval, sentence similarity, classification, and…
Apr 2, 2024
Santosh Sawant

Jamba: A Hybrid Transformer-Mamba Language Model

llm
research paper
Finally, the first production-grade commercially available Mamba-based model delivering best-in-class quality and performance is here. Introducing Jamba, a novel…
Apr 1, 2024
Santosh Sawant

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

llm
research paper
LLM empowered multi-modality inputs are becoming an essential part of Vision Language Models (VLMs) such as LLaVA and Otter. However, despite these advancements, a…
Mar 28, 2024
Santosh Sawant

RigorLLM: Resilient Guardrails for large language models against undesired content

llm
research paper
Large language models (LLMs) have demonstrated impressive capabilities in NLG and different downstream tasks. However, the potential of LLMs to produce biased or harmful…
Mar 27, 2024
Santosh Sawant

DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models

llm
research paper
Traditional methods of RAG typically rely on single-round retrieval, using the LLM’s initial input to retrieve relevant information from external corpora. While this method…
Mar 26, 2024
Santosh Sawant

SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series

llm
research paper
Recently, Structured State Space models (SSM) such as Mumba have been pitched as an for Transformer based models especially when it comes to increase efficiency and…
Mar 25, 2024
Santosh Sawant

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

llm
research paper
Vision language models (VLMs) like GPT-4, LLaMAadapter, and LLaVA have been instrumental in augmenting LLMs with visual understanding capabilities. VLMs serve as…
Mar 22, 2024
Santosh Sawant

Evolutionary Optimization of Model Merging Recipes

llm
research paper
Model merging offers a novel approach to leverage the strengths of multiple pre-trained models. It allows us to combine task-specific models, each potentially fine-tuned for…
Mar 21, 2024
Santosh Sawant

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

llm
research paper
Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Most of Existing Multimodal Large Language…
Mar 20, 2024
Santosh Sawant

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

llm
research paper
Reinforcement Learning from Human Feedback (RLHF) is one of the most popular methods to align Pretrained Large Language Models (LLMs) with human preferences. It involves…
Mar 19, 2024
Santosh Sawant

RAFT: Adapting Language Model to Domain Specific RAG

llm
research paper
Adapting LLMs to the specialized domains, which is essential to many emerging applications, usually takes two paths: in-context learning through Retrieval-Augmented…
Mar 18, 2024
Santosh Sawant

USER-LLM: Efficient LLM Contextualization with User Embeddings

llm
research paper
Large language models (LLMs) have revolutionized the field of user modeling and personalization due to its ability to learn and adapt from massive amounts of textual data.…
Mar 15, 2024
Santosh Sawant

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

llm
research paper
Factual correctness has been one of the growing concerns around LLMs reasoning capabilities. This issue becomes more significant when it comes to zero-shot CoT (Chain of…
Mar 14, 2024
Santosh Sawant

MoAI: Mixture of All Intelligence for Large Language and Vision Models

llm
research paper
Following the success of the instruction-tuned LLMs, several visual instruction tuning datasets have been meticulously curated to enhance zero-shot vision language (VL)…
Mar 13, 2024
Santosh Sawant

VideoMamba: State Space Model for Efficient Video Understanding

llm
research paper
Mastering spatiotemporal representation is one of the key areas in any video understanding task. However there usually are two challenges associated with it: (1) the large…
Mar 12, 2024
Santosh Sawant

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

llm
research paper
Adapter-based fine-tuning methods, such as LoRA, are key to making large language models disruptive in various domain specific applications. LoRA introduces a limited number…
Mar 11, 2024
Santosh Sawant

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

llm
research paper
Training Large Language Models (LLMs) is challenging due to memory constraints from weight and optimizer size. Low-rank adaptation (LoRA) addresses this by adding trainable…
Mar 8, 2024
Santosh Sawant

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

llm
research paper

Mar 7, 2024
Santosh Sawant

Design2Code: How Far Are We From Automating Front-End Engineering?

llm
research paper
Recent releases of advanced multimodal LLMs such as GPT-4V and Gemini version pro have led to breakthroughs in visual and code generation understanding. This has opened up…
Mar 6, 2024
Santosh Sawant

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

llm
research paper
In recent years, text-to-image (T2I) generation models such as DreamBooth and BLIP-Diffusion have rapidly evolved, generating intricate and highly detailed images that often…
Mar 5, 2024
Santosh Sawant

VisionLLaMA : A Unified LLaMA Interface for Vision Tasks

llm
research paper
Large language models, especially the LLaMA family of models, aroused great interest in the research community for multimodal models application, where many methods heavily…
Mar 4, 2024
Santosh Sawant

Beyond Language Models: Byte Models are Digital World Simulators

llm
research paper
Bytes are the foundation of all digital data, devices, and software, from computer processors to operating systems in everyday electronics. Therefore, training models for…
Mar 1, 2024
Santosh Sawant

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

llm
research paper
Large Language Models (LLMs) have demonstrated remarkable performance in a wide range of natural language processing tasks, but their increasing size has posed challenges…
Feb 29, 2024
Santosh Sawant

ChunkLlama : Training-Free Long-Context Scaling of Large Language Models

llm
research paper
The ability to comprehend and process long-context information is essential for large language models (LLMs) to cater to a wide range of applications effectively. Finetuning…
Feb 28, 2024
Santosh Sawant

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

llm
research paper
MobiLlama, another Small Language Models (SLMs) for resource constrained devices. MobileLlama is a SLM design that initiates from a larger model and applies a careful…
Feb 27, 2024
Santosh Sawant

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

llm
research paper
Self-attention, one of the critical components in LLM, has a poor performance during inference since it performs intensive memory operations on key/value tensors of context…
Feb 26, 2024
Santosh Sawant

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

llm
research paper
Large language models (LLMs) with large model size can greatly improve task performance but demand expensive computational resources for training. To address this, the LLM…
Feb 23, 2024
Santosh Sawant

The FinBen: An Holistic Financial Benchmark for Large Language Models

llm
research paper
Recent studies have shown the great potential of advanced LLMs such as GPT-4 on financial text analysis and prediction tasks in the financial domain. While their potential…
Feb 22, 2024
Santosh Sawant

GRIT : Generative Representational Instruction Tuning

llm
research paper
All text-based language problems can be reduced to either generation or embedding. Creating a single general model that performs such a wide range of tasks has been a…
Feb 16, 2024
Santosh Sawant

Aespa: Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

llm
research paper
With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices…
Feb 15, 2024
Santosh Sawant

Graph Mamba: Towards Learning on Graphs with State Space Models

llm
research paper
Graph Transformers (GTs) has shown promising potential in graph representation learning. GTs, however, have quadratic computational cost, lack inductive biases on graph…
Feb 14, 2024
Santosh Sawant

Fiddler: CPU-GPU Orchestration for Fast Local Inference of MoE Models

llm
research paper
Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architectures are showing remarkable performance on various tasks. By activating a subset of experts inside…
Feb 13, 2024
Santosh Sawant

PHATGOOSE: Learning to Route Among Specialized Experts for Zero-Shot Generalization

llm
research paper
The availability of Huggingface PEFT modules has made it cheap and easy to modularly adapt a given pre-trained model to a specific task or domain. In the meantime, extremely…
Feb 12, 2024
Santosh Sawant

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

llm
research paper
General-purpose LLMs like LLaMA and GPT-4 have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in…
Feb 9, 2024
Santosh Sawant

Hydragen: High-Throughput LLM Inference with Shared Prefixes

llm
research paper
Transformer-based large language models (LLMs) such as OpenAI GPT3.5 and GPT4 are now deployed to hundreds of millions of users. LLM inference in such scenarios commonly…
Feb 8, 2024
Santosh Sawant

MambaFormer: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

llm
research paper
State-space models (SSMs), such as Mamba, have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and…
Feb 7, 2024
Santosh Sawant

BlackMamba: Mixture of Experts for State-Space Models

llm
research paper
State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and…
Feb 6, 2024
Santosh Sawant

Repeat After Me: Transformers are Better than State Space Models at Copying

llm
research paper
Feb 5, 2024
Santosh Sawant

Re3val: Reinforced and Reranked Generative Retrieval

llm
research paper
The primary objective of retrieval models is to enhance the accuracy of answers by selecting the most relevant documents retrieved for a given query, ensuring models have…
Feb 2, 2024
Santosh Sawant

FIND: INterface for Foundation models’ embeDDings

llm
research paper
Foundation models across the vision and language domains, such as GPT4, DALLE-3, SAM and LLaMA etc., have demonstrated significant advancements in addressing open-ended…
Feb 1, 2024
Santosh Sawant

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

llm
research paper
Vision Language Models (VLMs), such as OpenAI’s GPT-4, Flamingo, BLIP-2 and LLaVA have demonstrated significant advancements in addressing open-ended visual…
Jan 31, 2024
Santosh Sawant

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

llm
research paper
For Large Vision-Language Models (LVLMs), scaling the model can effectively improve performance. However, expanding model parameters significantly increases the training and…
Jan 30, 2024
Santosh Sawant

EAGLE: Extrapolation Algorithm for Greater Language-model Efficiency

llm
research paper
Auto-regressive decoding has become the de facto standard for large language models (LLMs). This process generates output tokens one at a time, which makes the generation by…
Jan 29, 2024
Santosh Sawant

MambaByte: Token-free Selective State Space Model

llm
research paper
In December 2023, “Mamba : Linear-Time Sequence Modeling with Selective State Spaces” paper was release and with it the whole discussion about Mamba (SSM) been a viable…
Jan 25, 2024
Santosh Sawant

Instruction-Tune Llama2 with TRL

hugging face
llm
model building
This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI. The idea of the blog post is to focus on creating the instruction dataset, which we can then…
Jan 25, 2024
Santosh Sawant

Towards Conversational Diagnostic AI

llm
research paper
With the Med-PaLM series of LLMs Google is one of the few companies you can claim expertise in building medical domain specific LLMs. The latest addition has been AMIE…
Jan 24, 2024
Santosh Sawant

ChatQA: Building GPT-4 Level Conversational QA Models

llm
research paper
With all open source LLM models trying to outperform GPT-4 one may wonder, which one has truly been successful in Conversational QA - one of the elementary use cases of LLMs.
Jan 23, 2024
Santosh Sawant

How to Fine-Tune LLMs with TRL

hugging face
llm
model building
Large Language Models or LLMs have seen a lot of progress in the last year. We went from now ChatGPT competitor to a whole zoo of LLMs, including Meta AI’s Llama 2, Mistrals …
Jan 23, 2024
Santosh Sawant

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

llm
research paper
 
Jan 22, 2024
Santosh Sawant

Merge Model using Mergekit

tools
llm
model building
Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU…
Jan 22, 2024
Santosh Sawant

Tuning Language Models by Proxy

llm
research paper
These days capabilities of large pretrained LLMs can be significantly enhanced for specific domains of interest or task using additional fine tuning. However, tuning these…
Jan 19, 2024
Santosh Sawant

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

llm
research paper
Recently Microsoft DeepSpeed launched DeepSpeed-FastGen LLM serving framework, which offers up to 2.3x higher effective throughput compared to state-of-the-art systems like…
Jan 18, 2024
Santosh Sawant

Self-Evaluation Improves Selective Generation in Large Language Models

llm
research paper
Trustworthiness of LLMs output is one of the important considerations for safe deployment of LLMs in production.Once of the straightforward way to do so is by measuring…
Jan 17, 2024
Santosh Sawant

Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections

llm
research paper
Self-RAG is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the…
Jan 16, 2024
Santosh Sawant

Reciprocal Rank Fusion (RRF) with LambdaMART: Context Tuning for Retrieval Augmented Generation (RAG)

llm
research paper
RAG typically consists of three primary components: Tool Retrieval, Plan Generation, and Execution. Existing RAG methodologies rely heavily on semantic search for tool…
Jan 15, 2024
Santosh Sawant

Chain of Thought (CoT): The Impact of Reasoning Step Length on Large Language Models

llm
research paper
If you are doing prompt engineering for LLMs then you might have come across Chain of Thought (CoT) prompting, which is significant in improving the reasoning abilities of…
Jan 12, 2024
Santosh Sawant

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

llm
research paper
Introducing DistAttention, a distributed attention algorithm, and DistKV-LLM, a distributed LLM serving system, to improve the performance and resource management of…
Jan 11, 2024
Santosh Sawant

Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon

llm
research paper
Activation Beacon is a plug-and-play module for large language models that allows them to process longer contexts with a limited context window, while preserving their…
Jan 10, 2024
Santosh Sawant

Improving Text Embeddings with Large Language Models using fine-tuned Mistral-7B LLM

llm
research paper
Check out a groundbreaking paper on improving text embeddings with large language models (LLMs) like GPT-4! The authors propose generating synthetic training data for text…
Jan 9, 2024
Santosh Sawant

DOCLLM: A Layout Aware Generative Language Models for Multi model document understanding

llm
research paper
Introducing DocLLM, a groundbreaking generative language model that can understand visually rich documents without the need for expensive image encoders. DocLLM uses a…
Jan 8, 2024
Santosh Sawant

Self-Play Fine-Tuning (SPIN): Converts Weak Language Models to Strong Language Models

llm
research paper
Self-Play Fine-Tuning (SPIN) is a new fine-tuning method to improve large language models (LLMs) without needing additional human-annotated data.
Jan 5, 2024
Santosh Sawant

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

llm
research paper
The paper provides a comprehensive taxonomy categorizing over 32 techniques for mitigating hallucinations in large language models (LLMs). It groups the techniques into…
Jan 4, 2024
Santosh Sawant

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

llm
research paper
With only four lines of code modification, the proposed method can effortlessly extend existing LLMs’ context window without any fine-tuning. This work elicits LLMs’…
Jan 3, 2024
Santosh Sawant

Mamba-Chat: A Chat LLM based on State Space Models

llm
research paper
Mamba-Chat is the first chat language model based on a state-space model architecture, not a transformer.
Jan 2, 2024
Santosh Sawant

KwaiAgents: Generalized Information-seeking Agent System with LLMs - 2 Open-source models fine tuned for agent systems! Better than GPT-3.5 turbo as an agent!

llm
research paper
Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this…
Jan 1, 2024
Santosh Sawant
No matching items