articles

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

llm

research paper

Hybrid state-space models (SSMs) like Jamba, Samba, Zamba, and Hymba combine the strengths of two different architectures. They merge attention mechanisms, which are great…

BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs

llm

research paper

Recently pioneering work like BitNet b1.58 demonstrated that 1.58-bit LLMs can match full-precision performance while drastically reducing inference costs (latency, memory…

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

llm

research paper

Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning LLMs with human preferences. While recent research has focused on algorithmic improvements, the…

DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails

llm

research paper

The rapid advancement of large language models (LLMs) has increased the need for guardrail models to ensure responsible use, particularly in detecting unsafe and illegal…

DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning

llm

research paper

A typical training process for LLMs consists of three phases: (1) Pre-training: In this stage, LLMs are pre-trained on vast amounts of text and code to learn general-purpose…

Mind Evolution: Evolving Deeper LLM Thinking

llm

research paper

Recently Google have released an evolutionary search strategy for scaling inference time compute in Large Language Model called Mind Evolution, uses a language model to…

MiniMax-01: Scaling Foundation Models with Lightning Attention

llm

research paper

Recently, Long context LLMs have been pinnacle in further advancement of generative ai in various fields. Now researchers have introduced the MiniMax-01 series long context…

Training Large Language Models to Reason in a Continuous Latent Space

llm

research paper

Large language models (LLMs) are restricted to reason in the “language space”, where they typically express the reasoning process with a chain-of-thought (CoT) to solve a…

LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

llm

research paper

Recent advancements in text-to-video (T2V) generative models have shown impressive capabilities. However, these models are still inadequate in aligning synthesized videos…

PaliGemma 2: A Family of Versatile VLMs for Transfer

llm

research paper

Google has released the PaliGemma 2 family of models, an upgrade of the PaliGemma open Vision-Language Model (VLM) based on the Gemma 2 family of language models. PailGemma…

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

llm

research paper

Retrieval-augmented Generation (RAG) enhances Large Language Models (LLMs) by integrating external knowledge to reduce hallucinations and incorporate up-to-date information…

Efficient Track Anything

llm

research paper

Segment Anything Model 2 (SAM 2) has emerged as a powerful tool for video object segmentation and tracking anything. Key components of SAM 2 that drive the impressive video…

LongKey: Keyphrase Extraction for Long Documents

llm

research paper

In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase…

VisualLens: Personalization through Visual History

llm

research paper

Recent Large Language Models (LLMs) can support contexts up to millions of tokens in length However, processing such long sequences with LLMs requires substantial…

VisualLens: Personalization through Visual History

llm

research paper

Imagine a personal assistant observing what you do in your daily life. When you ask for recommendations on anything from restaurants and activities to movies, books, and…

A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection

llm

research paper

Large Language Models (LLMs) are prone to off-topic misuse, where users may prompt these models to perform tasks beyond their intended scope. Current guardrails, which often…

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

llm

research paper

Recently, the ability to follow complex instructions with multiple constraints is gaining increasing attention as LLMs are deployed in sophisticated real-world applications.…

SEALONG: Large Language Models Can Self-Improve in Long-context Reasoning

llm

research paper

Large language models (LLMs) have achieved substantial progress in processing long contexts but still struggle with long-context reasoning. Existing approaches typically…

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

llm

research paper

Recently there has been growing trends of developing sophisticated LLM models specialized in both image comprehension and text-to-image generation. This is achieved…

NEKO: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

llm

research paper

The challenge in building a general-purpose post recognition error corrector, which is required to evaluate your fine tuned models on a custom dataset, is how to train a…

BitNet a4.8: 4-bit Activations for 1-bit LLMs

llm

research paper

Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining…

Structrag: Boosting Knowledge Intensive Reasoning Of Llms Via Inference-Time Hybrid Information Structurization

llm

research paper

Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) in many knowledge-based tasks. However, existing RAG methods struggle…

Backtracking Improves Generation Safety

llm

research paper

LLM has a fundamental limitation almost by definition: there is no taking back tokens that have been generated, even when they are clearly problematic. In the context of…

RULER : A Model-Agnostic Method to Control Generated Length for Large Language Models

llm

research paper

The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a…

Style over Substance: failure modes of LLM judges in alignment benchmarking.

llm

research paper

Recently LLM-judge benchmarks such as MT-Bench, Alpaca Eval, and Arena-Hard-Auto have been a go to tool to simultaneously automate evaluation of LLMs while also aligning…

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

llm

research paper

Large Language Models (LLMs) have demonstrated remarkable effectiveness across a diverse range of tasks. However, LLMs are usually distinguished by their massive parameter…

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

llm

research paper

Large Language Models (LLMs) have revolutionized software engineering (SE), demonstrating remarkable capabilities in various coding tasks. While recent efforts have produced…

Making Text Embedders Few-Shot Learners

llm

research paper

LLM-based embedding models have demonstrated remarkable improvements in in-domain accuracy and generalization, particularly when trained using supervised learning approaches…

Introducing Contextual Retrieval

llm

research paper

In traditional RAG, documents are typically split into smaller chunks for efficient retrieval. While this approach works well for many applications, it can lead to problems…

Training Language Models to Self-Correct via Reinforcement Learning

llm

research paper

Self-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Existing…

Training Language Models to Self-Correct via Reinforcement Learning

llm

research paper

Recently, jina.ai have released jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and…

Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models

llm

research paper

Modern information retrieval (IR) models generally match queries to passages based on a single semantic similarity score. This can make the search experience confusing for…

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

llm

research paper

Transformer-based large Language Models (LLMs) become increasingly important in various domains. However, the quadratic time complexity of attention operation poses a…

Self-Harmonized Chain of Thought

llm

research paper

Chain-of-thought (CoT) prompting reveals that large language models are capable of performing complex reasoning via intermediate steps. CoT methods in large language models…

OneGen: efficient one-pass unified generation and retrieval for llms

llm

research paper

Despite the recent advancements in Large Language Models (LLMs), which have significantly enhanced the generative capabilities for various NLP tasks, LLMs still face…

Agent Workflow Memory (AWM)

llm

research paper

Recently, LLM-based agents have shown promise for real-world tasks like web navigation, but they still struggle with complex, long-term tasks. Unlike these models, humans…

MemoRAG: moving towards next-gen rag via memory-inspired knowledge discovery

llm

research paper

Retrieval-Augmented Generation (RAG) leverages retrieval tools to access external databases, thereby enhancing the generation quality of large language models (LLMs) through…

GraphRAG auto-tuning provides rapid adaptation to new domains

llm

research paper

GraphRAG uses large language models (LLMs), guided by a set of domain-specific prompts, to create a comprehensive knowledge graph that details entities and their…

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding

llm

research paper

Multimodel Large Language Models(MLLMs) have achieved promising OCR free Document Understanding performance by increasing the supported resolution of document images.…

Generative Verifiers: Reward Modeling as Next-Token Prediction

llm

research paper

While large language models (LLMs) demonstrate remarkable capabilities, they often confidently make logical and factual mistakes, which can invalidate the entire solution. A…

GEagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

llm

research paper

The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual…

Efficient Detection of Toxic Prompts in Large Language Models

llm

research paper

Large language models (LLMs) like ChatGPT and Gemini have significantly advanced natural language processing. However, these models can be exploited by malicious individuals…

LLM Pruning and Distillation in Practice: The Minitron Approach

llm

research paper

Over the past few years, significant advancements have blossomed in the two key pillars of multimodal intelligence: understanding and generation. Recent works have tried to…

Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

llm

research paper

Recent studies have demonstrated how Large Language Models (LLMs) can be utilized to learn skills for improved decision-making in interactive environments. However, learning…

LLM Pruning and Distillation in Practice: The Minitron Approach

llm

research paper

Training multiple multi-billion parameter large language models from scratch is extremely time-, data- and resource-intensive. However, recent work has demonstrated the…

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

llm

research paper

Multi-modal generative models need to be able to perceive, process, and produce both discrete elements (such as text or code) and continuous elements (e.g. image, audio, and…

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

llm

research paper

The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs…

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

llm

research paper

Large Multimodal Models (LMMs) have attracted significant attention with their potential applications and emergent capabilities. However, recent works have demonstrated that…

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

llm

research paper

Recent advancements in large language models have significantly influenced mathematical reasoning and theorem proving in artificial intelligence. Despite notable progress in…

rStar: Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

llm

research paper

Despite their success, large language models face significant challenges in complex reasoning tasks. Although fine-tuning is shown to be an effective way to improve…

PAD: Prioritize Alignment in Dataset Distillation

llm

research paper

Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve…

CODEXGRAPH: Bridging Large Language Models and Code Repositories via Code Graph Databases

llm

research paper

Large Language Models (LLMs) excel in stand-alone code tasks like HumanEval and MBPP, but struggle with handling entire code repositories. Current solutions rely on…

Synthesizing Text-to-SQL Data from Weak and Strong LLMs

llm

research paper

Text-to-SQL has been one of the shout-out use cases in AI application development especially with close source LLM such as GPT4. However, the adoption of closed source LLMs…

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

llm

research paper

Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions.…

ReLiK: Retrieve and LinK, Fast and Accurate Entity Linking and Relation Extraction on an Academic Budget

llm

research paper

Extracting structured information from unstructured text lies at the core of many Gen AI problems such as Information Retrieval, Knowledge Graph Construction, Knowledge…

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

llm

research paper

Standard prompt-based LLM inference has two sequential stages: prefilling and decoding. During the prefilling stage, the model computes and saves the KV cache of each token…

DiT-MoE : Scaling Diffusion Transformers to 16 Billion Parameters

llm

research paper

Recently, diffusion models (DiT) have emerged as powerful deep generative models in various domains, such as image, video and 3D objects. However, training and serving such…

DDK: Distilling Domain Knowledge for Efficient Large Language Models

llm

research paper

Despite the advance of large language models (LLMs) in various applications, it still faces significant challenges to propagate further due to high computational and storage…

Chain of Diagnosis (CoD): Towards an Interpretable Medical Agent

llm

research paper

The field of medical diagnosis has undergone a significant transformation with the advent of large language models (LLMs), yet the challenges of interpretability within…

LAMBDA: A Large Model Based Data Agent

llm

research paper

Large Language Models (LLMs) have been instrumental in pushing innovation across multiple domains. However, despite these advancements, the current LLM paradigm encounters…

VILA2: VILA Augmented VILA

llm

research paper

Visual language models (VLMs) have rapidly progressed, driven by the success of large language models (LLMs). However data curation of VLMs still remains under-explored.…

The Llama 3 Herd of Models

llm

research paper

The Llama 3.1 release marked a big milestone for LLM researchers and the open source AI community. Meta engineers trained Llama 3.1 on NVIDIA H100 Tensor Core GPUs. They…

BOND: Aligning LLMs with Best-of-N Distillation

llm

research paper

State-of-the-art large language models (LLMs) such as Gemin and GPT-4 are generally trained in three stages. First, LLMs are pre-trained on large corpora of knowledge using…

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

llm

research paper

Recently, considerable research work has been going towards reducing high computational cost and memory footprint of LLMs, especially during the inference stage. Sparsity is…

Beyond KV Caching: Shared Attention for Efficient LLMs

llm

research paper

The efficiency of large language models (LLMs) remains a critical challenge, particularly in contexts where computational resources are limited. Traditional attention…

E5-V: Universal Embeddings with Multimodal Large Language Models

llm

research paper

With the development of Multimodal Large Language Models (MLLMs), there is an increasing need for embedding models to represent multimodal inputs. Although CLIP shows…

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

llm

research paper

The capability of LLMs to process long texts is particularly crucial across various domains. Considering the critical role of LLMs in handling long texts, numerous…

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

llm

research paper

Parameter-efficient transfer learning (PETL) is widely used for domain adaptation of large pre-trained models to specific downstream tasks, greatly reducing trainable…

AgentInstruct: Toward Generative Teaching with Agentic Flows

llm

research paper

Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases…

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

llm

research paper

FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is used across various libs to accelerate…

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

llm

research paper

Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems…

Composable Interventions for Language Models

llm

research paper

Language models (LMs) exhibit striking capabilities on various important tasks but despite such high performance, LMs generated content are usually prone to be…

Associative Recurrent Memory Transformer

llm

research paper

Long sequence LLMs are some of the challenging models to work around as memory plays a crucial role processing extremely long contexts and utilizing remote past information.…

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

llm

research paper

Key-value (KV) caching plays an essential role in accelerating decoding for transformer-based autoregressive large language models (LLMs). However, the amount of memory…

Adam-mini: Use Fewer Learning Rates To Gain More

llm

research paper

Adam(W) has become the de-facto optimizer for training large language models (LLMs). Despite its superior performance, Adam is expensive to use. Specifically, Adam requires…

Searching for Best Practices in Retrieval-Augmented Generation

llm

research paper

Retrieval-augmented generation (RAG) techniques have proven to be effective in enhancing LLMs response quality, particularly in specialized domains. While many RAG…

MInference: a Million-token inference on a single A100 machine

llm

research paper

The computational challenges of LLM inference remain a significant barrier to their widespread deployment, especially as context lengths continue to increase. Existing…

MIRAI: Evaluating LLM Agents for Event Forecasting

llm

research paper

Recent advancements in Large Language Models (LLMs) have enabled LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex…

AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

llm

research paper

Retrieval-Augmented Generation (RAG) has emerged as a prominent framework for building ML/AI solutions with LLMs. Additional modules such as query rewriting, prompt…

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

llm

research paper

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of…

Instruction Pre-Training: Language Models are Supervised Multi Task Learners

llm

research paper

Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds…

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

llm

research paper

In the traditional RAG framework, the basic retrieval units are normally short but the retriever needs to scan over a massive amount of units to find the relevant piece.…

Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

llm

research paper

Large language models have shown promising results in arithmetic and symbolic reasoning by expressing intermediate reasoning in text as a chain of thought, yet struggle to…

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

llm

research paper

Today’s almost all LLMs are predominantly designed as monolithic architectures, these models rely extensively on large-scale data to embed generalized language capabilities…

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

llm

research paper

Software agents have emerged as promising tools for addressing complex software engineering tasks. However, existing works oversimplify software development workflows by…

THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation

llm

research paper

Nowadays Large language models (LLMs) with large context windows are capable of processing lengthy dialogue histories during prolonged interaction with users without…

Ad Auctions for LLMs via Retrieval Augmented Generation

llm

research paper

Large language models (LLMs) have been making headway in various domains and now also in the field of computational advertising. Now with the integration of ads into the…

Improving Alignment and Robustness with Circuit Breakers

llm

research paper

Large language models (LLMs) have been instrumental in pushing the boundaries of various real-world applications mostly which are associated with long-sequence inputs, such…

Improving Alignment and Robustness with Circuit Breakers

llm

research paper

The landscape of artificial intelligence (AI) has long been marred by the persistent threat of adversarial attacks, particularly those targeting neural networks. The rise of…

TEXTGRAD : Automatic “Differentiation” via Text

llm

research paper

There is an emerging paradigm shift in how AI systems are built these days. The new generation of AI applications are increasingly compound systems involving multiple…

HUSKY: A Unified, Open-Source Language Agent for Multi-Step Reasoning

llm

research paper

Recent advances in the capabilities of large language models (LLMs) have led to the development of language agents to address complex, multi-step tasks. However, most…

Mixture-of-Agents : Enhances Large Language Model Capabilities

llm

research paper

Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. However, despite the plethora of…

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

llm

research paper

Recently, various prompting methods such as CoT, ToT and GoT have been instrumental in improving reasoning performance of LLMs. All these methods can be broadly divided into…

Block Transformer: Global-to-Local Language Modeling for Fast Inference

llm

research paper

Generating tokens with transformer-based autoregressive language models (LMs) is costly due to the self-attention mechanism that attends to all previous tokens. To apply…

Show, Don’t Tell: Aligning Language Models with Demonstrated Feedback

llm

research paper

Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, alignment can be challenging, especially for…

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

llm

research paper

In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can…

Contextual Position Encoding: Learning to Count What’s Important

llm

research paper

The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but the attention mechanism…

Similarity is Not All You Need: Endowing Retrieval-Augmented Generation with Multi–layered Thoughts

llm

research paper

Retrieval-augmented generation (RAG) has been pencil in pushing LLM use cases in the Knowledge management system. Nevertheless, existing retrieval-augmented generation…

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

llm

research paper

Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these…

VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections

llm

research paper

LLM Training and finetuning are still far too computationally and memory intensive tasks. Several techniques have been proposed to reduce these memory requirements, such as…

Zamba: A Compact 7B SSM Hybrid Model

llm

research paper

Recently, State-of-the-art Transformer-SSM hybrid Architecture has been a driving force in Open source LLMs. Inline with such trends researchers from Zyphra have launched…

Layer-Condensed KV Cache for Efficient Inference of Large Language Models

llm

research paper

Key-value (KV) cache is one of the most significant parts of any transformer based LLM model and takes over 30% of the GPU memory during deployment. Hence KV cache plays a…

Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model

llm

research paper

Recently small-scale visual language models performance have come in par with its larger-scale counterparts. Models such as LLaVAPhi [47], which combines the open source…

SUTRA: Scalable Multilingual language model architecture

llm

research paper

Recent advancements in Large Language Models (LLMs) have predominantly focused on a limited set of data-rich languages, with training datasets being notably skewed towards…

Linearizing Large Language Models

llm

research paper

Over the last few years, Transformers have displaced Recurrent Neural Networks (RNNs) in sequence modeling tasks, owing to their highly parallel training efficiency and…

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

llm

research paper

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions…

Is Flash Attention Stable?

llm

research paper

Given the size and complexity of workloads, training Large Language Models (LLMs) often takes months together, across hundreds or thousands of GPUs. For example, LLaMA2’s…

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

llm

research paper

Optimizing LLMs operational cost and computation requirement is one of the sortout topics for researchers. Accelerated solutions deploy on mobile, edge devices or commodity…

Better & Faster Large Language Models via Multi-token Prediction

llm

research paper

All Large language models such as GPT and Llama are trained with a next-token prediction loss. However, despite the recent wave of impressive achievements in LLMs…

NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment

llm

research paper

Aligning Large Language Models (LLMs) with human values and preferences is essential for making them helpful and safe. However, building efficient tools to perform alignment…

PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models

llm

research paper

Proprietary LMs such as GPT-4 model-based evaluation have emerged as a scalable solution for assessing LM-generated text. However, concerns related to transparency…

Octopus v4: Graph of language models

llm

research paper

LLMs have been effective in a wide range of applications, yet the most sophisticated models are often proprietary (GPT 4, Gemini) and considerably costly than open source…

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

llm

research paper

Evaluating language models is a challenging task: not only is it difficult to find meaningful data to test the models, but evaluating the correctness of a generated response…

Make Your LLM Fully Utilize the Context

llm

research paper

These days the training context windows of many contemporary LLMs have been expanded to tens of thousands of tokens, thereby enabling these models to process extensive…

CodecLM: Aligning Language Models with Tailored Synthetic Data

llm

research paper

Recent progress in instruction tuned LLM highlights the critical role of high-quality data in enhancing LLMs’ instruction-following capabilities. However, acquiring such…

LLM-R2 : A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency

llm

research paper

Recently, DB query rewrite using LLMs has been one of the sort out use cases. The aim of query rewrite is to output a new query equivalent to the original SQL query, while…

TransformerFAM: Feedback attention is working memory

llm

research paper

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. One of the widely used…

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

llm

research paper

Recently Google has released RecurrentGemma, an open language model which uses Google’s novel Griffin architecture. Griffin combines linear RNN with local attention to…

MEGALODON: Efficient LLM Pretraining and Inference with Unlimited Context Length

llm

research paper

The Transformer architecture is backbone of any production LLMs, but despite its remarkable capabilities, it faces challenges with quadratic computational complexity and…

Trust Region Direct Preference Optimization (TR-DPO) : Learn Your Reference Model for Real Good Alignment

llm

research paper

Aligning large language models with human preferences (RLHF) has become increasingly important to ensure safety and overall usefulness of the model. Traditionally, the…

RHO-1: Not All Tokens Are What You Need

llm

research paper

High quality training data sets are crucial to boost LLMs performance. Various data filtering techniques such as heuristics and classifiers are being utilized to select such…

Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

llm

research paper

Fine-tuning LLMs using Reinforcement Learning from Human Feedback (RLHF) has alway been a preferred way for making LLMs more useful by aligning them with human values or…

Stream of Search (SoS): Learning to Search in Language

llm

research paper

Transformer-based auto-regressive models such as GPT have shown remarkable performance in generative tasks but struggle when it comes to complex decision-making and…

ReFT: Representation Finetuning for Language Models

llm

research paper

Parameter-efficient finetuning (PEFT) methods have been instrumental in rapid adoption of fine tuned domain specific LLMs. PEFTs not only reduced memory usage and time…

Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

llm

research paper

Transformer FLOPs Equation or FLOPs-per-token is one of the key attributes in determining computation budget for any transformer base LLM models. Usually in language models…

sDPO: Don’t Use Your Data All at Once

llm

research paper

As development of large language models (LLM) progresses, aligning them with human preferences has become increasingly important to ensure safety and usefulness of the…

Gecko: Versatile Text Embeddings Distilled from Large Language Models

llm

research paper

Recent advancement in the Text Embedding model has been instrumental for various downstream tasks including document retrieval, sentence similarity, classification, and…

Jamba: A Hybrid Transformer-Mamba Language Model

llm

research paper

Finally, the first production-grade commercially available Mamba-based model delivering best-in-class quality and performance is here. Introducing Jamba, a novel…

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

llm

research paper

LLM empowered multi-modality inputs are becoming an essential part of Vision Language Models (VLMs) such as LLaVA and Otter. However, despite these advancements, a…

RigorLLM: Resilient Guardrails for large language models against undesired content

llm

research paper

Large language models (LLMs) have demonstrated impressive capabilities in NLG and different downstream tasks. However, the potential of LLMs to produce biased or harmful…

DRAGIN: Dynamic Retrieval Augmented Generation based on the Real-time Information Needs of Large Language Models

llm

research paper

Traditional methods of RAG typically rely on single-round retrieval, using the LLM’s initial input to retrieve relevant information from external corpora. While this method…

SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series

llm

research paper

Recently, Structured State Space models (SSM) such as Mumba have been pitched as an for Transformer based models especially when it comes to increase efficiency and…

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference

llm

research paper

Vision language models (VLMs) like GPT-4, LLaMAadapter, and LLaVA have been instrumental in augmenting LLMs with visual understanding capabilities. VLMs serve as…

Evolutionary Optimization of Model Merging Recipes

llm

research paper

Model merging offers a novel approach to leverage the strengths of multiple pre-trained models. It allows us to combine task-specific models, each potentially fine-tuned for…

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

llm

research paper

Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Most of Existing Multimodal Large Language…

PERL: Parameter Efficient Reinforcement Learning from Human Feedback

llm

research paper

Reinforcement Learning from Human Feedback (RLHF) is one of the most popular methods to align Pretrained Large Language Models (LLMs) with human preferences. It involves…

RAFT: Adapting Language Model to Domain Specific RAG

llm

research paper

Adapting LLMs to the specialized domains, which is essential to many emerging applications, usually takes two paths: in-context learning through Retrieval-Augmented…

USER-LLM: Efficient LLM Contextualization with User Embeddings

llm

research paper

Large language models (LLMs) have revolutionized the field of user modeling and personalization due to its ability to learn and adapt from massive amounts of textual data.…

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation

llm

research paper

Factual correctness has been one of the growing concerns around LLMs reasoning capabilities. This issue becomes more significant when it comes to zero-shot CoT (Chain of…

MoAI: Mixture of All Intelligence for Large Language and Vision Models

llm

research paper

Following the success of the instruction-tuned LLMs, several visual instruction tuning datasets have been meticulously curated to enhance zero-shot vision language (VL)…

VideoMamba: State Space Model for Efficient Video Understanding

llm

research paper

Mastering spatiotemporal representation is one of the key areas in any video understanding task. However there usually are two challenges associated with it: (1) the large…

Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models

llm

research paper

Adapter-based fine-tuning methods, such as LoRA, are key to making large language models disruptive in various domain specific applications. LoRA introduces a limited number…

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

llm

research paper

Training Large Language Models (LLMs) is challenging due to memory constraints from weight and optimizer size. Low-rank adaptation (LoRA) addresses this by adding trainable…

InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding

llm

research paper

Design2Code: How Far Are We From Automating Front-End Engineering?

llm

research paper

Recent releases of advanced multimodal LLMs such as GPT-4V and Gemini version pro have led to breakthroughs in visual and code generation understanding. This has opened up…

DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model

llm

research paper

In recent years, text-to-image (T2I) generation models such as DreamBooth and BLIP-Diffusion have rapidly evolved, generating intricate and highly detailed images that often…

VisionLLaMA : A Unified LLaMA Interface for Vision Tasks

llm

research paper

Large language models, especially the LLaMA family of models, aroused great interest in the research community for multimodal models application, where many methods heavily…

Beyond Language Models: Byte Models are Digital World Simulators

llm

research paper

Bytes are the foundation of all digital data, devices, and software, from computer processors to operating systems in everyday electronics. Therefore, training models for…

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

llm

research paper

Large Language Models (LLMs) have demonstrated remarkable performance in a wide range of natural language processing tasks, but their increasing size has posed challenges…

ChunkLlama : Training-Free Long-Context Scaling of Large Language Models

llm

research paper

The ability to comprehend and process long-context information is essential for large language models (LLMs) to cater to a wide range of applications effectively. Finetuning…

MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

llm

research paper

MobiLlama, another Small Language Models (SLMs) for resource constrained devices. MobileLlama is a SLM design that initiates from a larger model and applies a careful…

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

llm

research paper

Self-attention, one of the critical components in LLM, has a poor performance during inference since it performs intensive memory operations on key/value tensors of context…

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

llm

research paper

Large language models (LLMs) with large model size can greatly improve task performance but demand expensive computational resources for training. To address this, the LLM…

The FinBen: An Holistic Financial Benchmark for Large Language Models

llm

research paper

Recent studies have shown the great potential of advanced LLMs such as GPT-4 on financial text analysis and prediction tasks in the financial domain. While their potential…

GRIT : Generative Representational Instruction Tuning

llm

research paper

All text-based language problems can be reduced to either generation or embedding. Creating a single general model that performs such a wide range of tasks has been a…

Aespa: Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

llm

research paper

With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices…

Graph Mamba: Towards Learning on Graphs with State Space Models

llm

research paper

Graph Transformers (GTs) has shown promising potential in graph representation learning. GTs, however, have quadratic computational cost, lack inductive biases on graph…

Fiddler: CPU-GPU Orchestration for Fast Local Inference of MoE Models

llm

research paper

Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architectures are showing remarkable performance on various tasks. By activating a subset of experts inside…

PHATGOOSE: Learning to Route Among Specialized Experts for Zero-Shot Generalization

llm

research paper

The availability of Huggingface PEFT modules has made it cheap and easy to modularly adapt a given pre-trained model to a specific task or domain. In the meantime, extremely…

Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

llm

research paper

General-purpose LLMs like LLaMA and GPT-4 have demonstrated remarkable proficiency in understanding and generating natural language. However, their capabilities wane in…

Hydragen: High-Throughput LLM Inference with Shared Prefixes

llm

research paper

Transformer-based large language models (LLMs) such as OpenAI GPT3.5 and GPT4 are now deployed to hundreds of millions of users. LLM inference in such scenarios commonly…

MambaFormer: Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks

llm

research paper

State-space models (SSMs), such as Mamba, have been proposed as alternatives to Transformer networks in language modeling, by incorporating gating, convolutions, and…

BlackMamba: Mixture of Experts for State-Space Models

llm

research paper

State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks while achieving linear time and…

Repeat After Me: Transformers are Better than State Space Models at Copying

llm

research paper

Re3val: Reinforced and Reranked Generative Retrieval

llm

research paper

The primary objective of retrieval models is to enhance the accuracy of answers by selecting the most relevant documents retrieved for a given query, ensuring models have…

FIND: INterface for Foundation models’ embeDDings

llm

research paper

Foundation models across the vision and language domains, such as GPT4, DALLE-3, SAM and LLaMA etc., have demonstrated significant advancements in addressing open-ended…

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

llm

research paper

Vision Language Models (VLMs), such as OpenAI’s GPT-4, Flamingo, BLIP-2 and LLaVA have demonstrated significant advancements in addressing open-ended visual…

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

llm

research paper

For Large Vision-Language Models (LVLMs), scaling the model can effectively improve performance. However, expanding model parameters significantly increases the training and…

EAGLE: Extrapolation Algorithm for Greater Language-model Efficiency

llm

research paper

Auto-regressive decoding has become the de facto standard for large language models (LLMs). This process generates output tokens one at a time, which makes the generation by…

MambaByte: Token-free Selective State Space Model

llm

research paper

In December 2023, “Mamba : Linear-Time Sequence Modeling with Selective State Spaces” paper was release and with it the whole discussion about Mamba (SSM) been a viable…

Instruction-Tune Llama2 with TRL

hugging face

llm

model building

This blog post is an extended guide on instruction-tuning Llama 2 from Meta AI. The idea of the blog post is to focus on creating the instruction dataset, which we can then…

Towards Conversational Diagnostic AI

llm

research paper

With the Med-PaLM series of LLMs Google is one of the few companies you can claim expertise in building medical domain specific LLMs. The latest addition has been AMIE…

ChatQA: Building GPT-4 Level Conversational QA Models

llm

research paper

With all open source LLM models trying to outperform GPT-4 one may wonder, which one has truly been successful in Conversational QA - one of the elementary use cases of LLMs.

How to Fine-Tune LLMs with TRL

hugging face

llm

model building

Large Language Models or LLMs have seen a lot of progress in the last year. We went from now ChatGPT competitor to a whole zoo of LLMs, including Meta AI’s Llama 2, Mistrals …

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

llm

research paper

Merge Model using Mergekit

tools

llm

model building

Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU…

Tuning Language Models by Proxy

llm

research paper

These days capabilities of large pretrained LLMs can be significantly enhanced for specific domains of interest or task using additional fine tuning. However, tuning these…

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference

llm

research paper

Recently Microsoft DeepSpeed launched DeepSpeed-FastGen LLM serving framework, which offers up to 2.3x higher effective throughput compared to state-of-the-art systems like…

Self-Evaluation Improves Selective Generation in Large Language Models

llm

research paper

Trustworthiness of LLMs output is one of the important considerations for safe deployment of LLMs in production.Once of the straightforward way to do so is by measuring…

Self-RAG: Learning to Retrieve, Generate and Critique through Self-Reflections

llm

research paper

Self-RAG is a new framework to train an arbitrary LM to learn to retrieve, generate, and critique to enhance the factuality and quality of generations, without hurting the…

Reciprocal Rank Fusion (RRF) with LambdaMART: Context Tuning for Retrieval Augmented Generation (RAG)

llm

research paper

RAG typically consists of three primary components: Tool Retrieval, Plan Generation, and Execution. Existing RAG methodologies rely heavily on semantic search for tool…

Chain of Thought (CoT): The Impact of Reasoning Step Length on Large Language Models

llm

research paper

If you are doing prompt engineering for LLMs then you might have come across Chain of Thought (CoT) prompting, which is significant in improving the reasoning abilities of…

Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

llm

research paper

Introducing DistAttention, a distributed attention algorithm, and DistKV-LLM, a distributed LLM serving system, to improve the performance and resource management of…

Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon

llm

research paper

Activation Beacon is a plug-and-play module for large language models that allows them to process longer contexts with a limited context window, while preserving their…

Improving Text Embeddings with Large Language Models using fine-tuned Mistral-7B LLM

llm

research paper

Check out a groundbreaking paper on improving text embeddings with large language models (LLMs) like GPT-4! The authors propose generating synthetic training data for text…

DOCLLM: A Layout Aware Generative Language Models for Multi model document understanding

llm

research paper

Introducing DocLLM, a groundbreaking generative language model that can understand visually rich documents without the need for expensive image encoders. DocLLM uses a…

Self-Play Fine-Tuning (SPIN): Converts Weak Language Models to Strong Language Models

llm

research paper

Self-Play Fine-Tuning (SPIN) is a new fine-tuning method to improve large language models (LLMs) without needing additional human-annotated data.

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

llm

research paper

The paper provides a comprehensive taxonomy categorizing over 32 techniques for mitigating hallucinations in large language models (LLMs). It groups the techniques into…

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

llm

research paper

With only four lines of code modification, the proposed method can effortlessly extend existing LLMs’ context window without any fine-tuning. This work elicits LLMs’…

Mamba-Chat: A Chat LLM based on State Space Models

llm

research paper

Mamba-Chat is the first chat language model based on a state-space model architecture, not a transformer.

KwaiAgents: Generalized Information-seeking Agent System with LLMs - 2 Open-source models fine tuned for agent systems! Better than GPT-3.5 turbo as an agent!

llm

research paper

Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this…