Tag

Rag

500 articles archived under #rag · RSS

arXiv — NLP / Computation & Language research 17d ago

Decoupled Mixture-of-Experts for Parametric Knowledge Injection

arXiv:2606.14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented…

33
arXiv — NLP / Computation & Language research 17d ago

ScoreGate: Adaptive Chunk Selection for Retrieval-Augmented Generation via Dual-Score Statistical Fusion

arXiv:2606.14269v1 Announce Type: cross Abstract: Fixed-cardinality retrieval injects a constant top-K chunks into the generator regardless of query complexity, causing over-retrieval for narrow queries and under-retrieval for compositional ones. We describe ScoreGate, a…

11
arXiv — NLP / Computation & Language research 17d ago

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

arXiv:2504.20734v5 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing approaches are limited to a…

37
arXiv — NLP / Computation & Language research 17d ago

Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression

arXiv:2505.23277v3 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) often suffers from long and noisy retrieved contexts. Existing context compression methods typically rely on heuristic relevance estimation or supervised compression models rather than on…

29
arXiv — NLP / Computation & Language research 17d ago

Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links

arXiv:2509.24102v5 Announce Type: replace Abstract: While moral reasoning has emerged as a promising research direction for large language models (LLMs), achieving robust generalization remains a critical challenge. This challenge arises from the gap between what is said and…

27
arXiv — NLP / Computation & Language research 17d ago

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

arXiv:2512.22671v3 Announce Type: replace Abstract: Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities.…

22
arXiv — NLP / Computation & Language research 17d ago

C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning

arXiv:2603.05167v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used as judges of chain-of-thought (CoT) reasoning, yet it remains unclear whether they can reliably assess process faithfulness rather than merely answer plausibility. We introduce…

20
arXiv — NLP / Computation & Language research 17d ago

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

arXiv:2603.16073v2 Announce Type: replace Abstract: Scientific papers advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these…

38
Vercel — AI dev-tools 17d ago

Increased Blob store limit for Hobby users

Hobby users can now create up to 100 Blob stores, up from 5. This gives teams more flexibility to organize data by project, environment, or region as applications grow. Storage, operations, and transfer limits still apply. Learn more in the Blob documentation . Read more

21
r/LocalLLaMA community 17d ago

Gemma 12b less than 10 watts 6.5pp 1.3tg

Google pixel 10 pro Termux Llamacpp version: 9639 (ef8268fee) $ ./llama.cpp/build_vulkan/bin/llama-cli -m storage/downloads/gemma-4-12b-it-UD-Q3_K_XL.gguf --model-draft storage/downloads/mtp-gemma-4-12b-it.gguf --temp 1.0 --top-p 0.95 --top-k 64 --spec-type draft-mtp…

5
r/MachineLearning community 18d ago

I’m building a free bilingual machine-learning notebook course — looking for feedback on structure and coverage [R]

Hi everyone, I’m building an open-source machine-learning tutorial repository in Jupyter Notebook format: https://github.com/mohammadijoo/Machine_Learning_Tutorials The course is bilingual: English and Persian/Farsi versions are organized in parallel. The goal is to make a…

18
r/LocalLLaMA community 19d ago

I don’t know who needs to hear this but 128GB BD-R XL M-DISC is SOTA for consumer-available archival optical storage (for backing up your models)

If you’re trying to download and preserve your local LLMs in case of future availability issues due to AI-related politics, your best bet is either 128gb or 100gb Blu-Ray optical disks, more specifically BD-R XL M-DISC standard format which are archival-grade and built to last…

21
r/LocalLLaMA community 19d ago

3090 died, good night sweet prince

Feelsbadman.jpeg Once you've tasted 4x GPUs and almost BF16 models with BF16 KV cache you can't go back 😞. AND IT'S THE WEEKEND OH MAN.   submitted by   /u/fragment_me [link]   [comments]

32
Vercel — AI dev-tools 19d ago

Workflow SDK now runs natively in Nitro v3

Workflow SDK 's native Nitro v3 integration is now in beta. Steps run inside the same bundled runtime as the rest of your app, instead of a separate bundle. Nitro's useStorage() and other server-side APIs work directly inside "use step" functions. The Nitro dev server also…

26
TechCrunch — AI news-outlet 19d ago

SpaceX IPO: Live updates on everything you need to know

TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won't), pre-IPO deals, and what's tucked inside its S-1 registration…

4
NVIDIA Developer Blog official-blog 20d ago

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

As enterprise AI adoption scales, developers are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and...

25
TechCrunch — AI news-outlet 20d ago

SpaceX IPO: Everything you need to know

TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won't), pre-IPO deals, and what's tucked inside its S-1 registration…

32
r/LocalLLaMA community 20d ago

We should heavily discourage and moderate cloud API (deepseek api, GLM api, etc.) topics and discussion. This is LOCAL first.

I’m just some fucking guy. This is just some fucking opinion. I’ve seen tons of stealth marketing or related topics on this subreddit about how great or how easy it is to use some random subscription api. Why the fuck are we allowing people to so casually talk about how much…

31
Hugging Face Daily Papers research 20d ago

Leveraging Morphology for Historical Script Metrological Analysis

Abstract A transformer-based architecture with prototype learning enables scalable paleographic measurements from historical documents using only line-level transcriptions, demonstrating its effectiveness on a 160-page codex with minimal training data requirements. Generated by…

37
arXiv — NLP / Computation & Language research 20d ago

Rigel: Reverse-Engineering the Metal 4.1 Tensor Compute Path on the Apple M4 Max GPU

arXiv:2606.12765v1 Announce Type: new Abstract: Apple's Metal 4.1 exposes a tensor compute path: the Metal Performance Primitives (MPP) matmul2d operation over cooperative_tensor fragments, whose interface is documented but whose hardware behavior is deliberately hidden. The…

18
arXiv — NLP / Computation & Language research 20d ago

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation

arXiv:2606.12789v1 Announce Type: new Abstract: Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and at what granularity. We present…

22
arXiv — NLP / Computation & Language research 20d ago

SafeLLM: Extraction as a Hallucination-Resistant Alternative to Rewriting in Safety-Critical Settings

arXiv:2606.12897v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to access organisational documentation, including standard operating procedures (SOPs), HR policies and institutional guidelines. However, retrieval-augmented generation (RAG)…

29
arXiv — NLP / Computation & Language research 20d ago

X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation

arXiv:2606.12903v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) systems may receive evidence that is not merely noisy but mutually contradictory. This issue becomes particularly salient in multilingual settings, where retrieved Chinese and English evidence…

8
arXiv — NLP / Computation & Language research 20d ago

HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

arXiv:2606.13142v1 Announce Type: new Abstract: Persona-grounded dialogue systems aim to produce responses consistent with a speaker's persona, yet existing methods treat personas as a flat set of sentences and fail to model the high-order relations among persona…

25
arXiv — NLP / Computation & Language research 20d ago

SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection

arXiv:2606.13189v1 Announce Type: new Abstract: Prompt-based LLMs are increasingly used for stance detection, but harder examples are not always repaired by clearer instructions, reasoning prompts, retrieval, or debate. We introduce SICI (Stance Inference Complexity Index), a…

10
arXiv — NLP / Computation & Language research 20d ago

PolyAlign: Conditional Human-Distribution Alignment

arXiv:2606.13227v1 Announce Type: new Abstract: Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress…

29
arXiv — NLP / Computation & Language research 20d ago

Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

arXiv:2606.13507v1 Announce Type: new Abstract: Large-scale mined corpora provide abundant training data for end-to-end speech-to-speech translation (S2ST) but may contain noise, misalignment, and semantic errors. Filtering noisy data is crucial to maintain robust speech…

30
arXiv — NLP / Computation & Language research 20d ago

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

arXiv:2606.13537v1 Announce Type: new Abstract: While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on mMARCO that systematically evaluates…

11
arXiv — NLP / Computation & Language research 20d ago

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

arXiv:2606.13647v1 Announce Type: new Abstract: We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual…

25
arXiv — NLP / Computation & Language research 20d ago

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

arXiv:2606.13680v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning…

11
arXiv — NLP / Computation & Language research 20d ago

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

arXiv:2606.12471v1 Announce Type: cross Abstract: Klindt, LeCun, and Balestriero (arXiv:2605.26379) proved that Joint-Embedding Predictive Architectures (JEPAs) achieve linear identifiability, the linear recovery of the world's true latent variables, if and only if the world's…

28
arXiv — NLP / Computation & Language research 20d ago

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

arXiv:2606.12616v1 Announce Type: cross Abstract: Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single…

16
arXiv — NLP / Computation & Language research 20d ago

MiniPIC: Flexible Position-Independent Caching in <100LOC

arXiv:2606.13126v1 Announce Type: cross Abstract: Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV…

12
arXiv — NLP / Computation & Language research 20d ago

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

arXiv:2606.13239v1 Announce Type: cross Abstract: Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with…

34
arXiv — NLP / Computation & Language research 20d ago

TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum

arXiv:2606.13267v1 Announce Type: cross Abstract: TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in English or…

37
arXiv — NLP / Computation & Language research 20d ago

Uncertainty-Aware Hybrid Retrieval for Long-Document RAG

arXiv:2606.13550v1 Announce Type: cross Abstract: Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence…

38
Hugging Face Daily Papers research 20d ago

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

Abstract N-GRPO, a novel exploration strategy within GRPO framework, enhances mathematical reasoning in large language models through semantic neighbor mixing that maintains semantic consistency while injecting diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The success…

27
Hugging Face Daily Papers research 21d ago

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Abstract A lightweight approach combining a frozen pretrained time-series foundation model with a simple regression head achieves superior RUL prediction performance compared to various baseline methods on industrial sensor data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

15
arXiv — Machine Learning research 21d ago

RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

arXiv:2606.11275v1 Announce Type: new Abstract: Rotary Position Embeddings (RoPE) make attention scores position-relative but leave the value pathway position-blind: the message sent by a value token is the same regardless of its distance from the query. We propose RoVE, a…

10
arXiv — Machine Learning research 21d ago

RePAIR: Predictive Self-Supervised Representation Learning in Chess

arXiv:2606.11860v1 Announce Type: new Abstract: In this paper, we introduce Representation Prediction via Autoencoding using Iterative Refinement (RePAIR) - a novel self-supervised representation learning architecture that synthesizes Masked Autoencoders (MAE), Joint Embedding…

15
arXiv — Machine Learning research 21d ago

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

arXiv:2606.11990v1 Announce Type: new Abstract: Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models.…

20
arXiv — Machine Learning research 21d ago

Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents

arXiv:2606.11998v1 Announce Type: new Abstract: Trusted monitoring is a cornerstone of AI control. However, as frontier models grow more capable, the increasing capabilities gap between trusted and untrusted models may render trusted models unreliable monitors. We introduce…

30
arXiv — Machine Learning research 21d ago

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

arXiv:2606.12146v1 Announce Type: new Abstract: Rotary Position Embedding (RoPE) is widely adopted in Transformer models, yet its extension to high-dimensional domains lacks a unified theoretical formulation. Most existing approaches either apply rotations independently along…

8
arXiv — Machine Learning research 21d ago

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information…

14
arXiv — NLP / Computation & Language research 21d ago

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

arXiv:2606.11198v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently distort the model's attention…

6
arXiv — NLP / Computation & Language research 21d ago

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

arXiv:2606.11199v1 Announce Type: new Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather…

24
arXiv — NLP / Computation & Language research 21d ago

EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA

arXiv:2606.11212v1 Announce Type: new Abstract: Standard Retrieval-Augmented Generation (RAG) pipelines route every query through retrieval and generation unconditionally, incurring unnecessary computation and propagating low-quality context to the generator. We introduce…

12
arXiv — NLP / Computation & Language research 21d ago

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

arXiv:2606.11257v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use,…

35
arXiv — NLP / Computation & Language research 21d ago

When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval

arXiv:2606.11350v1 Announce Type: new Abstract: Retrieval-augmented generation degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually…

13
arXiv — NLP / Computation & Language research 21d ago

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few…

17

Decoupled Mixture-of-Experts for Parametric Knowledge Injection

ScoreGate: Adaptive Chunk Selection for Retrieval-Augmented Generation via Dual-Score Statistical Fusion

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression

Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning

ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Increased Blob store limit for Hobby users

Gemma 12b less than 10 watts 6.5pp 1.3tg

I’m building a free bilingual machine-learning notebook course — looking for feedback on structure and coverage [R]

I don’t know who needs to hear this but 128GB BD-R XL M-DISC is SOTA for consumer-available archival optical storage (for backing up your models)

3090 died, good night sweet prince

Workflow SDK now runs natively in Nitro v3

SpaceX IPO: Live updates on everything you need to know

Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure

SpaceX IPO: Everything you need to know

We should heavily discourage and moderate cloud API (deepseek api, GLM api, etc.) topics and discussion. This is LOCAL first.

Leveraging Morphology for Historical Script Metrological Analysis

Rigel: Reverse-Engineering the Metal 4.1 Tensor Compute Path on the Apple M4 Max GPU

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation

SafeLLM: Extraction as a Hallucination-Resistant Alternative to Rewriting in Safety-Critical Settings

X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation

HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue

SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection

PolyAlign: Conditional Human-Distribution Alignment

Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency

PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation

MiniPIC: Flexible Position-Independent Caching in <100LOC

ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm

TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum

Uncertainty-Aware Hybrid Retrieval for Long-Document RAG

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

RePAIR: Predictive Self-Supervised Representation Learning in Chess

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents

nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content

NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track

EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis