News / #rag Tag Rag 500 articles archived under #rag · RSS Sign in to follow arXiv — NLP / Computation & Language research 17d ago Decoupled Mixture-of-Experts for Parametric Knowledge Injection arXiv:2606.14243v1 Announce Type: new Abstract: Knowledge injection aims to equip large language models (LLMs) with external, domain-specific, or time-sensitive knowledge. Existing approaches typically face a trade-off between flexibility and integration: retrieval-augmented… 33 arXiv — NLP / Computation & Language research 17d ago ScoreGate: Adaptive Chunk Selection for Retrieval-Augmented Generation via Dual-Score Statistical Fusion arXiv:2606.14269v1 Announce Type: cross Abstract: Fixed-cardinality retrieval injects a constant top-K chunks into the generator regardless of query complexity, causing over-retrieval for narrow queries and under-retrieval for compositional ones. We describe ScoreGate, a… 11 arXiv — NLP / Computation & Language research 17d ago UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities arXiv:2504.20734v5 Announce Type: replace Abstract: Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing approaches are limited to a… 37 arXiv — NLP / Computation & Language research 17d ago Sentinel: Decoding Context Utilization via Attention Probing for Efficient LLM Context Compression arXiv:2505.23277v3 Announce Type: replace Abstract: Retrieval-augmented generation (RAG) often suffers from long and noisy retrieved contexts. Existing context compression methods typically rely on heuristic relevance estimation or supervised compression models rather than on… 29 arXiv — NLP / Computation & Language research 17d ago Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links arXiv:2509.24102v5 Announce Type: replace Abstract: While moral reasoning has emerged as a promising research direction for large language models (LLMs), achieving robust generalization remains a critical challenge. This challenge arises from the gap between what is said and… 27 arXiv — NLP / Computation & Language research 17d ago Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2 arXiv:2512.22671v3 Announce Type: replace Abstract: Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities.… 22 arXiv — NLP / Computation & Language research 17d ago C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning arXiv:2603.05167v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used as judges of chain-of-thought (CoT) reasoning, yet it remains unclear whether they can reliably assess process faithfulness rather than merely answer plausibility. We introduce… 20 arXiv — NLP / Computation & Language research 17d ago ClaimFlow: Tracing the Evolution of Scientific Claims in NLP arXiv:2603.16073v2 Announce Type: replace Abstract: Scientific papers advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these… 38 Vercel — AI dev-tools 17d ago Increased Blob store limit for Hobby users Hobby users can now create up to 100 Blob stores, up from 5. This gives teams more flexibility to organize data by project, environment, or region as applications grow. Storage, operations, and transfer limits still apply. Learn more in the Blob documentation . Read more 21 r/LocalLLaMA community 17d ago Gemma 12b less than 10 watts 6.5pp 1.3tg Google pixel 10 pro Termux Llamacpp version: 9639 (ef8268fee) $ ./llama.cpp/build_vulkan/bin/llama-cli -m storage/downloads/gemma-4-12b-it-UD-Q3_K_XL.gguf --model-draft storage/downloads/mtp-gemma-4-12b-it.gguf --temp 1.0 --top-p 0.95 --top-k 64 --spec-type draft-mtp… 5 r/MachineLearning community 18d ago I’m building a free bilingual machine-learning notebook course — looking for feedback on structure and coverage [R] Hi everyone, I’m building an open-source machine-learning tutorial repository in Jupyter Notebook format: https://github.com/mohammadijoo/Machine_Learning_Tutorials The course is bilingual: English and Persian/Farsi versions are organized in parallel. The goal is to make a… 18 r/LocalLLaMA community 19d ago I don’t know who needs to hear this but 128GB BD-R XL M-DISC is SOTA for consumer-available archival optical storage (for backing up your models) If you’re trying to download and preserve your local LLMs in case of future availability issues due to AI-related politics, your best bet is either 128gb or 100gb Blu-Ray optical disks, more specifically BD-R XL M-DISC standard format which are archival-grade and built to last… 21 r/LocalLLaMA community 19d ago 3090 died, good night sweet prince Feelsbadman.jpeg Once you've tasted 4x GPUs and almost BF16 models with BF16 KV cache you can't go back 😞. AND IT'S THE WEEKEND OH MAN.   submitted by   /u/fragment_me [link]   [comments] 32 Vercel — AI dev-tools 19d ago Workflow SDK now runs natively in Nitro v3 Workflow SDK 's native Nitro v3 integration is now in beta. Steps run inside the same bundled runtime as the rest of your app, instead of a separate bundle. Nitro's useStorage() and other server-side APIs work directly inside "use step" functions. The Nitro dev server also… 26 TechCrunch — AI news-outlet 19d ago SpaceX IPO: Live updates on everything you need to know TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won't), pre-IPO deals, and what's tucked inside its S-1 registration… 4 NVIDIA Developer Blog official-blog 20d ago Deploy Long-Context Reasoning and Agentic Workflows with MiniMax M3 on NVIDIA Accelerated Infrastructure As enterprise AI adoption scales, developers are increasingly forced to stitch together fragmented pipelines—separate models for text, vision, and... 25 TechCrunch — AI news-outlet 20d ago SpaceX IPO: Everything you need to know TechCrunch has followed SpaceX's start, struggles, and successes from the early days. And we're here for what happens next too. This package of SpaceX IPO coverage includes who stands to win (and maybe some who won't), pre-IPO deals, and what's tucked inside its S-1 registration… 32 r/LocalLLaMA community 20d ago We should heavily discourage and moderate cloud API (deepseek api, GLM api, etc.) topics and discussion. This is LOCAL first. I’m just some fucking guy. This is just some fucking opinion. I’ve seen tons of stealth marketing or related topics on this subreddit about how great or how easy it is to use some random subscription api. Why the fuck are we allowing people to so casually talk about how much… 31 Hugging Face Daily Papers research 20d ago Leveraging Morphology for Historical Script Metrological Analysis Abstract A transformer-based architecture with prototype learning enables scalable paleographic measurements from historical documents using only line-level transcriptions, demonstrating its effectiveness on a 160-page codex with minimal training data requirements. Generated by… 37 arXiv — NLP / Computation & Language research 20d ago Rigel: Reverse-Engineering the Metal 4.1 Tensor Compute Path on the Apple M4 Max GPU arXiv:2606.12765v1 Announce Type: new Abstract: Apple's Metal 4.1 exposes a tensor compute path: the Metal Performance Primitives (MPP) matmul2d operation over cooperative_tensor fragments, whose interface is documented but whose hardware behavior is deliberately hidden. The… 18 arXiv — NLP / Computation & Language research 20d ago How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation arXiv:2606.12789v1 Announce Type: new Abstract: Evaluating retrieval-augmented generation (RAG) systems requires benchmarks that capture diverse question characteristics, yet practitioners lack empirical guidance on which dimensions to vary and at what granularity. We present… 22 arXiv — NLP / Computation & Language research 20d ago SafeLLM: Extraction as a Hallucination-Resistant Alternative to Rewriting in Safety-Critical Settings arXiv:2606.12897v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to access organisational documentation, including standard operating procedures (SOPs), HR policies and institutional guidelines. However, retrieval-augmented generation (RAG)… 29 arXiv — NLP / Computation & Language research 20d ago X-MADAM-RAG: Diagnosing and Handling Chinese-English Evidence Conflict in Retrieval-Augmented Generation arXiv:2606.12903v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) systems may receive evidence that is not merely noisy but mutually contradictory. This issue becomes particularly salient in multilingual settings, where retrieved Chinese and English evidence… 8 arXiv — NLP / Computation & Language research 20d ago HyPE: Category-Aware Hypergraph Encoding with Persistent Edge Embeddings for Persona-Grounded Dialogue arXiv:2606.13142v1 Announce Type: new Abstract: Persona-grounded dialogue systems aim to produce responses consistent with a speaker's persona, yet existing methods treat personas as a flat set of sentences and fail to model the high-order relations among persona… 25 arXiv — NLP / Computation & Language research 20d ago SICI: A Semantic-Pragmatic Complexity Index Reveals Regime Shifts in LLM Stance Detection arXiv:2606.13189v1 Announce Type: new Abstract: Prompt-based LLMs are increasingly used for stance detection, but harder examples are not always repaired by clearer instructions, reasoning prompts, retrieval, or debate. We introduce SICI (Stance Inference Complexity Index), a… 10 arXiv — NLP / Computation & Language research 20d ago PolyAlign: Conditional Human-Distribution Alignment arXiv:2606.13227v1 Announce Type: new Abstract: Post-training methods such as supervised fine-tuning (SFT) and preference optimization typically align language models toward a single global assistant behavior. While effective for improving average helpfulness, this can suppress… 29 arXiv — NLP / Computation & Language research 20d ago Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data arXiv:2606.13507v1 Announce Type: new Abstract: Large-scale mined corpora provide abundant training data for end-to-end speech-to-speech translation (S2ST) but may contain noise, misalignment, and semantic errors. Filtering noisy data is crucial to maintain robust speech… 30 arXiv — NLP / Computation & Language research 20d ago When Does Mixing Help? Analyzing Query Embedding Interpolation in Multilingual Dense Retrieval arXiv:2606.13537v1 Announce Type: new Abstract: While mixed-language querying is ubiquitous in multilingual communities, the sensitivity of dense retrievers to such queries remains poorly understood. We present a ratio-controlled study on mMARCO that systematically evaluates… 11 arXiv — NLP / Computation & Language research 20d ago SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation arXiv:2606.13647v1 Announce Type: new Abstract: We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types -- nearly 4$\times$ the depth of existing multilingual… 25 arXiv — NLP / Computation & Language research 20d ago Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning arXiv:2606.13680v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning… 11 arXiv — NLP / Computation & Language research 20d ago Identifiability Without Gaussianity: Symbolic World Models and Near-Infinite Temporal Consistency arXiv:2606.12471v1 Announce Type: cross Abstract: Klindt, LeCun, and Balestriero (arXiv:2605.26379) proved that Joint-Embedding Predictive Architectures (JEPAs) achieve linear identifiability, the linear recovery of the world's true latent variables, if and only if the world's… 28 arXiv — NLP / Computation & Language research 20d ago PersonaDrive: Human-Style Retrieval-Augmented VLA Agents for Closed-Loop Driving Simulation arXiv:2606.12616v1 Announce Type: cross Abstract: Closed-loop driving simulators typically populate their environments with non-ego traffic agents that behave largely the same way, produced either by rule-based traffic managers or by learned models trained toward a single… 16 arXiv — NLP / Computation & Language research 20d ago MiniPIC: Flexible Position-Independent Caching in <100LOC arXiv:2606.13126v1 Announce Type: cross Abstract: Retrieval-augmented and agentic workloads repeatedly prefill recurring predictable structured inputs (which we call "spans") such as documents and code files. Yet, prefix caching in engines such as vLLM cannot reuse their KV… 12 arXiv — NLP / Computation & Language research 20d ago ComAct: Reframing Professional Software Manipulation via COM-as-Action Paradigm arXiv:2606.13239v1 Announce Type: cross Abstract: Existing computer-use agents remain fundamentally limited in professional software manipulation: GUI-based agents suffer from fragile visual grounding and long-horizon error accumulation, while API-basedapproaches struggle with… 34 arXiv — NLP / Computation & Language research 20d ago TimeLens: On-Device Artifact Recognition with Retrieval-Augmented Question Answering for the Grand Egyptian Museum arXiv:2606.13267v1 Announce Type: cross Abstract: TimeLens is an AI-powered bilingual mobile guide for the Grand Egyptian Museum (GEM). Pointing a phone at an exhibit, a visitor sees the artifact recognized in real time and can ask follow-up questions answered in English or… 37 arXiv — NLP / Computation & Language research 20d ago Uncertainty-Aware Hybrid Retrieval for Long-Document RAG arXiv:2606.13550v1 Announce Type: cross Abstract: Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence… 38 Hugging Face Daily Papers research 20d ago N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization Abstract N-GRPO, a novel exploration strategy within GRPO framework, enhances mathematical reasoning in large language models through semantic neighbor mixing that maintains semantic consistency while injecting diversity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct The success… 27 Hugging Face Daily Papers research 21d ago Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation Abstract A lightweight approach combining a frozen pretrained time-series foundation model with a simple regression head achieves superior RUL prediction performance compared to various baseline methods on industrial sensor data. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 15 arXiv — Machine Learning research 21d ago RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways arXiv:2606.11275v1 Announce Type: new Abstract: Rotary Position Embeddings (RoPE) make attention scores position-relative but leave the value pathway position-blind: the message sent by a value token is the same regardless of its distance from the query. We propose RoVE, a… 10 arXiv — Machine Learning research 21d ago RePAIR: Predictive Self-Supervised Representation Learning in Chess arXiv:2606.11860v1 Announce Type: new Abstract: In this paper, we introduce Representation Prediction via Autoencoding using Iterative Refinement (RePAIR) - a novel self-supervised representation learning architecture that synthesizes Masked Autoencoders (MAE), Joint Embedding… 15 arXiv — Machine Learning research 21d ago Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation arXiv:2606.11990v1 Announce Type: new Abstract: Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models.… 20 arXiv — Machine Learning research 21d ago Bootstrapped Monitoring: Leveraging Transparent Reasoning to Oversee Stronger AI Agents arXiv:2606.11998v1 Announce Type: new Abstract: Trusted monitoring is a cornerstone of AI control. However, as frontier models grow more capable, the increasing capabilities gap between trusted and untrusted models may render trusted models unreliable monitors. We introduce… 30 arXiv — Machine Learning research 21d ago nD-RoPE: A Generalized RoPE for n-Dimensional Position Embedding arXiv:2606.12146v1 Announce Type: new Abstract: Rotary Position Embedding (RoPE) is widely adopted in Transformer models, yet its extension to high-dimensional domains lacks a unified theoretical formulation. Most existing approaches either apply rotations independently along… 8 arXiv — Machine Learning research 21d ago Fourier Features Let Agents Learn High Precision Policies with Imitation Learning arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information… 14 arXiv — NLP / Computation & Language research 21d ago The Structural Attention Tax: How Retrieval Format Hijacks In-Context Learning Independent of Content arXiv:2606.11198v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) systems inject external knowledge to improve LLM outputs, yet the format of injected content -- distinct from its semantic relevance -- can independently distort the model's attention… 6 arXiv — NLP / Computation & Language research 21d ago NightFeats @ MMU-RAGent NeurIPS 2025: A Context-Optimized Multi-Agent RAG System for the Text-to-Text Track arXiv:2606.11199v1 Announce Type: new Abstract: We present NightFeats, a structured multi-agent retrieval-augmented generation (RAG) system submitted to the MMU-RAGent competition at NeurIPS 2025, where it was awarded Best Dynamic Evaluation in the text-to-text track. Rather… 24 arXiv — NLP / Computation & Language research 21d ago EverydayGPT: Confidence-Gated Routing for Efficient and Safe Hybrid GPT-RAG Conversational QA arXiv:2606.11212v1 Announce Type: new Abstract: Standard Retrieval-Augmented Generation (RAG) pipelines route every query through retrieval and generation unconditionally, incurring unnecessary computation and propagating low-quality context to the generator. We introduce… 12 arXiv — NLP / Computation & Language research 21d ago Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite arXiv:2606.11257v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use,… 35 arXiv — NLP / Computation & Language research 21d ago When More Documents Hurt RAG: Mitigating Vector Search Dilution with Domain-Scoped, Model-Agnostic Retrieval arXiv:2606.11350v1 Announce Type: new Abstract: Retrieval-augmented generation degrades when scaled to large, heterogeneous document collections, where dense similarity loses discriminative power, and top-k retrieval increasingly returns semantically similar but contextually… 13 arXiv — NLP / Computation & Language research 21d ago When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis arXiv:2606.11375v1 Announce Type: new Abstract: Standard linear probing declares a property "encoded" when a classifier on hidden states achieves high accuracy. The protocol works well on a snapshot but breaks across pre-training: probe accuracy saturates within the first few… 17 Page 6 of 10 · 500 articles ← Newer Older →