arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 1d ago
LuxEmo: Expressive Text-to-Speech Corpus for Luxembourgish
arXiv:2606.31947v1 Announce Type: new Abstract: State-of-the-art speech datasets predominantly focus on widely spoken languages, often overlooking low-resource languages such as Luxembourgish, which remain underrepresented in speech technology research. In this work, we…
25 -
arXiv — NLP / Computation & Language research 1d ago
DigitalCoach: Communication and Grounding Gaps in Human and Agentic Computer Use Coaching
arXiv:2606.31980v1 Announce Type: new Abstract: Agents are increasingly capable of automating software tasks, but can they teach humans how to use software themselves? We introduce DigitalCoach, a multimodal dataset of 72 human expert-novice computer use coaching sessions…
36 -
arXiv — NLP / Computation & Language research 1d ago
Scalable Behaviour Cloning on Browser Using via Skill Distillation
arXiv:2606.32014v1 Announce Type: new Abstract: Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but…
16 -
arXiv — NLP / Computation & Language research 1d ago
Generative Skill Composition for LLM Agents
arXiv:2606.32025v1 Announce Type: new Abstract: Recent LLM agents benefit from skills for solving complex tasks. Skills encapsulate modular packages of procedural knowledge and instructions for performing specialized tasks, such as setting up a sandboxed environment, running a…
34 -
arXiv — NLP / Computation & Language research 1d ago
When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors
arXiv:2606.32029v1 Announce Type: new Abstract: While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer…
29 -
arXiv — NLP / Computation & Language research 1d ago
Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs
arXiv:2606.32032v1 Announce Type: new Abstract: Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with…
34 -
arXiv — NLP / Computation & Language research 1d ago
Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision
arXiv:2606.32038v1 Announce Type: new Abstract: When does training language models (LMs) to generate explanations of their predictions yield faithful introspection, rather than superficial imitation? We study LMs trained to explain which features of their inputs influenced their…
30 -
arXiv — NLP / Computation & Language research 1d ago
ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection
arXiv:2606.30646v1 Announce Type: cross Abstract: Speech recruits the same executive, attentional, and working memory processes underlying instrumental activities of daily living, or IADLs, providing a non-invasive proxy for cognitive assessment. Yet most speech-based dementia…
18 -
arXiv — NLP / Computation & Language research 1d ago
Emergent Culture in Minimal LLM Systems
arXiv:2606.30668v1 Announce Type: cross Abstract: What happens when LLM agents operate with no context outside a turn, minimal prompting, and simple tools? Inspired by swarm engineering, we give collectives of three agents the ability to send messages and manipulate a shared…
22 -
arXiv — NLP / Computation & Language research 1d ago
ViTL: Temporal Logic-Guided Zero-Shot Natural Language Navigation via Vision-Language Models
arXiv:2606.30696v1 Announce Type: cross Abstract: Enabling robots to follow natural language commands to complete zero-shot long-horizon tasks remains challenging. It requires extracting implicit temporal and logical constraints from natural language commands and executing…
4 -
arXiv — NLP / Computation & Language research 1d ago
From Search to Synthesis: Training LLMs as Zero-Shot Workflow Generators
arXiv:2606.30704v1 Announce Type: cross Abstract: Large language models (LLMs) excel across a wide range of tasks, yet their instance-specific solutions often lack the structural consistency needed for reliable deployment. Workflows that encode recurring algorithmic patterns at…
13 -
arXiv — NLP / Computation & Language research 1d ago
Revocable Learned State via Process Sidecars
arXiv:2606.30788v1 Announce Type: cross Abstract: Language models are often adapted in stages: a public skill phase, a private memory phase, and a later safety phase that learns to refuse outputs tied to the remembered entities. Revoking the memory after the safety phase is not…
17 -
arXiv — NLP / Computation & Language research 1d ago
Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings
arXiv:2606.30824v1 Announce Type: cross Abstract: We introduce Information Terra, a narrative-anchored semantic-first projection that places a document corpus on an Earth-like globe whose poles are two user-chosen endpoint documents and whose prime meridian is the great-circle…
28 -
arXiv — NLP / Computation & Language research 1d ago
When Does Learning to Stop Help? A Cost-Aware Study of Early Exits in Reasoning Models
arXiv:2606.30852v1 Announce Type: cross Abstract: Reasoning models spend different amounts of useful computation across instances, but it remains unclear when a learned stopping rule improves over simple confidence or convergence thresholds. We study this question with…
11 -
arXiv — NLP / Computation & Language research 1d ago
Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization
arXiv:2606.31002v1 Announce Type: cross Abstract: Theorem-proving benchmarks evaluate proof search against fixed formal statements, but natural-language-to-Lean formalization must generate the formal statement itself. In this setting, compilation is only a validity check: a Lean…
35 -
arXiv — NLP / Computation & Language research 1d ago
ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs
arXiv:2606.31054v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are critically hampered by hallucination, generating content inconsistent with the provided image. In this paper, we identify an internal signature of hallucination: progressive…
37 -
arXiv — NLP / Computation & Language research 1d ago
Usage frequency and application variety of research methods in library and information science: Continuous investigation from 1991 to 2021
arXiv:2606.31081v1 Announce Type: cross Abstract: The present study analyzed over 26,000 research articles published between 1991 and 2021 in twenty-one major LIS (Library and Information Science) journals, using the machine learning (ML) approach to categorize the research…
5 -
arXiv — NLP / Computation & Language research 1d ago
UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling
arXiv:2606.31128v1 Announce Type: cross Abstract: Speech editing aims to modify specific portions of an utterance while preserving the remaining speech. Existing approaches primarily focus on word-level content modification and typically treat content, speaker, and emotion…
30 -
arXiv — NLP / Computation & Language research 1d ago
PruneGround: Plug-and-play Spatial Pruning for 3D Visual Grounding
arXiv:2606.31148v1 Announce Type: cross Abstract: 3D Visual Grounding (3DVG) aims to localize target objects in 3D scenes given natural language descriptions. Existing approaches typically perform reasoning over the entire scene, leading to ambiguous predictions and high…
17 -
arXiv — NLP / Computation & Language research 1d ago
ComplianceGate: Classifier-Gated Multi-Tier LLM Routing for Inference in Regulated Industries
arXiv:2606.31163v1 Announce Type: cross Abstract: Large language models deployed in regulated industries operate under two constraints: compliance enforcement and cost efficiency. Personally identifiable information (PII) in user queries can reach model endpoints before the…
14 -
arXiv — NLP / Computation & Language research 1d ago
HealthAgentBench: A Unified Benchmark Suite of Realistic Agentic Healthcare Environments for Challenging Frontier AI Agents
arXiv:2606.31179v1 Announce Type: cross Abstract: As AI agents become increasingly capable of complex, long-horizon reasoning, rigorous and holistic evaluation is essential for measuring progress toward real-world healthcare applications. We introduce HealthAgentBench, a suite…
29 -
arXiv — NLP / Computation & Language research 1d ago
Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents
arXiv:2606.31270v1 Announce Type: cross Abstract: Computer-use agents, which leverage multimodal large language models (MLLMs) to operate computers and complete tasks, have attracted significant attention for their utility and versatility. A major challenge in developing these…
20 -
arXiv — NLP / Computation & Language research 1d ago
The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills
arXiv:2606.31272v1 Announce Type: cross Abstract: AI agents increasingly acquire and execute skills at runtime: bundles of prompt instructions, executable code, and tool declarations fetched from marketplaces and other agents. Governing them needs a stable notion of skill…
16 -
arXiv — NLP / Computation & Language research 1d ago
Calibrating the Evaluator: Does Probability Calibration Mitigate Preference Coupling in LLM Agent Feedback Loops?
arXiv:2606.31371v1 Announce Type: cross Abstract: When large language model (LLM) agents adapt their behavior through evaluator feedback, systematic evaluator biases propagate into the agent's learned strategy distribution - a phenomenon termed evaluator preference coupling.…
38 -
arXiv — NLP / Computation & Language research 1d ago
Visual Semantic Entropy: Do Vision Language Models Recognize Visual Ambiguity?
arXiv:2606.31407v1 Announce Type: cross Abstract: Vision-language models can produce confident answers on visually ambiguous inputs, resulting in biased predictions. Common entropy-based methods, such as Semantic Entropy (SE), rely on output diversity. Yet our analysis shows…
15 -
arXiv — NLP / Computation & Language research 1d ago
CDR-Bench: Evaluating Faithful Execution of Compositional, Order-Sensitive Data Refinement Recipes
arXiv:2606.31435v1 Announce Type: cross Abstract: Data refinement involves executing multi-step recipes over evolving text states, where both composition and execution order of processing operators determine the outcome. While existing benchmarks either isolate text editing or…
38 -
arXiv — NLP / Computation & Language research 1d ago
Fork-Think with Confidence
arXiv:2606.31484v1 Announce Type: cross Abstract: Parallel thinking has enjoyed great success for boosting LLM performance on reasoning tasks without the need for any re-training. However, existing methods follow a think-first-then-decide paradigm, i.e., they first sample…
38 -
arXiv — NLP / Computation & Language research 1d ago
Falsification, Not Exposure: An Internally Preregistered Placebo-Controlled Decomposition of Self-Repair Feedback in Frozen Small Code Models
arXiv:2606.31511v1 Announce Type: cross Abstract: In deployment settings where retraining is infeasible, small frozen code models are routinely asked to repair a failed program after seeing their own failing output, usually treated as a retry mechanism. From a Popperian view, a…
9 -
arXiv — NLP / Computation & Language research 1d ago
RaBitQCache: Rotated Binary Quantization for KVCache in Long Context LLM Inference
arXiv:2606.31519v1 Announce Type: cross Abstract: Long-context Large Language Model inference is severely bottlenecked by the massive Key-Value (KV) cache, yet existing sparse attention methods often suffer from static fixed-budget (Top-k) retrieval or rely on proxy scores that…
5 -
arXiv — NLP / Computation & Language research 1d ago
Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2
arXiv:2606.31543v1 Announce Type: cross Abstract: Large language models can produce fluent, internally coherent reasoning traces for abstract reasoning tasks while still being confidently wrong - making selection among candidates, not just generation, the central challenge. I…
4 -
arXiv — NLP / Computation & Language research 1d ago
ShopX: A Foundation Model for Intent-to-Item Fulfillment in Agentic Shopping
arXiv:2606.31693v1 Announce Type: cross Abstract: The wave of AI-native applications is moving shopping beyond page- and feed-based browsing toward intent-driven experiences orchestrated by LLM agents. A common design wraps an LLM around existing search and recommendation…
38 -
arXiv — NLP / Computation & Language research 1d ago
RCT: A Robot-Collected Touch-Vision-Language Dataset for Tactile Generalization
arXiv:2606.31694v1 Announce Type: cross Abstract: For robots manipulating open-world objects, tactile representations must generalize to unseen materials. We introduce RCT (Robotic Contact Tactile), a robot-collected touch-vision-language dataset with 29,279 tactile frames from…
18 -
arXiv — NLP / Computation & Language research 1d ago
Bridging the Gap Between Latent and Explicit Reasoning with Looped Transformers
arXiv:2606.31779v1 Announce Type: cross Abstract: Language models typically reason via explicit chain-of-thought (CoT), generating intermediate steps token-by-token. Latent CoT offers an alternative: it performs multi-step reasoning in the model's hidden states, replacing…
34 -
arXiv — NLP / Computation & Language research 1d ago
SpikeLogBERT: Energy-Efficient Log Parsing Using Spiking Transformer Networks
arXiv:2606.31781v1 Announce Type: cross Abstract: Log parsing is a fundamental step in automated log analysis, transforming raw system logs into structured event templates for downstream tasks such as anomaly detection and system monitoring. Existing log parsing methods range…
17 -
arXiv — NLP / Computation & Language research 1d ago
Review Residuals: Update-Conditioned Residual Gating for Transformers
arXiv:2606.31859v1 Announce Type: cross Abstract: Residual connections add every sublayer's proposed update with a fixed coefficient of one; the network never evaluates whether an update is reliable before committing it. Drawing on the human-factors principle of independent…
23 -
arXiv — NLP / Computation & Language research 1d ago
Signed-Permutation Coordinate Transport for RMSNorm Transformers
arXiv:2606.31963v1 Announce Type: cross Abstract: Modern LLM workflows move coordinate-indexed objects across checkpoints: steering vectors, sparse autoencoders, top-$k$ neuron sets, attribution lists, and merge alignments. This is only well posed after fixing the model's…
37 -
arXiv — NLP / Computation & Language research 1d ago
MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments
arXiv:2606.31966v1 Announce Type: cross Abstract: Recent multimodal large language models (MLLMs) have strong potential as embodied agents, but their ability to collaborate in visually grounded environments remains underexplored. To address this gap, we introduce MECoBench, a…
4 -
arXiv — NLP / Computation & Language research 1d ago
SemRF: A Semantic Reference Frame for Residual-Stream Dynamics in Language Models
arXiv:2606.32022v1 Announce Type: cross Abstract: Residual-stream analysis asks how language-model computation evolves across depth, but intermediate decoding requires comparable readout coordinates across layers. If embedding anchors and unembedding readout disagree on the…
23 -
arXiv — NLP / Computation & Language research 1d ago
QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents
arXiv:2606.32034v1 Announce Type: cross Abstract: LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the…
36 -
arXiv — NLP / Computation & Language research 1d ago
Learning by Surprise: Adaptive Mitigation of Model Collapse in Large Language Models
arXiv:2410.12341v4 Announce Type: replace Abstract: As AI-generated content increasingly populates the web, generative AI models are at growing risk of being trained on their own outputs, a process known as AI autophagy. This feedback loop has been shown to induce model…
16 -
arXiv — NLP / Computation & Language research 1d ago
Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection
arXiv:2502.15845v2 Announce Type: replace Abstract: Large Language Models (LLMs) often hallucinate, limiting their reliability in sensitive applications. In black-box settings, several self-consistency-based techniques have been proposed for hallucination detection. We…
29 -
arXiv — NLP / Computation & Language research 1d ago
SAGE: A Search-AuGmented Evaluation of Large Language Models on Free-Form QA
arXiv:2504.07385v3 Announce Type: replace Abstract: As Large Language Models (LLMs) become increasingly used for question-answering (QA), relying on static, pre-annotated references for evaluation poses significant challenges in cost, scalability, and completeness. Meanwhile,…
26 -
arXiv — NLP / Computation & Language research 1d ago
From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary
arXiv:2506.17294v3 Announce Type: replace Abstract: The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding research area, offering advantages such as scalable availability and personalized narration. However, existing…
17 -
arXiv — NLP / Computation & Language research 1d ago
The Bidirectional Process Reward Model
arXiv:2508.01682v3 Announce Type: replace Abstract: Process Reward Models (PRMs), which assign fine-grained scores to intermediate reasoning steps within a solution trajectory, have emerged as a promising approach to enhance the reasoning quality of Large Language Models (LLMs).…
5 -
arXiv — NLP / Computation & Language research 1d ago
Rethinking On-policy Optimization for Query Augmentation
arXiv:2510.17139v3 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or…
28 -
arXiv — NLP / Computation & Language research 1d ago
Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation
arXiv:2512.21002v3 Announce Type: replace Abstract: Distilling the capabilities from a large reasoning model (LRM) to a smaller student model often involves training on substantial amounts of reasoning data. However, knowledge distillation (KD) over lengthy sequences with prompt…
28 -
arXiv — NLP / Computation & Language research 1d ago
InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training
arXiv:2601.04126v3 Announce Type: replace Abstract: GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity of suitable environments. We present…
29 -
arXiv — NLP / Computation & Language research 1d ago
What If We Allocate Test-Time Compute Adaptively?
arXiv:2602.01070v5 Announce Type: replace Abstract: Test-time compute scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking. In contrast, we propose a verifier-guided adaptive framework treating reasoning…
30 -
arXiv — NLP / Computation & Language research 1d ago
FairJudge: An Adaptive, Debiased, and Consistent LLM-as-a-Judge
arXiv:2602.06625v2 Announce Type: replace Abstract: Existing LLM-as-a-Judge systems suffer from three fundamental limitations: limited adaptivity to task- and domain-specific evaluation criteria, systematic biases driven by non-semantic cues such as position, length, format, and…
7 -
arXiv — NLP / Computation & Language research 1d ago
Beyond Scalar Rewards: Dense Feedback for LLM Policy Synthesis in Sequential Social Dilemmas
arXiv:2603.19453v3 Announce Type: replace Abstract: We propose an LLM harness that generates code-based policy functions for multi-agent environments, evaluates them with self-play, and refines them using feedback from previous iterations. Following the recent line of work in…
28