News / #paper Tag Research papers 500 articles archived under #paper · RSS Sign in to follow arXiv — NLP / Computation & Language research 1d ago From Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue arXiv:2606.30973v1 Announce Type: new Abstract: Frictive Policy Optimization (FPO; Pustejovsky et al., 2025) treats friction in collaborative dialogue -- misalignment, misunderstanding, repair -- as an epistemic signal essential to common-ground construction, rather than noise… 18 arXiv — NLP / Computation & Language research 1d ago Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments arXiv:2606.30987v1 Announce Type: new Abstract: Decision-makers routinely rely on expert judgments accompanied by written explanations, yet explanation quality is difficult to measure at scale. Forecasting tournaments offer a natural testing ground: probabilistic judgments are… 6 arXiv — NLP / Computation & Language research 1d ago Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG arXiv:2606.30989v1 Announce Type: new Abstract: Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive… 5 arXiv — NLP / Computation & Language research 1d ago CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations arXiv:2606.31033v1 Announce Type: new Abstract: In this paper, we propose CORTEX, a token-level hallucination detection method for Retrieval-Augmented Generation (RAG). In long-form RAG outputs, hallucinations often arise in localized spans rather than throughout an entire… 20 arXiv — NLP / Computation & Language research 1d ago Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies arXiv:2606.31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can… 12 arXiv — NLP / Computation & Language research 1d ago A Semantic-Layer-Mediated Agent for Natural Language to SQL over Heterogeneous Enterprise Databases arXiv:2606.31041v1 Announce Type: new Abstract: Natural language-to-SQL (NL2SQL) over real-world enterprise databases remains significantly more challenging than on academic benchmarks. Enterprise schemas often contain hundreds of physical tables with cryptic column names,… 12 arXiv — NLP / Computation & Language research 1d ago Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems arXiv:2606.31055v1 Announce Type: new Abstract: Speech-to-speech (S2S) AI agents are advancing rapidly, yet evaluation lacks interpretable speech-native measures for conversational prosody and rhythm. Because $F_0$, speaking rate, articulation rate, and pausing shift with… 7 arXiv — NLP / Computation & Language research 1d ago Exploring the relationship between team institutional composition and novelty in academic papers based on fine-grained knowledge entities arXiv:2606.31058v1 Announce Type: new Abstract: The composition of author teams is an important factor influencing the novelty of academic papers. However, existing studies have paid limited attention to the role of institutional composition, and most novelty measures remain at… 22 arXiv — NLP / Computation & Language research 1d ago Building a Multimodal Dataset of Academic Paper for Keyword Extraction arXiv:2606.31069v1 Announce Type: new Abstract: Up to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential… 14 arXiv — NLP / Computation & Language research 1d ago Triospect: A Three-Dimensional Framework for Robust Statistical AI-Generated Text Detection Against Diverse Attacks arXiv:2606.31074v1 Announce Type: new Abstract: Existing AI-generated text detectors are vulnerable to attacks that manipulate textual characteristics. In this study, we propose a novel Triospect Detection Framework by using additional perspectives of content (core ideas) and… 37 arXiv — NLP / Computation & Language research 1d ago When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking arXiv:2606.31087v1 Announce Type: new Abstract: Few-shot selection typically assumes that reranking retrieved examples always improves performance. We challenge this view by identifying that the expensive reranking step can in fact degrade performance. Instead, we propose… 4 arXiv — NLP / Computation & Language research 1d ago What Counts as an Error? Dual-Reference Benchmarking for Atypical ASR arXiv:2606.31112v1 Announce Type: new Abstract: ASR systems have been often reported to underperform on atypical speech. An often conflated compounding factor is the existence of two valid transcription references: verbatim (actual produced speech, including… 31 arXiv — NLP / Computation & Language research 1d ago SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference arXiv:2606.31145v1 Announce Type: new Abstract: Large language models increasingly operate over long contexts, where the KV cache becomes a dominant memory bottleneck: its size grows linearly with sequence length and must be retained throughout decoding, making full GPU caching… 11 arXiv — NLP / Computation & Language research 1d ago TAG-DLM: Diffusion Language Models for Text-Attributed Graph Learning arXiv:2606.31166v1 Announce Type: new Abstract: Text-attributed graphs (TAGs), where each node carries a natural language description, require models to jointly reason over text and graph topology. Existing approaches often handle the two modalities separately: graph neural… 8 arXiv — NLP / Computation & Language research 1d ago Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection arXiv:2606.31186v1 Announce Type: new Abstract: Spontaneous speech is a vital non-invasive biomarker for Alzheimer's Disease (AD), yet many systems overlook non-linear structural disruptions and clinical heterogeneity in pathological language. We propose a Multi-View Gated Graph… 31 arXiv — NLP / Computation & Language research 1d ago Can LLMs Imagine Moral Alternatives Beyond Binary Dilemmas? arXiv:2606.31213v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed as moral advisors and agents, they need to address dilemmas between two competing values. However, existing research on LLMs with moral dilemmas overlooks a central aspect… 11 arXiv — NLP / Computation & Language research 1d ago Probing Stylistic Appropriation using Large Language Models: An Evaluation Framework for Copyright Infringement under EU Law arXiv:2606.31250v1 Announce Type: new Abstract: Large language models (LLM) trained on web-scale corpora generate output that may infringe copyright, yet existing technical safeguards focus narrowly on verbatim memorisation. EU copyright doctrine applies a broader standards:… 36 arXiv — NLP / Computation & Language research 1d ago When the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue arXiv:2606.31307v1 Announce Type: new Abstract: Large language models used in task-oriented dialogue often produce fluent but unsafe responses when backend database calls fail, return empty results, or surface mismatched information, inventing venues, confirmations, or booking… 16 arXiv — NLP / Computation & Language research 1d ago LOPA: Enhancing Spoken Language Assessment via Latent Ordinal Prototype Alignment arXiv:2606.31310v1 Announce Type: new Abstract: Fueled by increasing model scale and multimodal inputs, Multimodal Large Language Models (MLLMs) have emerged as a promising paradigm for Spoken Language Assessment (SLA). While effective, this paradigm often overlooks the… 9 arXiv — NLP / Computation & Language research 1d ago BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding arXiv:2606.31315v1 Announce Type: new Abstract: Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based… 20 arXiv — NLP / Computation & Language research 1d ago Linguistic Bias Mitigation for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck arXiv:2606.31411v1 Announce Type: new Abstract: Rapid advancements in generative speech technology have compromised the reliability of voice biometrics. While current spoofing detectors excel when assessed under in-domain conditions, generalisation to out-of-domain settings is… 4 arXiv — NLP / Computation & Language research 1d ago Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering arXiv:2606.31432v1 Announce Type: new Abstract: Medical multiple-choice question answering requires parameter-efficient adaptation across heterogeneous knowledge domains and reasoning operations. A medication question, a diagnostic decision, a public-health item, and a… 33 arXiv — NLP / Computation & Language research 1d ago Revising RVL-CDIP: Quantifying Errors and Test-Train Overlap arXiv:2606.31446v1 Announce Type: new Abstract: RVL-CDIP is a popular dataset for benchmarking document classifiers. However, the dataset contains ample amounts of label errors as well as non-trivial amounts of test-train overlap, both of which may impact model performance… 25 arXiv — NLP / Computation & Language research 1d ago Team MKC at CLPsych 2026: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics arXiv:2606.31464v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) have motivated their adoption across a wide range of domains, including Artificial Intelligence (AI) for mental health. Given the growing prevalence of mental health disorders… 19 arXiv — NLP / Computation & Language research 1d ago Building an ASR Solution for Training and Assessing Children's Reading arXiv:2606.31508v1 Announce Type: new Abstract: Automatic speech recognition for children's reading remains underdeveloped for most African languages, including Bambara, despite its potential value for reproducible literacy assessment. We present an open-source system for… 30 arXiv — NLP / Computation & Language research 1d ago FinPersona-Bench: A Benchmark for Longitudinal Psychometric Stability of Autonomous Financial Agents arXiv:2606.31522v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous financial agents initialized with explicit behavioral mandates such as "preserve capital" or "avoid speculative bets" that are meant to govern every decision… 19 arXiv — NLP / Computation & Language research 1d ago AutoTrainess: Teaching Language Models to Improve Language Models Autonomously arXiv:2606.31551v1 Announce Type: new Abstract: Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that… 37 arXiv — NLP / Computation & Language research 1d ago Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings arXiv:2606.31602v1 Announce Type: new Abstract: This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation.… 8 arXiv — NLP / Computation & Language research 1d ago CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning arXiv:2606.31608v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong results on many medical benchmarks, but their clinical reasoning remains difficult to evaluate reliably. A central risk is an evaluation illusion: fluent and well-structured explanations… 37 arXiv — NLP / Computation & Language research 1d ago Tone-Conditioned Curriculum Learning for Low-Resource Bantu Speech Recognition arXiv:2606.31642v1 Announce Type: new Abstract: Southern Bantu languages are spoken by over 80 million people, yet current foundation ASR models still produce zero-shot WER above 100%, which limits practical use in education and public services. We addressed this gap with a tone… 18 arXiv — NLP / Computation & Language research 1d ago Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues arXiv:2606.31644v1 Announce Type: new Abstract: As large language models take on morally consequential roles in healthcare, legal, and hiring contexts, we need to examine whether their ethical behaviors are genuine or superficial. We show that current fairness evaluations… 5 arXiv — NLP / Computation & Language research 1d ago Overview of the TalentCLEF 2026: Skill and Job Title Intelligence for Human Capital Management arXiv:2606.31692v1 Announce Type: new Abstract: This paper presents an overview of the second edition of the TalentCLEF challenge, organized as a Lab at the Conference and Labs of the Evaluation Forum (CLEF) 2026. TalentCLEF is an initiative aimed at advancing Natural Language… 19 arXiv — NLP / Computation & Language research 1d ago Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian arXiv:2606.31718v1 Announce Type: new Abstract: Relation extraction (RE) for low-resource languages is typically constrained by the lack of annotated corpora. We investigate the feasibility of cross-lingual RE for Romanian by combining automatic dataset translation with large… 38 arXiv — NLP / Computation & Language research 1d ago Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue arXiv:2606.31719v1 Announce Type: new Abstract: In collaborative dialogue, shared perception does not guarantee shared interpretation. Mutual understanding must be established through interaction. We investigate whether vision-language models (VLMs) can distinguish what could be… 22 arXiv — NLP / Computation & Language research 1d ago Adapting Foundation ASR Models to Dysarthric Speech: A Case Study arXiv:2606.31722v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems often perform poorly in dysarthric speech, limiting their usefulness to affected speakers in everyday communication. This paper presents a personalized ASR system for a dysarthric speaker,… 11 arXiv — NLP / Computation & Language research 1d ago STEB: Style Text Embedding Benchmark arXiv:2606.31741v1 Announce Type: new Abstract: While semantic embeddings are rigorously evaluated on the Massive Text Embedding Benchmark, the evaluation of style embeddings remains fragmented, with each work relying on their own set of tasks and datasets. To bridge this gap,… 27 arXiv — NLP / Computation & Language research 1d ago CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield arXiv:2606.31796v1 Announce Type: new Abstract: We study three complementary techniques for training compute-efficient language models. (1) Selective supervision and per-token efficiency. Selective Ground Truth Token Training (SGT) concentrates supervision on the ~15% of output… 14 arXiv — NLP / Computation & Language research 1d ago Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors arXiv:2606.31845v1 Announce Type: new Abstract: A transformer's feed-forward (FFN) sublayer materializes the distinctions attention gathers, yet gives no account of what it computes. In a parameter-neutral replacement, each hidden unit is an explicit fuzzy set operation on… 35 arXiv — NLP / Computation & Language research 1d ago Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action arXiv:2606.31916v1 Announce Type: new Abstract: Theory of Mind (ToM) benchmarks for Large Language Models (LLMs) typically rely on passive question-answering formats, but the deployment of LLMs in increasingly agentic and autonomous forms demands new evaluations. In this paper… 25 arXiv — NLP / Computation & Language research 1d ago LuxEmo: Expressive Text-to-Speech Corpus for Luxembourgish arXiv:2606.31947v1 Announce Type: new Abstract: State-of-the-art speech datasets predominantly focus on widely spoken languages, often overlooking low-resource languages such as Luxembourgish, which remain underrepresented in speech technology research. In this work, we… 25 arXiv — NLP / Computation & Language research 1d ago DigitalCoach: Communication and Grounding Gaps in Human and Agentic Computer Use Coaching arXiv:2606.31980v1 Announce Type: new Abstract: Agents are increasingly capable of automating software tasks, but can they teach humans how to use software themselves? We introduce DigitalCoach, a multimodal dataset of 72 human expert-novice computer use coaching sessions… 36 arXiv — NLP / Computation & Language research 1d ago Scalable Behaviour Cloning on Browser Using via Skill Distillation arXiv:2606.32014v1 Announce Type: new Abstract: Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but… 16 arXiv — NLP / Computation & Language research 1d ago Generative Skill Composition for LLM Agents arXiv:2606.32025v1 Announce Type: new Abstract: Recent LLM agents benefit from skills for solving complex tasks. Skills encapsulate modular packages of procedural knowledge and instructions for performing specialized tasks, such as setting up a sandboxed environment, running a… 34 arXiv — NLP / Computation & Language research 1d ago When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors arXiv:2606.32029v1 Announce Type: new Abstract: While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer… 29 arXiv — NLP / Computation & Language research 1d ago Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs arXiv:2606.32032v1 Announce Type: new Abstract: Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with… 34 arXiv — NLP / Computation & Language research 1d ago Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision arXiv:2606.32038v1 Announce Type: new Abstract: When does training language models (LMs) to generate explanations of their predictions yield faithful introspection, rather than superficial imitation? We study LMs trained to explain which features of their inputs influenced their… 30 arXiv — NLP / Computation & Language research 1d ago ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection arXiv:2606.30646v1 Announce Type: cross Abstract: Speech recruits the same executive, attentional, and working memory processes underlying instrumental activities of daily living, or IADLs, providing a non-invasive proxy for cognitive assessment. Yet most speech-based dementia… 18 arXiv — NLP / Computation & Language research 1d ago Emergent Culture in Minimal LLM Systems arXiv:2606.30668v1 Announce Type: cross Abstract: What happens when LLM agents operate with no context outside a turn, minimal prompting, and simple tools? Inspired by swarm engineering, we give collectives of three agents the ability to send messages and manipulate a shared… 22 arXiv — NLP / Computation & Language research 1d ago Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings arXiv:2606.30824v1 Announce Type: cross Abstract: We introduce Information Terra, a narrative-anchored semantic-first projection that places a document corpus on an Earth-like globe whose poles are two user-chosen endpoint documents and whose prime meridian is the great-circle… 28 arXiv — NLP / Computation & Language research 1d ago When Does Learning to Stop Help? A Cost-Aware Study of Early Exits in Reasoning Models arXiv:2606.30852v1 Announce Type: cross Abstract: Reasoning models spend different amounts of useful computation across instances, but it remains unclear when a learned stopping rule improves over simple confidence or convergence thresholds. We study this question with… 11 Page 7 of 10 · 500 articles ← Newer Older →