Tag

Research papers

500 articles archived under #paper · RSS

arXiv — NLP / Computation & Language research 1d ago

From Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue

arXiv:2606.30973v1 Announce Type: new Abstract: Frictive Policy Optimization (FPO; Pustejovsky et al., 2025) treats friction in collaborative dialogue -- misalignment, misunderstanding, repair -- as an epistemic signal essential to common-ground construction, rather than noise…

18
arXiv — NLP / Computation & Language research 1d ago

Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments

arXiv:2606.30987v1 Announce Type: new Abstract: Decision-makers routinely rely on expert judgments accompanied by written explanations, yet explanation quality is difficult to measure at scale. Forecasting tournaments offer a natural testing ground: probabilistic judgments are…

6
arXiv — NLP / Computation & Language research 1d ago

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

arXiv:2606.30989v1 Announce Type: new Abstract: Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive…

5
arXiv — NLP / Computation & Language research 1d ago

CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations

arXiv:2606.31033v1 Announce Type: new Abstract: In this paper, we propose CORTEX, a token-level hallucination detection method for Retrieval-Augmented Generation (RAG). In long-form RAG outputs, hallucinations often arise in localized spans rather than throughout an entire…

20
arXiv — NLP / Computation & Language research 1d ago

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

arXiv:2606.31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can…

12
arXiv — NLP / Computation & Language research 1d ago

A Semantic-Layer-Mediated Agent for Natural Language to SQL over Heterogeneous Enterprise Databases

arXiv:2606.31041v1 Announce Type: new Abstract: Natural language-to-SQL (NL2SQL) over real-world enterprise databases remains significantly more challenging than on academic benchmarks. Enterprise schemas often contain hundreds of physical tables with cryptic column names,…

12
arXiv — NLP / Computation & Language research 1d ago

Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems

arXiv:2606.31055v1 Announce Type: new Abstract: Speech-to-speech (S2S) AI agents are advancing rapidly, yet evaluation lacks interpretable speech-native measures for conversational prosody and rhythm. Because $F_0$, speaking rate, articulation rate, and pausing shift with…

7
arXiv — NLP / Computation & Language research 1d ago

Exploring the relationship between team institutional composition and novelty in academic papers based on fine-grained knowledge entities

arXiv:2606.31058v1 Announce Type: new Abstract: The composition of author teams is an important factor influencing the novelty of academic papers. However, existing studies have paid limited attention to the role of institutional composition, and most novelty measures remain at…

22
arXiv — NLP / Computation & Language research 1d ago

Building a Multimodal Dataset of Academic Paper for Keyword Extraction

arXiv:2606.31069v1 Announce Type: new Abstract: Up to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential…

14
arXiv — NLP / Computation & Language research 1d ago

Triospect: A Three-Dimensional Framework for Robust Statistical AI-Generated Text Detection Against Diverse Attacks

arXiv:2606.31074v1 Announce Type: new Abstract: Existing AI-generated text detectors are vulnerable to attacks that manipulate textual characteristics. In this study, we propose a novel Triospect Detection Framework by using additional perspectives of content (core ideas) and…

37
arXiv — NLP / Computation & Language research 1d ago

When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking

arXiv:2606.31087v1 Announce Type: new Abstract: Few-shot selection typically assumes that reranking retrieved examples always improves performance. We challenge this view by identifying that the expensive reranking step can in fact degrade performance. Instead, we propose…

4
arXiv — NLP / Computation & Language research 1d ago

What Counts as an Error? Dual-Reference Benchmarking for Atypical ASR

arXiv:2606.31112v1 Announce Type: new Abstract: ASR systems have been often reported to underperform on atypical speech. An often conflated compounding factor is the existence of two valid transcription references: verbatim (actual produced speech, including…

31
arXiv — NLP / Computation & Language research 1d ago

SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference

arXiv:2606.31145v1 Announce Type: new Abstract: Large language models increasingly operate over long contexts, where the KV cache becomes a dominant memory bottleneck: its size grows linearly with sequence length and must be retained throughout decoding, making full GPU caching…

11
arXiv — NLP / Computation & Language research 1d ago

TAG-DLM: Diffusion Language Models for Text-Attributed Graph Learning

arXiv:2606.31166v1 Announce Type: new Abstract: Text-attributed graphs (TAGs), where each node carries a natural language description, require models to jointly reason over text and graph topology. Existing approaches often handle the two modalities separately: graph neural…

8
arXiv — NLP / Computation & Language research 1d ago

Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection

arXiv:2606.31186v1 Announce Type: new Abstract: Spontaneous speech is a vital non-invasive biomarker for Alzheimer's Disease (AD), yet many systems overlook non-linear structural disruptions and clinical heterogeneity in pathological language. We propose a Multi-View Gated Graph…

31
arXiv — NLP / Computation & Language research 1d ago

Can LLMs Imagine Moral Alternatives Beyond Binary Dilemmas?

arXiv:2606.31213v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed as moral advisors and agents, they need to address dilemmas between two competing values. However, existing research on LLMs with moral dilemmas overlooks a central aspect…

11
arXiv — NLP / Computation & Language research 1d ago

Probing Stylistic Appropriation using Large Language Models: An Evaluation Framework for Copyright Infringement under EU Law

arXiv:2606.31250v1 Announce Type: new Abstract: Large language models (LLM) trained on web-scale corpora generate output that may infringe copyright, yet existing technical safeguards focus narrowly on verbatim memorisation. EU copyright doctrine applies a broader standards:…

36
arXiv — NLP / Computation & Language research 1d ago

When the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue

arXiv:2606.31307v1 Announce Type: new Abstract: Large language models used in task-oriented dialogue often produce fluent but unsafe responses when backend database calls fail, return empty results, or surface mismatched information, inventing venues, confirmations, or booking…

16
arXiv — NLP / Computation & Language research 1d ago

LOPA: Enhancing Spoken Language Assessment via Latent Ordinal Prototype Alignment

arXiv:2606.31310v1 Announce Type: new Abstract: Fueled by increasing model scale and multimodal inputs, Multimodal Large Language Models (MLLMs) have emerged as a promising paradigm for Spoken Language Assessment (SLA). While effective, this paradigm often overlooks the…

9
arXiv — NLP / Computation & Language research 1d ago

BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

arXiv:2606.31315v1 Announce Type: new Abstract: Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based…

20
arXiv — NLP / Computation & Language research 1d ago

Linguistic Bias Mitigation for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck

arXiv:2606.31411v1 Announce Type: new Abstract: Rapid advancements in generative speech technology have compromised the reliability of voice biometrics. While current spoofing detectors excel when assessed under in-domain conditions, generalisation to out-of-domain settings is…

4
arXiv — NLP / Computation & Language research 1d ago

Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering

arXiv:2606.31432v1 Announce Type: new Abstract: Medical multiple-choice question answering requires parameter-efficient adaptation across heterogeneous knowledge domains and reasoning operations. A medication question, a diagnostic decision, a public-health item, and a…

33
arXiv — NLP / Computation & Language research 1d ago

Revising RVL-CDIP: Quantifying Errors and Test-Train Overlap

arXiv:2606.31446v1 Announce Type: new Abstract: RVL-CDIP is a popular dataset for benchmarking document classifiers. However, the dataset contains ample amounts of label errors as well as non-trivial amounts of test-train overlap, both of which may impact model performance…

25
arXiv — NLP / Computation & Language research 1d ago

Team MKC at CLPsych 2026: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics

arXiv:2606.31464v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) have motivated their adoption across a wide range of domains, including Artificial Intelligence (AI) for mental health. Given the growing prevalence of mental health disorders…

19
arXiv — NLP / Computation & Language research 1d ago

Building an ASR Solution for Training and Assessing Children's Reading

arXiv:2606.31508v1 Announce Type: new Abstract: Automatic speech recognition for children's reading remains underdeveloped for most African languages, including Bambara, despite its potential value for reproducible literacy assessment. We present an open-source system for…

30
arXiv — NLP / Computation & Language research 1d ago

FinPersona-Bench: A Benchmark for Longitudinal Psychometric Stability of Autonomous Financial Agents

arXiv:2606.31522v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous financial agents initialized with explicit behavioral mandates such as "preserve capital" or "avoid speculative bets" that are meant to govern every decision…

19
arXiv — NLP / Computation & Language research 1d ago

AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

arXiv:2606.31551v1 Announce Type: new Abstract: Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that…

37
arXiv — NLP / Computation & Language research 1d ago

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

arXiv:2606.31602v1 Announce Type: new Abstract: This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation.…

8
arXiv — NLP / Computation & Language research 1d ago

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

arXiv:2606.31608v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong results on many medical benchmarks, but their clinical reasoning remains difficult to evaluate reliably. A central risk is an evaluation illusion: fluent and well-structured explanations…

37
arXiv — NLP / Computation & Language research 1d ago

Tone-Conditioned Curriculum Learning for Low-Resource Bantu Speech Recognition

arXiv:2606.31642v1 Announce Type: new Abstract: Southern Bantu languages are spoken by over 80 million people, yet current foundation ASR models still produce zero-shot WER above 100%, which limits practical use in education and public services. We addressed this gap with a tone…

18
arXiv — NLP / Computation & Language research 1d ago

Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues

arXiv:2606.31644v1 Announce Type: new Abstract: As large language models take on morally consequential roles in healthcare, legal, and hiring contexts, we need to examine whether their ethical behaviors are genuine or superficial. We show that current fairness evaluations…

5
arXiv — NLP / Computation & Language research 1d ago

Overview of the TalentCLEF 2026: Skill and Job Title Intelligence for Human Capital Management

arXiv:2606.31692v1 Announce Type: new Abstract: This paper presents an overview of the second edition of the TalentCLEF challenge, organized as a Lab at the Conference and Labs of the Evaluation Forum (CLEF) 2026. TalentCLEF is an initiative aimed at advancing Natural Language…

19
arXiv — NLP / Computation & Language research 1d ago

Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian

arXiv:2606.31718v1 Announce Type: new Abstract: Relation extraction (RE) for low-resource languages is typically constrained by the lack of annotated corpora. We investigate the feasibility of cross-lingual RE for Romanian by combining automatic dataset translation with large…

38
arXiv — NLP / Computation & Language research 1d ago

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

arXiv:2606.31719v1 Announce Type: new Abstract: In collaborative dialogue, shared perception does not guarantee shared interpretation. Mutual understanding must be established through interaction. We investigate whether vision-language models (VLMs) can distinguish what could be…

22
arXiv — NLP / Computation & Language research 1d ago

Adapting Foundation ASR Models to Dysarthric Speech: A Case Study

arXiv:2606.31722v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems often perform poorly in dysarthric speech, limiting their usefulness to affected speakers in everyday communication. This paper presents a personalized ASR system for a dysarthric speaker,…

11
arXiv — NLP / Computation & Language research 1d ago

STEB: Style Text Embedding Benchmark

arXiv:2606.31741v1 Announce Type: new Abstract: While semantic embeddings are rigorously evaluated on the Massive Text Embedding Benchmark, the evaluation of style embeddings remains fragmented, with each work relying on their own set of tasks and datasets. To bridge this gap,…

27
arXiv — NLP / Computation & Language research 1d ago

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield

arXiv:2606.31796v1 Announce Type: new Abstract: We study three complementary techniques for training compute-efficient language models. (1) Selective supervision and per-token efficiency. Selective Ground Truth Token Training (SGT) concentrates supervision on the ~15% of output…

14
arXiv — NLP / Computation & Language research 1d ago

Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors

arXiv:2606.31845v1 Announce Type: new Abstract: A transformer's feed-forward (FFN) sublayer materializes the distinctions attention gathers, yet gives no account of what it computes. In a parameter-neutral replacement, each hidden unit is an explicit fuzzy set operation on…

35
arXiv — NLP / Computation & Language research 1d ago

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action

arXiv:2606.31916v1 Announce Type: new Abstract: Theory of Mind (ToM) benchmarks for Large Language Models (LLMs) typically rely on passive question-answering formats, but the deployment of LLMs in increasingly agentic and autonomous forms demands new evaluations. In this paper…

25
arXiv — NLP / Computation & Language research 1d ago

LuxEmo: Expressive Text-to-Speech Corpus for Luxembourgish

arXiv:2606.31947v1 Announce Type: new Abstract: State-of-the-art speech datasets predominantly focus on widely spoken languages, often overlooking low-resource languages such as Luxembourgish, which remain underrepresented in speech technology research. In this work, we…

25
arXiv — NLP / Computation & Language research 1d ago

DigitalCoach: Communication and Grounding Gaps in Human and Agentic Computer Use Coaching

arXiv:2606.31980v1 Announce Type: new Abstract: Agents are increasingly capable of automating software tasks, but can they teach humans how to use software themselves? We introduce DigitalCoach, a multimodal dataset of 72 human expert-novice computer use coaching sessions…

36
arXiv — NLP / Computation & Language research 1d ago

Scalable Behaviour Cloning on Browser Using via Skill Distillation

arXiv:2606.32014v1 Announce Type: new Abstract: Internet users collectively perform an enormous range of skilled work through web browsers, from software development and document editing to search, forms, and enterprise workflows, making human browsing a highly scalable but…

16
arXiv — NLP / Computation & Language research 1d ago

Generative Skill Composition for LLM Agents

arXiv:2606.32025v1 Announce Type: new Abstract: Recent LLM agents benefit from skills for solving complex tasks. Skills encapsulate modular packages of procedural knowledge and instructions for performing specialized tasks, such as setting up a sandboxed environment, running a…

34
arXiv — NLP / Computation & Language research 1d ago

When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

arXiv:2606.32029v1 Announce Type: new Abstract: While large language models (LLMs) perform well on table tasks, they still make data referencing errors (DREs), i.e., incorrectly citing or omitting table values, despite understanding the table structure. Beyond final-answer…

29
arXiv — NLP / Computation & Language research 1d ago

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

arXiv:2606.32032v1 Announce Type: new Abstract: Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with…

34
arXiv — NLP / Computation & Language research 1d ago

Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision

arXiv:2606.32038v1 Announce Type: new Abstract: When does training language models (LMs) to generate explanations of their predictions yield faithful introspection, rather than superficial imitation? We study LMs trained to explain which features of their inputs influenced their…

30
arXiv — NLP / Computation & Language research 1d ago

ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection

arXiv:2606.30646v1 Announce Type: cross Abstract: Speech recruits the same executive, attentional, and working memory processes underlying instrumental activities of daily living, or IADLs, providing a non-invasive proxy for cognitive assessment. Yet most speech-based dementia…

18
arXiv — NLP / Computation & Language research 1d ago

Emergent Culture in Minimal LLM Systems

arXiv:2606.30668v1 Announce Type: cross Abstract: What happens when LLM agents operate with no context outside a turn, minimal prompting, and simple tools? Inspired by swarm engineering, we give collectives of three agents the ability to send messages and manipulate a shared…

22
arXiv — NLP / Computation & Language research 1d ago

Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings

arXiv:2606.30824v1 Announce Type: cross Abstract: We introduce Information Terra, a narrative-anchored semantic-first projection that places a document corpus on an Earth-like globe whose poles are two user-chosen endpoint documents and whose prime meridian is the great-circle…

28
arXiv — NLP / Computation & Language research 1d ago

When Does Learning to Stop Help? A Cost-Aware Study of Early Exits in Reasoning Models

arXiv:2606.30852v1 Announce Type: cross Abstract: Reasoning models spend different amounts of useful computation across instances, but it remains unclear when a learned stopping rule improves over simple confidence or convergence thresholds. We study this question with…

11

From Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue

Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments

Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG

CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations

Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies

A Semantic-Layer-Mediated Agent for Natural Language to SQL over Heterogeneous Enterprise Databases

Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems

Exploring the relationship between team institutional composition and novelty in academic papers based on fine-grained knowledge entities

Building a Multimodal Dataset of Academic Paper for Keyword Extraction

Triospect: A Three-Dimensional Framework for Robust Statistical AI-Generated Text Detection Against Diverse Attacks

When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking

What Counts as an Error? Dual-Reference Benchmarking for Atypical ASR

SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference

TAG-DLM: Diffusion Language Models for Text-Attributed Graph Learning

Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection

Can LLMs Imagine Moral Alternatives Beyond Binary Dilemmas?

Probing Stylistic Appropriation using Large Language Models: An Evaluation Framework for Copyright Infringement under EU Law

When the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue

LOPA: Enhancing Spoken Language Assessment via Latent Ordinal Prototype Alignment

BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

Linguistic Bias Mitigation for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck

Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering

Revising RVL-CDIP: Quantifying Errors and Test-Train Overlap

Team MKC at CLPsych 2026: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics

Building an ASR Solution for Training and Assessing Children's Reading

FinPersona-Bench: A Benchmark for Longitudinal Psychometric Stability of Autonomous Financial Agents

AutoTrainess: Teaching Language Models to Improve Language Models Autonomously

Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

Tone-Conditioned Curriculum Learning for Low-Resource Bantu Speech Recognition

Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues

Overview of the TalentCLEF 2026: Skill and Job Title Intelligence for Human Capital Management

Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian

Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue

Adapting Foundation ASR Models to Dysarthric Speech: A Case Study

STEB: Style Text Embedding Benchmark

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield

Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors

Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action

LuxEmo: Expressive Text-to-Speech Corpus for Luxembourgish

DigitalCoach: Communication and Grounding Gaps in Human and Agentic Computer Use Coaching

Scalable Behaviour Cloning on Browser Using via Skill Distillation

Generative Skill Composition for LLM Agents

When LLMs Read Tables Carelessly: Measuring and Reducing Data Referencing Errors

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision

ASR-Agnostic Multimodal Spectrotemporal Modeling for Early Dementia Detection

Emergent Culture in Minimal LLM Systems

Information Terra: A Narrative-Anchored Semantic-First Projection of Document Embeddings

When Does Learning to Stop Help? A Cost-Aware Study of Early Exits in Reasoning Models