arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 1d ago
A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization
arXiv:2606.30775v1 Announce Type: new Abstract: Enterprise AI agents route user queries to specialized skills by matching queries against natural language skill descriptions. When two skills share overlapping descriptions, the routing LLM misroutes queries, a failure we term…
25 -
arXiv — NLP / Computation & Language research 1d ago
Indi-RomCoM: Code-Mixed Benchmark for Evaluating LLMs on Romanized Indic-English Instructions
arXiv:2606.30790v1 Announce Type: new Abstract: Romanized Code Mixing (RCM), where bilingual speakers fluidly blend local languages with English in Roman script, has emerged as the dominant form of communication across multilingual communities. While Large Language Models (LLMs)…
26 -
arXiv — NLP / Computation & Language research 1d ago
Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale
arXiv:2606.30801v1 Announce Type: new Abstract: Personalization algorithms determine what content users encounter on online platforms. Auditing these systems is difficult because independent auditors have only black-box access to the algorithms, while personalization depends on…
37 -
arXiv — NLP / Computation & Language research 1d ago
When Calibration Rankings Reverse: Accuracy-Controlled Evaluation for Fair Comparison of LLMs
arXiv:2606.30814v1 Announce Type: new Abstract: Calibration evaluates whether a model confidence aligns with its empirical accuracy. Existing studies often compare the calibration of different large language models using global calibration metrics such as Expected Calibration…
21 -
arXiv — NLP / Computation & Language research 1d ago
When transformers learn "impossible" languages, what do they learn?
arXiv:2606.30815v1 Announce Type: new Abstract: Recent work suggests that transformer language models show a bias towards human languages over unnatural ("impossible") languages argued to be unacquirable by humans. However, this literature has largely based these claims on…
34 -
arXiv — NLP / Computation & Language research 1d ago
Test-Time Verification for Text-to-SQL via Outcome Reward Models
arXiv:2606.30851v1 Announce Type: new Abstract: Improving the reliability of large language models (LLMs) at inference time is a central challenge in structured reasoning tasks such as Text-to-SQL. Common test-time inference strategies, including Best-of-N sampling and Majority…
15 -
arXiv — NLP / Computation & Language research 1d ago
Multilingual Polarization Detection Using Transformer-Based Models with Class Weighting and Threshold Tuning
arXiv:2606.30857v1 Announce Type: new Abstract: This paper describes our submission to SemEval-2026 Task 9 on detecting multilingual, multicultural, and multievent online polarization. We address all three subtasks: binary polarization detection, polarization type…
4 -
arXiv — NLP / Computation & Language research 1d ago
Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support
arXiv:2606.30887v1 Announce Type: new Abstract: Large language models show promise for mental health support, yet therapeutic quality improves only when evaluation functions as an actionable control signal rather than a passive metric. We introduce a framework that formulates…
32 -
arXiv — NLP / Computation & Language research 1d ago
Beyond Clean Text: Evaluating Encoder and Decoder Robustness for Bangla Event Detection in Noisy Text
arXiv:2606.30914v1 Announce Type: new Abstract: Event detection (ED) systems are typically evaluated on clean, curated text, leaving their robustness to real-world noise largely unexplored, particularly for low-resource languages such as Bangla. We introduce a generalized Bangla…
17 -
arXiv — NLP / Computation & Language research 1d ago
Bridging Scientific Heritage: An Arabic--Russian Parallel Corpus and LLM Benchmark for Sustainable Knowledge Transfer
arXiv:2606.30943v1 Announce Type: new Abstract: Russian and Arabic are among the major languages of scientific communication. Language barriers impede the exchange of research results between these communities, which affects international collaboration and the progress of…
8 -
arXiv — NLP / Computation & Language research 1d ago
Linguistic Distancing on Social Media: Indicators of Emotion Regulation Across Age Groups
arXiv:2606.30957v1 Announce Type: new Abstract: Managing our emotional responses to events is key to emotional well-being, a process referred to as emotion regulation in psychology. Previous work has established that the degree to which we distance events is a type of emotion…
8 -
arXiv — NLP / Computation & Language research 1d ago
From Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue
arXiv:2606.30973v1 Announce Type: new Abstract: Frictive Policy Optimization (FPO; Pustejovsky et al., 2025) treats friction in collaborative dialogue -- misalignment, misunderstanding, repair -- as an epistemic signal essential to common-ground construction, rather than noise…
18 -
arXiv — NLP / Computation & Language research 1d ago
Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments
arXiv:2606.30987v1 Announce Type: new Abstract: Decision-makers routinely rely on expert judgments accompanied by written explanations, yet explanation quality is difficult to measure at scale. Forecasting tournaments offer a natural testing ground: probabilistic judgments are…
6 -
arXiv — NLP / Computation & Language research 1d ago
Wait, am I Being Fair? Characterizing Deductive Stereotyping and Mitigating It with Fair-GCG
arXiv:2606.30989v1 Announce Type: new Abstract: Warning: This paper contains several toxic and offensive statements. While reasoning generally improves fairness in recent large language models (LLMs), failures persist. In this work, we identify a failure mode, deductive…
5 -
arXiv — NLP / Computation & Language research 1d ago
CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations
arXiv:2606.31033v1 Announce Type: new Abstract: In this paper, we propose CORTEX, a token-level hallucination detection method for Retrieval-Augmented Generation (RAG). In long-form RAG outputs, hallucinations often arise in localized spans rather than throughout an entire…
20 -
arXiv — NLP / Computation & Language research 1d ago
Truth or Sophistry? LoFa: A Benchmark for LLM Robustness Against Logical Fallacies
arXiv:2606.31039v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit strong semantic capabilities, yet their resilience to manipulative linguistic patterns such as logical fallacies remains underexplored. Prior work has primarily examined whether LLMs can…
12 -
arXiv — NLP / Computation & Language research 1d ago
A Semantic-Layer-Mediated Agent for Natural Language to SQL over Heterogeneous Enterprise Databases
arXiv:2606.31041v1 Announce Type: new Abstract: Natural language-to-SQL (NL2SQL) over real-world enterprise databases remains significantly more challenging than on academic benchmarks. Enterprise schemas often contain hundreds of physical tables with cryptic column names,…
12 -
arXiv — NLP / Computation & Language research 1d ago
Reference-Based Prosody and Rhythm Evaluation for Spoken Dialogue Systems
arXiv:2606.31055v1 Announce Type: new Abstract: Speech-to-speech (S2S) AI agents are advancing rapidly, yet evaluation lacks interpretable speech-native measures for conversational prosody and rhythm. Because $F_0$, speaking rate, articulation rate, and pausing shift with…
7 -
arXiv — NLP / Computation & Language research 1d ago
Exploring the relationship between team institutional composition and novelty in academic papers based on fine-grained knowledge entities
arXiv:2606.31058v1 Announce Type: new Abstract: The composition of author teams is an important factor influencing the novelty of academic papers. However, existing studies have paid limited attention to the role of institutional composition, and most novelty measures remain at…
22 -
arXiv — NLP / Computation & Language research 1d ago
Building a Multimodal Dataset of Academic Paper for Keyword Extraction
arXiv:2606.31069v1 Announce Type: new Abstract: Up to this point, keyword extraction task typically relies solely on textual data. Neglecting visual details and audio features from image and audio modalities leads to deficiencies in information richness and overlooks potential…
14 -
arXiv — NLP / Computation & Language research 1d ago
Triospect: A Three-Dimensional Framework for Robust Statistical AI-Generated Text Detection Against Diverse Attacks
arXiv:2606.31074v1 Announce Type: new Abstract: Existing AI-generated text detectors are vulnerable to attacks that manipulate textual characteristics. In this study, we propose a novel Triospect Detection Framework by using additional perspectives of content (core ideas) and…
37 -
arXiv — NLP / Computation & Language research 1d ago
When Reranking Hurts: Uncertainty-Based Gating for Few-Shot Reranking
arXiv:2606.31087v1 Announce Type: new Abstract: Few-shot selection typically assumes that reranking retrieved examples always improves performance. We challenge this view by identifying that the expensive reranking step can in fact degrade performance. Instead, we propose…
4 -
arXiv — NLP / Computation & Language research 1d ago
What Counts as an Error? Dual-Reference Benchmarking for Atypical ASR
arXiv:2606.31112v1 Announce Type: new Abstract: ASR systems have been often reported to underperform on atypical speech. An often conflated compounding factor is the existence of two valid transcription references: verbatim (actual produced speech, including…
31 -
arXiv — NLP / Computation & Language research 1d ago
SeKV: Resolution-Adaptive KV Cache with Hierarchical Semantic Memory for Long-Context LLM Inference
arXiv:2606.31145v1 Announce Type: new Abstract: Large language models increasingly operate over long contexts, where the KV cache becomes a dominant memory bottleneck: its size grows linearly with sequence length and must be retained throughout decoding, making full GPU caching…
11 -
arXiv — NLP / Computation & Language research 1d ago
TAG-DLM: Diffusion Language Models for Text-Attributed Graph Learning
arXiv:2606.31166v1 Announce Type: new Abstract: Text-attributed graphs (TAGs), where each node carries a natural language description, require models to jointly reason over text and graph topology. Existing approaches often handle the two modalities separately: graph neural…
8 -
arXiv — NLP / Computation & Language research 1d ago
Gated Multi-Graph Fusion via Graph Attention Networks for Alzheimer's Disease Detection
arXiv:2606.31186v1 Announce Type: new Abstract: Spontaneous speech is a vital non-invasive biomarker for Alzheimer's Disease (AD), yet many systems overlook non-linear structural disruptions and clinical heterogeneity in pathological language. We propose a Multi-View Gated Graph…
31 -
arXiv — NLP / Computation & Language research 1d ago
Can LLMs Imagine Moral Alternatives Beyond Binary Dilemmas?
arXiv:2606.31213v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed as moral advisors and agents, they need to address dilemmas between two competing values. However, existing research on LLMs with moral dilemmas overlooks a central aspect…
11 -
arXiv — NLP / Computation & Language research 1d ago
Probing Stylistic Appropriation using Large Language Models: An Evaluation Framework for Copyright Infringement under EU Law
arXiv:2606.31250v1 Announce Type: new Abstract: Large language models (LLM) trained on web-scale corpora generate output that may infringe copyright, yet existing technical safeguards focus narrowly on verbatim memorisation. EU copyright doctrine applies a broader standards:…
36 -
arXiv — NLP / Computation & Language research 1d ago
When the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue
arXiv:2606.31307v1 Announce Type: new Abstract: Large language models used in task-oriented dialogue often produce fluent but unsafe responses when backend database calls fail, return empty results, or surface mismatched information, inventing venues, confirmations, or booking…
16 -
arXiv — NLP / Computation & Language research 1d ago
LOPA: Enhancing Spoken Language Assessment via Latent Ordinal Prototype Alignment
arXiv:2606.31310v1 Announce Type: new Abstract: Fueled by increasing model scale and multimodal inputs, Multimodal Large Language Models (MLLMs) have emerged as a promising paradigm for Spoken Language Assessment (SLA). While effective, this paradigm often overlooks the…
9 -
arXiv — NLP / Computation & Language research 1d ago
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding
arXiv:2606.31315v1 Announce Type: new Abstract: Speculative decoding accelerates inference by using a lightweight draft model to generate candidate tokens in parallel, and are then verified by the target model, enabling lossless acceleration. Recently, diffusion-based…
20 -
arXiv — NLP / Computation & Language research 1d ago
Linguistic Bias Mitigation for Spoofing Detection via Gradient Reversal and A Variational Information Bottleneck
arXiv:2606.31411v1 Announce Type: new Abstract: Rapid advancements in generative speech technology have compromised the reliability of voice biometrics. While current spoofing detectors excel when assessed under in-domain conditions, generalisation to out-of-domain settings is…
4 -
arXiv — NLP / Computation & Language research 1d ago
Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering
arXiv:2606.31432v1 Announce Type: new Abstract: Medical multiple-choice question answering requires parameter-efficient adaptation across heterogeneous knowledge domains and reasoning operations. A medication question, a diagnostic decision, a public-health item, and a…
33 -
arXiv — NLP / Computation & Language research 1d ago
Revising RVL-CDIP: Quantifying Errors and Test-Train Overlap
arXiv:2606.31446v1 Announce Type: new Abstract: RVL-CDIP is a popular dataset for benchmarking document classifiers. However, the dataset contains ample amounts of label errors as well as non-trivial amounts of test-train overlap, both of which may impact model performance…
25 -
arXiv — NLP / Computation & Language research 1d ago
Team MKC at CLPsych 2026: Capturing and Characterizing Mental Health Changes through Social Media Timeline Dynamics
arXiv:2606.31464v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) have motivated their adoption across a wide range of domains, including Artificial Intelligence (AI) for mental health. Given the growing prevalence of mental health disorders…
19 -
arXiv — NLP / Computation & Language research 1d ago
Building an ASR Solution for Training and Assessing Children's Reading
arXiv:2606.31508v1 Announce Type: new Abstract: Automatic speech recognition for children's reading remains underdeveloped for most African languages, including Bambara, despite its potential value for reproducible literacy assessment. We present an open-source system for…
30 -
arXiv — NLP / Computation & Language research 1d ago
FinPersona-Bench: A Benchmark for Longitudinal Psychometric Stability of Autonomous Financial Agents
arXiv:2606.31522v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed as autonomous financial agents initialized with explicit behavioral mandates such as "preserve capital" or "avoid speculative bets" that are meant to govern every decision…
19 -
arXiv — NLP / Computation & Language research 1d ago
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously
arXiv:2606.31551v1 Announce Type: new Abstract: Training language models (LMs) remains a highly human-intensive process, even as frontier language model agents become increasingly capable at software engineering and other long-horizon tasks. A central challenge is that…
37 -
arXiv — NLP / Computation & Language research 1d ago
Robust Text Watermarking for Large Language Models via Dual Semantic Embeddings
arXiv:2606.31602v1 Announce Type: new Abstract: This work presents Dual-Embedding Watermarking (DEW), a semantic watermarking scheme for large language models (LLMs) that leverages contextual and token-level embeddings to enhance robustness against paraphrasing and translation.…
8 -
arXiv — NLP / Computation & Language research 1d ago
CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning
arXiv:2606.31608v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong results on many medical benchmarks, but their clinical reasoning remains difficult to evaluate reliably. A central risk is an evaluation illusion: fluent and well-structured explanations…
37 -
arXiv — NLP / Computation & Language research 1d ago
Tone-Conditioned Curriculum Learning for Low-Resource Bantu Speech Recognition
arXiv:2606.31642v1 Announce Type: new Abstract: Southern Bantu languages are spoken by over 80 million people, yet current foundation ASR models still produce zero-shot WER above 100%, which limits practical use in education and public services. We addressed this gap with a tone…
18 -
arXiv — NLP / Computation & Language research 1d ago
Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues
arXiv:2606.31644v1 Announce Type: new Abstract: As large language models take on morally consequential roles in healthcare, legal, and hiring contexts, we need to examine whether their ethical behaviors are genuine or superficial. We show that current fairness evaluations…
5 -
arXiv — NLP / Computation & Language research 1d ago
Overview of the TalentCLEF 2026: Skill and Job Title Intelligence for Human Capital Management
arXiv:2606.31692v1 Announce Type: new Abstract: This paper presents an overview of the second edition of the TalentCLEF challenge, organized as a Lab at the Conference and Labs of the Evaluation Forum (CLEF) 2026. TalentCLEF is an initiative aimed at advancing Natural Language…
19 -
arXiv — NLP / Computation & Language research 1d ago
Cross-lingual Relation Extraction with Large Language Models: Zero-Shot, Few-Shot, and Fine-Tuned Evaluation on Romanian
arXiv:2606.31718v1 Announce Type: new Abstract: Relation extraction (RE) for low-resource languages is typically constrained by the lack of annotated corpora. We investigate the feasibility of cross-lingual RE for Romanian by combining automatic dataset translation with large…
38 -
arXiv — NLP / Computation & Language research 1d ago
Seeing Is Not Sharing: Some Vision-Language Models Overestimate Common Ground in Asymmetric Dialogue
arXiv:2606.31719v1 Announce Type: new Abstract: In collaborative dialogue, shared perception does not guarantee shared interpretation. Mutual understanding must be established through interaction. We investigate whether vision-language models (VLMs) can distinguish what could be…
22 -
arXiv — NLP / Computation & Language research 1d ago
Adapting Foundation ASR Models to Dysarthric Speech: A Case Study
arXiv:2606.31722v1 Announce Type: new Abstract: Automatic speech recognition (ASR) systems often perform poorly in dysarthric speech, limiting their usefulness to affected speakers in everyday communication. This paper presents a personalized ASR system for a dysarthric speaker,…
11 -
arXiv — NLP / Computation & Language research 1d ago
STEB: Style Text Embedding Benchmark
arXiv:2606.31741v1 Announce Type: new Abstract: While semantic embeddings are rigorously evaluated on the Massive Text Embedding Benchmark, the evaluation of style embeddings remains fragmented, with each work relying on their own set of tasks and datasets. To bridge this gap,…
27 -
arXiv — NLP / Computation & Language research 1d ago
CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield
arXiv:2606.31796v1 Announce Type: new Abstract: We study three complementary techniques for training compute-efficient language models. (1) Selective supervision and per-token efficiency. Selective Ground Truth Token Training (SGT) concentrates supervision on the ~15% of output…
14 -
arXiv — NLP / Computation & Language research 1d ago
Explicit Fuzzy Logic in the Feed-Forward Layer: Self-Forgetting Quantifiers Discover Legible Grammatical-Licensing Detectors
arXiv:2606.31845v1 Announce Type: new Abstract: A transformer's feed-forward (FFN) sublayer materializes the distinctions attention gathers, yet gives no account of what it computes. In a parameter-neutral replacement, each hidden unit is an explicit fuzzy set operation on…
35 -
arXiv — NLP / Computation & Language research 1d ago
Theory of Mind and Persuasion Beyond Conversation: Assessing the Capacity of LLMs to Induce Belief States via Planning and Action
arXiv:2606.31916v1 Announce Type: new Abstract: Theory of Mind (ToM) benchmarks for Large Language Models (LLMs) typically rely on passive question-answering formats, but the deployment of LLMs in increasingly agentic and autonomous forms demands new evaluations. In this paper…
25