arXiv — NLP / Computation & Language
500 articles archived · Visit source ↗ · RSS
-
arXiv — NLP / Computation & Language research 6h ago
Evidence-Supported Credit Risk Report Generation Using News-Centric Financial Knowledge Graphs
arXiv:2607.01023v1 Announce Type: new Abstract: Financial markets evolve in response to real-world events reported in news, yet these drivers often remain implicit in text. To better explain market dynamics, event-market relations must be explicitly modeled through factual,…
27 -
arXiv — NLP / Computation & Language research 6h ago
Behavior-Adaptive Conversational Agents: Toward a Fluid Personality Framework
arXiv:2607.01034v1 Announce Type: new Abstract: Large language model (LLM)-based conversational agents (CAs) are now ubiquitous, creating new opportunities for AI-mediated behavior change. Their capacity to project nuanced personalities and adopt diverse metaphorical roles…
38 -
arXiv — NLP / Computation & Language research 6h ago
Conversable Complexity: Agentic LLM Collectives as Interpretable Substrates
arXiv:2607.01047v1 Announce Type: new Abstract: Complexity and interpretability rarely coincide: systems rich enough for complex behaviours to emerge are usually too opaque to question, while transparent ones are too simple for anything complex to emerge. A single large language…
33 -
arXiv — NLP / Computation & Language research 6h ago
Message Passing Enables Efficient Reasoning
arXiv:2607.01077v1 Announce Type: new Abstract: While inference-time scaling has improved the reasoning abilities of large language models (LLMs), the need to generate long chains-of-thought (CoTs) is a computational bottleneck. Thus, in contrast to sequential scaling methods…
37 -
arXiv — NLP / Computation & Language research 6h ago
Clinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking
arXiv:2607.01103v1 Announce Type: new Abstract: Open-response evaluation provides stronger clinical validity than multiple-choice benchmarks but creates a scoring bottleneck that motivates automated LLM-asa-Judge approaches. Whether such evaluators replicate clinical calibration…
12 -
arXiv — NLP / Computation & Language research 6h ago
Towards Developing a Multimodal Chat Assistant for University Stakeholders: RAG-based Approach
arXiv:2607.01115v1 Announce Type: new Abstract: University stakeholders often face difficulties in accessing timely and reliable information, especially in developing countries, where there are very few intelligent support systems. Existing rule-based chatbots are unable to…
15 -
arXiv — NLP / Computation & Language research 6h ago
$\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space
arXiv:2607.01127v1 Announce Type: new Abstract: Quantization has become an invaluable tool to reduce memory requirements and inference speed of modern language models, in particular to make them available for consumer setups and edge devices. While previous work has primarily…
17 -
arXiv — NLP / Computation & Language research 6h ago
AGC-Bench: Measuring Artificial General Creativity
arXiv:2607.01152v1 Announce Type: new Abstract: Creativity research has debated whether creativity is domain-specific (e.g., visual, writing, science), and if it is psychometrically separable from general intelligence. Both questions now apply to LLMs, but a unified benchmark of…
10 -
arXiv — NLP / Computation & Language research 6h ago
Adversarial Pragmatics for AI Safety Evaluation: A Benchmark for Instruction Conflict, Embedded Commands, and Policy Ambiguity
arXiv:2607.01153v1 Announce Type: new Abstract: Safety evaluations for language models increasingly depend on judgments about ambiguous natural-language behaviour: whether a model has followed an instruction, refused appropriately, complied with a policy, resisted an embedded…
14 -
arXiv — NLP / Computation & Language research 6h ago
Distill to Detect: Exposing Stealth Biases in LLMs through Cartridge Distillation
arXiv:2607.01208v1 Announce Type: new Abstract: Language models deployed in high-stakes roles can potentially favor certain entities, brands, or viewpoints, steering user decisions at scale. Such preferential biases can be introduced by any actor in the model's supply chain and…
15 -
arXiv — NLP / Computation & Language research 6h ago
The State-Prediction Separation Hypothesis
arXiv:2607.01218v1 Announce Type: new Abstract: Transformers use the same forward computation stream to both predict the next token and store useful state for future token predictions. We formulate the \emph{state-prediction separation hypothesis}: disentangling the two roles…
32 -
arXiv — NLP / Computation & Language research 6h ago
Measuring the Gap Between Human and LLM Research Ideas
arXiv:2607.01233v1 Announce Type: new Abstract: LLMs are increasingly used to brainstorm research ideas, but existing evaluations mostly judge individual ideas by novelty, feasibility, or expert preference. We instead ask: how far are current LLM-generated ideas from human…
8 -
arXiv — NLP / Computation & Language research 6h ago
Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions
arXiv:2507.15692v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) provide new opportunities for blind and low vision (BLV) people to access visual information in their daily lives. However, these models often produce errors that are difficult to detect…
22 -
arXiv — NLP / Computation & Language research 6h ago
Prompt Optimization for User Simulation in Conversational Recommender Systems: A Multi-Objective Framework
arXiv:2607.00010v1 Announce Type: cross Abstract: Conversational recommender systems (CRSs) are a core component of next-generation intelligent recommender systems because they enable users to actively elicit preferences, clarify intentions, and adapt recommendations in real…
4 -
arXiv — NLP / Computation & Language research 6h ago
Learning User-Aware Recall: Personalized Retrieval in Long-Term Conversational Memory
arXiv:2607.00017v1 Announce Type: cross Abstract: Long-term conversational agents are expected to remember past interactions, but memory is useful only when the right evidence is recalled for the right user. Existing memory-augmented LLM agents have made progress in building…
30 -
arXiv — NLP / Computation & Language research 6h ago
Destination-Labeled Self-Looping Systems with Dwell: Intrinsic Characterization, Realization Cost, and Recognition
arXiv:2607.00044v1 Announce Type: cross Abstract: We study a finite-state symbolic controller for systems in which the admissible visible transitions are fixed in advance and each visible state carries a minimum dwell requirement. The resulting model, which we call a…
32 -
arXiv — NLP / Computation & Language research 6h ago
CogTax: A Four-Level Cognitive Taxonomy for Command-Line Computing Education
arXiv:2607.00140v1 Announce Type: cross Abstract: As computing education expands beyond traditional programming into operational domains such as systems administration and command-line environments, existing pedagogical frameworks struggle to capture a dimension that is critical…
12 -
arXiv — NLP / Computation & Language research 6h ago
GRPO, Dr. GRPO, and DAPO Are Three Operations on One Number: The Group-Standard-Deviation Identity
arXiv:2607.00152v1 Announce Type: cross Abstract: Three of the most popular methods for training language models to reason look like three different tricks. They are not. All three adjust a single number: standard deviation, reflecting how much a prompt's sampled answers…
38 -
arXiv — NLP / Computation & Language research 6h ago
From Signals to Structure: How Memory Architecture Drives Language Emergence in LLM Agents
arXiv:2607.00233v1 Announce Type: cross Abstract: How do two agents invent a shared language from scratch? In a Lewis signaling game, a sender and receiver must coordinate on a code using only their interaction history. We study five memory architectures across varying channel…
26 -
arXiv — NLP / Computation & Language research 6h ago
Testing Frontier Large Language Models' Physics Literacy in Parallel Physical Worlds
arXiv:2607.00276v1 Announce Type: cross Abstract: Current large-language-model (LLM) physics benchmarks are usually scored by answer accuracy, which cannot distinguish genuine reasoning from recall of familiar problem patterns and reveals little about where a model's reasoning…
10 -
arXiv — NLP / Computation & Language research 6h ago
An LLM-Based Framework for Intent-Driven Network Topology Design
arXiv:2607.00292v1 Announce Type: cross Abstract: Designing deployable and resilient network topologies from natural language requirements remains a challenging problem in network automation. This work investigates the ability of Large Language Models (LLMs) to generate…
35 -
arXiv — NLP / Computation & Language research 6h ago
Rosetta: Composable Native Multimodal Pretraining
arXiv:2607.00293v1 Announce Type: cross Abstract: Achieving true artificial general intelligence requires foundation models capable of integrating new modalities without forgetting prior knowledge. However, accommodating continuous generative objectives alongside discrete…
15 -
arXiv — NLP / Computation & Language research 6h ago
EPC: A Standardized Protocol for Measuring Evaluator Preference Dynamics in LLM Agent Systems
arXiv:2607.00297v1 Announce Type: cross Abstract: When LLM agents use evaluator feedback to adapt their behavior in closed loops, evaluator biases propagate through the agent's strategy distribution -- a phenomenon known as evaluator preference coupling. Prior work has…
37 -
arXiv — NLP / Computation & Language research 6h ago
Mapping the Evaluation Frontier: An Empirical Survey of the Bias-Reliability Tradeoff Across Eleven Evaluator-Agent Conditions
arXiv:2607.00304v1 Announce Type: cross Abstract: The bias-reliability tradeoff conjectures that LLM evaluation systems are constrained in (gamma, H, CV) space, where evaluator coupling (gamma), strategy diversity (H), and small-sample measurement reliability (CV(N)) cannot be…
7 -
arXiv — NLP / Computation & Language research 6h ago
A Text-Steerable Instrument for Sketching Procedural Soundscapes via Language Models
arXiv:2607.00309v1 Announce Type: cross Abstract: We present a real-time musical interface that converts natural-language scene descriptions into evolving procedural soundscapes. A performer types a prompt such as "warm jazz cafe at midnight" and steers it through direct…
19 -
arXiv — NLP / Computation & Language research 6h ago
Watermarking for Proprietary Dataset Protection
arXiv:2607.00325v1 Announce Type: cross Abstract: A growing body of literature suggests that training data membership inference problems are fundamentally hard tasks in modern language modeling settings. We argue that output watermarking techniques are the right gadget to make…
8 -
arXiv — NLP / Computation & Language research 6h ago
Learning to Compose: Revisiting Proxy Task Design for Zero-Shot Composed Image Retrieval
arXiv:2607.00374v1 Announce Type: cross Abstract: Composed Image Retrieval (CIR) retrieves a target image from a reference image and a textual modification. While supervised CIR relies on costly triplets, Zero-Shot CIR (ZS-CIR) alleviates this reliance through proxy tasks…
5 -
arXiv — NLP / Computation & Language research 6h ago
When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers
arXiv:2607.00394v1 Announce Type: cross Abstract: LLM agents increasingly rely on retrieval buffers to store and reuse past experience, yet the cache management policies governing these buffers remain largely ad-hoc. We formalize this as an online semantic cache replacement…
22 -
arXiv — NLP / Computation & Language research 6h ago
NeuroCogMap Reveals Cognitive Organization of Large Language Models
arXiv:2607.00397v1 Announce Type: cross Abstract: Understanding how complex cognitive functions are organized within artificial systems is central to interpreting large language models (LLMs) and relating them to biological cognition. Yet although LLMs exhibit broad…
24 -
arXiv — NLP / Computation & Language research 6h ago
MolSafeEval: A Benchmark for Uncovering Safety Risks in AI-Generated Molecules
arXiv:2607.00464v1 Announce Type: cross Abstract: Current molecular generation benchmarks emphasize task complexity, molecule novelty, and property alignment; they largely overlook a critical concern: the potential safety risks of AI-generated molecules. In practice, many…
22 -
arXiv — NLP / Computation & Language research 6h ago
StochasT: Learning with Stochastic Turn Depth for Visual Instruction Tuning
arXiv:2607.00465v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) rely extensively on Visual Instruction Tuning (VIT) to elicit their multimodal reasoning capabilities. However, we find a discrepancy: VIT often packs multiple language tasks about the same…
8 -
arXiv — NLP / Computation & Language research 6h ago
MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos
arXiv:2607.00491v1 Announce Type: cross Abstract: Benchmarks for vision-language models (VLMs) mostly test observational spatial reasoning: models describe relations already visible in the input. Existing what-if tasks typically vary the observer while keeping the scene fixed.…
21 -
arXiv — NLP / Computation & Language research 6h ago
Self-Evolving Agents with Anytime-Valid Certificates
arXiv:2607.00871v1 Announce Type: cross Abstract: Self-evolving agents violate the assumption behind most learning-theoretic guarantees: the data, evaluator, components, and hypothesis space are produced by the policy being updated. We present \textbf{SEA}, an architecture that…
27 -
arXiv — NLP / Computation & Language research 6h ago
Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination
arXiv:2607.00924v1 Announce Type: cross Abstract: Accelerating materials discovery requires AI systems that can generate scientifically valid hypotheses through multi-step, domain-grounded reasoning. Standard large language models often produce fluent but weakly traceable…
36 -
arXiv — NLP / Computation & Language research 6h ago
Agentic generation of verifiable rules for deterministic, self-expanding reaction classification
arXiv:2607.01061v1 Announce Type: cross Abstract: Computer-assisted synthesis planning breaks target molecules into accessible precursors using large libraries of reaction rules that assign each transformation a deterministic, interpretable label. But chemistry is long-tailed,…
17 -
arXiv — NLP / Computation & Language research 6h ago
CausalMix: Data Mixture as Causal Inference for Language Model Training
arXiv:2607.01104v1 Announce Type: cross Abstract: In Large Language Model (LLM) training, data mixing plays a pivotal role in determining model performance. Recent methods optimize mixture weights via proxy models, but they rely on the assumption of static data distributions. As…
31 -
arXiv — NLP / Computation & Language research 6h ago
Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian Languages
arXiv:2607.01161v1 Announce Type: cross Abstract: Cross-lingual speaker verification (SV) systems typically exhibit performance degradation when enrollment and test utterances are spoken in different languages. However, standard evaluation protocols confound language mismatch…
16 -
arXiv — NLP / Computation & Language research 6h ago
QuasiMoTTo: Quasi-Monte Carlo Test-Time Scaling
arXiv:2607.01179v1 Announce Type: cross Abstract: Scaling inference compute, by generating many parallel attempts per problem, is a costly but reliable lever for improving language model capabilities. By default these attempts are generated independently, wasting inference…
36 -
arXiv — NLP / Computation & Language research 6h ago
Right in the Right Way: LM Training with Verifiable Rewards and Human Demonstrations
arXiv:2607.01181v1 Announce Type: cross Abstract: RL with verifiable rewards (RLVR) has emerged as a powerful paradigm for training LMs on tasks with well-defined success metrics, such as code generation and mathematical reasoning. However, current RLVR methods optimize only…
25 -
arXiv — NLP / Computation & Language research 6h ago
Theoria: Rewrite-Acceptability Verification over Informal Reasoning States
arXiv:2607.01223v1 Announce Type: cross Abstract: When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the…
18 -
arXiv — NLP / Computation & Language research 6h ago
AutoMem: Automated Learning of Memory as a Cognitive Skill
arXiv:2607.01224v1 Announce Type: cross Abstract: Memory expertise is a learned skill: knowing what to encode, when to retrieve, and how to organize knowledge--a capacity known in cognitive science as metamemory. We bring this perspective to LLMs by treating memory management as…
35 -
arXiv — NLP / Computation & Language research 6h ago
Is One Layer Enough? Training A Single Transformer Layer Can Match Full-Parameter RL Training
arXiv:2607.01232v1 Announce Type: cross Abstract: Reinforcement learning (RL) has become a central component of post-training large language models (LLMs), yet little is understood about how RL adaptation is distributed across transformer layers. Existing approaches typically…
7 -
arXiv — NLP / Computation & Language research 6h ago
Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations
arXiv:2503.13445v3 Announce Type: replace Abstract: When asked to explain their decisions, LLMs can often give explanations which sound plausible to humans. But are these explanations faithful, i.e. do they convey the factors actually responsible for the decision? In this work,…
4 -
arXiv — NLP / Computation & Language research 6h ago
GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge
arXiv:2507.05740v2 Announce Type: replace Abstract: Language models are powerful artifacts, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely…
30 -
arXiv — NLP / Computation & Language research 6h ago
Toward Cybersecurity-Expert Small Language Models
arXiv:2510.14113v2 Announce Type: replace Abstract: Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal…
30 -
arXiv — NLP / Computation & Language research 6h ago
LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data
arXiv:2510.24434v3 Announce Type: replace Abstract: The effectiveness of instruction-tuned Large Language Models (LLMs) is often limited in low-resource linguistic settings due to a lack of high-quality training data. We introduce LuxIT, a novel, monolingual instruction tuning…
10 -
arXiv — NLP / Computation & Language research 6h ago
OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning
arXiv:2510.24636v3 Announce Type: replace Abstract: Reward models (RMs) have become essential for aligning large language models (LLMs), serving as scalable proxies for human evaluation in both training and inference. However, existing RMs struggle on knowledge-intensive and…
34 -
arXiv — NLP / Computation & Language research 6h ago
Reasoning Up the Instruction Ladder for Controllable Language Models
arXiv:2511.04694v5 Announce Type: replace Abstract: As large language model (LLM) based systems take on high-stakes roles in real-world decision-making, they must reconcile competing instructions from multiple sources within a single prompt context. Enforcing an instruction…
17 -
arXiv — NLP / Computation & Language research 6h ago
Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents
arXiv:2511.07397v3 Announce Type: replace Abstract: Voice agents face a fundamental tension: the reasoning, retrieval, and tool use that make foundation models capable are iterative and slow, while conversational interaction demands responses on a millisecond timescale. Smaller,…
22 -
arXiv — NLP / Computation & Language research 6h ago
Graded strength of comparative illusions is explained by Bayesian inference
arXiv:2511.14642v2 Announce Type: replace Abstract: Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I…
33