News / #paper Tag Research papers 500 articles archived under #paper · RSS Sign in to follow arXiv — NLP / Computation & Language research 8h ago Conversable Complexity: Agentic LLM Collectives as Interpretable Substrates arXiv:2607.01047v1 Announce Type: new Abstract: Complexity and interpretability rarely coincide: systems rich enough for complex behaviours to emerge are usually too opaque to question, while transparent ones are too simple for anything complex to emerge. A single large language… 33 arXiv — NLP / Computation & Language research 8h ago Message Passing Enables Efficient Reasoning arXiv:2607.01077v1 Announce Type: new Abstract: While inference-time scaling has improved the reasoning abilities of large language models (LLMs), the need to generate long chains-of-thought (CoTs) is a computational bottleneck. Thus, in contrast to sequential scaling methods… 37 arXiv — NLP / Computation & Language research 8h ago Clinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking arXiv:2607.01103v1 Announce Type: new Abstract: Open-response evaluation provides stronger clinical validity than multiple-choice benchmarks but creates a scoring bottleneck that motivates automated LLM-asa-Judge approaches. Whether such evaluators replicate clinical calibration… 12 arXiv — NLP / Computation & Language research 8h ago Towards Developing a Multimodal Chat Assistant for University Stakeholders: RAG-based Approach arXiv:2607.01115v1 Announce Type: new Abstract: University stakeholders often face difficulties in accessing timely and reliable information, especially in developing countries, where there are very few intelligent support systems. Existing rule-based chatbots are unable to… 15 arXiv — NLP / Computation & Language research 8h ago $\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space arXiv:2607.01127v1 Announce Type: new Abstract: Quantization has become an invaluable tool to reduce memory requirements and inference speed of modern language models, in particular to make them available for consumer setups and edge devices. While previous work has primarily… 17 arXiv — NLP / Computation & Language research 8h ago AGC-Bench: Measuring Artificial General Creativity arXiv:2607.01152v1 Announce Type: new Abstract: Creativity research has debated whether creativity is domain-specific (e.g., visual, writing, science), and if it is psychometrically separable from general intelligence. Both questions now apply to LLMs, but a unified benchmark of… 10 arXiv — NLP / Computation & Language research 8h ago Adversarial Pragmatics for AI Safety Evaluation: A Benchmark for Instruction Conflict, Embedded Commands, and Policy Ambiguity arXiv:2607.01153v1 Announce Type: new Abstract: Safety evaluations for language models increasingly depend on judgments about ambiguous natural-language behaviour: whether a model has followed an instruction, refused appropriately, complied with a policy, resisted an embedded… 14 arXiv — NLP / Computation & Language research 8h ago Distill to Detect: Exposing Stealth Biases in LLMs through Cartridge Distillation arXiv:2607.01208v1 Announce Type: new Abstract: Language models deployed in high-stakes roles can potentially favor certain entities, brands, or viewpoints, steering user decisions at scale. Such preferential biases can be introduced by any actor in the model's supply chain and… 15 arXiv — NLP / Computation & Language research 8h ago The State-Prediction Separation Hypothesis arXiv:2607.01218v1 Announce Type: new Abstract: Transformers use the same forward computation stream to both predict the next token and store useful state for future token predictions. We formulate the \emph{state-prediction separation hypothesis}: disentangling the two roles… 32 arXiv — NLP / Computation & Language research 8h ago Measuring the Gap Between Human and LLM Research Ideas arXiv:2607.01233v1 Announce Type: new Abstract: LLMs are increasingly used to brainstorm research ideas, but existing evaluations mostly judge individual ideas by novelty, feasibility, or expert preference. We instead ask: how far are current LLM-generated ideas from human… 8 arXiv — NLP / Computation & Language research 8h ago Surfacing Variations to Calibrate Perceived Reliability of MLLM-generated Image Descriptions arXiv:2507.15692v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) provide new opportunities for blind and low vision (BLV) people to access visual information in their daily lives. However, these models often produce errors that are difficult to detect… 22 arXiv — NLP / Computation & Language research 8h ago Prompt Optimization for User Simulation in Conversational Recommender Systems: A Multi-Objective Framework arXiv:2607.00010v1 Announce Type: cross Abstract: Conversational recommender systems (CRSs) are a core component of next-generation intelligent recommender systems because they enable users to actively elicit preferences, clarify intentions, and adapt recommendations in real… 4 arXiv — NLP / Computation & Language research 8h ago Learning User-Aware Recall: Personalized Retrieval in Long-Term Conversational Memory arXiv:2607.00017v1 Announce Type: cross Abstract: Long-term conversational agents are expected to remember past interactions, but memory is useful only when the right evidence is recalled for the right user. Existing memory-augmented LLM agents have made progress in building… 30 arXiv — NLP / Computation & Language research 8h ago Destination-Labeled Self-Looping Systems with Dwell: Intrinsic Characterization, Realization Cost, and Recognition arXiv:2607.00044v1 Announce Type: cross Abstract: We study a finite-state symbolic controller for systems in which the admissible visible transitions are fixed in advance and each visible state carries a minimum dwell requirement. The resulting model, which we call a… 32 arXiv — NLP / Computation & Language research 8h ago From Signals to Structure: How Memory Architecture Drives Language Emergence in LLM Agents arXiv:2607.00233v1 Announce Type: cross Abstract: How do two agents invent a shared language from scratch? In a Lewis signaling game, a sender and receiver must coordinate on a code using only their interaction history. We study five memory architectures across varying channel… 26 arXiv — NLP / Computation & Language research 8h ago An LLM-Based Framework for Intent-Driven Network Topology Design arXiv:2607.00292v1 Announce Type: cross Abstract: Designing deployable and resilient network topologies from natural language requirements remains a challenging problem in network automation. This work investigates the ability of Large Language Models (LLMs) to generate… 35 arXiv — NLP / Computation & Language research 8h ago Rosetta: Composable Native Multimodal Pretraining arXiv:2607.00293v1 Announce Type: cross Abstract: Achieving true artificial general intelligence requires foundation models capable of integrating new modalities without forgetting prior knowledge. However, accommodating continuous generative objectives alongside discrete… 15 arXiv — NLP / Computation & Language research 8h ago A Text-Steerable Instrument for Sketching Procedural Soundscapes via Language Models arXiv:2607.00309v1 Announce Type: cross Abstract: We present a real-time musical interface that converts natural-language scene descriptions into evolving procedural soundscapes. A performer types a prompt such as "warm jazz cafe at midnight" and steers it through direct… 19 arXiv — NLP / Computation & Language research 8h ago Learning to Compose: Revisiting Proxy Task Design for Zero-Shot Composed Image Retrieval arXiv:2607.00374v1 Announce Type: cross Abstract: Composed Image Retrieval (CIR) retrieves a target image from a reference image and a textual modification. While supervised CIR relies on costly triplets, Zero-Shot CIR (ZS-CIR) alleviates this reliance through proxy tasks… 5 arXiv — NLP / Computation & Language research 8h ago When Classic Cache Policies Fail: Learning-Augmented Replacement for Semantic Retrieval Buffers arXiv:2607.00394v1 Announce Type: cross Abstract: LLM agents increasingly rely on retrieval buffers to store and reuse past experience, yet the cache management policies governing these buffers remain largely ad-hoc. We formalize this as an online semantic cache replacement… 22 arXiv — NLP / Computation & Language research 8h ago NeuroCogMap Reveals Cognitive Organization of Large Language Models arXiv:2607.00397v1 Announce Type: cross Abstract: Understanding how complex cognitive functions are organized within artificial systems is central to interpreting large language models (LLMs) and relating them to biological cognition. Yet although LLMs exhibit broad… 24 arXiv — NLP / Computation & Language research 8h ago StochasT: Learning with Stochastic Turn Depth for Visual Instruction Tuning arXiv:2607.00465v1 Announce Type: cross Abstract: Large Vision-Language Models (LVLMs) rely extensively on Visual Instruction Tuning (VIT) to elicit their multimodal reasoning capabilities. However, we find a discrepancy: VIT often packs multiple language tasks about the same… 8 arXiv — NLP / Computation & Language research 8h ago MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos arXiv:2607.00491v1 Announce Type: cross Abstract: Benchmarks for vision-language models (VLMs) mostly test observational spatial reasoning: models describe relations already visible in the input. Existing what-if tasks typically vary the observer while keeping the scene fixed.… 21 arXiv — NLP / Computation & Language research 8h ago Self-Evolving Agents with Anytime-Valid Certificates arXiv:2607.00871v1 Announce Type: cross Abstract: Self-evolving agents violate the assumption behind most learning-theoretic guarantees: the data, evaluator, components, and hypothesis space are produced by the policy being updated. We present \textbf{SEA}, an architecture that… 27 arXiv — NLP / Computation & Language research 8h ago Graph-Native Reinforcement Learning Enables Traceable Scientific Hypothesis Generation through Conceptual Recombination arXiv:2607.00924v1 Announce Type: cross Abstract: Accelerating materials discovery requires AI systems that can generate scientifically valid hypotheses through multi-step, domain-grounded reasoning. Standard large language models often produce fluent but weakly traceable… 36 arXiv — NLP / Computation & Language research 8h ago Agentic generation of verifiable rules for deterministic, self-expanding reaction classification arXiv:2607.01061v1 Announce Type: cross Abstract: Computer-assisted synthesis planning breaks target molecules into accessible precursors using large libraries of reaction rules that assign each transformation a deterministic, interpretable label. But chemistry is long-tailed,… 17 arXiv — NLP / Computation & Language research 8h ago Disentangling Speaker and Language Effects in Cross-Lingual Speaker Verification for Iberian Languages arXiv:2607.01161v1 Announce Type: cross Abstract: Cross-lingual speaker verification (SV) systems typically exhibit performance degradation when enrollment and test utterances are spoken in different languages. However, standard evaluation protocols confound language mismatch… 16 arXiv — NLP / Computation & Language research 8h ago Theoria: Rewrite-Acceptability Verification over Informal Reasoning States arXiv:2607.01223v1 Announce Type: cross Abstract: When should an AI system's answer be trusted? Formal proof assistants offer certainty but cannot reach most of the problem distribution; scalar LLM judges offer coverage but produce opaque scores that cannot be audited after the… 18 arXiv — NLP / Computation & Language research 8h ago AutoMem: Automated Learning of Memory as a Cognitive Skill arXiv:2607.01224v1 Announce Type: cross Abstract: Memory expertise is a learned skill: knowing what to encode, when to retrieve, and how to organize knowledge--a capacity known in cognitive science as metamemory. We bring this perspective to LLMs by treating memory management as… 35 arXiv — NLP / Computation & Language research 8h ago Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations arXiv:2503.13445v3 Announce Type: replace Abstract: When asked to explain their decisions, LLMs can often give explanations which sound plausible to humans. But are these explanations faithful, i.e. do they convey the factors actually responsible for the decision? In this work,… 4 arXiv — NLP / Computation & Language research 8h ago GPTKB v1.5: A Massive Knowledge Base for Exploring Factual LLM Knowledge arXiv:2507.05740v2 Announce Type: replace Abstract: Language models are powerful artifacts, yet their factual knowledge is still poorly understood, and inaccessible to ad-hoc browsing and scalable statistical analysis. This demonstration introduces GPTKB v1.5, a densely… 30 arXiv — NLP / Computation & Language research 8h ago Toward Cybersecurity-Expert Small Language Models arXiv:2510.14113v2 Announce Type: replace Abstract: Large language models (LLMs) are transforming everyday applications, yet deployment in cybersecurity lags due to a lack of high-quality, domain-specific models and training datasets. To address this gap, we present CyberPal… 30 arXiv — NLP / Computation & Language research 8h ago LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data arXiv:2510.24434v3 Announce Type: replace Abstract: The effectiveness of instruction-tuned Large Language Models (LLMs) is often limited in low-resource linguistic settings due to a lack of high-quality training data. We introduce LuxIT, a novel, monolingual instruction tuning… 10 arXiv — NLP / Computation & Language research 8h ago OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning arXiv:2510.24636v3 Announce Type: replace Abstract: Reward models (RMs) have become essential for aligning large language models (LLMs), serving as scalable proxies for human evaluation in both training and inference. However, existing RMs struggle on knowledge-intensive and… 34 arXiv — NLP / Computation & Language research 8h ago Reasoning Up the Instruction Ladder for Controllable Language Models arXiv:2511.04694v5 Announce Type: replace Abstract: As large language model (LLM) based systems take on high-stakes roles in real-world decision-making, they must reconcile competing instructions from multiple sources within a single prompt context. Enforcing an instruction… 17 arXiv — NLP / Computation & Language research 8h ago Thinking While Speaking: Inference-Time Knowledge Transfer for Responsive and Intelligent Conversational Voice Agents arXiv:2511.07397v3 Announce Type: replace Abstract: Voice agents face a fundamental tension: the reasoning, retrieval, and tool use that make foundation models capable are iterative and slow, while conversational interaction demands responses on a millisecond timescale. Smaller,… 22 arXiv — NLP / Computation & Language research 8h ago Graded strength of comparative illusions is explained by Bayesian inference arXiv:2511.14642v2 Announce Type: replace Abstract: Like visual processing, language processing is susceptible to illusions in which people systematically misperceive stimuli. In one such case--the comparative illusion (CI), e.g., More students have been to Russia than I… 33 r/MachineLearning community 1d ago On July 1, 2026, arXiv will spin out from Cornell University, its home for the past 25 years, to become an independent nonprofit organization. Major funding support from Simons Foundation and Schmidt Sciences. Ditching the red for their website. [N] arXiv’s next chapter: Updates on our spin out from Cornell University: https://blog.arxiv.org/2026/06/30/arxivs-next-chapter/   submitted by   /u/Nunki08 [link]   [comments] 12 arXiv — Machine Learning research 1d ago Joint discovery of governing partial differential equations from multi-source datasets by competitive optimization arXiv:2606.30699v1 Announce Type: new Abstract: Discovering governing equations directly from observational data is a key step towards interpretable scientific machine learning. Current data-driven approaches typically operate on a single dataset, inherently limiting their… 38 arXiv — Machine Learning research 1d ago Accelerometry-Derived Digital Biomarkers for Cardiometabolic Risk: A Population-Representative Tabular Benchmark with Uncertainty Quantification arXiv:2606.30702v1 Announce Type: new Abstract: Structured tabular data dominates clinical medicine, yet existing benchmarks fail to reflect real-world properties like complex survey sampling, demographic oversampling, and subgroup fairness. We introduce the NHANES Accelerometry… 31 arXiv — NLP / Computation & Language research 1d ago From Search to Synthesis: Training LLMs as Zero-Shot Workflow Generators arXiv:2606.30704v1 Announce Type: cross Abstract: Large language models (LLMs) excel across a wide range of tasks, yet their instance-specific solutions often lack the structural consistency needed for reliable deployment. Workflows that encode recurring algorithmic patterns at… 13 arXiv — Machine Learning research 1d ago Why Do Few-Step Text Latents Fail When Image Latents Work? Non-Commitment at Sharp Categorical Readouts arXiv:2606.30705v1 Announce Type: new Abstract: Deterministic few-step generation succeeds on continuous image latents but collapses to incoherent text on continuous text latents, and we show the cause is geometric rather than a training or scaling deficiency: a smooth,… 30 arXiv — Machine Learning research 1d ago Hierarchical Global Attention (HGA) arXiv:2606.30709v1 Announce Type: new Abstract: Hierarchical Global Attention (HGA) is a drop-in replacement for dense causal attention in pretrained long-context transformers. HGA preserves the original checkpoint parameters: the pretrained $W_Q$, $W_K$, $W_V$, and $W_O$… 23 arXiv — Machine Learning research 1d ago ReactionAtlas: Ab origine exploration of chemical reaction networks with machine learning arXiv:2606.30778v1 Announce Type: new Abstract: Mapping a chemical reaction network, the graph of minima and transition states (TS) and the elementary reactions connecting them, is the natural language of chemistry, from catalysis to combustion to the origin of life.… 5 arXiv — NLP / Computation & Language research 1d ago Revocable Learned State via Process Sidecars arXiv:2606.30788v1 Announce Type: cross Abstract: Language models are often adapted in stages: a public skill phase, a private memory phase, and a later safety phase that learns to refuse outputs tied to the remembered entities. Revoking the memory after the safety phase is not… 17 arXiv — Machine Learning research 1d ago Predictable GRPO: A Closed-Form Model of Training Dynamics arXiv:2606.30789v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has become a standard tool for improving the reasoning ability of large language models, yet its training dynamics are still described empirically: reward trajectories are fit with… 16 arXiv — Machine Learning research 1d ago Gradient Smoothing: Coupling Layer-wise Updates for Improved Optimization arXiv:2606.30813v1 Announce Type: new Abstract: Deep neural networks with repeated architectural blocks, such as transformers, often exhibit structured relationships across layers that emerge during training. Motivated by this observation, we introduce \emph{Depth-wise Gradient… 25 arXiv — Machine Learning research 1d ago Mind the Residual Gap: Probabilistic Downscaling under Real-World Bias arXiv:2606.30821v1 Announce Type: new Abstract: Probabilistic downscaling is the task of modeling the conditional distribution of high-resolution fields given coarse inputs, and is a central challenge to atmospheric science, climate modeling, and other multiscale physical… 21 arXiv — Machine Learning research 1d ago Partition-Guided Distance Saliency: Bridging Decision and Objective Spaces in Many-Objective Optimization arXiv:2606.30836v1 Announce Type: new Abstract: Explainability in Many-Objective Optimization (MaO) is currently hindered by the escalating complexity of the Pareto front, which renders the relationship between high-dimensional decision variables and objective outcomes… 16 arXiv — Machine Learning research 1d ago A Stationary-Distribution Theory for Triplet-Based Plateau Search in Random Forest Ensemble-Size Selection arXiv:2606.30837v1 Announce Type: new Abstract: The number of trees is a central computational parameter in Random Forests: increasing it reduces finite-ensemble variability but increases training and prediction cost. Plateau-based tuning adapts this parameter through local… 18 Page 4 of 10 · 500 articles ← Newer Older →