News / #paper Tag Research papers 500 articles archived under #paper · RSS Sign in to follow arXiv — NLP / Computation & Language research 1d ago Beyond Compilation: Evaluating Faithful Natural-Language-to-Lean Statement Formalization arXiv:2606.31002v1 Announce Type: cross Abstract: Theorem-proving benchmarks evaluate proof search against fixed formal statements, but natural-language-to-Lean formalization must generate the formal statement itself. In this setting, compilation is only a validity check: a Lean… 35 arXiv — NLP / Computation & Language research 1d ago ADAPT: Attention Dynamics Alignment with Preference Tuning for Faithful MLLMs arXiv:2606.31054v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) are critically hampered by hallucination, generating content inconsistent with the provided image. In this paper, we identify an internal signature of hallucination: progressive… 37 arXiv — NLP / Computation & Language research 1d ago Usage frequency and application variety of research methods in library and information science: Continuous investigation from 1991 to 2021 arXiv:2606.31081v1 Announce Type: cross Abstract: The present study analyzed over 26,000 research articles published between 1991 and 2021 in twenty-one major LIS (Library and Information Science) journals, using the machine learning (ML) approach to categorize the research… 5 arXiv — NLP / Computation & Language research 1d ago UniSAE: Unified Speech Attribute Editing on Speaker, Emotion and Low-Level Content via Discrete Phonetic Posteriorgram Modelling arXiv:2606.31128v1 Announce Type: cross Abstract: Speech editing aims to modify specific portions of an utterance while preserving the remaining speech. Existing approaches primarily focus on word-level content modification and typically treat content, speaker, and emotion… 30 arXiv — NLP / Computation & Language research 1d ago PruneGround: Plug-and-play Spatial Pruning for 3D Visual Grounding arXiv:2606.31148v1 Announce Type: cross Abstract: 3D Visual Grounding (3DVG) aims to localize target objects in 3D scenes given natural language descriptions. Existing approaches typically perform reasoning over the entire scene, leading to ambiguous predictions and high… 17 arXiv — NLP / Computation & Language research 1d ago HealthAgentBench: A Unified Benchmark Suite of Realistic Agentic Healthcare Environments for Challenging Frontier AI Agents arXiv:2606.31179v1 Announce Type: cross Abstract: As AI agents become increasingly capable of complex, long-horizon reasoning, rigorous and holistic evaluation is essential for measuring progress toward real-world healthcare applications. We introduce HealthAgentBench, a suite… 29 arXiv — NLP / Computation & Language research 1d ago Learning from Failure: Inference-Time Self-Improvement for Computer-Use Agents arXiv:2606.31270v1 Announce Type: cross Abstract: Computer-use agents, which leverage multimodal large language models (MLLMs) to operate computers and complete tasks, have attracted significant attention for their utility and versatility. A major challenge in developing these… 20 arXiv — NLP / Computation & Language research 1d ago The Decomposition Is the Fingerprint: Per-Component Identity for Agent Skills arXiv:2606.31272v1 Announce Type: cross Abstract: AI agents increasingly acquire and execute skills at runtime: bundles of prompt instructions, executable code, and tool declarations fetched from marketplaces and other agents. Governing them needs a stable notion of skill… 16 arXiv — NLP / Computation & Language research 1d ago Visual Semantic Entropy: Do Vision Language Models Recognize Visual Ambiguity? arXiv:2606.31407v1 Announce Type: cross Abstract: Vision-language models can produce confident answers on visually ambiguous inputs, resulting in biased predictions. Common entropy-based methods, such as Semantic Entropy (SE), rely on output diversity. Yet our analysis shows… 15 arXiv — NLP / Computation & Language research 1d ago CDR-Bench: Evaluating Faithful Execution of Compositional, Order-Sensitive Data Refinement Recipes arXiv:2606.31435v1 Announce Type: cross Abstract: Data refinement involves executing multi-step recipes over evolving text states, where both composition and execution order of processing operators determine the outcome. While existing benchmarks either isolate text editing or… 38 arXiv — NLP / Computation & Language research 1d ago Falsification, Not Exposure: An Internally Preregistered Placebo-Controlled Decomposition of Self-Repair Feedback in Frozen Small Code Models arXiv:2606.31511v1 Announce Type: cross Abstract: In deployment settings where retraining is infeasible, small frozen code models are routinely asked to repair a failed program after seeing their own failing output, usually treated as a retry mechanism. From a Popperian view, a… 9 arXiv — NLP / Computation & Language research 1d ago Modality-Driven Search with Holistic Trace Judging for ARC-AGI-2 arXiv:2606.31543v1 Announce Type: cross Abstract: Large language models can produce fluent, internally coherent reasoning traces for abstract reasoning tasks while still being confidently wrong - making selection among candidates, not just generation, the central challenge. I… 4 arXiv — NLP / Computation & Language research 1d ago ShopX: A Foundation Model for Intent-to-Item Fulfillment in Agentic Shopping arXiv:2606.31693v1 Announce Type: cross Abstract: The wave of AI-native applications is moving shopping beyond page- and feed-based browsing toward intent-driven experiences orchestrated by LLM agents. A common design wraps an LLM around existing search and recommendation… 38 arXiv — NLP / Computation & Language research 1d ago RCT: A Robot-Collected Touch-Vision-Language Dataset for Tactile Generalization arXiv:2606.31694v1 Announce Type: cross Abstract: For robots manipulating open-world objects, tactile representations must generalize to unseen materials. We introduce RCT (Robotic Contact Tactile), a robot-collected touch-vision-language dataset with 29,279 tactile frames from… 18 arXiv — NLP / Computation & Language research 1d ago SpikeLogBERT: Energy-Efficient Log Parsing Using Spiking Transformer Networks arXiv:2606.31781v1 Announce Type: cross Abstract: Log parsing is a fundamental step in automated log analysis, transforming raw system logs into structured event templates for downstream tasks such as anomaly detection and system monitoring. Existing log parsing methods range… 17 arXiv — NLP / Computation & Language research 1d ago MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments arXiv:2606.31966v1 Announce Type: cross Abstract: Recent multimodal large language models (MLLMs) have strong potential as embodied agents, but their ability to collaborate in visually grounded environments remains underexplored. To address this gap, we introduce MECoBench, a… 4 arXiv — NLP / Computation & Language research 1d ago Learning by Surprise: Adaptive Mitigation of Model Collapse in Large Language Models arXiv:2410.12341v4 Announce Type: replace Abstract: As AI-generated content increasingly populates the web, generative AI models are at growing risk of being trained on their own outputs, a process known as AI autophagy. This feedback loop has been shown to induce model… 16 arXiv — NLP / Computation & Language research 1d ago Verify when Uncertain: Beyond Self-Consistency in Black Box Hallucination Detection arXiv:2502.15845v2 Announce Type: replace Abstract: Large Language Models (LLMs) often hallucinate, limiting their reliability in sensitive applications. In black-box settings, several self-consistency-based techniques have been proposed for hallucination detection. We… 29 arXiv — NLP / Computation & Language research 1d ago SAGE: A Search-AuGmented Evaluation of Large Language Models on Free-Form QA arXiv:2504.07385v3 Announce Type: replace Abstract: As Large Language Models (LLMs) become increasingly used for question-answering (QA), relying on static, pre-annotated references for evaluation poses significant challenges in cost, scalability, and completeness. Meanwhile,… 26 arXiv — NLP / Computation & Language research 1d ago From Multimodal Perception to Strategic Reasoning: A Survey on AI-Generated Game Commentary arXiv:2506.17294v3 Announce Type: replace Abstract: The advent of artificial intelligence has propelled AI-Generated Game Commentary (AI-GGC) into a rapidly expanding research area, offering advantages such as scalable availability and personalized narration. However, existing… 17 arXiv — NLP / Computation & Language research 1d ago The Bidirectional Process Reward Model arXiv:2508.01682v3 Announce Type: replace Abstract: Process Reward Models (PRMs), which assign fine-grained scores to intermediate reasoning steps within a solution trajectory, have emerged as a promising approach to enhance the reasoning quality of Large Language Models (LLMs).… 5 arXiv — NLP / Computation & Language research 1d ago Rethinking On-policy Optimization for Query Augmentation arXiv:2510.17139v3 Announce Type: replace Abstract: Recent advances in large language models (LLMs) have led to a surge of interest in query augmentation for information retrieval (IR). Two main approaches have emerged. The first prompts LLMs to generate answers or… 28 arXiv — NLP / Computation & Language research 1d ago Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation arXiv:2512.21002v3 Announce Type: replace Abstract: Distilling the capabilities from a large reasoning model (LRM) to a smaller student model often involves training on substantial amounts of reasoning data. However, knowledge distillation (KD) over lengthy sequences with prompt… 28 arXiv — NLP / Computation & Language research 1d ago InfiniteWeb: Scalable Web Environment Synthesis for GUI Agent Training arXiv:2601.04126v3 Announce Type: replace Abstract: GUI agents that interact with graphical interfaces on behalf of users represent a promising direction for practical AI assistants. However, training such agents is hindered by the scarcity of suitable environments. We present… 29 arXiv — NLP / Computation & Language research 1d ago What If We Allocate Test-Time Compute Adaptively? arXiv:2602.01070v5 Announce Type: replace Abstract: Test-time compute scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking. In contrast, we propose a verifier-guided adaptive framework treating reasoning… 30 arXiv — NLP / Computation & Language research 1d ago FairJudge: An Adaptive, Debiased, and Consistent LLM-as-a-Judge arXiv:2602.06625v2 Announce Type: replace Abstract: Existing LLM-as-a-Judge systems suffer from three fundamental limitations: limited adaptivity to task- and domain-specific evaluation criteria, systematic biases driven by non-semantic cues such as position, length, format, and… 7 arXiv — NLP / Computation & Language research 1d ago Beyond Scalar Rewards: Dense Feedback for LLM Policy Synthesis in Sequential Social Dilemmas arXiv:2603.19453v3 Announce Type: replace Abstract: We propose an LLM harness that generates code-based policy functions for multi-agent environments, evaluates them with self-play, and refines them using feedback from previous iterations. Following the recent line of work in… 28 Hacker News — AI on Front Page community 1d ago ArXiv's Next Chapter Article URL: https://blog.arxiv.org/2026/06/30/arxivs-next-chapter/ Comments URL: https://news.ycombinator.com/item?id=48741748 Points: 200 # Comments: 59 12 Hugging Face Daily Papers research 2d ago DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model Abstract DreamForge-World 0.1 Preview adapts a video generation architecture with a residual action pathway to enable real-time interactive world simulation on consumer hardware with low computational requirements. Generated by Qwen/Qwen2.5-Coder-32B-Instruct We present… 18 r/LocalLLaMA community 2d ago PageStorm: A Model Built for Creative Book Writing Over a year ago, we set out to build a single-turn full-book writing model. Half a year ago, we published our LongPage Dataset for book scale creative writing. Today, we are announcing our first model: PageStorm Research Preview. Paper: https://arxiv.org/abs/2605.17064 Models:… 9 Hugging Face Daily Papers research 2d ago TheoremGraph: Bridging Formal and Informal Mathematics Abstract A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Mathematical knowledge is organized around statements… 32 r/LocalLLaMA community 2d ago InternScience/Agents-A1 · Hugging Face Unbelievable benchmarks for a 35B MoE, somebody verify. Here is tech report btw: https://arxiv.org/pdf/2606.30616   submitted by   /u/mlon_eusk-_- [link]   [comments] 23 arXiv — Machine Learning research 2d ago Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models arXiv:2606.28406v1 Announce Type: new Abstract: Text-to-image and multimodal generative models are increasingly used to produce scientific figures such as mechanism diagrams, experimental-design schematics, conceptual frameworks, and graphical abstracts. Yet existing… 36 arXiv — Machine Learning research 2d ago On the Necessity of a Liquid Substrate for Mesh Intelligence arXiv:2606.28413v1 Announce Type: new Abstract: A mesh of sovereign agents has no center: no shared clock, no shared model, and no coordinator to gather data or retrain. Its competence rests on each agent folding the projections its peers emit into a single internal state,… 8 arXiv — Machine Learning research 2d ago Position: RL Researchers Need to Distinguish Between Solving Simulators and Using Simulators as a Proxy arXiv:2606.28433v1 Announce Type: new Abstract: One goal in reinforcement learning (RL) research is to understand general-purpose sequential decision-making, using benchmark simulators as a proxy for learning in deployment settings. When running experiments, however, the goal of… 5 arXiv — Machine Learning research 2d ago Learning to Distributedly Estimate under Partially Known Dynamics: A Covariance-Agnostic Neural Kalman Consensus Filter arXiv:2606.28441v1 Announce Type: new Abstract: Online latent state estimation constitutes a fundamental challenge within the artificial intelligence field, serving as a foundational tool for diverse applications, including sequential decision making, anomaly and change-point… 21 arXiv — Machine Learning research 2d ago S-GAI: Spectral Geometry-Aware Initialization for Sigmoidal MLPs -- From Dataset Geometry to Network Weights arXiv:2606.28444v1 Announce Type: new Abstract: Classical universal approximation theorems establish the expressive power of sigmoidal multilayer perceptrons, but they do not prescribe how initial weights should encode the geometry of a data distribution. We propose S-GAI, a… 31 arXiv — Machine Learning research 2d ago scKDGM: KAN-guided Dynamic Graph Masked Learning for Single-Cell RNA-seq Clustering arXiv:2606.28459v1 Announce Type: new Abstract: Single-cell RNA sequencing (scRNA-seq) clustering is essential for identifying cell types, but high dimensionality, sparsity, dropout, and technical noise hinder robust expression representation and cell graph construction.… 27 arXiv — Machine Learning research 2d ago Counterfactual Residual Data Augmentation for Regression arXiv:2606.28460v1 Announce Type: new Abstract: Data-driven modeling in real-world regression tasks often suffers from limited training samples, high collection costs, and noisy observations. Inspired by the impact of data augmentation in vision and language, we propose a novel… 21 arXiv — Machine Learning research 2d ago Singular Learning and Occam's Razor in Deep Monomial Networks arXiv:2606.28464v1 Announce Type: new Abstract: In the optimization of neural networks, gradient dynamics are influenced by critical points that arise from the model's architecture. These critical points occur where the Jacobian of the model's parametrization is rank-deficient,… 11 arXiv — Machine Learning research 2d ago An Agentic AI Pipeline for Appliance-Level Energy Anomaly Detection and LLM-Driven Recommendations arXiv:2606.28467v1 Announce Type: new Abstract: Appliance-level energy monitoring in office buildings produces noisy alerts that non-expert facility managers struggle to use. This paper proposes an end-to-end agentic pipeline that combines deep time-series forecasting,… 11 arXiv — Machine Learning research 2d ago Modelling Emotional Memory in Children with Tensor Networks arXiv:2606.28470v1 Announce Type: new Abstract: We demonstrate how emotional valence influences the order-dependent structure of children's recognition memory: correct recall of a sequence of emotionally-valenced toys depended not just on the valence of a given toy itself, but… 7 arXiv — Machine Learning research 2d ago A Trainable-by-Parts Operator Learning Framework: Bridging DeepONet and Karhunen-Loeve Expansions for Large-Scale Applications arXiv:2606.28519v1 Announce Type: new Abstract: Training operator-learning models for large-scale problems governed by partial differential equations (PDEs) is challenging due to the curse of dimensionality, memory constraints, and limited training data. These challenges arise… 38 arXiv — Machine Learning research 2d ago A Gravitational Interpretation of Fine-Tuning Reversion arXiv:2606.28525v1 Announce Type: new Abstract: Fine-tuning on harmless data can partially undo behaviors acquired earlier in training. Safety can erode under benign post-alignment updates, unlearned capabilities can re-emerge, latent traits can transfer through apparently… 27 arXiv — Machine Learning research 2d ago NIVA: A Multimodal Foundation Model for Actionable Earth System Intelligence arXiv:2606.28546v1 Announce Type: new Abstract: Recent advances in AI-driven weather and climate modeling have improved forecast skill while reducing computational cost. However, existing data-driven approaches are limited in their ability to model coupled Earth system dynamics,… 9 arXiv — Machine Learning research 2d ago Improving Coherence in Hierarchical Time Series Forecasting using Structured Temporal Fusion arXiv:2606.28553v1 Announce Type: new Abstract: In many real-world applications, such as retail sales, energy usage, and supply chain planning, forecasting is performed across hierarchical structures. These structures often represent aggregations (e.g., products to categories to… 29 arXiv — Machine Learning research 2d ago Geometric Measurements of the Axiom of Choice in Neural Proof Embeddings arXiv:2606.28572v1 Announce Type: new Abstract: The axiom of choice has divided the foundations of mathematics for over a century, but the distinction between classical and constructive proofs has remained a philosophical and methodological one. We use Lean 4's kernel-level… 8 arXiv — Machine Learning research 2d ago Replica Symmetry Breaking and Algorithmic Thresholds in Empirical Risk Minimization under Multi-Index Model arXiv:2606.28573v1 Announce Type: new Abstract: Modern machine learning models are trained by optimizing high-dimensional non-convex empirical risk functions. Such cost functions can have a multitude of local optima and yet, gradient-based optimization appears to converge to… 10 arXiv — Machine Learning research 2d ago What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs arXiv:2606.28615v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, where free-text explanations such as chain-of-thought and post-hoc rationales are used to justify model outputs. Yet it remains unclear whether these… 31 arXiv — Machine Learning research 2d ago Randomized Exploration for Linear Bandits via Absolute Perturbations arXiv:2606.28616v1 Announce Type: new Abstract: In stochastic linear bandits, the canonical Upper Confidence Bound (UCB) algorithm admits a simple frequentist regret analysis but can be computationally demanding, while Thompson Sampling (TS) is computationally attractive yet… 27 Page 8 of 10 · 500 articles ← Newer Older →