News / #paper Tag Research papers 500 articles archived under #paper · RSS Sign in to follow arXiv — NLP / Computation & Language research 6h ago Persona Without Substrate: Regime-Dependence and the LLM Individuation Problem arXiv:2607.00006v1 Announce Type: new Abstract: Beckmann & Butlin's (2026) ontological framework for the LLM individuation problem inherits an unargued cross-regime co-reference assumption from the persona-vectors literature: that the same direction picks out the same content… 21 arXiv — NLP / Computation & Language research 6h ago Controllable Narrative Rendering for Enhanced Assisted Writing arXiv:2607.00009v1 Announce Type: new Abstract: Despite the remarkable proficiency of large language models (LLMs) in basic writing assistance, their utility in creative writing is fundamentally hindered by a persistent binary failure. This issue manifests as an oscillation… 13 arXiv — NLP / Computation & Language research 6h ago Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth arXiv:2607.00139v1 Announce Type: new Abstract: The cost of human expert evaluation is a principal bottleneck to deploying language models in specialized, high-stakes domains. This is particularly acute for Arabic sociolinguistic knowledge: credible grading requires not only… 20 arXiv — NLP / Computation & Language research 6h ago Hate Speech Detection in Turkish and Arabic Languages: A Comprehensive Study arXiv:2607.00143v1 Announce Type: new Abstract: Online hate speech has been linked to a global rise in violence against minorities, including incidents such as mass shootings, lynchings, and ethnic cleansing. Societies grappling with this issue, particularly when hate speech… 6 arXiv — NLP / Computation & Language research 6h ago Readable but Not Controllable: Neuron-Level Evidence for Medical LLM Hallucination arXiv:2607.00158v1 Announce Type: new Abstract: Hallucination remains one of the central obstacles to deploying medical LLMs. Yet, even when hallucination can be detected, it is still unclear whether the internal representations associated with it can be used for control rather… 33 arXiv — NLP / Computation & Language research 6h ago Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting arXiv:2607.00159v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) aims to evaluate whether Visual Language Models (VLMs) can retrieve, ground, and reason over external structured knowledge beyond visual evidence. In practice, answer accuracy is… 30 arXiv — NLP / Computation & Language research 6h ago ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs arXiv:2607.00171v1 Announce Type: new Abstract: Text embeddings are standard for semantic similarity tasks, yet their evaluation remains an open challenge. Current benchmarks are static, cover only a limited set of languages, are often domain-specific, susceptible to… 4 arXiv — NLP / Computation & Language research 6h ago Structural Pattern Mining in Inka Khipus: Unsupervised Clustering, Provenance Classification, and a Computational Validation of the Santa Valley Match arXiv:2607.00185v1 Announce Type: new Abstract: Khipus--knotted cord devices--were the primary recording medium of the Inka Empire (c. 1400-1532 CE), yet their system remains undeciphered. We present a reproducible machine-learning pipeline applied to the Open Khipu Repository… 29 arXiv — NLP / Computation & Language research 6h ago LV-ROVER: Multi-Stream Tesseract Voting for Maltese Paragraph OCR arXiv:2607.00250v1 Announce Type: new Abstract: Maltese has decent text corpora and pretrained language models, but, like many languages outside the handful with large OCR benchmarks, only a single known real labelled PDF corpus for OCR training, 57 page, far below what… 25 arXiv — NLP / Computation & Language research 6h ago SEFORA: Student Essays with Feedback Corpus and LLM Feedback Evaluation Framework arXiv:2607.00274v1 Announce Type: new Abstract: Effective writing feedback is among the strongest drivers of student learning, yet producing it at scale is labor-intensive. LLMs offer a natural path to scaling writing support, but two gaps stand in the way: few public corpora… 10 arXiv — NLP / Computation & Language research 6h ago TRACE: State-Aware Query Processing over Temporal Evidence Graphs for Conversational Data arXiv:2607.00339v1 Announce Type: new Abstract: Conversational data is increasingly used as a persistent source of user state for long-running assistants and AI agents. However, querying this data remains challenging because conversations naturally evolve: plans are revised,… 8 arXiv — NLP / Computation & Language research 6h ago DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning arXiv:2607.00341v1 Announce Type: new Abstract: Large language models achieve strong performance on many reasoning tasks when allowed to externalize intermediate steps as Chain-of-Thought (CoT). However, many questions require the model to internalize the multi-step reasoning… 32 arXiv — NLP / Computation & Language research 6h ago Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training arXiv:2607.00368v1 Announce Type: new Abstract: Large language model test-time training (TTT) is often evaluated through local proxy metrics: models are updated on recent tokens, retrieved context, target-domain data, or verifiable task attempts, and then judged by perplexity,… 12 arXiv — NLP / Computation & Language research 6h ago A Mechanistic View of Authority Hierarchy in LLM Sycophancy arXiv:2607.00415v1 Announce Type: new Abstract: Authority bias poses a critical safety concern in language models: models systematically prioritize social cues from authority figures over factual consistency, swaying their answers based on source credibility rather than… 17 arXiv — NLP / Computation & Language research 6h ago Speech Playground: An Interactive Tool for Speech Analysis and Comparison arXiv:2607.00418v1 Announce Type: new Abstract: This paper presents Speech Playground, an interactive speech visualization and comparison tool. While existing tools such as Praat are excellent, it can be cumbersome to integrate them with modern deep learning representations and… 26 arXiv — NLP / Computation & Language research 6h ago Selective Test-Time Debiasing for CLIP via Reward Gating arXiv:2607.00423v1 Announce Type: new Abstract: Vision language models (VLMs) demonstrate strong zero-shot performance, but often perpetuate social stereotypes in person-centric queries, yielding skewed demographic distributions. Current debiasing methods apply uniform bias… 22 arXiv — NLP / Computation & Language research 6h ago Understanding Why Language Models Hallucinate: Testing Reasoning Against Priors arXiv:2607.00447v1 Announce Type: new Abstract: Large language models often produce hallucinated answers that violate prompt-level constraints. A key diagnostic question is whether these failures reflect missing knowledge, or whether the model has the relevant information but… 10 arXiv — NLP / Computation & Language research 6h ago Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking arXiv:2607.00482v1 Announce Type: new Abstract: Reasoning language models frequently overthink: generating extended chains of behaviors such as hedging, approach abandonment, and self contradiction that consume tokens without improving answers. We show that these behaviors are… 4 arXiv — NLP / Computation & Language research 6h ago Efficient Multilingual Reasoning Transfer via Progressive Code-Switching arXiv:2607.00485v1 Announce Type: new Abstract: Large reasoning models (LRMs) have achieved strong reasoning capabilities in English, yet their performance degrades significantly when required to reason in other languages. A natural solution is to transfer the model's English… 9 arXiv — NLP / Computation & Language research 6h ago BaseRT: Best-in-Class LLM Inference on Apple Silicon via Native Metal arXiv:2607.00501v1 Announce Type: new Abstract: We present BaseRT, a native Metal inference runtime for large language models (LLMs) on Apple Silicon, and report the highest inference throughput on this hardware to date. Existing runtimes, including llama.cpp and MLX-based… 22 arXiv — NLP / Computation & Language research 6h ago A Task-State Representation for Long-Horizon Mobile GUI Agents arXiv:2607.00502v1 Announce Type: new Abstract: While long-horizon mobile GUI agents typically rely on thought-action-observation loops, they struggle to separate persistent task states from transient screen observations. As execution histories grow, this entanglement imposes a… 11 arXiv — NLP / Computation & Language research 6h ago Dual-Confidence Contrastive Decoding for Retrieval-Augmented Generation arXiv:2607.00570v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) increasingly requires models to answer questions from multiple retrieved documents, where only some sources are relevant and the retrieved bundle may contain stale, noisy, or conflicting… 35 arXiv — NLP / Computation & Language research 6h ago Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine arXiv:2607.00576v1 Announce Type: new Abstract: Multi-image content has become an increasingly prevalent form of visual communication in social media, giving rise to a new safety issue, multi-image implicit toxicity (MIIT), where each image appears benign in isolation, but… 15 arXiv — NLP / Computation & Language research 6h ago Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs arXiv:2607.00588v1 Announce Type: new Abstract: Continuous diffusion language models such as ELF report record-low generative perplexity (Gen-PPL). We find a catch: these models repeat far more than human text, and Gen-PPL rewards rather than penalizes that repetition, so its… 12 arXiv — NLP / Computation & Language research 6h ago Multi-Turn Agentic Scientific Literature Search via Workflow Induction arXiv:2607.00597v1 Announce Type: new Abstract: Scientific literature search often requires more than retrieving papers from a single query: users' intents are underspecified, preference-dependent, and evolve through interaction. Existing search agents typically rely on fixed… 26 arXiv — NLP / Computation & Language research 6h ago "Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo arXiv:2607.00601v1 Announce Type: new Abstract: The game of Taboo requires describing a target word without using a set of forbidden words, so that other players can guess it. This deceptively simple task combines strict lexical constraints with the need for communicatively… 12 arXiv — NLP / Computation & Language research 6h ago Auditing Forgetting in Limited Memory Language Models arXiv:2607.00605v1 Announce Type: new Abstract: Limited Memory Language Models (LMLMs) externalize factual knowledge to a database to enable deletion-based unlearning without retraining. Existing evaluations measure post-deletion correctness in aggregate and cannot tell whether… 5 arXiv — NLP / Computation & Language research 6h ago Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications arXiv:2607.00661v1 Announce Type: new Abstract: Explanations for emotion classifiers are usually produced post hoc, with no guarantee that they reflect the computation behind the label. We present an explication interface for event-based emotion analysis. A parser maps the input… 9 arXiv — NLP / Computation & Language research 6h ago YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese arXiv:2607.00664v1 Announce Type: new Abstract: We propose YOMI-Bench, a benchmark for evaluating kanji reading and phonological understanding of large language models (LLMs) for Japanese. In Japanese, a single kanji character often has multiple possible readings, making it… 8 arXiv — NLP / Computation & Language research 6h ago Self-conditioned Flow Map Language Models via Fixed-point Flows arXiv:2607.00714v1 Announce Type: new Abstract: Self-conditioning is a core technique that enhances continuous flow-based language models, where the model learns to denoise generated text by conditioning on its own denoising estimate. While empirically successful, its… 13 arXiv — NLP / Computation & Language research 6h ago MSQA: A Natively Sourced Multilingual and Multicultural SimpleQA Benchmark arXiv:2607.00724v1 Announce Type: new Abstract: Multilingual fluency often invites a stronger assumption: a model that can speak a user's language must also understand the culture encoded by that language. We call this the Illusion of Cultural Alignment. To test this assumption… 8 arXiv — NLP / Computation & Language research 6h ago What Survives Into Context: A Diagnostic for Budget-Constrained Multi-Hop RAG and When Submodular Evidence Packing Improves It arXiv:2607.00725v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) under a fixed reader-context budget forces a selection problem: of the evidence retrieved, only a fraction can be shown to the reader. We argue that document recall -- the standard retrieval… 27 arXiv — NLP / Computation & Language research 6h ago MetaHOPE: A Metaphor-Oriented Evaluation Framework for Analysing MT and LLM Translation Errors arXiv:2607.00848v1 Announce Type: new Abstract: In this opinion paper, we propose MetaHOPE, an error severity-aware annotation framework for evaluating metaphor translations. Metaphors present challenges for machine translation (MT) and natural language understanding and… 21 arXiv — NLP / Computation & Language research 6h ago The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters arXiv:2607.00849v1 Announce Type: new Abstract: News articles are an important source of information on disaster impacts and adaptation. A key methodological challenge in socio-environmental studies is how to select a representative data sample. Two approaches are common:… 35 arXiv — NLP / Computation & Language research 6h ago Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models arXiv:2607.00852v1 Announce Type: new Abstract: This work studies the hidden-state inversion problem: recovering the original input token sequence of a decoder-only language model from its last-layer hidden states. Rather than treating inversion as a one-shot reconstruction, we… 34 arXiv — NLP / Computation & Language research 6h ago CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models arXiv:2607.00862v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have achieved remarkable success on complex tasks by leveraging long chain-of-thought (CoT) trajectories, yet they frequently exhibit overthinking on simple queries, resulting in significant token… 8 arXiv — NLP / Computation & Language research 6h ago Dynamic Bidirectional Pattern Memory: A Production-Scale Empirical Characterisation of Inference-Time Gating in Clinical NLP arXiv:2607.00870v1 Announce Type: new Abstract: We study inference-time pattern-memory gating in a production-scale clinical natural language processing (NLP) pipeline. The pipeline pairs a generator (Llama-3.3 70B) proposing extractions with a verifier (MMed-Llama-3.1 70B)… 33 arXiv — NLP / Computation & Language research 6h ago How Ethos and Pathos Appeals Resonate in Reader Interpretations of Social Media Messages arXiv:2607.00873v1 Announce Type: new Abstract: Rhetorical strategies and their influence on audiences are often studied through social media posts and comments. However, this focus overlooks the universal audience, which is the majority of readers who remain silent and do not… 6 arXiv — NLP / Computation & Language research 6h ago MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages arXiv:2607.00890v1 Announce Type: new Abstract: Open web-scale pre-training corpora remain concentrated in English, limiting multilingual LLM development. We introduce MultiSynt/MT, an open synthetic parallel corpus with approximately 4.8 trillion target-language tokens across… 12 arXiv — NLP / Computation & Language research 6h ago Beyond Document Grounding: Span-Level Hallucination Detection over Code, Tool Output, and Documents arXiv:2607.00895v1 Announce Type: new Abstract: Hallucination detection for retrieval-augmented generation (RAG) is usually evaluated on natural-language document evidence. However, grounded generation systems increasingly rely on structured inputs: source code, developer-tool… 14 arXiv — NLP / Computation & Language research 6h ago From Personas to Plot: Character-Grounded Multi-Agent Story Generation for Long-Form Narratives arXiv:2607.00918v1 Announce Type: new Abstract: Although large language models (LLMs) have demonstrated impressive creative fiction generation, they struggle to maintain narrative consistency and coherent plot lines in long-form stories. In this work, we introduce a unified… 30 arXiv — NLP / Computation & Language research 6h ago Persona Non Grata: LLM Persona-Driven Generations in MCQA are Unstable in Distinct Dimensions arXiv:2607.00937v1 Announce Type: new Abstract: Persona-driven generations (PDGs) have seen prolific use in research and industry applications, where a large language model (LLM) takes on a 'persona' while completing some task. While persona expressed through free-form text… 19 arXiv — NLP / Computation & Language research 6h ago Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies arXiv:2607.00968v1 Announce Type: new Abstract: Emotion recognition in natural language is a foundational challenge in affective computing, with critical implications for human-computer interaction, mental health support, and conversational AI. This paper presents a rigorous,… 20 arXiv — NLP / Computation & Language research 6h ago Svarna: An Open Corpus Workbench for Modern Greek arXiv:2607.00970v1 Announce Type: new Abstract: This paper introduces Svarna, a free, open-source, web-based corpus workbench for modern Greek. Svarna integrates five databases covering various registers, institutional, literary, dialectal, social media, and historical, to… 29 arXiv — NLP / Computation & Language research 6h ago KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers arXiv:2607.01000v1 Announce Type: new Abstract: Recent research has increasingly focused on understanding how Transformers store and process knowledge, as well as how this knowledge can be edited. Research work in this area is often conducted in two phases: first, phenomena are… 8 arXiv — NLP / Computation & Language research 6h ago Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads arXiv:2607.01002v1 Announce Type: new Abstract: In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for… 5 arXiv — NLP / Computation & Language research 6h ago Understanding Large Language Models arXiv:2607.01006v1 Announce Type: new Abstract: Large Language Models (LLMs) represent one of the most significant advances in AI and natural language processing in recent years. Still, many pressing questions about their mechanisms, capabilities, and relationship to human… 28 arXiv — NLP / Computation & Language research 6h ago Reading Order Inference for Complex Document Layouts arXiv:2607.01018v1 Announce Type: new Abstract: Reading order inference remains a critical bottleneck in the digitization of complex historical manuscripts, where pages contain multiple spatially interleaved reading streams, the canonical example being the Glossa Ordinaria… 4 arXiv — NLP / Computation & Language research 6h ago Evidence-Supported Credit Risk Report Generation Using News-Centric Financial Knowledge Graphs arXiv:2607.01023v1 Announce Type: new Abstract: Financial markets evolve in response to real-world events reported in news, yet these drivers often remain implicit in text. To better explain market dynamics, event-market relations must be explicitly modeled through factual,… 27 arXiv — NLP / Computation & Language research 6h ago Behavior-Adaptive Conversational Agents: Toward a Fluid Personality Framework arXiv:2607.01034v1 Announce Type: new Abstract: Large language model (LLM)-based conversational agents (CAs) are now ubiquitous, creating new opportunities for AI-mediated behavior change. Their capacity to project nuanced personalities and adopt diverse metaphorical roles… 38 Page 3 of 10 · 500 articles ← Newer Older →