Tag

Research papers

500 articles archived under #paper · RSS

arXiv — NLP / Computation & Language research 6h ago

Persona Without Substrate: Regime-Dependence and the LLM Individuation Problem

arXiv:2607.00006v1 Announce Type: new Abstract: Beckmann & Butlin's (2026) ontological framework for the LLM individuation problem inherits an unargued cross-regime co-reference assumption from the persona-vectors literature: that the same direction picks out the same content…

21
arXiv — NLP / Computation & Language research 6h ago

Controllable Narrative Rendering for Enhanced Assisted Writing

arXiv:2607.00009v1 Announce Type: new Abstract: Despite the remarkable proficiency of large language models (LLMs) in basic writing assistance, their utility in creative writing is fundamentally hindered by a persistent binary failure. This issue manifests as an oscillation…

13
arXiv — NLP / Computation & Language research 6h ago

Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth

arXiv:2607.00139v1 Announce Type: new Abstract: The cost of human expert evaluation is a principal bottleneck to deploying language models in specialized, high-stakes domains. This is particularly acute for Arabic sociolinguistic knowledge: credible grading requires not only…

20
arXiv — NLP / Computation & Language research 6h ago

Hate Speech Detection in Turkish and Arabic Languages: A Comprehensive Study

arXiv:2607.00143v1 Announce Type: new Abstract: Online hate speech has been linked to a global rise in violence against minorities, including incidents such as mass shootings, lynchings, and ethnic cleansing. Societies grappling with this issue, particularly when hate speech…

6
arXiv — NLP / Computation & Language research 6h ago

Readable but Not Controllable: Neuron-Level Evidence for Medical LLM Hallucination

arXiv:2607.00158v1 Announce Type: new Abstract: Hallucination remains one of the central obstacles to deploying medical LLMs. Yet, even when hallucination can be detected, it is still unclear whether the internal representations associated with it can be used for control rather…

33
arXiv — NLP / Computation & Language research 6h ago

Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting

arXiv:2607.00159v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) aims to evaluate whether Visual Language Models (VLMs) can retrieve, ground, and reason over external structured knowledge beyond visual evidence. In practice, answer accuracy is…

30
arXiv — NLP / Computation & Language research 6h ago

ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs

arXiv:2607.00171v1 Announce Type: new Abstract: Text embeddings are standard for semantic similarity tasks, yet their evaluation remains an open challenge. Current benchmarks are static, cover only a limited set of languages, are often domain-specific, susceptible to…

4
arXiv — NLP / Computation & Language research 6h ago

Structural Pattern Mining in Inka Khipus: Unsupervised Clustering, Provenance Classification, and a Computational Validation of the Santa Valley Match

arXiv:2607.00185v1 Announce Type: new Abstract: Khipus--knotted cord devices--were the primary recording medium of the Inka Empire (c. 1400-1532 CE), yet their system remains undeciphered. We present a reproducible machine-learning pipeline applied to the Open Khipu Repository…

29
arXiv — NLP / Computation & Language research 6h ago

LV-ROVER: Multi-Stream Tesseract Voting for Maltese Paragraph OCR

arXiv:2607.00250v1 Announce Type: new Abstract: Maltese has decent text corpora and pretrained language models, but, like many languages outside the handful with large OCR benchmarks, only a single known real labelled PDF corpus for OCR training, 57 page, far below what…

25
arXiv — NLP / Computation & Language research 6h ago

SEFORA: Student Essays with Feedback Corpus and LLM Feedback Evaluation Framework

arXiv:2607.00274v1 Announce Type: new Abstract: Effective writing feedback is among the strongest drivers of student learning, yet producing it at scale is labor-intensive. LLMs offer a natural path to scaling writing support, but two gaps stand in the way: few public corpora…

10
arXiv — NLP / Computation & Language research 6h ago

TRACE: State-Aware Query Processing over Temporal Evidence Graphs for Conversational Data

arXiv:2607.00339v1 Announce Type: new Abstract: Conversational data is increasingly used as a persistent source of user state for long-running assistants and AI agents. However, querying this data remains challenging because conversations naturally evolve: plans are revised,…

8
arXiv — NLP / Computation & Language research 6h ago

DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning

arXiv:2607.00341v1 Announce Type: new Abstract: Large language models achieve strong performance on many reasoning tasks when allowed to externalize intermediate steps as Chain-of-Thought (CoT). However, many questions require the model to internalize the multi-step reasoning…

32
arXiv — NLP / Computation & Language research 6h ago

Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training

arXiv:2607.00368v1 Announce Type: new Abstract: Large language model test-time training (TTT) is often evaluated through local proxy metrics: models are updated on recent tokens, retrieved context, target-domain data, or verifiable task attempts, and then judged by perplexity,…

12
arXiv — NLP / Computation & Language research 6h ago

A Mechanistic View of Authority Hierarchy in LLM Sycophancy

arXiv:2607.00415v1 Announce Type: new Abstract: Authority bias poses a critical safety concern in language models: models systematically prioritize social cues from authority figures over factual consistency, swaying their answers based on source credibility rather than…

17
arXiv — NLP / Computation & Language research 6h ago

Speech Playground: An Interactive Tool for Speech Analysis and Comparison

arXiv:2607.00418v1 Announce Type: new Abstract: This paper presents Speech Playground, an interactive speech visualization and comparison tool. While existing tools such as Praat are excellent, it can be cumbersome to integrate them with modern deep learning representations and…

26
arXiv — NLP / Computation & Language research 6h ago

Selective Test-Time Debiasing for CLIP via Reward Gating

arXiv:2607.00423v1 Announce Type: new Abstract: Vision language models (VLMs) demonstrate strong zero-shot performance, but often perpetuate social stereotypes in person-centric queries, yielding skewed demographic distributions. Current debiasing methods apply uniform bias…

22
arXiv — NLP / Computation & Language research 6h ago

Understanding Why Language Models Hallucinate: Testing Reasoning Against Priors

arXiv:2607.00447v1 Announce Type: new Abstract: Large language models often produce hallucinated answers that violate prompt-level constraints. A key diagnostic question is whether these failures reflect missing knowledge, or whether the model has the relevant information but…

10
arXiv — NLP / Computation & Language research 6h ago

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking

arXiv:2607.00482v1 Announce Type: new Abstract: Reasoning language models frequently overthink: generating extended chains of behaviors such as hedging, approach abandonment, and self contradiction that consume tokens without improving answers. We show that these behaviors are…

4
arXiv — NLP / Computation & Language research 6h ago

Efficient Multilingual Reasoning Transfer via Progressive Code-Switching

arXiv:2607.00485v1 Announce Type: new Abstract: Large reasoning models (LRMs) have achieved strong reasoning capabilities in English, yet their performance degrades significantly when required to reason in other languages. A natural solution is to transfer the model's English…

9
arXiv — NLP / Computation & Language research 6h ago

BaseRT: Best-in-Class LLM Inference on Apple Silicon via Native Metal

arXiv:2607.00501v1 Announce Type: new Abstract: We present BaseRT, a native Metal inference runtime for large language models (LLMs) on Apple Silicon, and report the highest inference throughput on this hardware to date. Existing runtimes, including llama.cpp and MLX-based…

22
arXiv — NLP / Computation & Language research 6h ago

A Task-State Representation for Long-Horizon Mobile GUI Agents

arXiv:2607.00502v1 Announce Type: new Abstract: While long-horizon mobile GUI agents typically rely on thought-action-observation loops, they struggle to separate persistent task states from transient screen observations. As execution histories grow, this entanglement imposes a…

11
arXiv — NLP / Computation & Language research 6h ago

Dual-Confidence Contrastive Decoding for Retrieval-Augmented Generation

arXiv:2607.00570v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) increasingly requires models to answer questions from multiple retrieved documents, where only some sources are relevant and the retrieved bundle may contain stale, noisy, or conflicting…

35
arXiv — NLP / Computation & Language research 6h ago

Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine

arXiv:2607.00576v1 Announce Type: new Abstract: Multi-image content has become an increasingly prevalent form of visual communication in social media, giving rise to a new safety issue, multi-image implicit toxicity (MIIT), where each image appears benign in isolation, but…

15
arXiv — NLP / Computation & Language research 6h ago

Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs

arXiv:2607.00588v1 Announce Type: new Abstract: Continuous diffusion language models such as ELF report record-low generative perplexity (Gen-PPL). We find a catch: these models repeat far more than human text, and Gen-PPL rewards rather than penalizes that repetition, so its…

12
arXiv — NLP / Computation & Language research 6h ago

Multi-Turn Agentic Scientific Literature Search via Workflow Induction

arXiv:2607.00597v1 Announce Type: new Abstract: Scientific literature search often requires more than retrieving papers from a single query: users' intents are underspecified, preference-dependent, and evolve through interaction. Existing search agents typically rely on fixed…

26
arXiv — NLP / Computation & Language research 6h ago

"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo

arXiv:2607.00601v1 Announce Type: new Abstract: The game of Taboo requires describing a target word without using a set of forbidden words, so that other players can guess it. This deceptively simple task combines strict lexical constraints with the need for communicatively…

12
arXiv — NLP / Computation & Language research 6h ago

Auditing Forgetting in Limited Memory Language Models

arXiv:2607.00605v1 Announce Type: new Abstract: Limited Memory Language Models (LMLMs) externalize factual knowledge to a database to enable deletion-based unlearning without retraining. Existing evaluations measure post-deletion correctness in aggregate and cannot tell whether…

5
arXiv — NLP / Computation & Language research 6h ago

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

arXiv:2607.00661v1 Announce Type: new Abstract: Explanations for emotion classifiers are usually produced post hoc, with no guarantee that they reflect the computation behind the label. We present an explication interface for event-based emotion analysis. A parser maps the input…

9
arXiv — NLP / Computation & Language research 6h ago

YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

arXiv:2607.00664v1 Announce Type: new Abstract: We propose YOMI-Bench, a benchmark for evaluating kanji reading and phonological understanding of large language models (LLMs) for Japanese. In Japanese, a single kanji character often has multiple possible readings, making it…

8
arXiv — NLP / Computation & Language research 6h ago

Self-conditioned Flow Map Language Models via Fixed-point Flows

arXiv:2607.00714v1 Announce Type: new Abstract: Self-conditioning is a core technique that enhances continuous flow-based language models, where the model learns to denoise generated text by conditioning on its own denoising estimate. While empirically successful, its…

13
arXiv — NLP / Computation & Language research 6h ago

MSQA: A Natively Sourced Multilingual and Multicultural SimpleQA Benchmark

arXiv:2607.00724v1 Announce Type: new Abstract: Multilingual fluency often invites a stronger assumption: a model that can speak a user's language must also understand the culture encoded by that language. We call this the Illusion of Cultural Alignment. To test this assumption…

8
arXiv — NLP / Computation & Language research 6h ago

What Survives Into Context: A Diagnostic for Budget-Constrained Multi-Hop RAG and When Submodular Evidence Packing Improves It

arXiv:2607.00725v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) under a fixed reader-context budget forces a selection problem: of the evidence retrieved, only a fraction can be shown to the reader. We argue that document recall -- the standard retrieval…

27
arXiv — NLP / Computation & Language research 6h ago

MetaHOPE: A Metaphor-Oriented Evaluation Framework for Analysing MT and LLM Translation Errors

arXiv:2607.00848v1 Announce Type: new Abstract: In this opinion paper, we propose MetaHOPE, an error severity-aware annotation framework for evaluating metaphor translations. Metaphors present challenges for machine translation (MT) and natural language understanding and…

21
arXiv — NLP / Computation & Language research 6h ago

The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters

arXiv:2607.00849v1 Announce Type: new Abstract: News articles are an important source of information on disaster impacts and adaptation. A key methodological challenge in socio-environmental studies is how to select a representative data sample. Two approaches are common:…

35
arXiv — NLP / Computation & Language research 6h ago

Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

arXiv:2607.00852v1 Announce Type: new Abstract: This work studies the hidden-state inversion problem: recovering the original input token sequence of a decoder-only language model from its last-layer hidden states. Rather than treating inversion as a one-shot reconstruction, we…

34
arXiv — NLP / Computation & Language research 6h ago

CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models

arXiv:2607.00862v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have achieved remarkable success on complex tasks by leveraging long chain-of-thought (CoT) trajectories, yet they frequently exhibit overthinking on simple queries, resulting in significant token…

8
arXiv — NLP / Computation & Language research 6h ago

Dynamic Bidirectional Pattern Memory: A Production-Scale Empirical Characterisation of Inference-Time Gating in Clinical NLP

arXiv:2607.00870v1 Announce Type: new Abstract: We study inference-time pattern-memory gating in a production-scale clinical natural language processing (NLP) pipeline. The pipeline pairs a generator (Llama-3.3 70B) proposing extractions with a verifier (MMed-Llama-3.1 70B)…

33
arXiv — NLP / Computation & Language research 6h ago

How Ethos and Pathos Appeals Resonate in Reader Interpretations of Social Media Messages

arXiv:2607.00873v1 Announce Type: new Abstract: Rhetorical strategies and their influence on audiences are often studied through social media posts and comments. However, this focus overlooks the universal audience, which is the majority of readers who remain silent and do not…

6
arXiv — NLP / Computation & Language research 6h ago

MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages

arXiv:2607.00890v1 Announce Type: new Abstract: Open web-scale pre-training corpora remain concentrated in English, limiting multilingual LLM development. We introduce MultiSynt/MT, an open synthetic parallel corpus with approximately 4.8 trillion target-language tokens across…

12
arXiv — NLP / Computation & Language research 6h ago

Beyond Document Grounding: Span-Level Hallucination Detection over Code, Tool Output, and Documents

arXiv:2607.00895v1 Announce Type: new Abstract: Hallucination detection for retrieval-augmented generation (RAG) is usually evaluated on natural-language document evidence. However, grounded generation systems increasingly rely on structured inputs: source code, developer-tool…

14
arXiv — NLP / Computation & Language research 6h ago

From Personas to Plot: Character-Grounded Multi-Agent Story Generation for Long-Form Narratives

arXiv:2607.00918v1 Announce Type: new Abstract: Although large language models (LLMs) have demonstrated impressive creative fiction generation, they struggle to maintain narrative consistency and coherent plot lines in long-form stories. In this work, we introduce a unified…

30
arXiv — NLP / Computation & Language research 6h ago

Persona Non Grata: LLM Persona-Driven Generations in MCQA are Unstable in Distinct Dimensions

arXiv:2607.00937v1 Announce Type: new Abstract: Persona-driven generations (PDGs) have seen prolific use in research and industry applications, where a large language model (LLM) takes on a 'persona' while completing some task. While persona expressed through free-form text…

19
arXiv — NLP / Computation & Language research 6h ago

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

arXiv:2607.00968v1 Announce Type: new Abstract: Emotion recognition in natural language is a foundational challenge in affective computing, with critical implications for human-computer interaction, mental health support, and conversational AI. This paper presents a rigorous,…

20
arXiv — NLP / Computation & Language research 6h ago

Svarna: An Open Corpus Workbench for Modern Greek

arXiv:2607.00970v1 Announce Type: new Abstract: This paper introduces Svarna, a free, open-source, web-based corpus workbench for modern Greek. Svarna integrates five databases covering various registers, institutional, literary, dialectal, social media, and historical, to…

29
arXiv — NLP / Computation & Language research 6h ago

KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers

arXiv:2607.01000v1 Announce Type: new Abstract: Recent research has increasingly focused on understanding how Transformers store and process knowledge, as well as how this knowledge can be edited. Research work in this area is often conducted in two phases: first, phenomena are…

8
arXiv — NLP / Computation & Language research 6h ago

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

arXiv:2607.01002v1 Announce Type: new Abstract: In long-context use, large language models frequently synthesize answers from the meaning of a relevant context span rather than literally copy-pasting them. Identifying which attention heads perform this synthesis matters for…

5
arXiv — NLP / Computation & Language research 6h ago

Understanding Large Language Models

arXiv:2607.01006v1 Announce Type: new Abstract: Large Language Models (LLMs) represent one of the most significant advances in AI and natural language processing in recent years. Still, many pressing questions about their mechanisms, capabilities, and relationship to human…

28
arXiv — NLP / Computation & Language research 6h ago

Reading Order Inference for Complex Document Layouts

arXiv:2607.01018v1 Announce Type: new Abstract: Reading order inference remains a critical bottleneck in the digitization of complex historical manuscripts, where pages contain multiple spatially interleaved reading streams, the canonical example being the Glossa Ordinaria…

4
arXiv — NLP / Computation & Language research 6h ago

Evidence-Supported Credit Risk Report Generation Using News-Centric Financial Knowledge Graphs

arXiv:2607.01023v1 Announce Type: new Abstract: Financial markets evolve in response to real-world events reported in news, yet these drivers often remain implicit in text. To better explain market dynamics, event-market relations must be explicitly modeled through factual,…

27
arXiv — NLP / Computation & Language research 6h ago

Behavior-Adaptive Conversational Agents: Toward a Fluid Personality Framework

arXiv:2607.01034v1 Announce Type: new Abstract: Large language model (LLM)-based conversational agents (CAs) are now ubiquitous, creating new opportunities for AI-mediated behavior change. Their capacity to project nuanced personalities and adopt diverse metaphorical roles…

38

Persona Without Substrate: Regime-Dependence and the LLM Individuation Problem

Controllable Narrative Rendering for Enhanced Assisted Writing

Benchmarking Frontier LLMs on Arabic Cultural and Sociolinguistic Knowledge: A Cross-Evaluation Framework with Human SME Ground Truth

Hate Speech Detection in Turkish and Arabic Languages: A Comprehensive Study

Readable but Not Controllable: Neuron-Level Evidence for Medical LLM Hallucination

Identifying and Resolving Pitfalls of Knowledge-Based VQA Benchmarks: Auditing, Repairing, and Augmenting

ALEE: Any-Language Evaluation of Embeddings via English-Centric Minimal Pairs

Structural Pattern Mining in Inka Khipus: Unsupervised Clustering, Provenance Classification, and a Computational Validation of the Santa Valley Match

LV-ROVER: Multi-Stream Tesseract Voting for Maltese Paragraph OCR

SEFORA: Student Essays with Feedback Corpus and LLM Feedback Evaluation Framework

TRACE: State-Aware Query Processing over Temporal Evidence Graphs for Conversational Data

DiscoLoop: Looping Discrete Embeddings and Continuous Hidden States for Multi-hop Reasoning

Beyond Perplexity: A Behavioral Evaluation Framework for Deployment-Memory Claims in LLM Test-Time Training

A Mechanistic View of Authority Hierarchy in LLM Sycophancy

Speech Playground: An Interactive Tool for Speech Analysis and Comparison

Selective Test-Time Debiasing for CLIP via Reward Gating

Understanding Why Language Models Hallucinate: Testing Reasoning Against Priors

Know When to Stop: Segment-Level Credit Assignment for Reducing Overthinking

Efficient Multilingual Reasoning Transfer via Progressive Code-Switching

BaseRT: Best-in-Class LLM Inference on Apple Silicon via Native Metal

A Task-State Representation for Long-Horizon Mobile GUI Agents

Dual-Confidence Contrastive Decoding for Retrieval-Augmented Generation

Safe Alone, Unsafe Together: Safeguarding Against Implicit Toxicity When Benign Images Combine

Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs

Multi-Turn Agentic Scientific Literature Search via Workflow Induction

"Don't Say It!": Constraints, Compliance, and Communication when Language Models Play Taboo

Auditing Forgetting in Limited Memory Language Models

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

YOMI-Bench: A Benchmark for Evaluating Kanji Reading and Phonological Understanding of LLMs for Japanese

Self-conditioned Flow Map Language Models via Fixed-point Flows

MSQA: A Natively Sourced Multilingual and Multicultural SimpleQA Benchmark

What Survives Into Context: A Diagnostic for Budget-Constrained Multi-Hop RAG and When Submodular Evidence Packing Improves It

MetaHOPE: A Metaphor-Oriented Evaluation Framework for Analysing MT and LLM Translation Errors

The Course of News Events: A Comparison of Bottom-Up and Top-Down Approaches for Collecting Text-Based Data about Disasters

Recovering Input Text from Hidden States: Study of Gradient-Based Inversion of Decoder-Only Language Models

CAT: Confidence-Adaptive Thinking for Efficient Reasoning of Large Reasoning Models

Dynamic Bidirectional Pattern Memory: A Production-Scale Empirical Characterisation of Inference-Time Gating in Clinical NLP

How Ethos and Pathos Appeals Resonate in Reader Interpretations of Social Media Messages

MultiSynt/MT: Trillion-Token Multi-Parallel Pre-Training Data Translated Across 36 Languages

Beyond Document Grounding: Span-Level Hallucination Detection over Code, Tool Output, and Documents

From Personas to Plot: Character-Grounded Multi-Agent Story Generation for Long-Form Narratives

Persona Non Grata: LLM Persona-Driven Generations in MCQA are Unstable in Distinct Dimensions

Quantifying the Affective Gap: A Zero-Shot Evaluation of LLMs on Fine-Grained Emotion Taxonomies

Svarna: An Open Corpus Workbench for Modern Greek

KnowledgeDebugger -- an Exploration Tool for Knowledge Localization and Editing in Transformers

Logit-Contribution Scoring Identifies Non-Literal Retrieval Heads

Understanding Large Language Models

Reading Order Inference for Complex Document Layouts

Evidence-Supported Credit Risk Report Generation Using News-Centric Financial Knowledge Graphs

Behavior-Adaptive Conversational Agents: Toward a Fluid Personality Framework