2026-01-09

Multivector Reranking in the Era of Strong First-Stage Retrievers
SimuAgent An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
RelayLLM Efficient Reasoning via Collaborative Decoding
Token-Level LLM Collaboration via FusionRoute
Semantically Orthogonal Framework for Citation Classification Disentangling Intent and Content
Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts
Driving on Registers
Milestones over Outcome Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward
Challenges and Research Directions for Large Language Model Inference Hardware
ArcAligner Adaptive Recursive Aligner for Compressed Context Embeddings in RAG
SparseLaneSTP Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Asynchronous Secure Federated Learning with Byzantine aggregators
Distributed Online Convex Optimization with Efficient Communication Improved Algorithm and Lower bounds
SCALERSynthetic Scalable Adaptive Learning Environment for Reasoning
AgentOCR Reimagining Agent History via Optical Self-Compression
AT $^2$ PO Agentic Turn-based Policy Optimization via Tree Search
When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail
GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models
Beyond Monolithic Architectures A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search
ResMAS Resilience Optimization in LLM-based Multi-agent Systems
LAMB LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence
LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
Succeeding at Scale Automated Multi-Retriever Fusion and Query-Side Adaptation for Multi-Tenant Search
All Changes May Have Invariant Principles Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction
Reasoning Over Space Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation
Discrete Fourier Transform-based Point Cloud Compression for Efficient SLAM in Featureless Terrain
Self-MedRAG a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering
LinguaGame A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation
Invisible Walls Privacy-Preserving ISAC Empowered by Reconfigurable Intelligent Surfaces
Convergence Rates for Learning Pseudo-Differential Operators
Re-Rankers as Relevance Judges
SpectraFormer an Attention-Based Raman Unmixing Tool for Accessing the Graphene Buffer-Layer Signature on SiC
Rate or Fate? RLV $^\varepsilon$ R Reinforcement Learning with Verifiable Noisy Rewards
LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
Phasor Agents Oscillatory Graphs with Three-Factor Plasticity and Sleep-Staged Learning
PackCache A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
RIGOURATE Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation
SCAR-GS Spatial Context Attention for Residuals in Progressive Gaussian Splatting
Stable Language Guidance for Vision-Language-Action Models
ResTok Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
FocusUI Efficient UI Grounding via Position-Preserving Visual Token Selection
Decide Then Retrieve A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval
A Privacy-Preserving Localization Scheme with Node Selection in Mobile Networks
Feature-Aware One-Shot Federated Learning via Hierarchical Token Sequences
Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification
When Numbers Start Talking Implicit Numerical Coordination Among LLM-Based Agents
Where meaning lives Layer-wise accessibility of psycholinguistic features in encoder and decoder language models
Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework
Improving Compactness and Reducing Ambiguity of CFIRE Rule-Based Explanations
EDCO Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
LLM-MC-Affect LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight
Architecting Agentic Communities using Design Patterns
Safety-Utility Conflicts Are Not Global Surgical Alignment via Head-Level Diagnosis
Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models
PhysicsFormer An Efficient and Fast Attention-Based Physics Informed Neural Network for Solving Incompressible Navier Stokes Equations
Jailbreaking LLMs & VLMs Mechanisms, Evaluation, and Unified Defense
DiffCoT Diffusion-styled Chain-of-Thought Reasoning in LLMs
Layer-Order Inversion Rethinking Latent Multi-Hop Reasoning in Large Language Models
IntroLM Introspective Language Models via Prefilling-Time Self-Evaluation
Beyond Perplexity A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning
From Bits to Chips An LLM-based Hardware-Aware Quantization Agent for Streamlined Deployment of LLMs

Multivector Reranking in the Era of Strong First-Stage Retrievers

Authors: Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

2026-01-08

http://arxiv.org/abs/2601.05200v1

Learned multivector representations power modern search systems with strong retrieval effectiveness, but their real-world use is limited by the high cost of exhaustive token-level retrieval. Therefore, most systems adopt a \emph{gather-and-refine} strategy, where a lightweight gather phase selects candidates for full scoring. However, this approach requires expensive searches over large token-level indexes and often misses the documents that would rank highest under full similarity. In this paper, we reproduce several state-of-the-art multivector retrieval methods on two publicly available datasets, providing a clear picture of the current multivector retrieval field and ob the inefficiency of token-level gathering. Building on top of that, we show that replacing the token-level gather phase with a single-vector document retriever -- specifically, a learned retriever (LSR) -- produces a smaller and more semantically coherent candidate set. This recasts the gather-and-refine pipeline into the well-established two-stage retrieval architecture. As retrieval latency decreases, query encoding with two neural encoders becomes the dominant computational bottleneck. To mitigate this, we integrate recent inference-free LSR methods, demonstrating that they preserve the retrieval effectiveness of the dual-encoder pipeline while substantially reducing query encoding time. Finally, we investigate multiple reranking configurations that balance efficiency, memory, and effectiveness, and we introduce two optimization techniques that prune low-quality candidates early. Empirical results show that these techniques improve retrieval efficiency by up to 1.8 $\times$ with no loss in quality. Overall, our two-stage approach achieves over $24\times$ speedup over the state-of-the-art multivector retrieval systems, while maintaining comparable or superior retrieval quality.

SimuAgent An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning

Authors: Yanchang Liang, Xiaowei Zhao

2026-01-08

http://arxiv.org/abs/2601.05187v1

Large language models (s) have revolutionized text-based code automation, but their potential in graph-oriented engineering workflows remains under-explored. We introduce SimuAgent, an -powered modeling and simulation agent tailored for Simulink. SimuAgent replaces verbose XML with a concise, dictionary-style Python representation, dramatically cutting token counts, improving interpretability, and enabling fast, in-process simulation. A lightweight plan-execute architecture, trained in two stages, equips the agent with both low-level tool skills and high-level design reasoning. To tackle rewards in long-horizon tasks, we propose Reflection-GRPO (ReGRPO), which augments Group Relative Policy Optimization (GRPO) with self-reflection traces that supply rich intermediate feedback, accelerating convergence and boosting robustness. Experiments on SimuBench, our newly released benchmark comprising 5300 multi-domain modeling tasks, show that a Qwen2.5-7B model fine-tuned with SimuAgent converges faster and achieves higher modeling accuracy than standard RL baselines, and even surpasses GPT-4o when evaluated with few-shot prompting on the same benchmark. Ablations confirm that the two-stage curriculum and abstract-reconstruct data augmentation further enhance generalization. SimuAgent trains and runs entirely on-premise with modest hardware, delivering a privacy-pre, cost-effective solution for industrial model-driven engineering. SimuAgent bridges the gap between s and graphical modeling environments, offering a practical solution for AI-assisted engineering design in industrial settings.

RelayLLM Efficient Reasoning via Collaborative Decoding

Authors: Chengsong Huang, Tong Zheng, Langlin Huang, Jinyuan Li, Haolin Liu, Jiaxin Huang

2026-01-08

http://arxiv.org/abs/2601.05167v1

Large Language Models (s) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessary reasoning capacity. Existing collaborative approaches, such as cascading or routing, operate at a coarse granularity by offloading entire queries to s, resulting in significant computational waste when the SLM is capable of handling the majority of reasoning steps. To address this, we propose Relay, a novel framework for efficient reasoning via token-level collaborative . Unlike routers, Relay empowers the SLM to act as an active controller that dynamically invokes the only for critical tokens via a special command, effectively "relaying" the generation process. We introduce a two-stage training framework, including warm-up and Group Relative Policy Optimization (GRPO) to teach the model to balance independence with strategic help-seeking. Empirical results across six benchmarks demonstrate that Relay achieves an average accuracy of 49.52%, effectively bridging the performance gap between the two models. Notably, this is achieved by invoking the for only 1.07% of the total generated tokens, offering a 98.2% cost reduction compared to performance-matched random routers.

Token-Level LLM Collaboration via FusionRoute

Authors: Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Huang, Shuchao Bi, Lizhu Zhang, Zhuokai Zhao

2026-01-08

http://arxiv.org/abs/2601.05106v1

Large language models (s) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi- collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.

Semantically Orthogonal Framework for Citation Classification Disentangling Intent and Content

Authors: Changxu Duan, Zhiyin Tan

2026-01-08

http://arxiv.org/abs/2601.05103v1

Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation intent (why a work is cited) with cited content type (what part is cited), limiting their effectiveness in auto classification due to a dilemma between fine-grained type distinctions and practical classification reliability. We introduce SOFT, a Semantically Orthogonal Framework with Two dimensions that explicitly separates citation intent from cited content type, drawing inspiration from semantic role theory. We systematically re-annotate the ACL-ARC dataset using SOFT and release a cross-disciplinary test set sampled from ACT2. Evaluation with both zero-shot and fine-tuned Large Language Models demonstrates that SOFT enables higher agreement between human annotators and s, and supports stronger classification performance and robust cross-domain generalization compared to ACL-ARC and SciCite annotation frameworks. These results confirm SOFT's value as a clear, reusable annotation standard, improving clarity, consistency, and generalizability for digital libraries and scholarly infrastructures. All code and data are publicly available on GitHub https://github.com/zhiyintan/SOFT.

Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts

Authors: Zhiyin Tan, Changxu Duan

2026-01-08

http://arxiv.org/abs/2601.05099v1

Identifying suitable datasets for a research question remains challenging because existing dataset search engines rely heavily on metadata quality and keyword , which often fail to capture the semantic intent of scientific investigation. We introduce a literature-driven framework that discovers datasets from citation contexts in scientific papers, enabling retrieval grounded in actual research use rather than metadata availability. Our approach combines large-scale citation-context extraction, schema-guided dataset recognition with Large Language Models, and provenance-pre entity resolution. We evaluate the system on eight survey-derived computer science queries and find that it achieves substantially higher recall than Google Dataset Search and DataCite Commons, with normalized recall ranging from an average of 47.47% to a highest value of 81.82%. Beyond recovering gold-standard datasets, the method also surfaces additional datasets not documented in the surveys. Expert assessments across five top-level Fields of Science indicate that a substantial portion of the additional datasets are considered high utility, and some are regarded as novel for the specific topics chosen by the experts. These findings establish citation-context mining as an effective and generalizable paradigm for dataset discovery, particularly in settings where datasets lack sufficient or reliable metadata. To support reproducibility and future extensions, we release our code, evaluation datasets, and results on GitHub (https://github.com/Fireblossom/citation-context-dataset-discovery).

Driving on Registers

Authors: Ellington Kirby, Alexandre Boulch, Yihong Xu, Yuan Yin, Gilles Puy, Éloi Zablocki, Andrei Bursuc, Spyros Gidaris, Renaud Marlet, Florent Bartoccioni, Anh-Quan Cao, Nermin Samet, Tuan-Hung VU, Matthieu Cord

2026-01-08

http://arxiv.org/abs/2601.05083v1

We present DrivoR, a simple and efficient -based architecture for end-to-end autonomous driving. Our approach builds on pretrained Vision Transformers (ViTs) and introduces camera-aware register tokens that compress multi-camera features into a compact scene representation, significantly reducing downstream computation without sacrificing accuracy. These tokens drive two lightweight rs that generate and then score candidate trajectories. The scoring r learns to mimic an oracle and predicts interpretable sub-scores representing aspects such as safety, comfort, and efficiency, enabling behavior-conditioned driving at inference. Despite its minimal design, DrivoR outperforms or matches strong contemporary baselines across NAVSIM-v1, NAVSIM-v2, and the photorealistic closed-loop HUGSIM benchmark. Our results show that a pure- architecture, combined with targeted token , is sufficient for accurate, efficient, and adaptive end-to-end driving. Code and checkpoints will be made available via the project page.

Milestones over Outcome Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward

Authors: Jianlong Chen, Daocheng Fu, Shengze Xu, Jiawei Chen, Yuan Feng, Yue Yang, Junchi Yan, Hongyuan Zha, Renqiu Xia

2026-01-08

http://arxiv.org/abs/2601.05073v1

Multimodal Large Language Models (Ms) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.

Challenges and Research Directions for Large Language Model Inference Hardware

Authors: Xiaoyu Ma, David Patterson

2026-01-08

http://arxiv.org/abs/2601.05047v1

Large Language Model () inference is hard. The autoregressive Decode phase of the underlying Transformer model makes inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and interconnect rather than compute. To address these challenges, we highlight four architecture research opportunities: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup . While our focus is datacenter AI, we also review their applicability for mobile devices.

ArcAligner Adaptive Recursive Aligner for Compressed Context Embeddings in RAG

Authors: Jianbo Li, Yi Jiang, Sendong Zhao, Bairui Hu, Haochun Wang, Bing Qin

2026-01-08

http://arxiv.org/abs/2601.05038v1

Retrieval-Augmented Generation (RAG) helps s stay accurate, but feeding long documents into a prompt makes the model slow and expensive. This has motivated context , ranging from token and summarization to embedding-based . While researchers have tried ''compressing'' these documents into smaller summaries or mathematical embeddings, there is a catch: the more you compress the data, the more the struggles to understand it. To address this challenge, we propose ArcAligner (Adaptive recursive context Aligner), a lightweight module integrated into the language model layers to help the model better utilize highly compressed context representations for downstream generation. It uses an adaptive ''gating'' system that only adds extra processing power when the information is complex, keeping the system fast. Across knowledge-intensive QA benchmarks, ArcAligner consistently beats baselines at comparable rates, especially on multi-hop and long-tail settings. The source code is publicly available.

SparseLaneSTP Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection

Authors: Maximilian Pittner, Joel Janai, Mario Faigle, Alexandru Paul Condurache

2026-01-08

http://arxiv.org/abs/2601.04968v1

3D lane detection has emerged as a critical challenge in autonomous driving, encompassing identification and localization of lane markings and the 3D road surface. Conventional 3D methods detect lanes from dense birds-eye-viewed (BEV) features, though erroneous transformations often result in a poor feature representation misaligned with the true 3D road surface. While recent lane detectors have surpassed dense BEV approaches, they completely disregard valuable lane-specific priors. Furthermore, existing methods fail to utilize historic lane observations, which yield the potential to resolve ambiguities in situations of poor visibility. To address these challenges, we present SparseLaneSTP, a novel method that integrates both geometric properties of the lane structure and temporal information into a lane . It introduces a new lane-specific spatio-temporal attention mechanism, a continuous lane representation tailored for architectures as well as temporal regularization. Identifying weaknesses of existing 3D lane datasets, we also introduce a precise and consistent 3D lane dataset using a simple yet effective auto-labeling strategy. Our experimental section proves the benefits of our contributions and demonstrates state-of-the-art performance across all detection and error metrics on existing 3D lane detection benchmarks as well as on our novel dataset.

Asynchronous Secure Federated Learning with Byzantine aggregators

Authors: Antonella Del Pozzo, Achille Desreumaux, Mathieu Gestin, Alexandre Rapetti, Sara Tucci-Piergiovanni

2026-01-08

http://arxiv.org/abs/2601.04930v1

Privacy-pre federated averaging is a central approach for protecting client privacy in federated learning. In this paper, we study this problem in an asynchronous s setting with malicious aggregators. We propose a new solution to provide federated averaging in this model while protecting the client's data privacy through secure aggregation and differential privacy. Our solution maintains the same performance as the state of the art across all metrics. The main contributions of this paper are threefold. First, unlike existing single- or multi-server solutions, we consider malicious aggregation servers that may manipulate the model to leak clients' data or halt computation. To tolerate this threat, we replicate the aggregators, allowing a fraction of them to be corrupted. Second, we propose a new privacy preservation protocol for protocols in asynchronous models with Byzantine aggregators. In this protocol, clients mask their values and add Gaussian noise to their models. In contrast with previous works, we use the replicated servers to unmask the models, while ensuring the liveness of training even if aggregators misbehave. Third, the asynchronous model introduces new challenges not present in existing approaches. In such a setting, faster clients may contribute more frequently, potentially reducing their privacy and biasing the training. To address this, we introduce an inclusion mechanism that ensures uniform client participation and balanced privacy budgets. Interestingly, the solution presented in this paper does not rely on agreement between aggregators. Thus, we circumvent the known impossibility of consensus in asynchronous settings where processes might crash. Additionally, this feature increases availability, as a consensus-based algorithm only progresses in periods of low latency.

Distributed Online Convex Optimization with Efficient Communication Improved Algorithm and Lower bounds

Authors: Sifan Yang, Wenhao Yang, Wei Jiang, Lijun Zhang

2026-01-08

http://arxiv.org/abs/2601.04907v1

We investigate distributed online convex optimization with compressed , where $n$ learners connected by a network collaboratively minimize a sequence of global loss functions using only local information and compressed data from neighbors. Prior work has established regret bounds of $O(\max\{ω^{-2}ρ^{-4}n^{1/2},ω^{-4}ρ^{-8}\}n\sqrt{T})$ and $O(\max\{ω^{-2}ρ^{-4}n^{1/2},ω^{-4}ρ^{-8}\}n\ln{T})$ for convex and strongly convex functions, respectively, where $ω\in(0,1]$ is the quality factor ( $ω=1$ means no ) and $ρ<1$ is the spectral gap of the matrix. However, these regret bounds suffer from a \emph{quadratic} or even \emph{quartic} dependence on $ω^{-1}$ . Moreover, the \emph{super-linear} dependence on $n$ is also undesirable. To overcome these limitations, we propose a novel algorithm that achieves improved regret bounds of $\tilde{O}(ω^{-1/2}ρ^{-1}n\sqrt{T})$ and $\tilde{O}(ω^{-1}ρ^{-2}n\ln{T})$ for convex and strongly convex functions, respectively. The primary idea is to design a \emph{two-level blocking update framework} incorporating two novel ingredients: an online gossip strategy and an error compensation scheme, which collaborate to \emph{achieve a better consensus} among learners. Furthermore, we establish the first lower bounds for this problem, justifying the optimality of our results with respect to both $ω$ and $T$ . Additionally, we consider the bandit feedback scenario, and extend our method with the classic gradient estimators to enhance existing regret bounds.

SCALERSynthetic Scalable Adaptive Learning Environment for Reasoning

Authors: Caijun Xu, Changyi Xiao, Zhongyuan Peng, Xinrun Wang, Yixin Cao

2026-01-08

http://arxiv.org/abs/2601.04809v1

Reinforcement learning (RL) offers a principled way to enhance the reasoning capabilities of large language models, yet its effectiveness hinges on training signals that remain informative as models evolve. In practice, RL progress often slows when task difficulty becomes poorly aligned with model capability, or when training is dominated by a narrow set of recurring problem patterns. To jointly address these issues, we propose SCALER (Synthetic sCalable Adaptive Learning Environment for Reasoning), a framework that sustains effective learning signals through adaptive environment design. SCALER introduces a scalable synthesis pipeline that converts real-world programming problems into verifiable reasoning environments with controllable difficulty and unbounded instance generation, enabling RL training beyond finite datasets while pre strong correctness guarantees. Building on this, SCALER further employs an adaptive multi-environment RL strategy that dynamically adjusts instance difficulty and curates the active set of environments to track the model's capability frontier and maintain distributional diversity. This co-adaptation prevents reward , mitigates overfitting to narrow task patterns, and supports sustained improvement throughout training. Extensive experiments show that SCALER consistently outperforms dataset-based RL baselines across diverse reasoning benchmarks and exhibits more stable, long-horizon training dynamics.

AgentOCR Reimagining Agent History via Optical Self-Compression

Authors: Lang Feng, Fuchao Yang, Feng Chen, Xin Cheng, Haiyang Xu, Zhenglin Wan, Ming Yan, Bo An

2026-01-08

http://arxiv.org/abs/2601.04786v1

Recent advances in large language models (s) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual , this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-, where the agent actively emits a rate and is trained with -aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\% of text-based agent performance while substantially reducing token consumption (>50\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-.

AT $^2$ PO Agentic Turn-based Policy Optimization via Tree Search

Authors: Zefang Zong, Dingwei Chen, Yang Li, Qi Yi, Bo Zhou, Chengming Li, Bo Qian, Peng Chen, Jie Jiang

2026-01-08

http://arxiv.org/abs/2601.04767v1

agents have emerged as powerful systems for tackling multi-turn tasks by interleaving internal reasoning and external tool interactions. Agentic Reinforcement Learning has recently drawn significant research attention as a critical post-training paradigm to further refine these capabilities. In this paper, we present AT $^2$ PO (Agentic Turn-based Policy Optimization via Tree Search), a unified framework for multi-turn agentic RL that addresses three core challenges: limited exploration diversity, credit assignment, and misaligned policy optimization. AT $^2$ PO introduces a turn-level tree structure that jointly enables Entropy-Guided Tree Expansion for strategic exploration and Turn-wise Credit Assignment for fine-grained reward propagation from outcomes. Complementing this, we propose Agentic Turn-based Policy Optimization, a turn-level learning objective that aligns policy updates with the natural decision granularity of agentic interactions. ATPO is orthogonal to tree search and can be readily integrated into any multi-turn RL pipeline. Experiments across seven benchmarks demonstrate consistent improvements over the state-of-the-art baseline by up to 1.84 percentage points in average, with ablation studies validating the effectiveness of each component. Our code is available at https://github.com/zzfoutofspace/ATPO.

When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail

Authors: Xiaoxiao Li

2026-01-08

http://arxiv.org/abs/2601.04748v1

Multi-agent AI systems have proven effective for complex reasoning. These systems are compounded by specialized agents, which collaborate through explicit , but incur substantial computational overhead. A natural question arises: can we achieve similar modularity benefits with a single agent that selects from a library of skills? We explore this question by viewing skills as internalized agent behaviors. From this perspective, a multi-agent system can be compiled into an equivalent single-agent system, trading inter-agent for skill selection. Our preliminary experiments suggest this approach can substantially reduce token usage and latency while maintaining competitive accuracy on reasoning benchmarks. However, this efficiency raises a deeper question that has received little attention: how does skill selection scale as libraries grow? Drawing on principles from cognitive science, we propose that skill selection exhibits bounded capacity analogous to human decision-making. We investigate the scaling behavior of skill selection and observe a striking pattern. Rather than degrading gradually, selection accuracy remains stable up to a critical library size, then drops sharply, indicating a phase transition reminiscent of capacity limits in human cognition. Furthermore, we find evidence that semantic confusability among similar skills, rather than library size alone, plays a central role in this degradation. This perspective suggests that hierarchical organization, which has long helped humans manage complex choices, may similarly benefit AI systems. Our initial results with hierarchical routing support this hypothesis. This work opens new questions about the fundamental limits of semantic-based skill selection in s and offers a cognitive-grounded framework and practical guidelines for designing scalable skill-based agents.

GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models

Authors: Maanas Taneja, Purab Shingvi

2026-01-08

http://arxiv.org/abs/2601.04719v1

The key-value () in large language models presents a significant memory bottleneck during inference, growing linearly with sequence length and often exceeding the memory footprint of model weights themselves. We implement and evaluate GPU-accelerated INT8 for , achieving 4 $\times$ memory reduction with minimal accuracy degradation. We develop four CUDA kernel variants -- naive, tiled, coarsened, and vectorized -- and benchmark them across realistic workload sizes up to 1 billion elements. Our vectorized kernel achieves up to 1,694 $\times$ speedup over CPU baselines while maintaining reconstruction error below 0.004 and attention score error below 0.1 even for 8K-dimensional heads. These results demonstrate that INT8 provides a practical approach for reducing memory pressure in inference with negligible computational overhead (6--58ms) and minimal impact on downstream model behavior

Beyond Monolithic Architectures A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search

Authors: Yiqun Chen, Lingyong Yan, Zixuan Yang, Erhan Zhang, Jiashu Zhao, Shuaiqiang Wang, Dawei Yin, Jiaxin Mao

2026-01-08

http://arxiv.org/abs/2601.04703v1

Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (s) to interleave reasoning with tool use. However, prevailing systems rely on monolithic agents that suffer from structural bottlenecks, including unconstrained reasoning outputs that inflate trajectories, outcome-level rewards that complicate credit assignment, and stochastic search noise that destabilizes learning. To address these challenges, we propose \textbf{M-ASK} (Multi-Agent Search and Knowledge), a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context. This decomposition allows each agent to focus on a well-defined subtask and reduces interference between search and context construction. Furthermore, to enable stable coordination, M-ASK employs turn-level rewards to provide granular supervision for both search decisions and knowledge updates. Experiments on multi-hop QA benchmarks demonstrate that M-ASK outperforms strong baselines, achieving not only superior answer accuracy but also significantly more stable training dynamics.\footnote{The source code for M-ASK is available at https://github.com/chenyiqun/M-ASK.}

ResMAS Resilience Optimization in LLM-based Multi-agent Systems

Authors: Zhilun Zhou, Zihan Liu, Jiahe Liu, Qingyu Shao, Yihan Wang, Kun Shao, Depeng Jin, Fengli Xu

2026-01-08

http://arxiv.org/abs/2601.04694v1

Large Language Model-based Multi-Agent Systems (-based MAS), where multiple agents collaborate to solve complex tasks, have shown impressive performance in many areas. However, MAS are typically distributed across different devices or environments, making them vulnerable to perturbations such as agent failures. While existing works have studied the adversarial attacks and corresponding defense strategies, they mainly focus on reactively detecting and mitigating attacks after they occur rather than proactively designing inherently resilient systems. In this work, we study the resilience of -based MAS under perturbations and find that both the topology and prompt design significantly influence system resilience. Motivated by these findings, we propose ResMAS: a two-stage framework for enhancing MAS resilience. First, we train a reward model to predict the MAS's resilience, based on which we train a topology generator to automatically design resilient topology for specific tasks through reinforcement learning. Second, we introduce a topology-aware prompt optimization method that refines each agent's prompt based on its connections and interactions with other agents. Extensive experiments across a range of tasks show that our approach substantially improves MAS resilience under various constraints. Moreover, our framework demonstrates strong generalization ability to new tasks and models, highlighting its potential for building resilient MASs.

LAMB LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence

Authors: Hyeongkeun Lee, Jongmin Choi, KiHyun Nam, Joon Son Chung

2026-01-08

http://arxiv.org/abs/2601.04658v1

Automated Audio Captioning aims to describe the semantic content of input audio. Recent works have employed large language models (s) as a text r to leverage their reasoning capabilities. However, prior approaches that project audio features into the embedding space without considering cross-modal alignment fail to fully utilize these capabilities. To address this, we propose LAMB, an -based audio captioning framework that bridges the modality gap between audio embeddings and the text embedding space. LAMB incorporates a Cross-Modal Aligner that minimizes Cauchy-Schwarz divergence while maximizing mutual information, yielding tighter alignment between audio and text at both global and token levels. We further design a Two-Stream Adapter that extracts semantically enriched audio embeddings, thereby delivering richer information to the Cross-Modal Aligner. Finally, leveraging the aligned audio embeddings, a proposed Token Guide directly computes scores within the text embedding space to steer the output logits of generated captions. Experimental results confirm that our framework strengthens the reasoning capabilities of the r, achieving state-of-the-art performance on AudioCaps.

LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models

Authors: Ryutaro Oshima, Yuya Hosoda, Youji Iiguni

2026-01-08

http://arxiv.org/abs/2601.04654v1

This paper proposes an automatic speech recognition (ASR) model for hate speech using large language models (s). The proposed method integrates the encoder of the ASR model with the r of the s, enabling simultaneous transcription and censorship tasks to prevent the exposure of harmful content. Instruction tuning of the to mask hate-related words with specific tokens requires an annotated hate speech dataset, which is limited. We generate text samples using an with the Chain-of-Thought (CoT) prompting technique guided by cultural context and examples and then convert them into speech samples using a text-to-speech (TTS) system. However, some of them contain non-hate speech samples with hate-related words, which degrades the censorship performance. This paper filters the samples which text classification models correctly label as hate content. By adjusting the threshold for the number of correct answer models, we can control the level of hate in the generated dataset, allowing us to train the s through curriculum learning in a gradual manner. Experimental results show that the proposed method achieves a masking accuracy of 58.6\% for hate-related words, surpassing previous baselines. We also confirm that the curriculum training contributes to the efficiency of both transcription and censorship tasks.

Succeeding at Scale Automated Multi-Retriever Fusion and Query-Side Adaptation for Multi-Tenant Search

Authors: Prateek Jain, Shabari S Nair, Ritesh Goru, Prakhar Agarwal, Ajay Yadav, Yoga Sri Varshan Varadharajan, Constantine Caramanis

2026-01-08

http://arxiv.org/abs/2601.04646v1

Large-scale multi-tenant retrieval systems amass vast user query logs yet critically lack the curated relevance labels required for effective domain adaptation. This "dark data" problem is exacerbated by the operational cost of model updates: jointly fine-tuning query and document encoders requires re-indexing the entire corpus, which is prohibitive in multi-tenant environments with thousands of isolated indices. To address these dual challenges, we introduce \textbf{DevRev Search}, a passage retrieval benchmark for technical customer support constructed through a fully automatic pipeline. We employ a \textbf{fusion-based candidate generation} strategy, pooling results from diverse and dense retrievers, and utilize an -as-a-Judge to perform rigorous \textbf{consistency filtering} and relevance assignment. We further propose a practical \textbf{Index-Pre Adaptation} strategy: by fine-tuning only the query encoder via Low-Rank Adaptation (LoRA), we achieve competitive performance improvements while keeping the document index frozen. Our experiments on DevRev Search and SciFact demonstrate that targeting specific layers in the query encoder yields optimal quality-efficiency trade-offs, offering a scalable path for personalized enterprise search.

All Changes May Have Invariant Principles Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction

Authors: Ziyou Jiang, Mingyang Li, Junjie Wang, Yuekai Huang, Jie Huang, Zhiyuan Chang, Zhaoyang Li, Qing Wang

2026-01-08

http://arxiv.org/abs/2601.04567v1

Harmful memes are ever-shifting in the Internet communities, which are difficult to analyze due to their type-shifting and temporal-evolving nature. Although these memes are shifting, we find that different memes may share invariant principles, i.e., the underlying design concept of malicious users, which can help us analyze why these memes are harmful. In this paper, we propose RepMD, an ever-shifting harmful meme detection method based on the design concept reproduction. We first refer to the attack tree to define the Design Concept Graph (DCG), which describes steps that people may take to design a harmful meme. Then, we derive the DCG from historical memes with design step reproduction and graph . Finally, we use DCG to guide the Multimodal Large Language Model (M) to detect harmful memes. The evaluation results show that RepMD achieves the highest accuracy with 81.1% and has slight accuracy decreases when generalized to type-shifting and temporal-evolving memes. Human evaluation shows that RepMD can improve the efficiency of human discovery on harmful memes, with 15 $\sim$ 30 seconds per meme.

Reasoning Over Space Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation

Authors: Dongyi Lv, Qiuyu Ding, Heng-Da Xu, Zhaoxu Sun, Zhi Wang, Feng Xiong, Mu Xu

2026-01-08

http://arxiv.org/abs/2601.04562v1

Generative recommendation with large language models (s) reframes prediction as sequence generation, yet existing -based recommenders remain limited in leveraging geographic signals that are crucial in mobility and local-services scenarios. Here, we present Reasoning Over Space (ROS), a framework that utilizes geography as a vital decision variable within the reasoning process. ROS introduces a Hierarchical Spatial Semantic ID (SID) that discretizes coarse-to-fine locality and POI semantics into compositional tokens, and endows with a three-stage Mobility Chain-of-Thought (CoT) paradigm that models user personality, constructs an intent-aligned candidate space, and performs locality informed . We further align the model with real world geography via spatial-guided Reinforcement Learning (RL). Experiments on three widely used location-based social network (LBSN) datasets show that ROS achieves over 10% relative gains in hit rate over strongest -based baselines and improves cross-city transfer, despite using a smaller backbone model.

Discrete Fourier Transform-based Point Cloud Compression for Efficient SLAM in Featureless Terrain

Authors: Riku Suzuki, Ayumi Umemura, Shreya Santra, Kentaro Uno, Kazuya Yoshida

2026-01-08

http://arxiv.org/abs/2601.04551v1

Simultaneous Localization and Mapping (SLAM) is an essential technology for the efficiency and reliability of unmanned robotic exploration missions. While the onboard computational capability and bandwidth are critically limited, the point cloud data handled by SLAM is large in size, attracting attention to data methods. To address such a problem, in this paper, we propose a new method for compressing point cloud maps by exploiting the Discrete Fourier Transform (DFT). The proposed technique converts the Digital Elevation Model (DEM) to the frequency-domain 2D image and omits its high-frequency components, focusing on the exploration of gradual terrains such as planets and deserts. Unlike terrains with detailed structures such as artificial environments, high-frequency components contribute little to the representation of gradual terrains. Thus, this method is effective in compressing data size without significant degradation of the point cloud. We evaluated the method in terms of rate and accuracy using camera sequences of two terrains with different elevation profiles.

Self-MedRAG a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering

Authors: Jessica Ryan, Alexander I. Gumilang, Robert Wiliam, Derwin Suhartono

2026-01-08

http://arxiv.org/abs/2601.04531v1

Large Language Models (s) have demonstrated significant potential in medical Question Answering (QA), yet they remain prone to hallucinations and ungrounded reasoning, limiting their reliability in high-stakes clinical scenarios. While Retrieval-Augmented Generation (RAG) mitigates these issues by incorporating external knowledge, conventional single-shot retrieval often fails to resolve complex biomedical queries requiring multi-step inference. To address this, we propose Self-MedRAG, a self-reflective hybrid framework designed to mimic the iterative hypothesis-verification process of clinical reasoning. Self-MedRAG integrates a hybrid retrieval strategy, combining (BM25) and dense (Contriever) retrievers via Reciprocal Rank Fusion (RRF) to maximize evidence coverage. It employs a generator to produce answers with supporting rationales, which are then assessed by a lightweight self-reflection module using Natural Language Inference (NLI) or -based verification. If the rationale lacks sufficient evidentiary support, the system autonomously reformulates the query and iterates to refine the context. We evaluated Self-MedRAG on the MedQA and PubMedQA benchmarks. The results demonstrate that our hybrid retrieval approach significantly outperforms single-retriever baselines. Furthermore, the inclusion of the self-reflective loop yielded substantial gains, increasing accuracy on MedQA from 80.00% to 83.33% and on PubMedQA from 69.10% to 79.82%. These findings confirm that integrating hybrid retrieval with iterative, evidence-based self-reflection effectively reduces unsupported claims and enhances the clinical reliability of -based systems.

LinguaGame A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation

Authors: Yuxiao Ye, Yiming Zhang, Yiran Ma, Huiyuan Xie, Huining Zhu, Zhiyuan Liu

2026-01-08

http://arxiv.org/abs/2601.04516v1

Large Language Models (s) have enabled Multi-Agent Systems (MASs) where agents interact through natural language to solve complex tasks or simulate multi-party dialogues. Recent work on -based MASs has mainly focused on architecture design, such as role assignment and workflow orchestration. In contrast, this paper targets the interaction process itself, aiming to improve agents' efficiency by helping them convey their intended meaning more effectively through language. To this end, we propose LinguaGame, a linguistically-grounded game-theoretic paradigm for multi-agent dialogue generation. Our approach models dialogue as a signalling game over communicative intents and strategies, solved with a training-free equilibrium approximation algorithm for inference-time decision adjustment. Unlike prior game-theoretic MASs, whose game designs are often tightly coupled with task-specific objectives, our framework relies on linguistically informed reasoning with minimal task-specific coupling. Specifically, it treats dialogue as intentional and strategic , requiring agents to infer what others aim to achieve (intents) and how they pursue those goals (strategies). We evaluate our framework in simulated courtroom proceedings and debates, with human expert assessments showing significant gains in efficiency.

Invisible Walls Privacy-Preserving ISAC Empowered by Reconfigurable Intelligent Surfaces

Authors: Yinghui He, Long Fan, Lei Xie, Dusit Niyato, Chau Yuen, Jun Luo

2026-01-08

http://arxiv.org/abs/2601.04488v1

The environmental and target-related information inherently carried in wireless signals, such as channel state information (CSI), has brought increasing attention to integrated sensing and (ISAC). However, it also raises pressing concerns about privacy leakage through eavesdropping. While existing efforts have attempted to mitigate this issue, they either fail to account for the needs of legitimate and sensing users or rely on hardware with high complexity and cost. To overcome these limitations, we propose PrivISAC, a plug-and-play, low-cost solution that leverages RIS to protect user privacy while pre ISAC performance. At the core of PrivISAC is a novel strategy in which each RIS row is assigned two distinct beamforming vectors, from which we deliberately construct a limited set of RIS configurations. During operation, exactly one configuration is randomly activated at each time slot to introduce additional perturbations, effectively masking sensitive sensing information from unauthorized eavesdroppers. To jointly ensure privacy protection and performance, we design the two vectors such that their responses remain nearly identical in the direction, thereby pre stable, high-throughput transmission, while exhibiting pronounced differences in the sensing direction, which introduces sufficient perturbations to thwart eavesdroppers. Additionally, to enable legitimate sensing under such randomized configurations, we introduce a time-domain masking and demasking method that allows the authorized receiver to associate each CSI sample with its underlying configuration and eliminate configuration-induced discrepancies, thereby recovering valid CSI. We implement PrivISAC on commodity wireless devices and experiment results show that PrivISAC provides strong privacy protection while pre high-quality legitimate ISAC.

Convergence Rates for Learning Pseudo-Differential Operators

Authors: Jiaheng Chen, Daniel Sanz-Alonso

2026-01-08

http://arxiv.org/abs/2601.04473v1

This paper establishes convergence rates for learning elliptic pseudo-differential operators, a fundamental operator class in partial differential equations and mathematical physics. In a wavelet-Galerkin framework, we formulate learning over this class as a structured infinite-dimensional regression problem with multiscale . Building on this structure, we propose a , data- and computation-efficient estimator, which leverages a novel matrix scheme tailored to the learning task and a nested-support strategy to balance approximation and estimation errors. In addition to obtaining convergence rates for the estimator, we show that the learned operator induces an efficient and stable Galerkin solver whose numerical error matches its statistical accuracy. Our results therefore contribute to bringing together operator learning, data-driven solvers, and wavelet methods in scientific computing.

Re-Rankers as Relevance Judges

Authors: Chuan Meng, Jiqun Liu, Mohammad Aliannejadi, Fengran Mo, Jeff Dalton, Maarten de Rijke

2026-01-08

http://arxiv.org/abs/2601.04455v1

Using large language models (s) to predict relevance judgments has shown promising results. Most studies treat this task as a distinct research line, e.g., focusing on prompt design for predicting relevance labels given a query and passage. However, predicting relevance judgments is essentially a form of relevance prediction, a problem extensively studied in tasks such as re-ranking. Despite this potential , little research has explored reusing or adapting established re-ranking methods to predict relevance judgments, leading to potential resource waste and redundant development. To bridge this gap, we reproduce re-rankers in a re-ranker-as-relevance-judge setup. We design two adaptation strategies: (i) using binary tokens (e.g., "true" and "false") generated by a re-ranker as direct judgments, and (ii) converting continuous re-ranking scores into binary labels via thresholding. We perform extensive experiments on TREC-DL 2019 to 2023 with 8 re-rankers from 3 families, ranging from 220M to 32B, and analyse the evaluation bias exhibited by re-ranker-based judges. Results show that re-ranker-based relevance judges, under both strategies, can outperform UMBRELA, a state-of-the-art -based relevance judge, in around 40% to 50% of the cases; they also exhibit strong self-preference towards their own and same-family re-rankers, as well as cross-family bias.

SpectraFormer an Attention-Based Raman Unmixing Tool for Accessing the Graphene Buffer-Layer Signature on SiC

Authors: Dmitriy Poteryayev, Pietro Novelli, Annalisa Coriolano, Riccardo Dettori, Valentina Tozzini, Fabio Beltram, Massimiliano Pontil, Antonio Rossi, Stiven Forti, Camilla Coletti

2026-01-07

http://arxiv.org/abs/2601.04445v1

Raman spectroscopy is a key tool for graphene characterization, yet its application to graphene grown on silicon carbide (SiC) is strongly limited by the intense and variable second-order Raman response of the substrate. This limitation is critical for buffer layer graphene, a semiconducting interfacial phase, whose vibrational signatures are ped with the SiC background and challenging to be reliably accessed using conventional reference-based subtraction, due to strong spatial and experimental variability of the substrate signal. Here we present SpectraFormer, a -based deep learning model that reconstructs the SiC Raman substrate contribution directly from post-growth partially masked spectroscopic data without relying on explicit reference measurements. By learning global correlations across the entire Raman shift range, the model captures the statistical structure of the SiC background and enables accurate reconstruction of its contribution in mixed spectra. Subtraction of the reconstructed substrate signal reveals weak vibrational features associated with ZLG that are inaccessible through conventional analysis methods. The extracted spectra are validated by ab initio vibrational calculations, allowing assignment of the resolved features to specific modes and confirming their physical consistency. By leveraging a state-of-the-art attention-based deep learning architecture, this approach establishes a robust, reference-free framework for Raman analysis of graphene on SiC and provides a foundation, compatible with real-time data acquisition, to its integration into automated, closed-loop AI-assisted growth optimization.

Rate or Fate? RLV $^\varepsilon$ R Reinforcement Learning with Verifiable Noisy Rewards

Authors: Ali Rad, Khashayar Filom, Darioush Keivan, Peyman Mohajerin Esfahani, Ehsan Kamalinejad

2026-01-07

http://arxiv.org/abs/2601.04411v1

Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training s: sample a completion, verify it, and update. In practice, however, the verifier is almost never clean--unit tests probe only limited corner cases; human and synthetic labels are imperfect; and judges (e.g., RLAIF) are noisy and can be exploited--and this problem worsens on harder domains (especially coding) where tests are and increasingly model-generated. We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)? To address this, we develop an analytically tractable multi-armed bandit view of RLVR dynamics, instantiated with GRPO and validated in controlled experiments. Modeling false positives and false negatives and grouping completions into recurring reasoning modes yields a replicator-style (natural-selection) flow on the probability simplex. The dynamics decouples into within-correct-mode competition and a one-dimensional evolution for the mass on incorrect modes, whose drift is determined solely by Youden's index J=TPR-FPR. This yields a sharp phase transition: when J>0, the incorrect mass is driven toward extinction (learning); when J=0, the process is neutral; and when J<0, incorrect modes amplify until they dominate (anti-learning and collapse). In the learning regime J>0, noise primarily rescales convergence time ("rate, not fate"). Experiments on verifiable programming tasks under synthetic noise reproduce the predicted J=0 boundary. Beyond noise, the framework offers a general lens for analyzing RLVR stability, convergence, and algorithmic interventions.

LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations

Authors: Priyaranjan Pattnayak, Sanchari Chowdhuri, Amit Agarwal, Hitesh Laxmichand Patel

2026-01-07

http://arxiv.org/abs/2601.04388v1

Clustering customer chat data is vital for cloud providers handling multi service queries. Traditional methods struggle with ping concerns and create broad, static clusters that degrade over time. Reclustering disrupts continuity, making issue tracking difficult. We propose an adaptive system that segments multi turn chats into service specific concerns and incrementally refines clusters as new issues arise. Cluster quality is tracked via DaviesBouldin Index and Silhouette Scores, with based splitting applied only to degraded clusters. Our method improves Silhouette Scores by over 100\% and reduces DBI by 65.6\% compared to baselines, enabling scalable, real time analytics without full reclustering.

Phasor Agents Oscillatory Graphs with Three-Factor Plasticity and Sleep-Staged Learning

Authors: Rodja Trappe

2026-01-07

http://arxiv.org/abs/2601.04362v1

Phasor Agents are dynamical systems whose internal state is a Phasor Graph: a weighted graph of coupled Stuart-Landau oscillators. A Stuart-Landau oscillator is a minimal stable "rhythm generator" (the normal form near a Hopf bifurcation); each oscillator is treated as an abstract computational unit (inspired by, but not claiming to model, biological oscillatory populations). In this interpretation, oscillator phase tracks relative timing (coherence), while amplitude tracks local gain or activity. Relative phase structure serves as a representational medium; coupling weights are learned via three-factor local plasticity - eligibility traces gated by global modulators and oscillation-timed write windows - without backpropagation. A central challenge in oscillatory substrates is stability: online weight updates can drive the network into unwanted regimes (e.g., global synchrony), collapsing representational diversity. We therefore separate wake tagging from offline consolidation, inspired by synaptic tagging-and-capture and sleep-stage dynamics: deep-sleep-like gated capture commits tagged changes safely, while REM-like replay reconstructs and perturbs experience for planning. A staged experiment suite validates each mechanism with ablations and falsifiers: eligibility traces preserve credit under delayed modulation; -progress signals pass timestamp-shuffle controls; phase-coherent retrieval reaches 4x diffusive baselines under noise; wake/sleep separation expands stable learning by 67 percent under matched weight-norm budgets; REM replay improves maze success rate by +45.5 percentage points; and a Tolman-style latent-learning signature - immediate competence and detour advantage after unrewarded exploration, consistent with an internal model - emerges from replay (Tolman, 1948). The codebase and all artifacts are open-source.

PackCache A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache

Authors: Kunyang Li, Mubarak Shah, Yuzhang Shang

2026-01-07

http://arxiv.org/abs/2601.04359v1

A unified autoregressive model is a Transformer-based framework that addresses diverse multimodal tasks (e.g., text, image, video) as a single sequence modeling problem under a shared token space. Such models rely on the - mechanism to reduce attention computation from O(T^2) to O(T); however, - size grows linearly with the number of generated tokens, and it rapidly becomes the dominant bottleneck limiting inference efficiency and generative length. Unified autoregressive video generation inherits this limitation. Our analysis reveals that - tokens exhibit distinct spatiotemporal properties: (i) text and conditioning-image tokens act as persistent semantic anchors that consistently receive high attention, and (ii) attention to previous frames naturally decays with temporal distance. Leveraging these observations, we introduce PackCache, a training-free - management method that dynamically compacts the through three coordinated mechanisms: condition anchoring that preserves semantic references, cross-frame decay modeling that allocates budget according to temporal distance, and spatially pre position embedding that maintains coherent 3D structure under removal. In terms of efficiency, PackCache accelerates end-to-end generation by 1.7-2.2x on 48-frame long sequences, showcasing its strong potential for enabling longer-sequence video generation. Notably, the final four frames - the portion most impacted by the progressively expanding - and thus the most expensive segment of the clip - PackCache delivers a 2.6x and 3.7x on A40 and H200, respectively, for 48-frame videos.

RIGOURATE Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation

Authors: Joseph James, Chenghao Xiao, Yucheng Li, Nafise Sadat Moosavi, Chenghua Lin

2026-01-07

http://arxiv.org/abs/2601.04350v1

Scientific rigour tends to be sidelined in favour of bold statements, leading authors to overstate claims beyond what their results support. We present RIGOURATE, a two-stage multimodal framework that retrieves supporting evidence from a paper's body and assigns each claim an overstatement score. The framework consists of a dataset of over 10K claim-evidence sets from ICLR and NeurIPS papers, annotated using eight s, with overstatement scores calibrated using peer-review comments and validated through human evaluation. It employes a fine-tuned reranker for evidence retrieval and a fine-tuned model to predict overstatement scores with justification. Compared to strong baselines, RIGOURATE enables improved evidence retrieval and overstatement detection. Overall, our work operationalises evidential proportionality and supports clearer, more transparent scientific .

SCAR-GS Spatial Context Attention for Residuals in Progressive Gaussian Splatting

Authors: Diego Revilla, Pooja Suresh, Anand Bhojan, Ooi Wei Tsang

2026-01-07

http://arxiv.org/abs/2601.04348v1

Recent advances in 3D Gaussian Splatting have allowed for real-time, high-fidelity novel view synthesis. Nonetheless, these models have significant storage requirements for large and medium-sized scenes, hindering their deployment over cloud and streaming services. Some of the most recent progressive techniques for these models rely on progressive masking and scalar techniques to reduce the bitrate of Gaussian attributes using spatial context models. While effective, scalar may not optimally capture the correlations of high-dimensional feature vectors, which can potentially limit the rate-distortion performance. In this work, we introduce a novel progressive codec for 3D Gaussian Splatting that replaces traditional methods with a more powerful Residual Vector Quantization approach to compress the primitive features. Our key contribution is an auto-regressive entropy model, guided by a multi-resolution hash grid, that accurately predicts the conditional probability of each successive transmitted index, allowing for coarse and refinement layers to be compressed with high efficiency.

Stable Language Guidance for Vision-Language-Action Models

Authors: Zhihao Zhan, Yuhao Chen, Jiaying Zhou, Qinhan Lv, Hao Liu, Keze Wang, Liang Lin, Guangrun Wang

2026-01-07

http://arxiv.org/abs/2601.04052v1

Vision-Language-Action (VLA) models have demonstrated impressive capabilities in generalized robotic control; however, they remain notoriously brittle to linguistic perturbations. We identify a critical ``modality collapse'' phenomenon where strong visual priors overwhelm linguistic signals, causing agents to overfit to specific instruction phrasings while ignoring the underlying semantic intent. To address this, we propose \textbf{Residual Semantic Steering (RSS)}, a probabilistic framework that disentangles physical affordance from semantic execution. RSS introduces two theoretical innovations: (1) \textbf{Monte Carlo Syntactic Integration}, which approximates the true semantic posterior via dense, -driven distributional expansion, and (2) \textbf{Residual Affordance Steering}, a dual-stream mechanism that explicitly isolates the causal influence of language by subtracting the visual affordance prior. Theoretical analysis suggests that RSS effectively maximizes the mutual information between action and intent while suppressing visual distractors. Empirical results across diverse manipulation benchmarks demonstrate that RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.

ResTok Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

Authors: Xu Zhang, Cheng Da, Huan Yang, Kun Gai, Ming Lu, Zhan Ma

2026-01-07

http://arxiv.org/abs/2601.03955v1

Existing 1D visual tokenizers for autoregressive (AR) generation largely follow the design principles of language modeling, as they are built directly upon s whose priors originate in language, yielding single-hierarchy latent tokens and treating visual data as flat sequential token streams. However, this language-like formulation overlooks key properties of vision, particularly the hierarchical and residual network designs that have long been essential for convergence and efficiency in visual models. To bring "vision" back to vision, we propose the Residual Tokenizer (ResTok), a 1D visual tokenizer that builds hierarchical residuals for both image tokens and latent tokens. The hierarchical representations obtained through progressively merging enable cross-level feature fusion at each layer, substantially enhancing representational capacity. Meanwhile, the semantic residuals between hierarchies prevent information , yielding more concentrated latent distributions that are easier for AR modeling. Cross-level bindings consequently emerge without any explicit constraints. To accelerate the generation process, we further introduce a hierarchical AR generator that substantially reduces sampling steps by predicting an entire level of latent tokens at once rather than generating them strictly token-by-token. Extensive experiments demonstrate that restoring hierarchical residual priors in visual tokenization significantly improves AR image generation, achieving a gFID of 2.34 on ImageNet-256 with only 9 sampling steps. Code is available at https://github.com/Kwai-Kolors/ResTok.

FocusUI Efficient UI Grounding via Position-Preserving Visual Token Selection

Authors: Mingyu Ouyang, Kevin Qinghong Lin, Mike Zheng Shou, Hwee Tou Ng

2026-01-07

http://arxiv.org/abs/2601.03928v1

Vision-Language Models (VLMs) have shown remarkable performance in User Interface (UI) grounding tasks, driven by their ability to process increasingly high-resolution screenshots. However, screenshots are tokenized into thousands of visual tokens (e.g., about 4700 for 2K resolution), incurring significant computational overhead and diluting attention. In contrast, humans typically focus on regions of interest when interacting with UI. In this work, we pioneer the task of efficient UI grounding. Guided by practical analysis of the task's characteristics and challenges, we propose FocusUI, an efficient UI grounding framework that selects patches most relevant to the instruction while pre positional continuity for precise grounding. FocusUI addresses two key challenges: (1) Eliminating redundant tokens in visual encoding. We construct patch-level supervision by fusing an instruction-conditioned score with a rule-based UI-graph score that down-weights large homogeneous regions to select distinct and instruction-relevant visual tokens. (2) Pre positional continuity during visual token selection. We find that general visual token methods suffer from severe accuracy degradation on UI grounding tasks due to broken positional information. We introduce a novel PosPad strategy, which compresses each contiguous sequence of dropped visual tokens into a single special marker placed at the sequence's last index to preserve positional continuity. Comprehensive experiments on four grounding benchmarks demonstrate that FocusUI surpasses GUI-specific baselines. On the ScreenSpot-Pro benchmark, FocusUI-7B achieves a performance improvement of 3.7% over GUI-Actor-7B. Even with only 30% visual token retention, FocusUI-7B drops by only 3.2% while achieving up to 1.44x faster inference and 17% lower peak GPU memory.

Decide Then Retrieve A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval

Authors: Wang Chen, Guanqiang Qi, Weikang Li, Yang Li, Deguo Xia, Jizhou Huang

2026-01-07

http://arxiv.org/abs/2601.03908v1

Retrieval-augmented generation (RAG) enhances large language models (s) by incorporating external knowledge, but existing approaches indiscriminately trigger retrieval and rely on single-path evidence construction, often introducing noise and limiting performance gains. In this work, we propose Decide Then Retrieve (DTR), a training-free framework that adaptively determines when retrieval is necessary and how external information should be selected. DTR leverages generation uncertainty to guide retrieval triggering and introduces a dual-path retrieval mechanism with adaptive information selection to better handle and ambiguous queries. Extensive experiments across five open-domain QA benchmarks, multiple model scales, and different retrievers demonstrate that DTR consistently improves EM and F1 over standard RAG and strong retrieval-enhanced baselines, while reducing unnecessary retrievals. The code and data used in this paper are available at https://github.com/ChenWangHKU/DTR.

A Privacy-Preserving Localization Scheme with Node Selection in Mobile Networks

Authors: Liangbo Xie, Mude Cai, Xiaolong Yang, Mu Zhou, Jiacheng Wang, Dusit Niyato

2026-01-07

http://arxiv.org/abs/2601.04280v1

Localization in mobile networks has been widely applied in many scenarios. However, an entity responsible for location estimation exposes both the target and anchors to potential location leakage at any time, creating serious security risks. Although existing studies have proposed privacy-pre localization algorithms, they still face challenges of insufficient positioning accuracy and excessive overhead. In this article, we propose a privacy-pre localization scheme, named PPLZN. PPLZN protects protects the location privacy of both the target and anchor nodes in crowdsourced localization. Simulation results validate the effectiveness of PPLZN. Evidently, it can achieve accurate position estimation without location leakage and outperform state-of-the-art approaches in both positioning accuracy and overhead. In addition, PPLZN significantly reduces computational and overhead in large-scale deployments, making it well-fitted for practical privacy-pre localization in resource-constrained networks.

Feature-Aware One-Shot Federated Learning via Hierarchical Token Sequences

Authors: Shudong Liu, Hanwen Zhang, Xiuling Wang, Yuesheng Zhu, Guibo Luo

2026-01-07

http://arxiv.org/abs/2601.03882v1

One-shot federated learning (OSFL) reduces the cost and privacy risks of iterative federated learning by constructing a global model with a single round of . However, most existing methods struggle to achieve robust performance on real-world domains such as medical imaging, or are inefficient when handling non-IID (Independent and Identically Distributed) data. To address these limitations, we introduce FALCON, a framework that enhances the effectiveness of OSFL over non-IID image data. The core idea of FALCON is to leverage the feature-aware hierarchical token sequences generation and knowledge distillation into OSFL. First, each client leverages a pretrained visual encoder with hierarchical scale encoding to compress images into hierarchical token sequences, which capture multi-scale semantics. Second, a multi-scale autoregressive generator is used to model the distribution of these token sequences and generate the synthetic sequences. Third, clients upload the synthetic sequences along with the local classifier trained on the real token sequences to the server. Finally, the server incorporates knowledge distillation into global training to reduce reliance on precise distribution modeling. Experiments on medical and natural image datasets validate the effectiveness of FALCON in diverse non-IID scenarios, outperforming the best OSFL baselines by 9.58% in average accuracy.

Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification

Authors: Anthony Lamelas

2026-01-07

http://arxiv.org/abs/2601.03874v1

Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size and computation cost make them difficult to access, deploy, and secure in many settings. This paper investigates whether small, r-only language models can provide an efficient alternative for the tasks of grammar correction and text simplification. The experiments in this paper focus on testing small language models out of the box, fine-tuned, and run sequentially on the JFLEG and ASSET datasets using established metrics. The results show that while SLMs may learn certain behaviors well, their performance remains below strong baselines and current s. The results also show that SLMs struggle with retaining meaning and hallucinations. These findings suggest that despite their efficiency advantages, current SLMs are not yet competitive enough with modern s for rewriting, and further advances in training are required for SLMs to close the performance gap between them and today's s.

When Numbers Start Talking Implicit Numerical Coordination Among LLM-Based Agents

Authors: Alessio Buscemi, Daniele Proverbio, Alessandro Di Stefano, The Anh Han, German Castignani, Pietro Liò

2026-01-07

http://arxiv.org/abs/2601.03846v1

s-based agents increasingly operate in multi-agent environments where strategic interaction and coordination are required. While existing work has largely focused on individual agents or on interacting agents sharing explicit , less is known about how interacting agents coordinate implicitly. In particular, agents may engage in covert , relying on indirect or non-linguistic signals embedded in their actions rather than on explicit messages. This paper presents a game-theoretic study of covert in -driven multi-agent systems. We analyse interactions across four canonical game-theoretic settings under different regimes, including explicit, restricted, and absent . Considering heterogeneous agent personalities and both one-shot and repeated games, we characterise when covert signals emerge and how they shape coordination and strategic outcomes.

Where meaning lives Layer-wise accessibility of psycholinguistic features in encoder and decoder language models

Authors: Taisiia Tikhomirova, Dirk U. Wulff

2026-01-07

http://arxiv.org/abs/2601.03798v1

Understanding where language models encode psychologically meaningful aspects of meaning is essential for both theory and practice. We conduct a systematic layer-wise probing study of 58 psycholinguistic features across 10 models, spanning encoder-only and r-only architectures, and compare three embedding extraction methods. We find that apparent localization of meaning is strongly method-dependent: contextualized embeddings yield higher feature-specific selectivity and different layer-wise profiles than isolated embeddings. Across models and methods, final-layer representations are rarely optimal for recovering psycholinguistic information with linear probes. Despite these differences, models exhibit a shared depth ordering of meaning dimensions, with lexical properties peaking earlier and experiential and affective dimensions peaking later. Together, these results show that where meaning "lives" in models reflects an interaction between methodological choices and architectural constraints.

Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework

Authors: Xiaoyu Luo, Yiyi Chen, Qiongxiu Li, Johannes Bjerva

2026-01-07

http://arxiv.org/abs/2601.03791v1

Large Language Models (s) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization evaluation for s, arguing that PII leakage should be evaluated under low lexical cue conditions, where target PII cannot be reconstructed through prompt-induced generalization or pattern completion. We formalize Cue-Resistant Memorization (CRM) as a cue-controlled evaluation framework and a necessary condition for valid memorization evaluation, explicitly conditioning on prompt-target cues. Using CRM, we conduct a large-scale multilingual re-evaluation of PII leakage across 32 languages and multiple memorization paradigms. Revisiting reconstruction-based settings, including verbatim prefix-suffix completion and associative reconstruction, we find that their apparent effectiveness is driven primarily by direct surface-form cues rather than by true memorization. When such cues are controlled for, reconstruction success diminishes substantially. We further examine cue-free generation and membership inference, both of which exhibit extremely low true positive rates. Overall, our results suggest that previously reported PII leakage is better explained by cue-driven behavior than by genuine memorization, highlighting the importance of cue-controlled evaluation for reliably quantifying privacy-relevant memorization in s.

Improving Compactness and Reducing Ambiguity of CFIRE Rule-Based Explanations

Authors: Sebastian Müller, Tobias Schneider, Ruben Kemna, Vanessa Toborek

2026-01-07

http://arxiv.org/abs/2601.03776v1

Models trained on tabular data are widely used in sensitive domains, increasing the demand for explanation methods to meet transparency needs. CFIRE is a recent algorithm in this domain that constructs compact surrogate rule models from local explanations. While effective, CFIRE may assign rules associated with different classes to the same sample, introducing ambiguity. We investigate this ambiguity and propose a post-hoc strategy that removes rules with low contribution or conflicting coverage, yielding smaller and less ambiguous models while pre fidelity. Experiments across multiple datasets confirm these improvements with minimal impact on predictive performance.

EDCO Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning

Authors: Jing-Cheng Pang, Liu Sun, Chang Zhou, Xian Tang, Haichuan Ma, Kun Jiang, Jianlong Wang, Kai Zhang, Sijie Wu, Haoran Cai, Chenwei Wu, Xubin Li, Xin Chen

2026-01-07

http://arxiv.org/abs/2601.03725v1

Domain-specific large language models (s), typically developed by fine-tuning a pre-trained general-purpose on specialized datasets, represent a significant advancement in applied AI. A common strategy in fine-tuning is curriculum learning, which pre-orders training samples based on metrics like difficulty to improve learning efficiency compared to a random sampling strategy. However, most existing methods for fine-tuning rely on a static curriculum, designed prior to training, which lacks adaptability to the model's evolving needs during fine-tuning. To address this, we propose EDCO, a novel framework based on two key concepts: inference entropy and dynamic curriculum orchestration. Inspired by recent findings that maintaining high answer entropy benefits long-term reasoning gains, EDCO prioritizes samples with high inference entropy in a continuously adapted curriculum. EDCO integrates three core components: an efficient entropy estimator that uses prefix tokens to approximate full-sequence entropy, an entropy-based curriculum generator that selects data points with the highest inference entropy, and an trainer that optimizes the model on the selected curriculum. Comprehensive experiments in , medicine and law domains, EDCO outperforms traditional curriculum strategies for fine-tuning Qwen3-4B and Llama3.2-3B models under supervised and reinforcement learning settings. Furthermore, the proposed efficient entropy estimation reduces computational time by 83.5% while maintaining high accuracy.

Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR

Authors: Yunhao Liang, Ruixuan Ying, Bo Li, Hong Li, Kai Yan, Qingwen Li, Min Yang, Okamoto Satoshi, Zhe Cui, Shiwen Ni

2026-01-07

http://arxiv.org/abs/2601.03714v2

DeepSeek-OCR utilizes an optical 2D mapping approach to achieve high-ratio vision-text , claiming to text tokens exceeding ten times the input visual tokens. While this suggests a promising solution for the long-context bottleneck, we investigate a critical question: "Visual merit or linguistic crutch - which drives DeepSeek-OCR's performance?" By employing sentence-level and word-level semantic corruption, we isolate the model's intrinsic OCR capabilities from its language priors. Results demonstrate that without linguistic support, DeepSeek-OCR's performance plummets from approximately 90% to 20%. Comparative benchmarking against 13 baseline models reveals that traditional pipeline OCR methods exhibit significantly higher robustness to such semantic perturbations than end-to-end methods. Furthermore, we find that lower visual token counts correlate with increased reliance on priors, exacerbating hallucination risks. Context stress testing also reveals a total model collapse around 10,000 text tokens, suggesting that current optical techniques may paradoxically aggravate the long-context bottleneck. This study empirically defines DeepSeek-OCR's capability boundaries and offers essential insights for future optimizations of the vision-text paradigm. We release all data, results and scripts used in this study at https://github.com/dududuck00/DeepSeekOCR.

How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs

Authors: Su-Hyeon Kim, Hyundong Jin, Yejin Lee, Yo-Sub Han

2026-01-07

http://arxiv.org/abs/2601.03662v1

Large Reasoning Models (LRMs) achieve remarkable success through explicit thinking steps, yet the thinking steps introduce a novel risk by potentially amplifying unsafe behaviors. Despite this vulnerability, conventional defense mechanisms remain ineffective as they overlook the unique reasoning dynamics of LRMs. In this work, we find that the emergence of safe-reminding phrases within thinking steps plays a pivotal role in ensuring LRM safety. Motivated by this finding, we propose SafeRemind, a -time defense method that dynamically injects safe-reminding phrases into thinking steps. By leveraging entropy triggers to intervene at decision-locking points, SafeRemind redirects potentially harmful trajectories toward safer outcomes without requiring any parameter updates. Extensive evaluations across five LRMs and six benchmarks demonstrate that SafeRemind substantially enhances safety, achieving improvements of up to 45.5%p while pre core reasoning utility.

LLM-MC-Affect LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight

Authors: Yu-Zheng Lin, Bono Po-Jen Shih, John Paul Martin Encinas, Elizabeth Victoria Abraham Achom, Karan Himanshu Patel, Jesus Horacio Pacheco, Sicong Shao, Jyotikrishna Dass, Soheil Salehi, Pratik Satam

2026-01-07

http://arxiv.org/abs/2601.03645v1

Emotional coordination is a core property of human interaction that shapes how relational meaning is constructed in real time. While text-based affect inference has become increasingly feasible, prior approaches often treat sentiment as a deterministic point estimate for individual speakers, failing to capture the inherent subjectivity, latent ambiguity, and sequential coupling found in mutual exchanges. We introduce -MC-Affect, a probabilistic framework that characterizes emotion not as a static label, but as a continuous latent probability distribution defined over an affective space. By leveraging stochastic and Monte Carlo estimation, the methodology approximates these distributions to derive high-fidelity sentiment trajectories that explicitly quantify both central affective tendencies and perceptual ambiguity. These trajectories enable a structured analysis of interpersonal coupling through sequential cross-correlation and slope-based indicators, identifying leading or lagging influences between interlocutors. To validate the interpretive capacity of this approach, we utilize teacher-student instructional dialogues as a representative case study, where our quantitative indicators successfully distill high-level interaction insights such as effective scaffolding. This work establishes a scalable and deployable pathway for understanding interpersonal dynamics, offering a generalizable solution that extends beyond education to broader social and behavioral research.

Architecting Agentic Communities using Design Patterns

Authors: Zoran Milosevic, Fethi Rabhi

2026-01-07

http://arxiv.org/abs/2601.03624v1

The rapid evolution of Large Language Models () and subsequent Agentic AI technologies requires systematic architectural guidance for building sophisticated, production-grade systems. This paper presents an approach for architecting such systems using design patterns derived from enterprise distributed systems standards, formal methods, and industry practice. We classify these patterns into three tiers: Agents (task-specific automation), Agentic AI (adaptive goal-seekers), and Agentic Communities (organizational frameworks where AI agents and human participants coordinate through formal roles, protocols, and governance structures). We focus on Agentic Communities - coordination frameworks encompassing Agents, Agentic AI entities, and humans - most relevant for enterprise and industrial applications. Drawing on established coordination principles from distributed systems, we ground these patterns in a formal framework that specifies collaboration agreements where AI agents and humans fill roles within governed ecosystems. This approach provides both practical guidance and formal verification capabilities, enabling expression of organizational, legal, and ethical rules through accountability mechanisms that ensure operational and verifiable governance of inter-agent , negotiation, and intent modeling. We validate this framework through a clinical trial matching case study. Our goal is to provide actionable guidance to practitioners while maintaining the formal rigor essential for enterprise deployment in dynamic, multi-agent ecosystems.

Safety-Utility Conflicts Are Not Global Surgical Alignment via Head-Level Diagnosis

Authors: Wang Cai, Yilin Wen, Jinchang Hou, Du Su, Guoqiu Wang, Zhonghou Lv, Chenfu Bao, Yunfang Wu

2026-01-07

http://arxiv.org/abs/2601.04262v1

Safety alignment in Large Language Models (s) inherently presents a multi-objective optimization conflict, often accompanied by an unintended degradation of general capabilities. Existing mitigation strategies typically rely on global gradient geometry to resolve these conflicts, yet they overlook Modular Heterogeneity within Transformers, specifically that the functional sensitivity and degree of conflict vary substantially across different attention heads. Such global approaches impose uniform update rules across all parameters, often resulting in suboptimal trade-offs by indiscriminately updating utility sensitive heads that exhibit intense gradient conflicts. To address this limitation, we propose Conflict-Aware Sparse Tuning (CAST), a framework that integrates head-level diagnosis with fine-tuning. CAST first constructs a pre-alignment conflict map by synthesizing Optimization Conflict and Functional Sensitivity, which then guides the selective update of parameters. Experiments reveal that alignment conflicts in s are not uniformly distributed. We find that the drop in general capabilities mainly comes from updating a small group of ``high-conflict'' heads. By simply skipping these heads during training, we significantly reduce this loss without compromising safety, offering an interpretable and parameter-efficient approach to improving the safety-utility trade-off.

Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models

Authors: Hang Fu, Wanli Peng, Yinghan Zhou, Jiaxuan Wu, Juan Wen, Yiming Xue

2026-01-07

http://arxiv.org/abs/2601.04261v1

The widespread adoption of Large Language Model () in commercial and research settings has intensified the need for robust intellectual property protection. Backdoor-based fingerprinting has emerged as a promising solution for this challenge. In practical application, the low-cost multi-model collaborative technique, ensemble, combines diverse s to leverage their complementary strengths, garnering significant attention and practical adoption. Unfortunately, the vulnerability of existing fingerprinting for the ensemble scenario is unexplored. In order to comprehensively assess the robustness of fingerprinting, in this paper, we propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA). The TFA gets the next token from a unified set of tokens created by the token filter mechanism at each step. The SVA filters out fingerprint responses through a sentence verification mechanism based on perplexity and voting. Experimentally, the proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance. The findings necessitate enhanced robustness in fingerprinting.

PhysicsFormer An Efficient and Fast Attention-Based Physics Informed Neural Network for Solving Incompressible Navier Stokes Equations

Authors: Biswanath Barman, Debdeep Chatterjee, Rajendra K. Ray

2026-01-07

http://arxiv.org/abs/2601.03613v1

Traditional experimental and numerical approaches for fluid dynamics problems often suffer from high computational cost, mesh sensitivity, and limited capability in capturing complex physical behaviors. Moreover, conventional physics-informed neural networks (PINNs) frequently struggle in chaotic and highly unsteady flow regimes. In this work, we propose \textit{PhysicsFormer}, a fast and efficient -based physics-informed framework that incorporates multi-head encoder-r cross-attention. Unlike multilayer perceptron-based PINNs, PhysicsFormer operates on sequential representations constructed from spatio-temporal data, enabling effective learning of long-range temporal dependencies and improved propagation of initial condition information. A data-embedding strategy is employed to convert spatio-temporal points into pseudo-sequences, while a dynamics-weighted loss function replaces the standard PINNs formulation. Owing to its parallel learning structure, PhysicsFormer demonstrates superior computational efficiency compared to existing -based approaches. The framework is validated on Burgers' equation and flow reconstruction governed by the Navier-Stokes equations, achieving mean squared errors on the order of $10^{-6}$ . In addition, an inverse problem involving parameter identification in the two-dimensional incompressible Navier-Stokes equations is investigated. For clean data, PhysicsFormer achieves zero identification error for both $λ_1$ and $λ_2$ ; under $1\%$ Gaussian noise, the errors are $0.07\%$ for $λ_1$ and $0\%$ for $λ_2$ . These results demonstrate that PhysicsFormer provides a reliable and computationally efficient surrogate modeling framework for time-dependent fluid flow problems.

Jailbreaking LLMs & VLMs Mechanisms, Evaluation, and Unified Defense

Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He

2026-01-07

http://arxiv.org/abs/2601.03594v1

This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (s) and Vision-Language Models (VLMs), emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It further differentiates between hallucinations and jailbreaks in terms of intent and triggering mechanisms. We propose a three-dimensional survey framework: (1) Attack dimension-including template/encoding-based, in-context learning manipulation, reinforcement/adversarial learning, -assisted and fine-tuned attacks, as well as prompt- and image-level perturbations and agent-based transfer in VLMs; (2) Defense dimension-encompassing prompt-level obfuscation, output evaluation, and model-level alignment or fine-tuning; and (3) Evaluation dimension-covering metrics such as Attack Success Rate (ASR), toxicity score, query/time cost, and multimodal Clean Accuracy and Attribute Success Rate. Compared with prior works, this survey spans the full spectrum from text-only to multimodal settings, consolidating shared mechanisms and proposing unified defense principles: variant-consistency and gradient-sensitivity detection at the perception layer, safety-aware and output review at the generation layer, and adversarially augmented preference alignment at the parameter layer. Additionally, we summarize existing multimodal safety benchmarks and discuss future directions, including automated red teaming, cross-modal collaborative defense, and standardized evaluation.

DiffCoT Diffusion-styled Chain-of-Thought Reasoning in LLMs

Authors: Shidong Cao, Hongzhan Lin, Yuxuan Gu, Ziyang Luo, Jing Ma

2026-01-07

http://arxiv.org/abs/2601.03559v1

Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive . In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while pre token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.

Layer-Order Inversion Rethinking Latent Multi-Hop Reasoning in Large Language Models

Authors: Xukai Liu, Ye Liu, Jipeng Zhang, Yanghai Zhang, Kai Zhang, Qi Liu

2026-01-07

http://arxiv.org/abs/2601.03542v1

Large language models (s) perform well on multi-hop reasoning, yet how they internally compose multiple facts remains unclear. Recent work proposes \emph{hop-aligned circuit hypothesis}, suggesting that bridge entities are computed sequentially across layers before later-hop answers. Through systematic analyses on real-world multi-hop queries, we show that this hop-aligned assumption does not generalize: later-hop answer entities can become decodable earlier than bridge entities, a phenomenon we call \emph{layer-order inversion}, which strengthens with total hops. To explain this behavior, we propose a \emph{probabilistic recall-and-extract} framework that models multi-hop reasoning as broad probabilistic recall in shallow MLP layers followed by selective extraction in deeper attention layers. This framework is empirically validated through systematic probing analyses, reinterpreting prior layer-wise evidence, explaining chain-of-thought gains, and providing a mechanistic diagnosis of multi-hop failures despite correct single-hop knowledge. Code is available at https://github.com/laquabe/Layer-Order-Inversion.

IntroLM Introspective Language Models via Prefilling-Time Self-Evaluation

Authors: Hossein Hosseini Kasnavieh, Gholamreza Haffari, Chris Leckie, Adel N. Toosi

2026-01-07

http://arxiv.org/abs/2601.03511v1

A major challenge for the operation of large language models (s) is how to predict whether a specific will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the ing phase without affecting generation using introspective tokens. By introducing token conditional LoRA that activates only for the introspective token, the model learns to predict the output quality for a given query while pre the original backbone behavior and avoiding external evaluators. On question answering benchmarks, IntroLM applied to Qwen3 8B achieves a ROC AUC of 90 precent for success prediction, outperforming a DeBERTa classifier by 14 precent. When integrated into multi model routing systems, IntroLM achieves superior cost performance tradeoffs, reducing latency by up to 33 precent and large model usage by up to 50 precent at matched reliability.

Beyond Perplexity A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning

Authors: Soheil Zibakhsh Shabgahi, Pedram Aghazadeh, Farinaz Koushanfar

2026-01-07

http://arxiv.org/abs/2601.03505v1

Supervised Fine-Tuning (SFT) is a standard approach for injecting domain knowledge into Large Language Models (s). However, relying on validation perplexity to monitor training is often insufficient, as it confounds stylistic mimicry with genuine factual internalization. To address this, we introduce the Knowledge Retention (KR) Test , a lightweight, corpus-grounded evaluation framework designed to distinguish factual learning from linguistics. KR-Test utilizes automatically generated contrastive examples to measure likelihood preferences for correct versus incorrect continuations, requiring no instruction tuning or generative . We validate the framework's integrity through a "blind vs. oracle" baseline analysis. Furthermore, we demonstrate the diagnostic capabilities of KR-Test by analyzing the training dynamics of Low-Rank Adaptation (LoRA). By exposing the fine-grained dissociation between linguistic convergence and knowledge retention, KR-Test enhances the interpretability of fine-tuning dynamics.

From Bits to Chips An LLM-based Hardware-Aware Quantization Agent for Streamlined Deployment of LLMs

Authors: Kaiyuan Deng, Hangyu Zheng, Minghai Qing, Kunxiong Zhu, Gen Li, Yang Xiao, Lan Emily Zhang, Linke Guo, Bo Hui, Yanzhi Wang, Geng Yuan, Gagan Agrawal, Wei Niu, Xiaolong Ma

2026-01-07

http://arxiv.org/abs/2601.03484v1

Deploying models, especially large language models (s), is becoming increasingly attractive to a broader user base, including those without specialized expertise. However, due to the resource constraints of certain hardware, maintaining high accuracy with larger model while meeting the hardware requirements remains a significant challenge. Model technique helps mitigate memory and compute bottlenecks, yet the added complexities of tuning and deploying d models further exacerbates these challenges, making the process unfriendly to most of the users. We introduce the Hardware-Aware Quantization Agent (HAQA), an automated framework that leverages s to streamline the entire and deployment process by enabling efficient hyperparameter tuning and hardware configuration, thereby simultaneously improving deployment quality and ease of use for a broad range of users. Our results demonstrate up to a 2.3x speedup in inference, along with increased throughput and improved accuracy compared to unoptimized models on Llama. Additionally, HAQA is designed to implement adaptive strategies across diverse hardware platforms, as it automatically finds optimal settings even when they appear counterintuitive, thereby reducing extensive manual effort and demonstrating superior adaptability. Code will be released.