2026-01-09
Table of Contents
- Multivector Reranking in the Era of Strong First-Stage Retrievers
- SimuAgent An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
- RelayLLM Efficient Reasoning via Collaborative Decoding
- Token-Level LLM Collaboration via FusionRoute
- Semantically Orthogonal Framework for Citation Classification Disentangling Intent and Content
- Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts
- Driving on Registers
- Milestones over Outcome Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward
- Challenges and Research Directions for Large Language Model Inference Hardware
- ArcAligner Adaptive Recursive Aligner for Compressed Context Embeddings in RAG
- SparseLaneSTP Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
- Asynchronous Secure Federated Learning with Byzantine aggregators
- Distributed Online Convex Optimization with Efficient Communication Improved Algorithm and Lower bounds
- SCALERSynthetic Scalable Adaptive Learning Environment for Reasoning
- AgentOCR Reimagining Agent History via Optical Self-Compression
- ATPO Agentic Turn-based Policy Optimization via Tree Search
- When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail
- GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models
- Beyond Monolithic Architectures A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search
- ResMAS Resilience Optimization in LLM-based Multi-agent Systems
- LAMB LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence
- LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
- Succeeding at Scale Automated Multi-Retriever Fusion and Query-Side Adaptation for Multi-Tenant Search
- All Changes May Have Invariant Principles Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction
- Reasoning Over Space Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation
- Discrete Fourier Transform-based Point Cloud Compression for Efficient SLAM in Featureless Terrain
- Self-MedRAG a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering
- LinguaGame A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation
- Invisible Walls Privacy-Preserving ISAC Empowered by Reconfigurable Intelligent Surfaces
- Convergence Rates for Learning Pseudo-Differential Operators
- Re-Rankers as Relevance Judges
- SpectraFormer an Attention-Based Raman Unmixing Tool for Accessing the Graphene Buffer-Layer Signature on SiC
- Rate or Fate? RLVR Reinforcement Learning with Verifiable Noisy Rewards
- LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
- Phasor Agents Oscillatory Graphs with Three-Factor Plasticity and Sleep-Staged Learning
- PackCache A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
- RIGOURATE Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation
- SCAR-GS Spatial Context Attention for Residuals in Progressive Gaussian Splatting
- Stable Language Guidance for Vision-Language-Action Models
- ResTok Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
- FocusUI Efficient UI Grounding via Position-Preserving Visual Token Selection
- Decide Then Retrieve A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval
- A Privacy-Preserving Localization Scheme with Node Selection in Mobile Networks
- Feature-Aware One-Shot Federated Learning via Hierarchical Token Sequences
- Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification
- When Numbers Start Talking Implicit Numerical Coordination Among LLM-Based Agents
- Where meaning lives Layer-wise accessibility of psycholinguistic features in encoder and decoder language models
- Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework
- Improving Compactness and Reducing Ambiguity of CFIRE Rule-Based Explanations
- EDCO Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning
- Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
- How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
- LLM-MC-Affect LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight
- Architecting Agentic Communities using Design Patterns
- Safety-Utility Conflicts Are Not Global Surgical Alignment via Head-Level Diagnosis
- Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models
- PhysicsFormer An Efficient and Fast Attention-Based Physics Informed Neural Network for Solving Incompressible Navier Stokes Equations
- Jailbreaking LLMs & VLMs Mechanisms, Evaluation, and Unified Defense
- DiffCoT Diffusion-styled Chain-of-Thought Reasoning in LLMs
- Layer-Order Inversion Rethinking Latent Multi-Hop Reasoning in Large Language Models
- IntroLM Introspective Language Models via Prefilling-Time Self-Evaluation
- Beyond Perplexity A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning
- From Bits to Chips An LLM-based Hardware-Aware Quantization Agent for Streamlined Deployment of LLMs
Multivector Reranking in the Era of Strong First-Stage Retrievers
Authors: Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
2026-01-08
Learned multivector representations power modern search systems with strong retrieval effectiveness, but their real-world use is limited by the high cost of exhaustive token-level retrieval. Therefore, most systems adopt a \emph{gather-and-refine} strategy, where a lightweight gather phase selects candidates for full scoring. However, this approach requires expensive searches over large token-level indexes and often misses the documents that would rank highest under full similarity. In this paper, we reproduce several state-of-the-art multivector retrieval methods on two publicly available datasets, providing a clear picture of the current multivector retrieval field and ob the inefficiency of token-level gathering. Building on top of that, we show that replacing the token-level gather phase with a single-vector document retriever -- specifically, a learned
retriever (LSR) -- produces a smaller and more semantically coherent candidate set. This recasts the gather-and-refine pipeline into the well-established two-stage retrieval architecture. As retrieval latency decreases, query encoding with two neural encoders becomes the dominant computational bottleneck. To mitigate this, we integrate recent inference-free LSR methods, demonstrating that they preserve the retrieval effectiveness of the dual-encoder pipeline while substantially reducing query encoding time. Finally, we investigate multiple reranking configurations that balance efficiency, memory, and effectiveness, and we introduce two optimization techniques that prune low-quality candidates early. Empirical results show that these techniques improve retrieval efficiency by up to 1.8 with no loss in quality. Overall, our two-stage approach achieves over speedup over the state-of-the-art multivector retrieval systems, while maintaining comparable or superior retrieval quality.
SimuAgent An LLM-Based Simulink Modeling Assistant Enhanced with Reinforcement Learning
Authors: Yanchang Liang, Xiaowei Zhao
2026-01-08
Large language models (s) have revolutionized text-based code automation, but their potential in graph-oriented engineering workflows remains under-explored. We introduce SimuAgent, an
-powered modeling and simulation agent tailored for Simulink. SimuAgent replaces verbose XML with a concise, dictionary-style Python representation, dramatically cutting token counts, improving interpretability, and enabling fast, in-process simulation. A lightweight plan-execute architecture, trained in two stages, equips the agent with both low-level tool skills and high-level design reasoning. To tackle
rewards in long-horizon tasks, we propose Reflection-GRPO (ReGRPO), which augments Group Relative Policy Optimization (GRPO) with self-reflection traces that supply rich intermediate feedback, accelerating convergence and boosting robustness. Experiments on SimuBench, our newly released benchmark comprising 5300 multi-domain modeling tasks, show that a Qwen2.5-7B model fine-tuned with SimuAgent converges faster and achieves higher modeling accuracy than standard RL baselines, and even surpasses GPT-4o when evaluated with few-shot prompting on the same benchmark. Ablations confirm that the two-stage curriculum and abstract-reconstruct data augmentation further enhance generalization. SimuAgent trains and runs entirely on-premise with modest hardware, delivering a privacy-pre
, cost-effective solution for industrial model-driven engineering. SimuAgent bridges the gap between
s and graphical modeling environments, offering a practical solution for AI-assisted engineering design in industrial settings.
RelayLLM Efficient Reasoning via Collaborative Decoding
Authors: Chengsong Huang, Tong Zheng, Langlin Huang, Jinyuan Li, Haolin Liu, Jiaxin Huang
2026-01-08
Large Language Models (s) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessary reasoning capacity. Existing collaborative approaches, such as cascading or routing, operate at a coarse granularity by offloading entire queries to
s, resulting in significant computational waste when the SLM is capable of handling the majority of reasoning steps. To address this, we propose Relay
, a novel framework for efficient reasoning via token-level collaborative
. Unlike routers, Relay
empowers the SLM to act as an active controller that dynamically invokes the
only for critical tokens via a special command, effectively "relaying" the generation process. We introduce a two-stage training framework, including warm-up and Group Relative Policy Optimization (GRPO) to teach the model to balance independence with strategic help-seeking. Empirical results across six benchmarks demonstrate that Relay
achieves an average accuracy of 49.52%, effectively bridging the performance gap between the two models. Notably, this is achieved by invoking the
for only 1.07% of the total generated tokens, offering a 98.2% cost reduction compared to performance-matched random routers.
Token-Level LLM Collaboration via FusionRoute
Authors: Nuoya Xiong, Yuhang Zhou, Hanqing Zeng, Zhaorun Chen, Furong Huang, Shuchao Bi, Lizhu Zhang, Zhuokai Zhao
2026-01-08
Large language models (s) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-
collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each
step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal
policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.
Semantically Orthogonal Framework for Citation Classification Disentangling Intent and Content
Authors: Changxu Duan, Zhiyin Tan
2026-01-08
Understanding the role of citations is essential for research assessment and citation-aware digital libraries. However, existing citation classification frameworks often conflate citation intent (why a work is cited) with cited content type (what part is cited), limiting their effectiveness in auto classification due to a dilemma between fine-grained type distinctions and practical classification reliability. We introduce SOFT, a Semantically Orthogonal Framework with Two dimensions that explicitly separates citation intent from cited content type, drawing inspiration from semantic role theory. We systematically re-annotate the ACL-ARC dataset using SOFT and release a cross-disciplinary test set sampled from ACT2. Evaluation with both zero-shot and fine-tuned Large Language Models demonstrates that SOFT enables higher agreement between human annotators and s, and supports stronger classification performance and robust cross-domain generalization compared to ACL-ARC and SciCite annotation frameworks. These results confirm SOFT's value as a clear, reusable annotation standard, improving clarity, consistency, and generalizability for digital libraries and scholarly
infrastructures. All code and data are publicly available on GitHub https://github.com/zhiyintan/SOFT.
Multi-Disciplinary Dataset Discovery from Citation-Verified Literature Contexts
Authors: Zhiyin Tan, Changxu Duan
2026-01-08
Identifying suitable datasets for a research question remains challenging because existing dataset search engines rely heavily on metadata quality and keyword , which often fail to capture the semantic intent of scientific investigation. We introduce a literature-driven framework that discovers datasets from citation contexts in scientific papers, enabling retrieval grounded in actual research use rather than metadata availability. Our approach combines large-scale citation-context extraction, schema-guided dataset recognition with Large Language Models, and provenance-pre
entity resolution. We evaluate the system on eight survey-derived computer science queries and find that it achieves substantially higher recall than Google Dataset Search and DataCite Commons, with normalized recall ranging from an average of 47.47% to a highest value of 81.82%. Beyond recovering gold-standard datasets, the method also surfaces additional datasets not documented in the surveys. Expert assessments across five top-level Fields of Science indicate that a substantial portion of the additional datasets are considered high utility, and some are regarded as novel for the specific topics chosen by the experts. These findings establish citation-context mining as an effective and generalizable paradigm for dataset discovery, particularly in settings where datasets lack sufficient or reliable metadata. To support reproducibility and future extensions, we release our code, evaluation datasets, and results on GitHub (https://github.com/Fireblossom/citation-context-dataset-discovery).
Driving on Registers
Authors: Ellington Kirby, Alexandre Boulch, Yihong Xu, Yuan Yin, Gilles Puy, Éloi Zablocki, Andrei Bursuc, Spyros Gidaris, Renaud Marlet, Florent Bartoccioni, Anh-Quan Cao, Nermin Samet, Tuan-Hung VU, Matthieu Cord
2026-01-08
We present DrivoR, a simple and efficient -based architecture for end-to-end autonomous driving. Our approach builds on pretrained Vision Transformers (ViTs) and introduces camera-aware register tokens that compress multi-camera features into a compact scene representation, significantly reducing downstream computation without sacrificing accuracy. These tokens drive two lightweight
rs that generate and then score candidate trajectories. The scoring
r learns to mimic an oracle and predicts interpretable sub-scores representing aspects such as safety, comfort, and efficiency, enabling behavior-conditioned driving at inference. Despite its minimal design, DrivoR outperforms or matches strong contemporary baselines across NAVSIM-v1, NAVSIM-v2, and the photorealistic closed-loop HUGSIM benchmark. Our results show that a pure-
architecture, combined with targeted token
, is sufficient for accurate, efficient, and adaptive end-to-end driving. Code and checkpoints will be made available via the project page.
Milestones over Outcome Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward
Authors: Jianlong Chen, Daocheng Fu, Shengze Xu, Jiawei Chen, Yuan Feng, Yue Yang, Junchi Yan, Hongyuan Zha, Renqiu Xia
2026-01-08
Multimodal Large Language Models (Ms) struggle with complex geometric reasoning, largely because "black box" outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces
signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.
Challenges and Research Directions for Large Language Model Inference Hardware
Authors: Xiaoyu Ma, David Patterson
2026-01-08
Large Language Model () inference is hard. The autoregressive Decode phase of the underlying Transformer model makes
inference fundamentally different from training. Exacerbated by recent AI trends, the primary challenges are memory and interconnect rather than compute. To address these challenges, we highlight four architecture research opportunities: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup
. While our focus is datacenter AI, we also review their applicability for mobile devices.
ArcAligner Adaptive Recursive Aligner for Compressed Context Embeddings in RAG
Authors: Jianbo Li, Yi Jiang, Sendong Zhao, Bairui Hu, Haochun Wang, Bing Qin
2026-01-08
Retrieval-Augmented Generation (RAG) helps s stay accurate, but feeding long documents into a prompt makes the model slow and expensive. This has motivated context
, ranging from token
and summarization to embedding-based
. While researchers have tried ''compressing'' these documents into smaller summaries or mathematical embeddings, there is a catch: the more you compress the data, the more the
struggles to understand it. To address this challenge, we propose ArcAligner (Adaptive recursive context Aligner), a lightweight module integrated into the language model layers to help the model better utilize highly compressed context representations for downstream generation. It uses an adaptive ''gating'' system that only adds extra processing power when the information is complex, keeping the system fast. Across knowledge-intensive QA benchmarks, ArcAligner consistently beats
baselines at comparable
rates, especially on multi-hop and long-tail settings. The source code is publicly available.
SparseLaneSTP Leveraging Spatio-Temporal Priors with Sparse Transformers for 3D Lane Detection
Authors: Maximilian Pittner, Joel Janai, Mario Faigle, Alexandru Paul Condurache
2026-01-08
3D lane detection has emerged as a critical challenge in autonomous driving, encompassing identification and localization of lane markings and the 3D road surface. Conventional 3D methods detect lanes from dense birds-eye-viewed (BEV) features, though erroneous transformations often result in a poor feature representation misaligned with the true 3D road surface. While recent lane detectors have surpassed dense BEV approaches, they completely disregard valuable lane-specific priors. Furthermore, existing methods fail to utilize historic lane observations, which yield the potential to resolve ambiguities in situations of poor visibility. To address these challenges, we present SparseLaneSTP, a novel method that integrates both geometric properties of the lane structure and temporal information into a
lane
. It introduces a new lane-specific spatio-temporal attention mechanism, a continuous lane representation tailored for
architectures as well as temporal regularization. Identifying weaknesses of existing 3D lane datasets, we also introduce a precise and consistent 3D lane dataset using a simple yet effective auto-labeling strategy. Our experimental section proves the benefits of our contributions and demonstrates state-of-the-art performance across all detection and error metrics on existing 3D lane detection benchmarks as well as on our novel dataset.
Asynchronous Secure Federated Learning with Byzantine aggregators
Authors: Antonella Del Pozzo, Achille Desreumaux, Mathieu Gestin, Alexandre Rapetti, Sara Tucci-Piergiovanni
2026-01-08
Privacy-pre federated averaging is a central approach for protecting client privacy in federated learning. In this paper, we study this problem in an asynchronous
s setting with malicious aggregators. We propose a new solution to provide federated averaging in this model while protecting the client's data privacy through secure aggregation and differential privacy. Our solution maintains the same performance as the state of the art across all metrics. The main contributions of this paper are threefold. First, unlike existing single- or multi-server solutions, we consider malicious aggregation servers that may manipulate the model to leak clients' data or halt computation. To tolerate this threat, we replicate the aggregators, allowing a fraction of them to be corrupted. Second, we propose a new privacy preservation protocol for protocols in asynchronous
models with Byzantine aggregators. In this protocol, clients mask their values and add Gaussian noise to their models. In contrast with previous works, we use the replicated servers to unmask the models, while ensuring the liveness of training even if aggregators misbehave. Third, the asynchronous
model introduces new challenges not present in existing approaches. In such a setting, faster clients may contribute more frequently, potentially reducing their privacy and biasing the training. To address this, we introduce an inclusion mechanism that ensures uniform client participation and balanced privacy budgets. Interestingly, the solution presented in this paper does not rely on agreement between aggregators. Thus, we circumvent the known impossibility of consensus in asynchronous settings where processes might crash. Additionally, this feature increases availability, as a consensus-based algorithm only progresses in periods of low latency.
Distributed Online Convex Optimization with Efficient Communication Improved Algorithm and Lower bounds
Authors: Sifan Yang, Wenhao Yang, Wei Jiang, Lijun Zhang
2026-01-08
We investigate distributed online convex optimization with compressed , where learners connected by a network collaboratively minimize a sequence of global loss functions using only local information and compressed data from neighbors. Prior work has established regret bounds of and for convex and strongly convex functions, respectively, where is the
quality factor ( means no
) and is the spectral gap of the
matrix. However, these regret bounds suffer from a \emph{quadratic} or even \emph{quartic} dependence on . Moreover, the \emph{super-linear} dependence on is also undesirable. To overcome these limitations, we propose a novel algorithm that achieves improved regret bounds of and for convex and strongly convex functions, respectively. The primary idea is to design a \emph{two-level blocking update framework} incorporating two novel ingredients: an online gossip strategy and an error compensation scheme, which collaborate to \emph{achieve a better consensus} among learners. Furthermore, we establish the first lower bounds for this problem, justifying the optimality of our results with respect to both and . Additionally, we consider the bandit feedback scenario, and extend our method with the classic gradient estimators to enhance existing regret bounds.
SCALERSynthetic Scalable Adaptive Learning Environment for Reasoning
Authors: Caijun Xu, Changyi Xiao, Zhongyuan Peng, Xinrun Wang, Yixin Cao
2026-01-08
Reinforcement learning (RL) offers a principled way to enhance the reasoning capabilities of large language models, yet its effectiveness hinges on training signals that remain informative as models evolve. In practice, RL progress often slows when task difficulty becomes poorly aligned with model capability, or when training is dominated by a narrow set of recurring problem patterns. To jointly address these issues, we propose SCALER (Synthetic sCalable Adaptive Learning Environment for Reasoning), a framework that sustains effective learning signals through adaptive environment design. SCALER introduces a scalable synthesis pipeline that converts real-world programming problems into verifiable reasoning environments with controllable difficulty and unbounded instance generation, enabling RL training beyond finite datasets while pre strong correctness guarantees. Building on this, SCALER further employs an adaptive multi-environment RL strategy that dynamically adjusts instance difficulty and curates the active set of environments to track the model's capability frontier and maintain distributional diversity. This co-adaptation prevents reward
, mitigates overfitting to narrow task patterns, and supports sustained improvement throughout training. Extensive experiments show that SCALER consistently outperforms dataset-based RL baselines across diverse reasoning benchmarks and exhibits more stable, long-horizon training dynamics.
AgentOCR Reimagining Agent History via Optical Self-Compression
Authors: Lang Feng, Fuchao Yang, Feng Chen, Xin Cheng, Haiyang Xu, Zhenglin Wan, Ming Yan, Bo An
2026-01-08
Recent advances in large language models (s) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual
, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-
, where the agent actively emits a
rate and is trained with
-aware reward to adaptively balance task success and token efficiency. We conduct extensive experiments on challenging agentic benchmarks, ALFWorld and search-based QA. Remarkably, results demonstrate that AgentOCR preserves over 95\% of text-based agent performance while substantially reducing token consumption (>50\%), yielding consistent token and memory efficiency. Our further analysis validates a 20x rendering speedup from segment optical caching and the effective strategic balancing of self-
.
ATPO Agentic Turn-based Policy Optimization via Tree Search
Authors: Zefang Zong, Dingwei Chen, Yang Li, Qi Yi, Bo Zhou, Chengming Li, Bo Qian, Peng Chen, Jie Jiang
2026-01-08
agents have emerged as powerful systems for tackling multi-turn tasks by interleaving internal reasoning and external tool interactions. Agentic Reinforcement Learning has recently drawn significant research attention as a critical post-training paradigm to further refine these capabilities. In this paper, we present ATPO (Agentic Turn-based Policy Optimization via Tree Search), a unified framework for multi-turn agentic RL that addresses three core challenges: limited exploration diversity,
credit assignment, and misaligned policy optimization. ATPO introduces a turn-level tree structure that jointly enables Entropy-Guided Tree Expansion for strategic exploration and Turn-wise Credit Assignment for fine-grained reward propagation from
outcomes. Complementing this, we propose Agentic Turn-based Policy Optimization, a turn-level learning objective that aligns policy updates with the natural decision granularity of agentic interactions. ATPO is orthogonal to tree search and can be readily integrated into any multi-turn RL pipeline. Experiments across seven benchmarks demonstrate consistent improvements over the state-of-the-art baseline by up to 1.84 percentage points in average, with ablation studies validating the effectiveness of each component. Our code is available at https://github.com/zzfoutofspace/ATPO.
When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail
Authors: Xiaoxiao Li
2026-01-08
Multi-agent AI systems have proven effective for complex reasoning. These systems are compounded by specialized agents, which collaborate through explicit , but incur substantial computational overhead. A natural question arises: can we achieve similar modularity benefits with a single agent that selects from a library of skills? We explore this question by viewing skills as internalized agent behaviors. From this perspective, a multi-agent system can be compiled into an equivalent single-agent system, trading inter-agent
for skill selection. Our preliminary experiments suggest this approach can substantially reduce token usage and latency while maintaining competitive accuracy on reasoning benchmarks. However, this efficiency raises a deeper question that has received little attention: how does skill selection scale as libraries grow?
Drawing on principles from cognitive science, we propose that
skill selection exhibits bounded capacity analogous to human decision-making. We investigate the scaling behavior of skill selection and observe a striking pattern. Rather than degrading gradually, selection accuracy remains stable up to a critical library size, then drops sharply, indicating a phase transition reminiscent of capacity limits in human cognition. Furthermore, we find evidence that semantic confusability among similar skills, rather than library size alone, plays a central role in this degradation. This perspective suggests that hierarchical organization, which has long helped humans manage complex choices, may similarly benefit AI systems. Our initial results with hierarchical routing support this hypothesis. This work opens new questions about the fundamental limits of semantic-based skill selection in
s and offers a cognitive-grounded framework and practical guidelines for designing scalable skill-based agents.
GPU-Accelerated INT8 Quantization for KV Cache Compression in Large Language Models
Authors: Maanas Taneja, Purab Shingvi
2026-01-08
The key-value ()
in large language models presents a significant memory bottleneck during inference, growing linearly with sequence length and often exceeding the memory footprint of model weights themselves. We implement and evaluate GPU-accelerated INT8
for
, achieving 4 memory reduction with minimal accuracy degradation. We develop four CUDA kernel variants -- naive, tiled, coarsened, and vectorized -- and benchmark them across realistic workload sizes up to 1 billion elements. Our vectorized kernel achieves up to 1,694 speedup over CPU baselines while maintaining reconstruction error below 0.004 and attention score error below 0.1 even for 8K-dimensional heads. These results demonstrate that INT8
provides a practical approach for reducing memory pressure in
inference with negligible computational overhead (6--58ms) and minimal impact on downstream model behavior
Beyond Monolithic Architectures A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search
Authors: Yiqun Chen, Lingyong Yan, Zixuan Yang, Erhan Zhang, Jiashu Zhao, Shuaiqiang Wang, Dawei Yin, Jiaxin Mao
2026-01-08
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (s) to interleave reasoning with tool use. However, prevailing systems rely on monolithic agents that suffer from structural bottlenecks, including unconstrained reasoning outputs that inflate trajectories,
outcome-level rewards that complicate credit assignment, and stochastic search noise that destabilizes learning. To address these challenges, we propose \textbf{M-ASK} (Multi-Agent Search and Knowledge), a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context. This decomposition allows each agent to focus on a well-defined subtask and reduces interference between search and context construction. Furthermore, to enable stable coordination, M-ASK employs turn-level rewards to provide granular supervision for both search decisions and knowledge updates. Experiments on multi-hop QA benchmarks demonstrate that M-ASK outperforms strong baselines, achieving not only superior answer accuracy but also significantly more stable training dynamics.\footnote{The source code for M-ASK is available at https://github.com/chenyiqun/M-ASK.}
ResMAS Resilience Optimization in LLM-based Multi-agent Systems
Authors: Zhilun Zhou, Zihan Liu, Jiahe Liu, Qingyu Shao, Yihan Wang, Kun Shao, Depeng Jin, Fengli Xu
2026-01-08
Large Language Model-based Multi-Agent Systems (-based MAS), where multiple
agents collaborate to solve complex tasks, have shown impressive performance in many areas. However, MAS are typically distributed across different devices or environments, making them vulnerable to perturbations such as agent failures. While existing works have studied the adversarial attacks and corresponding defense strategies, they mainly focus on reactively detecting and mitigating attacks after they occur rather than proactively designing inherently resilient systems. In this work, we study the resilience of
-based MAS under perturbations and find that both the
topology and prompt design significantly influence system resilience. Motivated by these findings, we propose ResMAS: a two-stage framework for enhancing MAS resilience. First, we train a reward model to predict the MAS's resilience, based on which we train a topology generator to automatically design resilient topology for specific tasks through reinforcement learning. Second, we introduce a topology-aware prompt optimization method that refines each agent's prompt based on its connections and interactions with other agents. Extensive experiments across a range of tasks show that our approach substantially improves MAS resilience under various constraints. Moreover, our framework demonstrates strong generalization ability to new tasks and models, highlighting its potential for building resilient MASs.
LAMB LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence
Authors: Hyeongkeun Lee, Jongmin Choi, KiHyun Nam, Joon Son Chung
2026-01-08
Automated Audio Captioning aims to describe the semantic content of input audio. Recent works have employed large language models (s) as a text
r to leverage their reasoning capabilities. However, prior approaches that project audio features into the
embedding space without considering cross-modal alignment fail to fully utilize these capabilities. To address this, we propose LAMB, an
-based audio captioning framework that bridges the modality gap between audio embeddings and the
text embedding space. LAMB incorporates a Cross-Modal Aligner that minimizes Cauchy-Schwarz divergence while maximizing mutual information, yielding tighter alignment between audio and text at both global and token levels. We further design a Two-Stream Adapter that extracts semantically enriched audio embeddings, thereby delivering richer information to the Cross-Modal Aligner. Finally, leveraging the aligned audio embeddings, a proposed Token Guide directly computes scores within the
text embedding space to steer the output logits of generated captions. Experimental results confirm that our framework strengthens the reasoning capabilities of the
r, achieving state-of-the-art performance on AudioCaps.
LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
Authors: Ryutaro Oshima, Yuya Hosoda, Youji Iiguni
2026-01-08
This paper proposes an automatic speech recognition (ASR) model for hate speech using large language models (s). The proposed method integrates the encoder of the ASR model with the
r of the
s, enabling simultaneous transcription and censorship tasks to prevent the exposure of harmful content. Instruction tuning of the
to mask hate-related words with specific tokens requires an annotated hate speech dataset, which is limited. We generate text samples using an
with the Chain-of-Thought (CoT) prompting technique guided by cultural context and examples and then convert them into speech samples using a text-to-speech (TTS) system. However, some of them contain non-hate speech samples with hate-related words, which degrades the censorship performance. This paper filters the samples which text classification models correctly label as hate content. By adjusting the threshold for the number of correct answer models, we can control the level of hate in the generated dataset, allowing us to train the
s through curriculum learning in a gradual manner. Experimental results show that the proposed method achieves a masking accuracy of 58.6\% for hate-related words, surpassing previous baselines. We also confirm that the curriculum training contributes to the efficiency of both transcription and censorship tasks.
Succeeding at Scale Automated Multi-Retriever Fusion and Query-Side Adaptation for Multi-Tenant Search
Authors: Prateek Jain, Shabari S Nair, Ritesh Goru, Prakhar Agarwal, Ajay Yadav, Yoga Sri Varshan Varadharajan, Constantine Caramanis
2026-01-08
Large-scale multi-tenant retrieval systems amass vast user query logs yet critically lack the curated relevance labels required for effective domain adaptation. This "dark data" problem is exacerbated by the operational cost of model updates: jointly fine-tuning query and document encoders requires re-indexing the entire corpus, which is prohibitive in multi-tenant environments with thousands of isolated indices. To address these dual challenges, we introduce \textbf{DevRev Search}, a passage retrieval benchmark for technical customer support constructed through a fully automatic pipeline. We employ a \textbf{fusion-based candidate generation} strategy, pooling results from diverse and dense retrievers, and utilize an
-as-a-Judge to perform rigorous \textbf{consistency filtering} and relevance assignment. We further propose a practical \textbf{Index-Pre
Adaptation} strategy: by fine-tuning only the query encoder via Low-Rank Adaptation (LoRA), we achieve competitive performance improvements while keeping the document index frozen. Our experiments on DevRev Search and SciFact demonstrate that targeting specific
layers in the query encoder yields optimal quality-efficiency trade-offs, offering a scalable path for personalized enterprise search.
All Changes May Have Invariant Principles Improving Ever-Shifting Harmful Meme Detection via Design Concept Reproduction
Authors: Ziyou Jiang, Mingyang Li, Junjie Wang, Yuekai Huang, Jie Huang, Zhiyuan Chang, Zhaoyang Li, Qing Wang
2026-01-08
Harmful memes are ever-shifting in the Internet communities, which are difficult to analyze due to their type-shifting and temporal-evolving nature. Although these memes are shifting, we find that different memes may share invariant principles, i.e., the underlying design concept of malicious users, which can help us analyze why these memes are harmful. In this paper, we propose RepMD, an ever-shifting harmful meme detection method based on the design concept reproduction. We first refer to the attack tree to define the Design Concept Graph (DCG), which describes steps that people may take to design a harmful meme. Then, we derive the DCG from historical memes with design step reproduction and graph . Finally, we use DCG to guide the Multimodal Large Language Model (M
) to detect harmful memes. The evaluation results show that RepMD achieves the highest accuracy with 81.1% and has slight accuracy decreases when generalized to type-shifting and temporal-evolving memes. Human evaluation shows that RepMD can improve the efficiency of human discovery on harmful memes, with 1530 seconds per meme.
Reasoning Over Space Enabling Geographic Reasoning for LLM-Based Generative Next POI Recommendation
Authors: Dongyi Lv, Qiuyu Ding, Heng-Da Xu, Zhaoxu Sun, Zhi Wang, Feng Xiong, Mu Xu
2026-01-08
Generative recommendation with large language models (s) reframes prediction as sequence generation, yet existing
-based recommenders remain limited in leveraging geographic signals that are crucial in mobility and local-services scenarios. Here, we present Reasoning Over Space (ROS), a framework that utilizes geography as a vital decision variable within the reasoning process. ROS introduces a Hierarchical Spatial Semantic ID (SID) that discretizes coarse-to-fine locality and POI semantics into compositional tokens, and endows
with a three-stage Mobility Chain-of-Thought (CoT) paradigm that models user personality, constructs an intent-aligned candidate space, and performs locality informed
. We further align the model with real world geography via spatial-guided Reinforcement Learning (RL). Experiments on three widely used location-based social network (LBSN) datasets show that ROS achieves over 10% relative gains in hit rate over strongest
-based baselines and improves cross-city transfer, despite using a smaller backbone model.
Discrete Fourier Transform-based Point Cloud Compression for Efficient SLAM in Featureless Terrain
Authors: Riku Suzuki, Ayumi Umemura, Shreya Santra, Kentaro Uno, Kazuya Yoshida
2026-01-08
Simultaneous Localization and Mapping (SLAM) is an essential technology for the efficiency and reliability of unmanned robotic exploration missions. While the onboard computational capability and bandwidth are critically limited, the point cloud data handled by SLAM is large in size, attracting attention to data
methods. To address such a problem, in this paper, we propose a new method for compressing point cloud maps by exploiting the Discrete Fourier Transform (DFT). The proposed technique converts the Digital Elevation Model (DEM) to the frequency-domain 2D image and omits its high-frequency components, focusing on the exploration of gradual terrains such as planets and deserts. Unlike terrains with detailed structures such as artificial environments, high-frequency components contribute little to the representation of gradual terrains. Thus, this method is effective in compressing data size without significant degradation of the point cloud. We evaluated the method in terms of
rate and accuracy using camera sequences of two terrains with different elevation profiles.
Self-MedRAG a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering
Authors: Jessica Ryan, Alexander I. Gumilang, Robert Wiliam, Derwin Suhartono
2026-01-08
Large Language Models (s) have demonstrated significant potential in medical Question Answering (QA), yet they remain prone to hallucinations and ungrounded reasoning, limiting their reliability in high-stakes clinical scenarios. While Retrieval-Augmented Generation (RAG) mitigates these issues by incorporating external knowledge, conventional single-shot retrieval often fails to resolve complex biomedical queries requiring multi-step inference. To address this, we propose Self-MedRAG, a self-reflective hybrid framework designed to mimic the iterative hypothesis-verification process of clinical reasoning. Self-MedRAG integrates a hybrid retrieval strategy, combining
(BM25) and dense (Contriever) retrievers via Reciprocal Rank Fusion (RRF) to maximize evidence coverage. It employs a generator to produce answers with supporting rationales, which are then assessed by a lightweight self-reflection module using Natural Language Inference (NLI) or
-based verification. If the rationale lacks sufficient evidentiary support, the system autonomously reformulates the query and iterates to refine the context. We evaluated Self-MedRAG on the MedQA and PubMedQA benchmarks. The results demonstrate that our hybrid retrieval approach significantly outperforms single-retriever baselines. Furthermore, the inclusion of the self-reflective loop yielded substantial gains, increasing accuracy on MedQA from 80.00% to 83.33% and on PubMedQA from 69.10% to 79.82%. These findings confirm that integrating hybrid retrieval with iterative, evidence-based self-reflection effectively reduces unsupported claims and enhances the clinical reliability of
-based systems.
LinguaGame A Linguistically Grounded Game-Theoretic Paradigm for Multi-Agent Dialogue Generation
Authors: Yuxiao Ye, Yiming Zhang, Yiran Ma, Huiyuan Xie, Huining Zhu, Zhiyuan Liu
2026-01-08
Large Language Models (s) have enabled Multi-Agent Systems (MASs) where agents interact through natural language to solve complex tasks or simulate multi-party dialogues. Recent work on
-based MASs has mainly focused on architecture design, such as role assignment and workflow orchestration. In contrast, this paper targets the interaction process itself, aiming to improve agents'
efficiency by helping them convey their intended meaning more effectively through language. To this end, we propose LinguaGame, a linguistically-grounded game-theoretic paradigm for multi-agent dialogue generation. Our approach models dialogue as a signalling game over communicative intents and strategies, solved with a training-free equilibrium approximation algorithm for inference-time decision adjustment. Unlike prior game-theoretic MASs, whose game designs are often tightly coupled with task-specific objectives, our framework relies on linguistically informed reasoning with minimal task-specific coupling. Specifically, it treats dialogue as intentional and strategic
, requiring agents to infer what others aim to achieve (intents) and how they pursue those goals (strategies). We evaluate our framework in simulated courtroom proceedings and debates, with human expert assessments showing significant gains in
efficiency.
Invisible Walls Privacy-Preserving ISAC Empowered by Reconfigurable Intelligent Surfaces
Authors: Yinghui He, Long Fan, Lei Xie, Dusit Niyato, Chau Yuen, Jun Luo
2026-01-08
The environmental and target-related information inherently carried in wireless signals, such as channel state information (CSI), has brought increasing attention to integrated sensing and (ISAC). However, it also raises pressing concerns about privacy leakage through eavesdropping. While existing efforts have attempted to mitigate this issue, they either fail to account for the needs of legitimate
and sensing users or rely on hardware with high complexity and cost. To overcome these limitations, we propose PrivISAC, a plug-and-play, low-cost solution that leverages RIS to protect user privacy while pre
ISAC performance. At the core of PrivISAC is a novel strategy in which each RIS row is assigned two distinct beamforming vectors, from which we deliberately construct a limited set of RIS configurations. During operation, exactly one configuration is randomly activated at each time slot to introduce additional perturbations, effectively masking sensitive sensing information from unauthorized eavesdroppers. To jointly ensure privacy protection and
performance, we design the two vectors such that their responses remain nearly identical in the
direction, thereby pre
stable, high-throughput transmission, while exhibiting pronounced differences in the sensing direction, which introduces sufficient perturbations to thwart eavesdroppers. Additionally, to enable legitimate sensing under such randomized configurations, we introduce a time-domain masking and demasking method that allows the authorized receiver to associate each CSI sample with its underlying configuration and eliminate configuration-induced discrepancies, thereby recovering valid CSI. We implement PrivISAC on commodity wireless devices and experiment results show that PrivISAC provides strong privacy protection while pre
high-quality legitimate ISAC.
Convergence Rates for Learning Pseudo-Differential Operators
Authors: Jiaheng Chen, Daniel Sanz-Alonso
2026-01-08
This paper establishes convergence rates for learning elliptic pseudo-differential operators, a fundamental operator class in partial differential equations and mathematical physics. In a wavelet-Galerkin framework, we formulate learning over this class as a structured infinite-dimensional regression problem with multiscale . Building on this structure, we propose a
, data- and computation-efficient estimator, which leverages a novel matrix
scheme tailored to the learning task and a nested-support strategy to balance approximation and estimation errors. In addition to obtaining convergence rates for the estimator, we show that the learned operator induces an efficient and stable Galerkin solver whose numerical error matches its statistical accuracy. Our results therefore contribute to bringing together operator learning, data-driven solvers, and wavelet methods in scientific computing.
Re-Rankers as Relevance Judges
Authors: Chuan Meng, Jiqun Liu, Mohammad Aliannejadi, Fengran Mo, Jeff Dalton, Maarten de Rijke
2026-01-08
Using large language models (s) to predict relevance judgments has shown promising results. Most studies treat this task as a distinct research line, e.g., focusing on prompt design for predicting relevance labels given a query and passage. However, predicting relevance judgments is essentially a form of relevance prediction, a problem extensively studied in tasks such as re-ranking. Despite this potential
, little research has explored reusing or adapting established re-ranking methods to predict relevance judgments, leading to potential resource waste and redundant development. To bridge this gap, we reproduce re-rankers in a re-ranker-as-relevance-judge setup. We design two adaptation strategies: (i) using binary tokens (e.g., "true" and "false") generated by a re-ranker as direct judgments, and (ii) converting continuous re-ranking scores into binary labels via thresholding. We perform extensive experiments on TREC-DL 2019 to 2023 with 8 re-rankers from 3 families, ranging from 220M to 32B, and analyse the evaluation bias exhibited by re-ranker-based judges. Results show that re-ranker-based relevance judges, under both strategies, can outperform UMBRELA, a state-of-the-art
-based relevance judge, in around 40% to 50% of the cases; they also exhibit strong self-preference towards their own and same-family re-rankers, as well as cross-family bias.
SpectraFormer an Attention-Based Raman Unmixing Tool for Accessing the Graphene Buffer-Layer Signature on SiC
Authors: Dmitriy Poteryayev, Pietro Novelli, Annalisa Coriolano, Riccardo Dettori, Valentina Tozzini, Fabio Beltram, Massimiliano Pontil, Antonio Rossi, Stiven Forti, Camilla Coletti
2026-01-07
Raman spectroscopy is a key tool for graphene characterization, yet its application to graphene grown on silicon carbide (SiC) is strongly limited by the intense and variable second-order Raman response of the substrate. This limitation is critical for buffer layer graphene, a semiconducting interfacial phase, whose vibrational signatures are ped with the SiC background and challenging to be reliably accessed using conventional reference-based subtraction, due to strong spatial and experimental variability of the substrate signal. Here we present SpectraFormer, a
-based deep learning model that reconstructs the SiC Raman substrate contribution directly from post-growth partially masked spectroscopic data without relying on explicit reference measurements. By learning global correlations across the entire Raman shift range, the model captures the statistical structure of the SiC background and enables accurate reconstruction of its contribution in mixed spectra. Subtraction of the reconstructed substrate signal reveals weak vibrational features associated with ZLG that are inaccessible through conventional analysis methods. The extracted spectra are validated by ab initio vibrational calculations, allowing assignment of the resolved features to specific modes and confirming their physical consistency. By leveraging a state-of-the-art attention-based deep learning architecture, this approach establishes a robust, reference-free framework for Raman analysis of graphene on SiC and provides a foundation, compatible with real-time data acquisition, to its integration into automated, closed-loop AI-assisted growth optimization.
Rate or Fate? RLVR Reinforcement Learning with Verifiable Noisy Rewards
Authors: Ali Rad, Khashayar Filom, Darioush Keivan, Peyman Mohajerin Esfahani, Ehsan Kamalinejad
2026-01-07
Reinforcement learning with verifiable rewards (RLVR) is a simple but powerful paradigm for training s: sample a completion, verify it, and update. In practice, however, the verifier is almost never clean--unit tests probe only limited corner cases; human and synthetic labels are imperfect; and
judges (e.g., RLAIF) are noisy and can be exploited--and this problem worsens on harder domains (especially coding) where tests are
and increasingly model-generated. We ask a pragmatic question: does the verification noise merely slow down the learning (rate), or can it flip the outcome (fate)?
To address this, we develop an analytically tractable multi-armed bandit view of RLVR dynamics, instantiated with GRPO and validated in controlled experiments. Modeling false positives and false negatives and grouping completions into recurring reasoning modes yields a replicator-style (natural-selection) flow on the probability simplex. The dynamics decouples into within-correct-mode competition and a one-dimensional evolution for the mass on incorrect modes, whose drift is determined solely by Youden's index J=TPR-FPR. This yields a sharp phase transition: when J>0, the incorrect mass is driven toward extinction (learning); when J=0, the process is neutral; and when J<0, incorrect modes amplify until they dominate (anti-learning and collapse). In the learning regime J>0, noise primarily rescales convergence time ("rate, not fate"). Experiments on verifiable programming tasks under synthetic noise reproduce the predicted J=0 boundary. Beyond noise, the framework offers a general lens for analyzing RLVR stability, convergence, and algorithmic interventions.
LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations
Authors: Priyaranjan Pattnayak, Sanchari Chowdhuri, Amit Agarwal, Hitesh Laxmichand Patel
2026-01-07
Clustering customer chat data is vital for cloud providers handling multi service queries. Traditional methods struggle with ping concerns and create broad, static clusters that degrade over time. Reclustering disrupts continuity, making issue tracking difficult. We propose an adaptive system that segments multi turn chats into service specific concerns and incrementally refines clusters as new issues arise. Cluster quality is tracked via DaviesBouldin Index and Silhouette Scores, with
based splitting applied only to degraded clusters. Our method improves Silhouette Scores by over 100\% and reduces DBI by 65.6\% compared to baselines, enabling scalable, real time analytics without full reclustering.
Phasor Agents Oscillatory Graphs with Three-Factor Plasticity and Sleep-Staged Learning
Authors: Rodja Trappe
2026-01-07
Phasor Agents are dynamical systems whose internal state is a Phasor Graph: a weighted graph of coupled Stuart-Landau oscillators. A Stuart-Landau oscillator is a minimal stable "rhythm generator" (the normal form near a Hopf bifurcation); each oscillator is treated as an abstract computational unit (inspired by, but not claiming to model, biological oscillatory populations). In this interpretation, oscillator phase tracks relative timing (coherence), while amplitude tracks local gain or activity. Relative phase structure serves as a representational medium; coupling weights are learned via three-factor local plasticity - eligibility traces gated by global modulators and oscillation-timed write windows - without backpropagation.
A central challenge in oscillatory substrates is stability: online weight updates can drive the network into unwanted regimes (e.g., global synchrony), collapsing representational diversity. We therefore separate wake tagging from offline consolidation, inspired by synaptic tagging-and-capture and sleep-stage dynamics: deep-sleep-like gated capture commits tagged changes safely, while REM-like replay reconstructs and perturbs experience for planning.
A staged experiment suite validates each mechanism with ablations and falsifiers: eligibility traces preserve credit under delayed modulation;
-progress signals pass timestamp-shuffle controls; phase-coherent retrieval reaches 4x diffusive baselines under noise; wake/sleep separation expands stable learning by 67 percent under matched weight-norm budgets; REM replay improves maze success rate by +45.5 percentage points; and a Tolman-style latent-learning signature - immediate competence and detour advantage after unrewarded exploration, consistent with an internal model - emerges from replay (Tolman, 1948).
The codebase and all artifacts are open-source.
PackCache A Training-Free Acceleration Method for Unified Autoregressive Video Generation via Compact KV-Cache
Authors: Kunyang Li, Mubarak Shah, Yuzhang Shang
2026-01-07
A unified autoregressive model is a Transformer-based framework that addresses diverse multimodal tasks (e.g., text, image, video) as a single sequence modeling problem under a shared token space. Such models rely on the -
mechanism to reduce attention computation from O(T^2) to O(T); however,
-
size grows linearly with the number of generated tokens, and it rapidly becomes the dominant bottleneck limiting inference efficiency and generative length. Unified autoregressive video generation inherits this limitation. Our analysis reveals that
-
tokens exhibit distinct spatiotemporal properties: (i) text and conditioning-image tokens act as persistent semantic anchors that consistently receive high attention, and (ii) attention to previous frames naturally decays with temporal distance. Leveraging these observations, we introduce PackCache, a training-free
-
management method that dynamically compacts the
through three coordinated mechanisms: condition anchoring that preserves semantic references, cross-frame decay modeling that allocates
budget according to temporal distance, and spatially pre
position embedding that maintains coherent 3D structure under
removal. In terms of efficiency, PackCache accelerates end-to-end generation by 1.7-2.2x on 48-frame long sequences, showcasing its strong potential for enabling longer-sequence video generation. Notably, the final four frames - the portion most impacted by the progressively expanding
-
and thus the most expensive segment of the clip - PackCache delivers a 2.6x and 3.7x
on A40 and H200, respectively, for 48-frame videos.
RIGOURATE Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation
Authors: Joseph James, Chenghao Xiao, Yucheng Li, Nafise Sadat Moosavi, Chenghua Lin
2026-01-07
Scientific rigour tends to be sidelined in favour of bold statements, leading authors to overstate claims beyond what their results support. We present RIGOURATE, a two-stage multimodal framework that retrieves supporting evidence from a paper's body and assigns each claim an overstatement score. The framework consists of a dataset of over 10K claim-evidence sets from ICLR and NeurIPS papers, annotated using eight s, with overstatement scores calibrated using peer-review comments and validated through human evaluation. It employes a fine-tuned reranker for evidence retrieval and a fine-tuned model to predict overstatement scores with justification. Compared to strong baselines, RIGOURATE enables improved evidence retrieval and overstatement detection. Overall, our work operationalises evidential proportionality and supports clearer, more transparent scientific
.
SCAR-GS Spatial Context Attention for Residuals in Progressive Gaussian Splatting
Authors: Diego Revilla, Pooja Suresh, Anand Bhojan, Ooi Wei Tsang
2026-01-07
Recent advances in 3D Gaussian Splatting have allowed for real-time, high-fidelity novel view synthesis. Nonetheless, these models have significant storage requirements for large and medium-sized scenes, hindering their deployment over cloud and streaming services. Some of the most recent progressive techniques for these models rely on progressive masking and scalar
techniques to reduce the bitrate of Gaussian attributes using spatial context models. While effective, scalar
may not optimally capture the correlations of high-dimensional feature vectors, which can potentially limit the rate-distortion performance.
In this work, we introduce a novel progressive codec for 3D Gaussian Splatting that replaces traditional methods with a more powerful Residual Vector Quantization approach to compress the primitive features. Our key contribution is an auto-regressive entropy model, guided by a multi-resolution hash grid, that accurately predicts the conditional probability of each successive transmitted index, allowing for coarse and refinement layers to be compressed with high efficiency.
Stable Language Guidance for Vision-Language-Action Models
Authors: Zhihao Zhan, Yuhao Chen, Jiaying Zhou, Qinhan Lv, Hao Liu, Keze Wang, Liang Lin, Guangrun Wang
2026-01-07
Vision-Language-Action (VLA) models have demonstrated impressive capabilities in generalized robotic control; however, they remain notoriously brittle to linguistic perturbations. We identify a critical ``modality collapse'' phenomenon where strong visual priors overwhelm linguistic signals, causing agents to overfit to specific instruction phrasings while ignoring the underlying semantic intent. To address this, we propose \textbf{Residual Semantic Steering (RSS)}, a probabilistic framework that disentangles physical affordance from semantic execution. RSS introduces two theoretical innovations: (1) \textbf{Monte Carlo Syntactic Integration}, which approximates the true semantic posterior via dense,
-driven distributional expansion, and (2) \textbf{Residual Affordance Steering}, a dual-stream
mechanism that explicitly isolates the causal influence of language by subtracting the visual affordance prior. Theoretical analysis suggests that RSS effectively maximizes the mutual information between action and intent while suppressing visual distractors. Empirical results across diverse manipulation benchmarks demonstrate that RSS achieves state-of-the-art robustness, maintaining performance even under adversarial linguistic perturbations.
ResTok Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation
Authors: Xu Zhang, Cheng Da, Huan Yang, Kun Gai, Ming Lu, Zhan Ma
2026-01-07
Existing 1D visual tokenizers for autoregressive (AR) generation largely follow the design principles of language modeling, as they are built directly upon s whose priors originate in language, yielding single-hierarchy latent tokens and treating visual data as flat sequential token streams. However, this language-like formulation overlooks key properties of vision, particularly the hierarchical and residual network designs that have long been essential for convergence and efficiency in visual models. To bring "vision" back to vision, we propose the Residual Tokenizer (ResTok), a 1D visual tokenizer that builds hierarchical residuals for both image tokens and latent tokens. The hierarchical representations obtained through progressively merging enable cross-level feature fusion at each layer, substantially enhancing representational capacity. Meanwhile, the semantic residuals between hierarchies prevent information
, yielding more concentrated latent distributions that are easier for AR modeling. Cross-level bindings consequently emerge without any explicit constraints. To accelerate the generation process, we further introduce a hierarchical AR generator that substantially reduces sampling steps by predicting an entire level of latent tokens at once rather than generating them strictly token-by-token. Extensive experiments demonstrate that restoring hierarchical residual priors in visual tokenization significantly improves AR image generation, achieving a gFID of 2.34 on ImageNet-256 with only 9 sampling steps. Code is available at https://github.com/Kwai-Kolors/ResTok.
FocusUI Efficient UI Grounding via Position-Preserving Visual Token Selection
Authors: Mingyu Ouyang, Kevin Qinghong Lin, Mike Zheng Shou, Hwee Tou Ng
2026-01-07
Vision-Language Models (VLMs) have shown remarkable performance in User Interface (UI) grounding tasks, driven by their ability to process increasingly high-resolution screenshots. However, screenshots are tokenized into thousands of visual tokens (e.g., about 4700 for 2K resolution), incurring significant computational overhead and diluting attention. In contrast, humans typically focus on regions of interest when interacting with UI. In this work, we pioneer the task of efficient UI grounding. Guided by practical analysis of the task's characteristics and challenges, we propose FocusUI, an efficient UI grounding framework that selects patches most relevant to the instruction while pre positional continuity for precise grounding. FocusUI addresses two key challenges: (1) Eliminating redundant tokens in visual encoding. We construct patch-level supervision by fusing an instruction-conditioned score with a rule-based UI-graph score that down-weights large homogeneous regions to select distinct and instruction-relevant visual tokens. (2) Pre
positional continuity during visual token selection. We find that general visual token
methods suffer from severe accuracy degradation on UI grounding tasks due to broken positional information. We introduce a novel PosPad strategy, which compresses each contiguous sequence of dropped visual tokens into a single special marker placed at the sequence's last index to preserve positional continuity. Comprehensive experiments on four grounding benchmarks demonstrate that FocusUI surpasses GUI-specific baselines. On the ScreenSpot-Pro benchmark, FocusUI-7B achieves a performance improvement of 3.7% over GUI-Actor-7B. Even with only 30% visual token retention, FocusUI-7B drops by only 3.2% while achieving up to 1.44x faster inference and 17% lower peak GPU memory.
Decide Then Retrieve A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval
Authors: Wang Chen, Guanqiang Qi, Weikang Li, Yang Li, Deguo Xia, Jizhou Huang
2026-01-07
Retrieval-augmented generation (RAG) enhances large language models (s) by incorporating external knowledge, but existing approaches indiscriminately trigger retrieval and rely on single-path evidence construction, often introducing noise and limiting performance gains. In this work, we propose Decide Then Retrieve (DTR), a training-free framework that adaptively determines when retrieval is necessary and how external information should be selected. DTR leverages generation uncertainty to guide retrieval triggering and introduces a dual-path retrieval mechanism with adaptive information selection to better handle
and ambiguous queries. Extensive experiments across five open-domain QA benchmarks, multiple model scales, and different retrievers demonstrate that DTR consistently improves EM and F1 over standard RAG and strong retrieval-enhanced baselines, while reducing unnecessary retrievals. The code and data used in this paper are available at https://github.com/ChenWangHKU/DTR.
A Privacy-Preserving Localization Scheme with Node Selection in Mobile Networks
Authors: Liangbo Xie, Mude Cai, Xiaolong Yang, Mu Zhou, Jiacheng Wang, Dusit Niyato
2026-01-07
Localization in mobile networks has been widely applied in many scenarios. However, an entity responsible for location estimation exposes both the target and anchors to potential location leakage at any time, creating serious security risks. Although existing studies have proposed privacy-pre localization algorithms, they still face challenges of insufficient positioning accuracy and excessive
overhead. In this article, we propose a privacy-pre
localization scheme, named PPLZN. PPLZN protects protects the location privacy of both the target and anchor nodes in crowdsourced localization. Simulation results validate the effectiveness of PPLZN. Evidently, it can achieve accurate position estimation without location leakage and outperform state-of-the-art approaches in both positioning accuracy and
overhead. In addition, PPLZN significantly reduces computational and
overhead in large-scale deployments, making it well-fitted for practical privacy-pre
localization in resource-constrained networks.
Feature-Aware One-Shot Federated Learning via Hierarchical Token Sequences
Authors: Shudong Liu, Hanwen Zhang, Xiuling Wang, Yuesheng Zhu, Guibo Luo
2026-01-07
One-shot federated learning (OSFL) reduces the cost and privacy risks of iterative federated learning by constructing a global model with a single round of
. However, most existing methods struggle to achieve robust performance on real-world domains such as medical imaging, or are inefficient when handling non-IID (Independent and Identically Distributed) data. To address these limitations, we introduce FALCON, a framework that enhances the effectiveness of OSFL over non-IID image data. The core idea of FALCON is to leverage the feature-aware hierarchical token sequences generation and knowledge distillation into OSFL. First, each client leverages a pretrained visual encoder with hierarchical scale encoding to compress images into hierarchical token sequences, which capture multi-scale semantics. Second, a multi-scale autoregressive
generator is used to model the distribution of these token sequences and generate the synthetic sequences. Third, clients upload the synthetic sequences along with the local classifier trained on the real token sequences to the server. Finally, the server incorporates knowledge distillation into global training to reduce reliance on precise distribution modeling. Experiments on medical and natural image datasets validate the effectiveness of FALCON in diverse non-IID scenarios, outperforming the best OSFL baselines by 9.58% in average accuracy.
Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification
Authors: Anthony Lamelas
2026-01-07
Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size and computation cost make them difficult to access, deploy, and secure in many settings. This paper investigates whether small, r-only language models can provide an efficient alternative for the tasks of grammar correction and text simplification. The experiments in this paper focus on testing small language models out of the box, fine-tuned, and run sequentially on the JFLEG and ASSET datasets using established metrics. The results show that while SLMs may learn certain behaviors well, their performance remains below strong baselines and current
s. The results also show that SLMs struggle with retaining meaning and hallucinations. These findings suggest that despite their efficiency advantages, current SLMs are not yet competitive enough with modern
s for rewriting, and further advances in training are required for SLMs to close the performance gap between them and today's
s.
When Numbers Start Talking Implicit Numerical Coordination Among LLM-Based Agents
Authors: Alessio Buscemi, Daniele Proverbio, Alessandro Di Stefano, The Anh Han, German Castignani, Pietro Liò
2026-01-07
s-based agents increasingly operate in multi-agent environments where strategic interaction and coordination are required. While existing work has largely focused on individual agents or on interacting agents sharing explicit
, less is known about how interacting agents coordinate implicitly. In particular, agents may engage in covert
, relying on indirect or non-linguistic signals embedded in their actions rather than on explicit messages. This paper presents a game-theoretic study of covert
in
-driven multi-agent systems. We analyse interactions across four canonical game-theoretic settings under different
regimes, including explicit, restricted, and absent
. Considering heterogeneous agent personalities and both one-shot and repeated games, we characterise when covert signals emerge and how they shape coordination and strategic outcomes.
Where meaning lives Layer-wise accessibility of psycholinguistic features in encoder and decoder language models
Authors: Taisiia Tikhomirova, Dirk U. Wulff
2026-01-07
Understanding where language models encode psychologically meaningful aspects of meaning is essential for both theory and practice. We conduct a systematic layer-wise probing study of 58 psycholinguistic features across 10
models, spanning encoder-only and
r-only architectures, and compare three embedding extraction methods. We find that apparent localization of meaning is strongly method-dependent: contextualized embeddings yield higher feature-specific selectivity and different layer-wise profiles than isolated embeddings. Across models and methods, final-layer representations are rarely optimal for recovering psycholinguistic information with linear probes. Despite these differences, models exhibit a shared depth ordering of meaning dimensions, with lexical properties peaking earlier and experiential and affective dimensions peaking later. Together, these results show that where meaning "lives" in
models reflects an interaction between methodological choices and architectural constraints.
Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework
Authors: Xiaoyu Luo, Yiyi Chen, Qiongxiu Li, Johannes Bjerva
2026-01-07
Large Language Models (s) have been reported to "leak" Personally Identifiable Information (PII), with successful PII reconstruction often interpreted as evidence of memorization. We propose a principled revision of memorization evaluation for
s, arguing that PII leakage should be evaluated under low lexical cue conditions, where target PII cannot be reconstructed through prompt-induced generalization or pattern completion. We formalize Cue-Resistant Memorization (CRM) as a cue-controlled evaluation framework and a necessary condition for valid memorization evaluation, explicitly conditioning on prompt-target
cues. Using CRM, we conduct a large-scale multilingual re-evaluation of PII leakage across 32 languages and multiple memorization paradigms. Revisiting reconstruction-based settings, including verbatim prefix-suffix completion and associative reconstruction, we find that their apparent effectiveness is driven primarily by direct surface-form cues rather than by true memorization. When such cues are controlled for, reconstruction success diminishes substantially. We further examine cue-free generation and membership inference, both of which exhibit extremely low true positive rates. Overall, our results suggest that previously reported PII leakage is better explained by cue-driven behavior than by genuine memorization, highlighting the importance of cue-controlled evaluation for reliably quantifying privacy-relevant memorization in
s.
Improving Compactness and Reducing Ambiguity of CFIRE Rule-Based Explanations
Authors: Sebastian Müller, Tobias Schneider, Ruben Kemna, Vanessa Toborek
2026-01-07
Models trained on tabular data are widely used in sensitive domains, increasing the demand for explanation methods to meet transparency needs. CFIRE is a recent algorithm in this domain that constructs compact surrogate rule models from local explanations. While effective, CFIRE may assign rules associated with different classes to the same sample, introducing ambiguity. We investigate this ambiguity and propose a post-hoc strategy that removes rules with low contribution or conflicting coverage, yielding smaller and less ambiguous models while pre
fidelity. Experiments across multiple datasets confirm these improvements with minimal impact on predictive performance.
EDCO Dynamic Curriculum Orchestration for Domain-specific Large Language Model Fine-tuning
Authors: Jing-Cheng Pang, Liu Sun, Chang Zhou, Xian Tang, Haichuan Ma, Kun Jiang, Jianlong Wang, Kai Zhang, Sijie Wu, Haoran Cai, Chenwei Wu, Xubin Li, Xin Chen
2026-01-07
Domain-specific large language models (s), typically developed by fine-tuning a pre-trained general-purpose
on specialized datasets, represent a significant advancement in applied AI. A common strategy in
fine-tuning is curriculum learning, which pre-orders training samples based on metrics like difficulty to improve learning efficiency compared to a random sampling strategy. However, most existing methods for
fine-tuning rely on a static curriculum, designed prior to training, which lacks adaptability to the model's evolving needs during fine-tuning. To address this, we propose EDCO, a novel framework based on two key concepts: inference entropy and dynamic curriculum orchestration. Inspired by recent findings that maintaining high answer entropy benefits long-term reasoning gains, EDCO prioritizes samples with high inference entropy in a continuously adapted curriculum. EDCO integrates three core components: an efficient entropy estimator that uses prefix tokens to approximate full-sequence entropy, an entropy-based curriculum generator that selects data points with the highest inference entropy, and an
trainer that optimizes the model on the selected curriculum. Comprehensive experiments in
, medicine and law domains, EDCO outperforms traditional curriculum strategies for fine-tuning Qwen3-4B and Llama3.2-3B models under supervised and reinforcement learning settings. Furthermore, the proposed efficient entropy estimation reduces computational time by 83.5% while maintaining high accuracy.
Visual Merit or Linguistic Crutch? A Close Look at DeepSeek-OCR
Authors: Yunhao Liang, Ruixuan Ying, Bo Li, Hong Li, Kai Yan, Qingwen Li, Min Yang, Okamoto Satoshi, Zhe Cui, Shiwen Ni
2026-01-07
DeepSeek-OCR utilizes an optical 2D mapping approach to achieve high-ratio vision-text , claiming to
text tokens exceeding ten times the input visual tokens. While this suggests a promising solution for the
long-context bottleneck, we investigate a critical question: "Visual merit or linguistic crutch - which drives DeepSeek-OCR's performance?" By employing sentence-level and word-level semantic corruption, we isolate the model's intrinsic OCR capabilities from its language priors. Results demonstrate that without linguistic support, DeepSeek-OCR's performance plummets from approximately 90% to 20%. Comparative benchmarking against 13 baseline models reveals that traditional pipeline OCR methods exhibit significantly higher robustness to such semantic perturbations than end-to-end methods. Furthermore, we find that lower visual token counts correlate with increased reliance on priors, exacerbating hallucination risks. Context stress testing also reveals a total model collapse around 10,000 text tokens, suggesting that current optical
techniques may paradoxically aggravate the long-context bottleneck. This study empirically defines DeepSeek-OCR's capability boundaries and offers essential insights for future optimizations of the vision-text
paradigm. We release all data, results and scripts used in this study at https://github.com/dududuck00/DeepSeekOCR.
How Does the Thinking Step Influence Model Safety? An Entropy-based Safety Reminder for LRMs
Authors: Su-Hyeon Kim, Hyundong Jin, Yejin Lee, Yo-Sub Han
2026-01-07
Large Reasoning Models (LRMs) achieve remarkable success through explicit thinking steps, yet the thinking steps introduce a novel risk by potentially amplifying unsafe behaviors. Despite this vulnerability, conventional defense mechanisms remain ineffective as they overlook the unique reasoning dynamics of LRMs. In this work, we find that the emergence of safe-reminding phrases within thinking steps plays a pivotal role in ensuring LRM safety. Motivated by this finding, we propose SafeRemind, a -time defense method that dynamically injects safe-reminding phrases into thinking steps. By leveraging entropy triggers to intervene at decision-locking points, SafeRemind redirects potentially harmful trajectories toward safer outcomes without requiring any parameter updates. Extensive evaluations across five LRMs and six benchmarks demonstrate that SafeRemind substantially enhances safety, achieving improvements of up to 45.5%p while pre
core reasoning utility.
LLM-MC-Affect LLM-Based Monte Carlo Modeling of Affective Trajectories and Latent Ambiguity for Interpersonal Dynamic Insight
Authors: Yu-Zheng Lin, Bono Po-Jen Shih, John Paul Martin Encinas, Elizabeth Victoria Abraham Achom, Karan Himanshu Patel, Jesus Horacio Pacheco, Sicong Shao, Jyotikrishna Dass, Soheil Salehi, Pratik Satam
2026-01-07
Emotional coordination is a core property of human interaction that shapes how relational meaning is constructed in real time. While text-based affect inference has become increasingly feasible, prior approaches often treat sentiment as a deterministic point estimate for individual speakers, failing to capture the inherent subjectivity, latent ambiguity, and sequential coupling found in mutual exchanges. We introduce -MC-Affect, a probabilistic framework that characterizes emotion not as a static label, but as a continuous latent probability distribution defined over an affective space. By leveraging stochastic
and Monte Carlo estimation, the methodology approximates these distributions to derive high-fidelity sentiment trajectories that explicitly quantify both central affective tendencies and perceptual ambiguity. These trajectories enable a structured analysis of interpersonal coupling through sequential cross-correlation and slope-based indicators, identifying leading or lagging influences between interlocutors. To validate the interpretive capacity of this approach, we utilize teacher-student instructional dialogues as a representative case study, where our quantitative indicators successfully distill high-level interaction insights such as effective scaffolding. This work establishes a scalable and deployable pathway for understanding interpersonal dynamics, offering a generalizable solution that extends beyond education to broader social and behavioral research.
Architecting Agentic Communities using Design Patterns
Authors: Zoran Milosevic, Fethi Rabhi
2026-01-07
The rapid evolution of Large Language Models () and subsequent Agentic AI technologies requires systematic architectural guidance for building sophisticated, production-grade systems. This paper presents an approach for architecting such systems using design patterns derived from enterprise distributed systems standards, formal methods, and industry practice. We classify these patterns into three tiers:
Agents (task-specific automation), Agentic AI (adaptive goal-seekers), and Agentic Communities (organizational frameworks where AI agents and human participants coordinate through formal roles, protocols, and governance structures). We focus on Agentic Communities - coordination frameworks encompassing
Agents, Agentic AI entities, and humans - most relevant for enterprise and industrial applications. Drawing on established coordination principles from distributed systems, we ground these patterns in a formal framework that specifies collaboration agreements where AI agents and humans fill roles within governed ecosystems. This approach provides both practical guidance and formal verification capabilities, enabling expression of organizational, legal, and ethical rules through accountability mechanisms that ensure operational and verifiable governance of inter-agent
, negotiation, and intent modeling. We validate this framework through a clinical trial matching case study. Our goal is to provide actionable guidance to practitioners while maintaining the formal rigor essential for enterprise deployment in dynamic, multi-agent ecosystems.
Safety-Utility Conflicts Are Not Global Surgical Alignment via Head-Level Diagnosis
Authors: Wang Cai, Yilin Wen, Jinchang Hou, Du Su, Guoqiu Wang, Zhonghou Lv, Chenfu Bao, Yunfang Wu
2026-01-07
Safety alignment in Large Language Models (s) inherently presents a multi-objective optimization conflict, often accompanied by an unintended degradation of general capabilities. Existing mitigation strategies typically rely on global gradient geometry to resolve these conflicts, yet they overlook Modular Heterogeneity within Transformers, specifically that the functional sensitivity and degree of conflict vary substantially across different attention heads. Such global approaches impose uniform update rules across all parameters, often resulting in suboptimal trade-offs by indiscriminately updating utility sensitive heads that exhibit intense gradient conflicts. To address this limitation, we propose Conflict-Aware Sparse Tuning (CAST), a framework that integrates head-level diagnosis with
fine-tuning. CAST first constructs a pre-alignment conflict map by synthesizing Optimization Conflict and Functional Sensitivity, which then guides the selective update of parameters. Experiments reveal that alignment conflicts in
s are not uniformly distributed. We find that the drop in general capabilities mainly comes from updating a small group of ``high-conflict'' heads. By simply skipping these heads during training, we significantly reduce this loss without compromising safety, offering an interpretable and parameter-efficient approach to improving the safety-utility trade-off.
Inhibitory Attacks on Backdoor-based Fingerprinting for Large Language Models
Authors: Hang Fu, Wanli Peng, Yinghan Zhou, Jiaxuan Wu, Juan Wen, Yiming Xue
2026-01-07
The widespread adoption of Large Language Model () in commercial and research settings has intensified the need for robust intellectual property protection. Backdoor-based
fingerprinting has emerged as a promising solution for this challenge. In practical application, the low-cost multi-model collaborative technique,
ensemble, combines diverse
s to leverage their complementary strengths, garnering significant attention and practical adoption. Unfortunately, the vulnerability of existing
fingerprinting for the ensemble scenario is unexplored. In order to comprehensively assess the robustness of
fingerprinting, in this paper, we propose two novel fingerprinting attack methods: token filter attack (TFA) and sentence verification attack (SVA). The TFA gets the next token from a unified set of tokens created by the token filter mechanism at each
step. The SVA filters out fingerprint responses through a sentence verification mechanism based on perplexity and voting. Experimentally, the proposed methods effectively inhibit the fingerprint response while maintaining ensemble performance. Compared with state-of-the-art attack methods, the proposed method can achieve better performance. The findings necessitate enhanced robustness in
fingerprinting.
PhysicsFormer An Efficient and Fast Attention-Based Physics Informed Neural Network for Solving Incompressible Navier Stokes Equations
Authors: Biswanath Barman, Debdeep Chatterjee, Rajendra K. Ray
2026-01-07
Traditional experimental and numerical approaches for fluid dynamics problems often suffer from high computational cost, mesh sensitivity, and limited capability in capturing complex physical behaviors. Moreover, conventional physics-informed neural networks (PINNs) frequently struggle in chaotic and highly unsteady flow regimes. In this work, we propose \textit{PhysicsFormer}, a fast and efficient -based physics-informed framework that incorporates multi-head encoder-
r cross-attention. Unlike multilayer perceptron-based PINNs, PhysicsFormer operates on sequential representations constructed from spatio-temporal data, enabling effective learning of long-range temporal dependencies and improved propagation of initial condition information. A data-embedding strategy is employed to convert spatio-temporal points into pseudo-sequences, while a dynamics-weighted loss function replaces the standard PINNs formulation. Owing to its parallel learning structure, PhysicsFormer demonstrates superior computational efficiency compared to existing
-based approaches. The framework is validated on Burgers' equation and flow reconstruction governed by the Navier-Stokes equations, achieving mean squared errors on the order of . In addition, an inverse problem involving parameter identification in the two-dimensional incompressible Navier-Stokes equations is investigated. For clean data, PhysicsFormer achieves zero identification error for both and ; under Gaussian noise, the errors are for and for . These results demonstrate that PhysicsFormer provides a reliable and computationally efficient surrogate modeling framework for time-dependent fluid flow problems.
Jailbreaking LLMs & VLMs Mechanisms, Evaluation, and Unified Defense
Authors: Zejian Chen, Chaozhuo Li, Chao Li, Xi Zhang, Litian Zhang, Yiming He
2026-01-07
This paper provides a systematic survey of jailbreak attacks and defenses on Large Language Models (s) and Vision-Language Models (VLMs), emphasizing that jailbreak vulnerabilities stem from structural factors such as incomplete training data, linguistic ambiguity, and generative uncertainty. It further differentiates between hallucinations and jailbreaks in terms of intent and triggering mechanisms. We propose a three-dimensional survey framework: (1) Attack dimension-including template/encoding-based, in-context learning manipulation, reinforcement/adversarial learning,
-assisted and fine-tuned attacks, as well as prompt- and image-level perturbations and agent-based transfer in VLMs; (2) Defense dimension-encompassing prompt-level obfuscation, output evaluation, and model-level alignment or fine-tuning; and (3) Evaluation dimension-covering metrics such as Attack Success Rate (ASR), toxicity score, query/time cost, and multimodal Clean Accuracy and Attribute Success Rate. Compared with prior works, this survey spans the full spectrum from text-only to multimodal settings, consolidating shared mechanisms and proposing unified defense principles: variant-consistency and gradient-sensitivity detection at the perception layer, safety-aware
and output review at the generation layer, and adversarially augmented preference alignment at the parameter layer. Additionally, we summarize existing multimodal safety benchmarks and discuss future directions, including automated red teaming, cross-modal collaborative defense, and standardized evaluation.
DiffCoT Diffusion-styled Chain-of-Thought Reasoning in LLMs
Authors: Shidong Cao, Hongzhan Lin, Yuxuan Gu, Ziyang Luo, Jing Ma
2026-01-07
Chain-of-Thought (CoT) reasoning improves multi-step mathematical problem solving in large language models but remains vulnerable to exposure bias and error accumulation, as early mistakes propagate irreversibly through autoregressive . In this work, we propose DiffCoT, a diffusion-styled CoT framework that reformulates CoT reasoning as an iterative denoising process. DiffCoT integrates diffusion principles at the reasoning-step level via a sliding-window mechanism, enabling unified generation and retrospective correction of intermediate steps while pre
token-level autoregression. To maintain causal consistency, we further introduce a causal diffusion noise schedule that respects the temporal structure of reasoning chains. Extensive experiments on three multi-step CoT reasoning benchmarks across diverse model backbones demonstrate that DiffCoT consistently outperforms existing CoT preference optimization methods, yielding improved robustness and error-correction capability in CoT reasoning.
Layer-Order Inversion Rethinking Latent Multi-Hop Reasoning in Large Language Models
Authors: Xukai Liu, Ye Liu, Jipeng Zhang, Yanghai Zhang, Kai Zhang, Qi Liu
2026-01-07
Large language models (s) perform well on multi-hop reasoning, yet how they internally compose multiple facts remains unclear. Recent work proposes \emph{hop-aligned circuit hypothesis}, suggesting that bridge entities are computed sequentially across layers before later-hop answers. Through systematic analyses on real-world multi-hop queries, we show that this hop-aligned assumption does not generalize: later-hop answer entities can become decodable earlier than bridge entities, a phenomenon we call \emph{layer-order inversion}, which strengthens with total hops. To explain this behavior, we propose a \emph{probabilistic recall-and-extract} framework that models multi-hop reasoning as broad probabilistic recall in shallow MLP layers followed by selective extraction in deeper attention layers. This framework is empirically validated through systematic probing analyses, reinterpreting prior layer-wise
evidence, explaining chain-of-thought gains, and providing a mechanistic diagnosis of multi-hop failures despite correct single-hop knowledge. Code is available at https://github.com/laquabe/Layer-Order-Inversion.
IntroLM Introspective Language Models via Prefilling-Time Self-Evaluation
Authors: Hossein Hosseini Kasnavieh, Gholamreza Haffari, Chris Leckie, Adel N. Toosi
2026-01-07
A major challenge for the operation of large language models (s) is how to predict whether a specific
will produce sufficiently high-quality output for a given query. Existing approaches rely on external classifiers, most commonly BERT based models, which suffer from limited context windows, constrained representational capacity, and additional computational overhead. We propose IntroLM, a method that enables causal language models to predict their own output quality during the
ing phase without affecting generation using introspective tokens. By introducing token conditional LoRA that activates only for the introspective token, the model learns to predict the output quality for a given query while pre
the original backbone behavior and avoiding external evaluators. On question answering benchmarks, IntroLM applied to Qwen3 8B achieves a ROC AUC of 90 precent for success prediction, outperforming a DeBERTa classifier by 14 precent. When integrated into multi model routing systems, IntroLM achieves superior cost performance tradeoffs, reducing latency by up to 33 precent and large model usage by up to 50 precent at matched reliability.
Beyond Perplexity A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning
Authors: Soheil Zibakhsh Shabgahi, Pedram Aghazadeh, Farinaz Koushanfar
2026-01-07
Supervised Fine-Tuning (SFT) is a standard approach for injecting domain knowledge into Large Language Models (s). However, relying on validation perplexity to monitor training is often insufficient, as it confounds stylistic mimicry with genuine factual internalization. To address this, we introduce the Knowledge Retention (KR) Test , a lightweight, corpus-grounded evaluation framework designed to distinguish factual learning from linguistics. KR-Test utilizes automatically generated contrastive examples to measure likelihood preferences for correct versus incorrect continuations, requiring no instruction tuning or generative
. We validate the framework's integrity through a "blind vs. oracle" baseline analysis. Furthermore, we demonstrate the diagnostic capabilities of KR-Test by analyzing the training dynamics of Low-Rank Adaptation (LoRA). By exposing the fine-grained dissociation between linguistic convergence and knowledge retention, KR-Test enhances the interpretability of fine-tuning dynamics.
From Bits to Chips An LLM-based Hardware-Aware Quantization Agent for Streamlined Deployment of LLMs
Authors: Kaiyuan Deng, Hangyu Zheng, Minghai Qing, Kunxiong Zhu, Gen Li, Yang Xiao, Lan Emily Zhang, Linke Guo, Bo Hui, Yanzhi Wang, Geng Yuan, Gagan Agrawal, Wei Niu, Xiaolong Ma
2026-01-07
Deploying models, especially large language models (s), is becoming increasingly attractive to a broader user base, including those without specialized expertise. However, due to the resource constraints of certain hardware, maintaining high accuracy with larger model while meeting the hardware requirements remains a significant challenge. Model
technique helps mitigate memory and compute bottlenecks, yet the added complexities of tuning and deploying
d models further exacerbates these challenges, making the process unfriendly to most of the users. We introduce the Hardware-Aware Quantization Agent (HAQA), an automated framework that leverages
s to streamline the entire
and deployment process by enabling efficient hyperparameter tuning and hardware configuration, thereby simultaneously improving deployment quality and ease of use for a broad range of users. Our results demonstrate up to a 2.3x speedup in inference, along with increased throughput and improved accuracy compared to unoptimized models on Llama. Additionally, HAQA is designed to implement adaptive
strategies across diverse hardware platforms, as it automatically finds optimal settings even when they appear counterintuitive, thereby reducing extensive manual effort and demonstrating superior adaptability. Code will be released.