2025-12-12

Beyond the Black Box Identifiable Interpretation and Control in Generative Models via Causal Minimality
K-Track Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices
ESS An Offload-Centric Latent-Cache Management Architecture for DeepSeek-V3.2-Exp
Complete Structural Analysis of $q$ -Heisenberg Algebras Homology, Rigidity, Automorphisms, and Deformations
Causal Reasoning Favors Encoders On The Limits of Decoder-Only Models
Unlocking the Address Book Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders
Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment
Clustered Federated Learning with Hierarchical Knowledge Distillation
T-pro 2.0 An Efficient Russian Hybrid-Reasoning Model and Playground
Sliding Window Attention Adaptation
RoboNeuron A Modular Framework Linking Foundation Models and ROS for Embodied AI
The Best of the Two Worlds Harmonizing Semantic and Hash IDs for Sequential Recommendation
Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method
EchoingPixels Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Efficient-VLN A Training-Efficient Vision-Language Navigation Model
InfoCom Kilobyte-Scale Communication-Efficient Collaborative Perception with Information Bottleneck
MotionEdit Benchmarking and Learning Motion-Centric Image Editing
Long-LRM++ Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction
SemanticBBV A Semantic Signature for Cross-Program Knowledge Reuse in Microarchitecture Simulation
An Efficient Graph-Transformer Operator for Learning Physical Dynamics with Manifolds Embedding
Federated Domain Generalization with Latent Space Inversion
Does SWE-Bench-Verified Test Agent Ability or Model Memory?
Semantic-Aware Confidence Calibration for Automated Audio Captioning
PARAN Persona-Augmented Review ANswering system on Food Delivery Review Dataset
Interpretable Embeddings with Sparse Autoencoders A Data Analysis Toolkit
Parallel Decoder Transformer Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning
Adaptive Nonparametric Estimation via Kernel Transport on Group Orbits Oracle Inequalities and Minimax Rates
SCOPE Language Models as One-Time Teacher for Hierarchical Planning in Text Environments
FlipLLM Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning
Mitigating Social Bias in English and Urdu Language Models Using PRM-Guided Candidate Selection and Sequential Refinement
RIFT A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning
Kinematics of Distant Milky Way Halo RR Lyrae Stars out to 160 kpc
Circuits, Features, and Heuristics in Molecular Transformers
BAMBO Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization
Mixture of Lookup Key-Value Experts
Unconsciously Forget Mitigating Memorization; Without Knowing What is being Memorized
d-TreeRPO Towards More Reliable Policy Optimization for Diffusion Language Models
RIS-Assisted Coordinated Multi-Point ISAC for Low-Altitude Sensing Coverage
System Report for CCL25-Eval Task 10 Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection
High-throughput characterization of snap-through stability boundaries of bistable beams in a programmable rotating platform
Masked Registration and Autoencoding of CT Images for Predictive Tibia Reconstruction
Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks
An Efficient Interaction Human-AI Synergy System Bridging Visual Awareness and Large Language Model for Intensive Care Units
LiePrune Lie Group and Quantum Geometric Dual Representation for One-Shot Structured Pruning of Quantum Neural Networks
Cytoplasmic Strings Analysis in Human Embryo Time-Lapse Videos using Deep Learning Framework
CourtPressGER A German Court Decision to Press Release Summarization Dataset
FUSER Feed-Forward MUltiview 3D Registration Transformer and SE(3) $^N$ Diffusion Refinement
Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs
Infinitesimal containment and sparse factors of iid
GoodSpeed Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference
Efficient MoE Serving in the Memory-Bound Regime Balance Activated Experts, Not Tokens
Training-free Context-adaptive Attention for Efficient Long Context Modeling
Efficient Feature Compression for Machines with Global Statistics Preservation
Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers
Understanding the Failure Modes of Transformers through the Lens of Graph Neural Networks
Freely controllable single-optical-frequency comb for highly sensitive cavity ring-down spectroscopy
When Quantum Federated Learning Meets Blockchain in 6G Networks

Beyond the Black Box Identifiable Interpretation and Control in Generative Models via Causal Minimality

Authors: Lingjing Kong, Shaoan Xie, Guangyi Chen, Yuewen Sun, Xiangchen Song, Eric P. Xing, Kun Zhang

2025-12-11

http://arxiv.org/abs/2512.10720v1

Deep generative models, while revolutionizing fields like image and text generation, largely operate as opaque black boxes, hindering human understanding, control, and alignment. While methods like autoencoders (SAEs) show remarkable empirical success, they often lack theoretical guarantees, risking subjective insights. Our primary objective is to establish a principled foundation for interpretable generative models. We demonstrate that the principle of causal minimality -- favoring the simplest causal explanation -- can endow the latent representations of diffusion vision and autoregressive language models with clear causal interpretation and robust, component-wise identifiable control. We introduce a novel theoretical framework for hierarchical selection models, where higher-level concepts emerge from the constrained composition of lower-level variables, better capturing the complex dependencies in data generation. Under theoretically derived minimality conditions (manifesting as or constraints), we show that learned representations can be equivalent to the true latent variables of the data-generating process. Empirically, applying these constraints to leading generative models allows us to extract their innate hierarchical concept graphs, offering fresh insights into their internal knowledge organization. Furthermore, these causally grounded concepts serve as levers for fine-grained model steering, paving the way for transparent, reliable systems.

K-Track Kalman-Enhanced Tracking for Accelerating Deep Point Trackers on Edge Devices

Authors: Bishoy Galoaa, Pau Closas, Sarah Ostadabbas

2025-12-11

http://arxiv.org/abs/2512.10628v1

Point tracking in video sequences is a foundational capability for real-world computer vision applications, including robotics, autonomous systems, augmented reality, and video analysis. While recent deep learning-based trackers achieve state-of-the-art accuracy on challenging benchmarks, their reliance on per-frame GPU inference poses a major barrier to deployment on resource-constrained edge devices, where compute, power, and connectivity are limited. We introduce K-Track (Kalman-enhanced Tracking), a general-purpose, tracker-agnostic framework designed to bridge this deployment gap. K-Track reduces inference cost by combining deep learning keyframe updates with lightweight Kalman filtering for intermediate frame prediction, using principled Bayesian uncertainty propagation to maintain temporal coherence. This hybrid strategy enables 5-10X speedup while retaining over 85% of the original trackers' accuracy. We evaluate K-Track across multiple state-of-the-art point trackers and demonstrate real-time performance on edge platforms such as the NVIDIA Jetson Nano and RTX Titan. By pre accuracy while dramatically lowering computational requirements, K-Track provides a practical path toward deploying high-quality point tracking in real-world, resource-limited settings, closing the gap between modern tracking algorithms and deployable vision systems.

ESS An Offload-Centric Latent-Cache Management Architecture for DeepSeek-V3.2-Exp

Authors: Xinhang Chen, Chao Zhang, Jiahuan He, Wei Liu, Jianming Zhang, Wenlong Zhou, Xiao Li, Pai Zeng, Shiyong Li, Yuanpan Qian, Dong Li, Zhaogeng Li

2025-12-11

http://arxiv.org/abs/2512.10576v1

DeepSeek-V3.2-Exp introduces a attention mechanism that significantly reduces inference latency in long-context scenarios. Although the overall throughput has improved greatly, the Decode-stage of PD remains to be a major bottleneck. This bottleneck primarily stems from the conflict between linear growth of Latent-Cache with sequence length and the limited GPU memory capacity, which constrains the feasible batch-size and thereby suppresses Decode-stage throughput. To address this challenge, we propose ESS (Extended Sparse Server), an offload-centric system design tailored for DeepSeek-V3.2-Exp. ESS selectively offloads Latent-Cache to CPU memory while pre latency-critical components on GPU. By freeing up GPU memory, ESS effectively decoupling batch-size scaling from GPU memory constraints. This design significantly improves Decode-stage throughput, thereby reducing deployment costs in real-world settings. Our high-fidelity simulations show that ESS delivers 69.4\% throughput improvement at 32K context length and up to 123\% throughput improvement at 128K, demonstrating its effectiveness for large-context inference workloads. These results highlight ESS as a practical and scalable solution for long-context .

Complete Structural Analysis of $q$ -Heisenberg Algebras Homology, Rigidity, Automorphisms, and Deformations

Authors: Mohammad H. M Rashid

2025-12-11

http://arxiv.org/abs/2512.10567v1

This paper establishes several fundamental structural properties of the $q$ -Heisenberg algebra $\mathfrak{h}_n(q)$ , a quantum deformation of the classical Heisenberg algebra. We first prove that when $q$ is not a root of unity, the global homological dimension of $\mathfrak{h}_n(q)$ is exactly $3n$ , while it becomes infinite when $q$ is a root of unity. We then demonstrate the rigidity of its iterated Ore extension structure, showing that any such presentation is essentially unique up to permutation and scaling of variables. The graded automorphism group is completely determined to be isomorphic to $(\mathbb{C}^*)^{2n} \rtimes S_n$ . Furthermore, $\mathfrak{h}_n(q)$ is shown to possess a universal deformation property as the canonical PBW-pre deformation of the classical Heisenberg algebra $\mathfrak{h}_n(1)$ . We compute its Hilbert series as $(1-t)^{-3n}$ , confirming polynomial growth of degree $3n$ , and establish that its Gelfand--Kirillov dimension coincides with its classical Krull dimension. These results are extended to a generalized multi-parameter version $\mathfrak{H}_n(\mathbf{Q})$ , and illustrated through detailed examples and applications in representation theory and deformation .

Causal Reasoning Favors Encoders On The Limits of Decoder-Only Models

Authors: Amartya Roy, Elamparithy M, Kripabandhu Ghosh, Ponnurangam Kumaraguru, Adrian de Wynter

2025-12-11

http://arxiv.org/abs/2512.10561v1

In context learning (ICL) underpins recent advances in large language models (s), although its role and performance in causal reasoning remains unclear. Causal reasoning demands multihop composition and strict conjunctive control, and reliance on spurious lexical relations of the input could provide misleading results. We hypothesize that, due to their ability to project the input into a latent space, encoder and encoder r architectures are better suited for said multihop conjunctive reasoning versus r only models. To do this, we compare fine-tuned versions of all the aforementioned architectures with zero and few shot ICL in both natural language and non natural language scenarios. We find that ICL alone is insufficient for reliable causal reasoning, often overfocusing on irrelevant input features. In particular, r only models are noticeably brittle to distributional shifts, while finetuned encoder and encoder r models can generalize more robustly across our tests, including the non natural language split. Both architectures are only matched or surpassed by r only architectures at large scales. We conclude by noting that for cost effective, short horizon robust causal reasoning, encoder or encoder r architectures with targeted finetuning are preferable.

Unlocking the Address Book Dissecting the Sparse Semantic Structure of LLM Key-Value Caches via Sparse Autoencoders

Authors: Qingsen Ma, Dianyun Wang, Jiaming Lyu, Yaoye Wang, Lechen Ning, Sujie Zhu, Zhenbo Xu, Liuyu Xiang, Huining Li, Huijia Wu, Zhaofeng He

2025-12-11

http://arxiv.org/abs/2512.10547v1

The Key-Value () is the primary memory bottleneck in long-context Large Language Models, yet it is typically treated as an opaque numerical tensor. In this work, we propose \textbf{STA-Attention}, a framework that utilizes Top-K Sparse Autoencoders (SAEs) to decompose the into interpretable semantic atoms.'' Unlike standard $L_1$-regularized SAEs, our Top-K approach eliminates shrinkage bias, pre![key](https://img.shields.io/badge/serving-FF8C00) the precise dot-product geometry required for attention. Our analysis uncovers a fundamental \textbf{Key-Value Asymmetry}: while Key vectors serve as highly ![key](https://img.shields.io/badge/sparse-F08080) routers dominated by aSemantic Elbow,'' deep Value vectors carry dense content payloads requiring a larger budget. Based on this structure, we introduce a Dual-Budget Strategy that selectively preserves the most informative semantic components while filtering representational noise. Experiments on Yi-6B, Mistral-7B, Qwen2.5-32B, and others show that our semantic reconstructions maintain perplexity and zero-shot performance comparable to the original models, effectively bridging the gap between mechanistic interpretability and faithful attention modeling.

Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment

Authors: Han Li, Shaohui Li, Wenrui Dai, Chenglin Li, Xinlong Pan, Haipeng Wang, Junni Zou, Hongkai Xiong

2025-12-11

http://arxiv.org/abs/2512.10450v1

Existing frameworks for learned video suffer from a dilemma between inaccurate temporal alignment and error propagation for motion estimation and compensation (ME/MC). The separate-transform framework employs distinct transforms for intra-frame and inter-frame to yield impressive rate-distortion (R-D) performance but causes evident error propagation, while the unified-transform framework eliminates error propagation via shared transforms but is inferior in ME/MC in shared latent domains. To address this limitation, in this paper, we propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE) to enable quality-consistent and error-propagation-free streaming for learned video . Specifically, we propose dualdomain progressive temporal alignment for ME/MC that leverages coarse pixel-domain alignment and refined latent-domain alignment to significantly enhance temporal context modeling in a coarse-to-fine fashion. The coarse pixel-domain alignment efficiently handles simple motion patterns with optical flow estimated from a single reference frame, while the refined latent-domain alignment develops a Flow-Guided Deformable Transformer (FGDT) over latents from multiple reference frames to achieve long-term motion refinement (LTMR) for complex motion patterns. Furthermore, we design a QCMoE module for continuous bit-rate adaptation that dynamically assigns different experts to adjust steps per pixel based on target quality and content rather than relies on a single step. QCMoE allows continuous and consistent rate control with appealing R-D performance. Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts, while successfully eliminating error propagation.

Clustered Federated Learning with Hierarchical Knowledge Distillation

Authors: Sabtain Ahmad, Meerzhan Kanatbekova, Ivona Brandic, Atakan Aral

2025-12-11

http://arxiv.org/abs/2512.10443v1

Clustered Federated Learning (CFL) has emerged as a powerful approach for addressing data heterogeneity and ensuring privacy in large distributed IoT environments. By clustering clients and training cluster-specific models, CFL enables personalized models tailored to groups of heterogeneous clients. However, conventional CFL approaches suffer from fragmented learning for training independent global models for each cluster and fail to take advantage of collective cluster insights. This paper advocates a shift to hierarchical CFL, allowing bi-level aggregation to train cluster-specific models at the edge and a unified global model at the cloud. This shift improves training efficiency yet might introduce challenges. To this end, we propose CFLHKD, a novel personalization scheme for integrating hierarchical cluster knowledge into CFL. Built upon multi-teacher knowledge distillation, CFLHKD enables inter-cluster knowledge sharing while pre cluster-specific personalization. CFLHKD adopts a bi-level aggregation to bridge the gap between local and global learning. Extensive evaluations of standard benchmark datasets demonstrate that CFLHKD outperforms representative baselines in cluster-specific and global model accuracy and achieves a performance improvement of 3.32-7.57\%.

T-pro 2.0 An Efficient Russian Hybrid-Reasoning Model and Playground

Authors: Dmitrii Stoianov, Danil Taranets, Olga Tsymboi, Ramil Latypov, Almaz Dautov, Vladislav Kruglikov, Nikita Surkov, German Abramov, Pavel Gein, Dmitry Abulkhanov, Mikhail Gashkov, Viktor Zelenkovskiy, Artem Batalov, Aleksandr Medvedev, Anatolii Potapov

2025-12-11

http://arxiv.org/abs/2512.10430v1

We introduce T-pro 2.0, an open-weight Russian for hybrid reasoning and efficient inference. The model supports direct answering and reasoning-trace generation, using a Cyrillic-dense tokenizer and an adapted EAGLE speculative- pipeline to reduce latency. To enable reproducible and extensible research, we release the model weights, the T-Wix 500k instruction corpus, the T-Math reasoning benchmark, and the EAGLE weights on Hugging Face. These resources allow users to study Russian-language reasoning and to extend or adapt both the model and the inference pipeline. A public web demo exposes reasoning and non-reasoning modes and illustrates the speedups achieved by our inference stack across domains. T-pro 2.0 thus serves as an accessible open system for building and evaluating efficient, practical Russian applications.

Sliding Window Attention Adaptation

Authors: Yijiong Yu, Jiale Liu, Qingyun Wu, Huazheng Wang, Ji Pei

2025-12-11

http://arxiv.org/abs/2512.10411v1

The self-attention mechanism in Transformer-based Large Language Models (s) scales quadratically with input length, making long-context inference expensive. Sliding window attention (SWA) reduces this cost to linear complexity, but naively enabling complete SWA at inference-time for models pretrained with full attention (FA) causes severe long-context performance degradation due to training-inference mismatch. This makes us wonder: Can FA-pretrained s be well adapted to SWA without pretraining? We investigate this by proposing Sliding Window Attention Adaptation (SWAA), a set of practical recipes that combine five methods for better adaptation: (1) applying SWA only during ing; (2) pre "sink" tokens; (3) interleaving FA/SWA layers; (4) chain-of-thought (CoT); and (5) fine-tuning. Our experiments show that SWA adaptation is feasible while non-trivial: no single method suffices, yet specific synergistic combinations effectively recover the original long-context performance. We further analyze the performance-efficiency trade-offs of different SWAA configurations and provide recommended recipes for diverse scenarios. Our code is available at https://github.com/yuyijiong/sliding-window-attention-adaptation

RoboNeuron A Modular Framework Linking Foundation Models and ROS for Embodied AI

Authors: Weifan Guan, Huasen Xi, Chenxiao Zhang, Aosheng Li, Qinghao Hu, Jian Cheng

2025-12-11

http://arxiv.org/abs/2512.10394v1

Current embodied AI systems face severe engineering impediments, primarily characterized by poor cross-scenario adaptability, rigid inter-module coupling, and fragmented inference . To overcome these limitations, we propose RoboNeuron, a universal deployment framework for embodied intelligence. RoboNeuron is the first framework to deeply integrate the cognitive capabilities of Large Language Models (s) and Vision-Language-Action (VLA) models with the real-time execution backbone of the Robot Operating System (ROS). We utilize the Model Context Protocol (MCP) as a semantic bridge, enabling the to dynamically orchestrate underlying robotic tools. The framework establishes a highly modular architecture that strictly decouples sensing, reasoning, and control by leveraging ROS's unified interfaces. Crucially, we introduce an automated tool to translate ROS messages into callable MCP functions, significantly streamlining development. RoboNeuron significantly enhances cross-scenario adaptability and component flexibility, while establishing a systematic platform for horizontal performance benchmarking, laying a robust foundation for scalable real-world embodied applications.

The Best of the Two Worlds Harmonizing Semantic and Hash IDs for Sequential Recommendation

Authors: Ziwei Liu, Yejing Wang, Qidong Liu, Zijian Zhang, Chong Chen, Wei Huang, Xiangyu Zhao

2025-12-11

http://arxiv.org/abs/2512.10388v1

Conventional Sequential Recommender Systems (SRS) typically assign unique Hash IDs (HID) to construct item embeddings. These HID embeddings effectively learn collaborative information from historical user-item interactions, making them vulnerable to situations where most items are rarely consumed (the long-tail problem). Recent methods that incorporate auxiliary information often suffer from noisy collaborative sharing caused by co-occurrence signals or semantic homogeneity caused by flat dense embeddings. Semantic IDs (SIDs), with their capability of code sharing and multi-granular semantic modeling, provide a promising alternative. However, the collaborative overwhelming phenomenon hinders the further development of SID-based methods. The mechanisms commonly compromise the uniqueness of identifiers required for modeling head items, creating a performance seesaw between head and tail items. To address this dilemma, we propose \textbf{\name}, a novel framework that harmonizes the SID and HID. Specifically, we devise a dual-branch modeling architecture that enables the model to capture both the multi-granular semantics within SID while pre the unique collaborative identity of HID. Furthermore, we introduce a dual-level alignment strategy that bridges the two representations, facilitating knowledge transfer and supporting robust preference modeling. Extensive experiments on three real-world datasets show that \name~ effectively balances recommendation quality for both head and tail items while surpassing the existing baselines. The implementation code can be found online\footnote{https://github.com/ziwliu8/H2Rec}.

Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method

Authors: Ge Zhang, Chunyang Wang, Bo Xiao, Xuelian Liu, Bin Liu

2025-12-11

http://arxiv.org/abs/2512.10386v1

High-quality point cloud data is a critical foundation for tasks such as autonomous driving and 3D reconstruction. However, LiDAR-based point cloud acquisition is often affected by various disturbances, resulting in a large number of noise points that degrade the accuracy of subsequent point cloud object detection and recognition. Moreover, existing point cloud denoising methods typically sacrifice computational efficiency in pursuit of higher denoising accuracy, or, conversely, improve processing speed at the expense of pre object boundaries and fine structural details, making it difficult to simultaneously achieve high denoising accuracy, strong edge preservation, and real-time performance. To address these limitations, this paper proposes an adaptive dual-weight gravitational-based point cloud denoising method. First, an octree is employed to perform spatial partitioning of the global point cloud, enabling parallel . Then, within each leaf node, adaptive voxel-based occupancy statistics and k-nearest neighbor (kNN) density estimation are applied to rapidly remove clearly isolated and low-density noise points, thereby reducing the effective candidate set. Finally, a gravitational scoring function that combines density weights with adaptive distance weights is constructed to finely distinguish noise points from object points. Experiments conducted on the Stanford 3D Scanning Repository, the Canadian Adverse Driving Conditions (CADC) dataset, and in-house FMCW LiDAR point clouds acquired in our laboratory demonstrate that, compared with existing methods, the proposed approach achieves consistent improvements in F1, PSNR, and Chamfer Distance (CD) across various noise conditions while reducing the single-frame processing time, thereby validating its high accuracy, robustness, and real-time performance in multi-noise scenarios.

Authors: Chao Gong, Depeng Wang, Zhipeng Wei, Ya Guo, Huijia Zhu, Jingjing Chen

2025-12-11

http://arxiv.org/abs/2512.10324v1

Audio-Visual Large Language Models (AV-s) face prohibitive computational overhead from massive audio and video tokens. Token reduction, while extensively explored for video-only s, is insufficient for the audio-visual domain, as these unimodal methods cannot leverage audio-visual cross-modal synergies. Furthermore, the distinct and dynamic information densities of audio and video render static budgets per modality suboptimal. How to perform token reduction on a joint audio-visual stream thus remains an unaddressed bottleneck. To fill this gap, we introduce EchoingPixels, a framework inspired by the coexistence and interaction of visuals and sound in real-world scenes. The core of our framework is the Cross-Modal Semantic Sieve (CS2), a module enabling early audio-visual interaction. Instead of compressing modalities independently, CS2 co-attends to the joint multimodal stream and reduces tokens from an entire combined pool of audio-visual tokens rather than using fixed budgets per modality. This single-pool approach allows it to adaptively allocate the token budget across both modalities and dynamically identify salient tokens in concert. To ensure this aggressive reduction preserves the vital temporal modeling capability, we co-design a Synchronization-Augmented RoPE (Sync-RoPE) to maintain critical temporal relationships for the ly selected tokens. Extensive experiments demonstrate that EchoingPixels achieves performance comparable to strong baselines using only 5-20% of the original tokens, with a 2-3x speedup and memory reduction.

Authors: Duo Zheng, Shijia Huang, Yanyang Li, Liwei Wang

2025-12-11

http://arxiv.org/abs/2512.10310v1

Multimodal large language models (Ms) have shown promising potential in Vision-Language Navigation (VLN). However, their practical development is severely hindered by the substantial training overhead. We recognize two key issues that contribute to the overhead: (1) the quadratic computational burden from processing long-horizon historical observations as massive sequences of tokens, and (2) the exploration-efficiency trade-off in DAgger, i.e., a data aggregation process of collecting agent-explored trajectories. While more exploration yields effective error-recovery trajectories for handling test-time distribution shifts, it comes at the cost of longer trajectory lengths for both training and inference. To address these challenges, we propose Efficient-VLN, a training-efficient VLN model. Specifically, to mitigate the token processing burden, we design two efficient memory mechanisms: a progressive memory that dynamically allocates more tokens to recent observations, and a learnable recursive memory that utilizes the key-value of learnable tokens as the memory state. Moreover, we introduce a dynamic mixed policy to balance the exploration-efficiency trade-off. Extensive experiments show that Efficient-VLN achieves state-of-the-art performance on R2R-CE (64.2% SR) and RxR-CE (67.0% SR). Critically, our model consumes merely 282 H800 GPU hours, demonstrating a dramatic reduction in training overhead compared to state-of-the-art methods.

InfoCom Kilobyte-Scale Communication-Efficient Collaborative Perception with Information Bottleneck

Authors: Quanmin Wei, Penglin Dai, Wei Li, Bingyi Liu, Xiao Wu

2025-12-11

http://arxiv.org/abs/2512.10305v1

Precise environmental perception is critical for the reliability of autonomous driving systems. While collaborative perception mitigates the limitations of single-agent perception through information sharing, it encounters a fundamental -performance trade-off. Existing -efficient approaches typically assume MB-level data transmission per collaboration, which may fail due to practical network constraints. To address these issues, we propose InfoCom, an information-aware framework establishing the pioneering theoretical foundation for -efficient collaborative perception via extended Information Bottleneck principles. Departing from mainstream feature manipulation, InfoCom introduces a novel information purification paradigm that theoretically optimizes the extraction of minimal sufficient task-critical information under Information Bottleneck constraints. Its core innovations include: i) An Information-Aware Encoding condensing features into minimal messages while pre perception-relevant information; ii) A Sparse Mask Generation identifying spatial cues with negligible cost; and iii) A Multi-Scale Decoding that progressively recovers perceptual information through mask-guided mechanisms rather than simple feature reconstruction. Comprehensive experiments across multiple datasets demonstrate that InfoCom achieves near-lossless perception while reducing overhead from megabyte to kilobyte-scale, representing 440-fold and 90-fold reductions per agent compared to Where2comm and ERMVP, respectively.

MotionEdit Benchmarking and Learning Motion-Centric Image Editing

Authors: Yixin Wan, Lei Ke, Wenhao Yu, Kai-Wei Chang, Dong Yu

2025-12-11

http://arxiv.org/abs/2512.10284v1

We introduce MotionEdit, a novel dataset for motion-centric image editing-the task of modifying subject actions and interactions while pre identity, structure, and physical plausibility. Unlike existing image editing datasets that focus on static appearance changes or contain only , low-quality motion edits, MotionEdit provides high-fidelity image pairs depicting realistic motion transformations extracted and verified from continuous videos. This new task is not only scientifically challenging but also practically significant, powering downstream applications such as frame-controlled video synthesis and animation. To evaluate model performance on the novel task, we introduce MotionEdit-Bench, a benchmark that challenges models on motion-centric edits and measures model performance with generative, discriminative, and preference-based metrics. Benchmark results reveal that motion editing remains highly challenging for existing state-of-the-art diffusion-based editing models. To address this gap, we propose MotionNFT (Motion-guided Negative-aware Fine Tuning), a post-training framework that computes motion alignment rewards based on how well the motion flow between input and model-edited images matches the ground-truth motion, guiding models toward accurate motion transformations. Extensive experiments on FLUX.1 Kontext and Qwen-Image-Edit show that MotionNFT consistently improves editing quality and motion fidelity of both base models on the motion editing task without sacrificing general editing ability, demonstrating its effectiveness.

Long-LRM++ Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction

Authors: Chen Ziwen, Hao Tan, Peng Wang, Zexiang Xu, Li Fuxin

2025-12-11

http://arxiv.org/abs/2512.10267v1

Recent advances in generalizable Gaussian splatting (GS) have enabled feed-forward reconstruction of scenes from tens of input views. Long-LRM notably scales this paradigm to 32 input images at $950\times540$ resolution, achieving 360° scene-level reconstruction in a single forward pass. However, directly predicting millions of Gaussian parameters at once remains highly error-sensitive: small inaccuracies in positions or other attributes lead to noticeable blurring, particularly in fine structures such as text. In parallel, implicit representation methods such as LVSM and LaCT have demonstrated significantly higher rendering fidelity by compressing scene information into model weights rather than explicit Gaussians, and RGB frames using the full or TTT backbone. However, this computationally intensive de process for every rendered frame makes real-time rendering infeasible. These observations raise key questions: Is the deep, sequential "de" process necessary? Can we retain the benefits of implicit representations while enabling real-time performance? We address these questions with Long-LRM++, a model that adopts a semi-explicit scene representation combined with a lightweight r. Long-LRM++ matches the rendering quality of LaCT on DL3DV while achieving real-time 14 FPS rendering on an A100 GPU, overcoming the speed limitations of prior implicit methods. Our design also scales to 64 input views at the $950\times540$ resolution, demonstrating strong generalization to increased input lengths. Additionally, Long-LRM++ delivers superior novel-view depth prediction on ScanNetv2 compared to direct depth rendering from Gaussians. Extensive ablation studies validate the effectiveness of each component in the proposed framework.

SemanticBBV A Semantic Signature for Cross-Program Knowledge Reuse in Microarchitecture Simulation

Authors: Zhenguo Liu, Chengao Shi, Chen Ding, Jiang Xu

2025-12-11

http://arxiv.org/abs/2512.10231v1

For decades, sampling-based techniques have been the de facto standard for accelerating microarchitecture simulation, with the Basic Block Vector (BBV) as the cornerstone program representation. Yet, the BBV's fundamental limitations: order-dependent IDs that prevent cross-program knowledge reuse and a lack of semantic content predictive of hardware performance have left a massive potential for optimization untapped. To address these gaps, we introduce SemanticBBV, a novel, two-stage framework that generates robust, performance-aware signatures for cross-program simulation reuse. First, a lightweight RW-based semantic encoder transforms assembly basic blocks into rich Basic Block Embeddings (BBEs), capturing deep functional semantics. Second, an order-invariant Set Transformer aggregates these BBEs, weighted by execution frequency, into a final signature. Crucially, this stage is co-trained with a dual objective: a triplet loss for signature distinctiveness and a Cycles Per Instruction (CPI) regression task, directly imbuing the signature with performance sensitivity. Our evaluation demonstrates that SemanticBBV not only matches traditional BBVs in single-program accuracy but also enables unprecedented cross-program analysis. By simulating just 14 universal program points, we estimated the performance of ten SPEC CPU benchmarks with 86.3% average accuracy, achieving a 7143x simulation speedup. Furthermore, the signature shows strong adaptability to new microarchitectures with minimal fine-tuning.

An Efficient Graph-Transformer Operator for Learning Physical Dynamics with Manifolds Embedding

Authors: Pengwei Liu, Xingyu Ren, Pengkai Wang, Hangjie Yuan, Zhongkai Hao, Guanyu Chen, Chao Xu, Dong Ni, Shengze Cai

2025-12-11

http://arxiv.org/abs/2512.10227v1

Accurate and efficient physical simulations are essential in science and engineering, yet traditional numerical solvers face significant challenges in computational cost when handling simulations across dynamic scenarios involving complex geometries, varying boundary/initial conditions, and diverse physical parameters. While deep learning offers promising alternatives, existing methods often struggle with flexibility and generalization, particularly on unstructured meshes, which significantly limits their practical applicability. To address these challenges, we propose PhysGTO, an efficient Graph-Transformer Operator for learning physical dynamics through explicit manifold embeddings in both physical and latent spaces. In the physical space, the proposed Unified Graph Embedding module aligns node-level conditions and constructs yet structure-pre graph connectivity to process heterogeneous inputs. In the latent space, PhysGTO integrates a lightweight flux-oriented message-passing scheme with projection-inspired attention to capture local and global dependencies, facilitating multilevel interactions among complex physical correlations. This design ensures linear complexity relative to the number of mesh points, reducing both the number of trainable parameters and computational costs in terms of floating-point operations (FLOPs), and thereby allowing efficient inference in real-time applications. We introduce a comprehensive benchmark spanning eleven datasets, covering problems with unstructured meshes, transient flow dynamics, and large-scale 3D geometries. PhysGTO consistently achieves state-of-the-art accuracy while significantly reducing computational costs, demonstrating superior flexibility, scalability, and generalization in a wide range of simulation tasks.

Federated Domain Generalization with Latent Space Inversion

Authors: Ragja Palakkadavath, Hung Le, Thanh Nguyen-Tang, Svetha Venkatesh, Sunil Gupta

2025-12-11

http://arxiv.org/abs/2512.10224v1

Federated domain generalization (FedDG) addresses distribution shifts among clients in a federated learning framework. FedDG methods aggregate the parameters of locally trained client models to form a global model that generalizes to unseen clients while pre data privacy. While improving the generalization capability of the global model, many existing approaches in FedDG jeopardize privacy by sharing statistics of client data between themselves. Our solution addresses this problem by contributing new ways to perform local client training and model aggregation. To improve local client training, we enforce (domain) invariance across local models with the help of a novel technique, \textbf{latent space inversion}, which enables better client privacy. When clients are not \emph{i.i.d}, aggregating their local models may discard certain local adaptations. To overcome this, we propose an \textbf{important weight} aggregation strategy to prioritize parameters that significantly influence predictions of local models during aggregation. Our extensive experiments show that our approach achieves superior results over state-of-the-art methods with less overhead.

Does SWE-Bench-Verified Test Agent Ability or Model Memory?

Authors: Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan

2025-12-11

http://arxiv.org/abs/2512.10218v1

SWE-Bench-Verified, a dataset comprising 500 issues, serves as a de facto benchmark for evaluating various large language models (s) on their ability to resolve GitHub issues. But this benchmark may with model training data. If that is true, scores may reflect training recall, not issue-solving skill. To study this, we test two Claude models that frequently appear in top-performing agents submitted to the benchmark. We ask them to find relevant files using only issue text, and then issue text plus file paths. We then run the same setup on BeetleBox and SWE-rebench. Despite both benchmarks involving popular open-source Python projects, models performed 3 times better on SWE-Bench-Verified. They were also 6 times better at finding edited files, without any additional context about the projects themselves. This gap suggests the models may have seen many SWE-Bench-Verified tasks during training. As a result, scores on this benchmark may not reflect an agent's ability to handle real software issues, yet it continues to be used in ways that can misrepresent progress and lead to choices that favour agents that use certain models over strong agent design. Our setup tests the localization step with minimal context to the extent that the task should be logically impossible to solve. Our results show the risk of relying on older popular benchmarks and support the shift toward newer datasets built with contamination in mind.

Semantic-Aware Confidence Calibration for Automated Audio Captioning

Authors: Lucas Dunker, Sai Akshay Menta, Snigdha Mohana Addepalli, Venkata Krishna Rayalu Garapati

2025-12-11

http://arxiv.org/abs/2512.10170v1

Automated audio captioning models frequently produce overconfident predictions regardless of semantic accuracy, limiting their reliability in deployment. This deficiency stems from two factors: evaluation metrics based on n-gram that fail to capture semantic correctness, and the absence of calibrated confidence estimation. We present a framework that addresses both limitations by integrating confidence prediction into audio captioning and redefining correctness through semantic similarity. Our approach augments a Whisper-based audio captioning model with a learned confidence prediction head that estimates uncertainty from r hidden states. We employ CLAP audio-text embeddings and sentence similarities (FENSE) to define semantic correctness, enabling Expected Calibration Error (ECE) computation that reflects true caption quality rather than surface-level text . Experiments on Clotho v2 demonstrate that confidence-guided beam search with semantic evaluation achieves dramatically improved calibration (CLAP-based ECE of 0.071) compared to greedy baselines (ECE of 0.488), while simultaneously improving caption quality across standard metrics. Our results establish that semantic similarity provides a more meaningful foundation for confidence calibration in audio captioning than traditional n-gram metrics.

PARAN Persona-Augmented Review ANswering system on Food Delivery Review Dataset

Authors: Moonsoo Park, Jeongseok Yun, Bohyung Kim

2025-12-10

http://arxiv.org/abs/2512.10148v1

Personalized review response generation presents a significant challenge in domains where user information is limited, such as food delivery platforms. While large language models (s) offer powerful text generation capabilities, they often produce generic responses when lacking contextual user data, reducing engagement and effectiveness. In this work, we propose a two-stage prompting framework that infers both explicit (e.g., user-stated preferences) and implicit (e.g., demographic or stylistic cues) personas directly from short review texts. These inferred persona attributes are then incorporated into the response generation prompt to produce user-tailored replies. To encourage diverse yet faithful generations, we adjust temperature during inference. We evaluate our method using a real-world dataset collected from a Korean food delivery app, and assess its impact on precision, diversity, and semantic consistency. Our findings highlight the effectiveness of persona-augmented prompting in enhancing the relevance and personalization of automated responses without requiring model fine-tuning.

Interpretable Embeddings with Sparse Autoencoders A Data Analysis Toolkit

Authors: Nick Jiang, Xiaoqing Sun, Lisa Dunlap, Lewis Smith, Neel Nanda

2025-12-10

http://arxiv.org/abs/2512.10092v1

Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often rely on costly -based techniques (e.g. annotating dataset differences) or dense embedding models (e.g. for clustering), which lack control over the properties of interest. We propose using autoencoders (SAEs) to create SAE embeddings: representations whose dimensions map to interpretable concepts. Through four data analysis tasks, we show that SAE embeddings are more cost-effective and reliable than s and more controllable than dense embeddings. Using the large hypothesis space of SAEs, we can uncover insights such as (1) semantic differences between datasets and (2) unexpected concept correlations in documents. For instance, by comparing model responses, we find that Grok-4 clarifies ambiguities more often than nine other frontier models. Relative to s, SAE embeddings uncover bigger differences at 2-8x lower cost and identify biases more reliably. Additionally, SAE embeddings are controllable: by filtering concepts, we can (3) cluster documents along axes of interest and (4) outperform dense embeddings on property-based retrieval. Using SAE embeddings, we study model behavior with two case studies: investigating how OpenAI model behavior has changed over time and finding "trigger" phrases learned by Tulu-3 (Lambert et al., 2024) from its training data. These results position SAEs as a versatile tool for unstructured data analysis and highlight the neglected importance of interpreting models through their data.

Parallel Decoder Transformer Model-Internal Parallel Decoding with Speculative Invariance via Note Conditioning

Authors: Logan Robbins

2025-12-10

http://arxiv.org/abs/2512.10054v1

Autoregressive in Large Language Models (s) is inherently sequential, creating a latency bottleneck that scales linearly with output length. While Decomposition-and-Fill'' methods like Skeleton-of-Thought attempt to parallelize generation via external orchestration, they suffer from \textit{coherence drift} due to the lack of cross-stream ![key](https://img.shields.io/badge/communication-F08080). In this work, we introduce the \textbf{Parallel Decoder Transformer (PDT)}, a parameter-efficient architecture that embeds coordination primitives directly into the inference process of a frozen pre-trained model. Instead of retraining the base model, PDT injects lightweight \textit{Speculative Note Conditioning (SNC)} adapters that allow parallel ![key](https://img.shields.io/badge/decoding-F08080) streams to synchronize via a shared, dynamic latent space. We formulate coordination as a \textit{speculative consensus} problem, where sibling streams broadcast semanticnotes'' to a global bus, gated by a learned verification head. We validate our approach on a 50,000-step curriculum using a frozen 20B-parameter backbone. Our results demonstrate that PDT achieves effective self-correction, reaching \textbf{77.8\% precision} in coverage prediction and recovering approximate serial semantics without modifying the trunk weights. This establishes PDT as a scalable, efficient alternative to full model fine-tuning for structured parallel generation.

Adaptive Nonparametric Estimation via Kernel Transport on Group Orbits Oracle Inequalities and Minimax Rates

Authors: Jocelyn Nembe

2025-12-10

http://arxiv.org/abs/2512.10049v1

We develop a unified framework for nonparametric functional estimation based on kernel transport along orbits of discrete group actions, which we term \emph{Twin Spaces}. Given a base kernel $K$ and a group $G = \langle\varphi\rangle$ acting isometrically on the input space $E$ , we construct a hierarchy of transported kernels $\{K_j\}_{j\geq 0}$ and a penalized model selection scheme satisfying a Kraft inequality. Our main contributions are threefold: (i) we establish non-asymptotic oracle inequalities for the penalized twin-kernel estimator with explicit constants; (ii) we introduce novel twin-regularity classes that capture smoothness along group orbits and prove that our estimator adapts to these classes; (iii) we show that the framework recovers classical minimax-optimal rates in the Euclidean setting while enabling improved rates when the target function exhibits orbital structure. The effective dimension $d_{\mathrm{eff}}$ governing the rates is characterized in terms of the quotient $G/L$ , where $L$ is the subgroup pre the base operation. Connections to wavelet methods, geometric , and adaptive computation are discussed.

SCOPE Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

Authors: Haoye Lu, Pavan Seshadri, Kaheer Suleman

2025-12-10

http://arxiv.org/abs/2512.09897v1

Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and feedback. Recent research suggests that large language models (s) encode rich semantic knowledge about the world, which can be valuable for guiding agents in high-level reasoning and planning across both embodied and purely textual settings. However, existing approaches often depend heavily on querying s during training and inference, making them computationally expensive and difficult to deploy efficiently. In addition, these methods typically employ a pretrained, unaltered whose parameters remain fixed throughout training, providing no opportunity for adaptation to the target task. To address these limitations, we introduce SCOPE (Subgoal-COnditioned Pretraining for Efficient planning), a one-shot hierarchical planner that leverages -generated subgoals only at initialization to pretrain a lightweight student model. Unlike prior approaches that distill knowledge by repeatedly prompting the model to adaptively generate subgoals during training, our method derives subgoals directly from example trajectories. This design removes the need for repeated queries, significantly improving efficiency, though at the cost of reduced explainability and potentially suboptimal subgoals. Despite their suboptimality, our results on the TextCraft environment show that -generated subgoals can still serve as a strong starting point for hierarchical goal decomposition in text-based planning tasks. Compared to the -based hierarchical agent ADaPT (Prasad et al., 2024), which achieves a 0.52 success rate, our method reaches 0.56 and reduces inference time from 164.4 seconds to just 3.0 seconds.

FlipLLM Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning

Authors: Khurram Khalil, Khaza Anuarul Hoque

2025-12-10

http://arxiv.org/abs/2512.09872v1

Generative Artificial Intelligence models, such as Large Language Models (s) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based threats, specifically bit-flip attacks (BFAs). Existing BFA discovery methods lack generalizability and struggle to scale, often failing to analyze the vast parameter space and complex interdependencies of modern foundation models in a reasonable time. This paper proposes Flip, a reinforcement learning (RL) architecture-agnostic framework that formulates BFA discovery as a sequential decision-making problem. Flip combines sensitivity-guided layer with Q-learning to efficiently identify minimal, high-impact bit sets that can induce catastrophic failure. We demonstrate the effectiveness and generalizability of Flip by applying it to a diverse set of models, including prominent text-only s (GPT-2 Large, LLaMA 3.1 8B, and DeepSeek-V2 7B), VLMs such as LLaVA 1.6, and datasets, such as MMLU, MMLU-Pro, VQAv2, and TextVQA. Our results show that Flip can identify critical bits that are vulnerable to BFAs up to 2.5x faster than SOTA methods. We demonstrate that flipping the Flip-identified bits plummets the accuracy of LLaMA 3.1 8B from 69.9% to ~0.2%, and for LLaVA's VQA score from 78% to almost 0%, by flipping as few as 5 and 7 bits, respectively. Further analysis reveals that applying standard hardware protection mechanisms, such as ECC SECDED, to the Flip-identified bit locations completely mitigates the BFA impact, demonstrating the practical value of our framework in guiding hardware-level defenses. Flip offers the first scalable and adaptive methodology for exploring the BFA vulnerability of both language and multimodal foundation models, paving the way for comprehensive hardware-security evaluation.

Authors: Muneeb Ur Raheem Khan

2025-12-10

http://arxiv.org/abs/2512.09854v1

Large language models (s) increasingly mediate human , decision support, content creation, and information retrieval. Despite impressive fluency, these systems frequently produce biased or stereotypical content, especially when prompted with socially sensitive language. A growing body of research has demonstrated that such biases disproportionately affect low-resource languages, where training data is limited and culturally unrepresentative. This paper presents a comprehensive study of inference-time bias mitigation, a strategy that avoids retraining or fine-tuning and instead operates directly on model outputs. Building on preference-ranking models (PRMs), we introduce a unified evaluation framework comparing three methods: (1) baseline single-word generation, (2) PRM-Select best-of-N sampling, and (3) PRM-Sequential refinement guided by PRM critiques. We evaluate these techniques across 200 English prompts and their Urdu counterparts, designed to reflect socio-cultural contexts relevant to gender, ethnicity, religion, nationality, disability, profession, age, and socioeconomic categories. Using GPT-3.5 as a candidate generator and GPT-4o-mini as a PRM-based bias and utility scorer, we provide an extensive quantitative analysis of bias reduction, utility preservation, and cross-lingual disparities. Our findings show: (a) substantial gains over the baseline for both languages; (b) consistently lower fairness scores for Urdu across all methods, highlighting structural inequities in multilingual training; and (c) distinct improvement trajectories between PRM-Select and PRM-Sequential. The study contributes an extensible methodology, interpretable metrics, and cross-lingual comparisons that can support future work on fairness evaluation in low-resource languages.

RIFT A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning

Authors: Khurram Khalil, Muhammad Mahad Khaliq, Khaza Anuarul Hoque

2025-12-10

http://arxiv.org/abs/2512.09829v1

The massive scale of modern AI accelerators presents critical challenges to traditional fault assessment methodologies, which face prohibitive computational costs and provide poor coverage of critical failure modes. This paper introduces RIFT (Reinforcement Learning-guided Intelligent Fault Targeting), a scalable framework that automates the discovery of minimal, high-impact fault scenarios for efficient design-time fault assessment. RIFT transforms the complex search for worst-case faults into a sequential decision-making problem, combining hybrid sensitivity analysis for search space with reinforcement learning to intelligently generate minimal, high-impact test suites. Evaluated on billion-parameter Large Language Model () workloads using NVIDIA A100 GPUs, RIFT achieves a \textbf{2.2 $\times$ } fault assessment speedup over evolutionary methods and reduces the required test vector volume by over \textbf{99\%} compared to random fault injection, all while achieving \textbf{superior fault coverage}. The proposed framework also provides actionable data to enable intelligent hardware protection strategies, demonstrating that RIFT-guided selective error correction code provides a \textbf{12.8 $\times$ } improvement in \textbf{cost-effectiveness} (coverage per unit area) compared to uniform triple modular redundancy protection. RIFT automatically generates UVM-compliant verification artifacts, ensuring its findings are directly actionable and integrable into commercial RTL verification workflows.

Kinematics of Distant Milky Way Halo RR Lyrae Stars out to 160 kpc

Authors: Yuting Feng, Puragra Guhathakurta, Eric W. Peng, Emily C. Cunningham, Patrick Côté, Laura Ferrarese, Stephen D. J. Gwyn

2025-12-10

http://arxiv.org/abs/2512.09795v1

We present a kinematical study of the outer halo (r_GC approximately 60 to 160 kpc) of the Milky Way based on spectroscopy of 55 RR Lyrae stars obtained with the ESI instrument on the Keck II telescope. Our spectroscopic targets were selected from three photometric surveys: NGVS, DES, and Pan-STARRS1. We derive center-of-mass radial velocities with uncertainties of 6 to 35 km s^-1. The halo velocity dispersion measured from our sample is 70 plus/minus 7 km s^-1. The velocity field shows a possible dipole-like structure, with redshifted northern and blueshifted southern hemispheres. Fitting a Milky Way - Large Magellanic Cloud dipole perturbation model yields a weak or marginal dipole signal with amplitude -30 (+16, -20) km s^-1 and apex direction (l, b) = (-38.2 (+42.4, -31.5), -41.3 (+27.9, -23.8)) deg, along with a bulk velocity of -16 plus/minus 11 km s^-1. Although limited by sky coverage and sample size, our results are consistent with the presence of LMC-induced disequilibrium in the distant halo beyond 100 kpc. In addition to the 55 RR Lyrae stars, our spectroscopy reveals that 10 additional photometrically selected RR Lyrae candidates are actually quasar or blazar contaminants, highlighting the need for caution regarding such contaminants in ly sampled photometric surveys. Our study demonstrates that single-epoch spectroscopy of RR Lyrae stars is a viable method for probing the kinematics of the outer halo, and future surveys such as Rubin LSST and DESI-II have the potential to significantly advance this effort.

Circuits, Features, and Heuristics in Molecular Transformers

Authors: Kristof Varadi, Mark Marosi, Peter Antal

2025-12-10

http://arxiv.org/abs/2512.09757v1

Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a mechanistic analysis of autoregressive s trained on drug-like small molecules to reveal the computational structure underlying their capabilities across multiple levels of abstraction. We identify computational patterns consistent with low-level syntactic parsing and more abstract chemical validity constraints. Using autoencoders (SAEs), we extract feature dictionaries associated with chemically relevant activation patterns. We validate our findings on downstream tasks and find that mechanistic insights can translate to predictive performance in various practical settings.

BAMBO Construct Ability and Efficiency LLM Pareto Set via Bayesian Adaptive Multi-objective Block-wise Optimization

Authors: Kesheng Chen, Wenjian Luo, Zhenqian Zhu, Yamin Hu, Yiya Xi

2025-12-10

http://arxiv.org/abs/2512.09972v1

Constructing a Pareto set is pivotal for navigating the capability-efficiency trade-offs in Large Language Models (s); however, existing merging techniques remain inadequate for this task. Coarse-grained, model-level methods yield only a set of suboptimal solutions, while fine-grained, layer-wise approaches suffer from the "curse of dimensionality," rendering the search space computationally intractable. To resolve this dichotomy, we propose BAMBO (Bayesian Adaptive Multi-objective Block-wise Optimization), a novel framework that automatically constructs the Pareto set. BAMBO renders the search tractable by introducing a Hybrid Optimal Block Partitioning strategy. Formulated as a 1D clustering problem, this strategy leverages a dynamic programming approach to optimally balance intra-block homogeneity and inter-block information distribution, thereby dramatically reducing dimensionality without sacrificing critical granularity. The entire process is automated within an evolutionary loop driven by the q-Expected Hypervolume Improvement (qEHVI) acquisition function. Experiments demonstrate that BAMBO discovers a superior and more comprehensive Pareto frontier than baselines, enabling agile model selection tailored to diverse operational constraints. Code is available at: https://github.com/xin8coder/BAMBO.

Mixture of Lookup Key-Value Experts

Authors: Zongcheng Wang

2025-12-10

http://arxiv.org/abs/2512.09723v1

Recent research has developed several architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature of MoLE is that each token id is associated with a dedicated group of experts. For a given input, only the experts corresponding to the input token id will be activated. Since the overhead of loading this small number of activated experts into RAM during inference is negligible, expert parameters can be offloaded to storage, making MoLE suitable for resource-constrained devices. However, MoLE's context-independent expert selection mechanism, based solely on input ids, may limit model performance. To address this, we propose the \textbf{M}ixture \textbf{o}f \textbf{L}ookup \textbf{K}ey-\textbf{V}alue Experts (\textbf{MoL}) model. In MoL, each expert is structured as a key-value pair. For a given input, the input-derived query interacts with the d key-value experts from the current sequence, generating a context-aware expert output. This context-aware mechanism alleviates the limitation of MoLE, and experimental results demonstrate that MoL achieves significantly lower validation loss in small-scale evaluations.

Unconsciously Forget Mitigating Memorization; Without Knowing What is being Memorized

Authors: Er Jin, Yang Zhang, Yongli Mou, Yanfei Dong, Stefan Decker, Kenji Kawaguchi, Johannes Stegmaier

2025-12-10

http://arxiv.org/abs/2512.09687v1

Recent advances in generative models have demonstrated an exceptional ability to produce highly realistic images. However, previous studies show that generated images often resemble the training data, and this problem becomes more severe as the model size increases. Memorizing training data can lead to legal challenges, including copyright infringement, violations of portrait rights, and trademark violations. Existing approaches to mitigating memorization mainly focus on manipulating the denoising sampling process to steer image embeddings away from the memorized embedding space or employ unlearning methods that require training on datasets containing specific sets of memorized concepts. However, existing methods often incur substantial computational overhead during sampling, or focus narrowly on removing one or more groups of target concepts, imposing a significant limitation on their scalability. To understand and mitigate these problems, our work, UniForget, offers a new perspective on understanding the root cause of memorization. Our work demonstrates that specific parts of the model are responsible for copyrighted content generation. By applying model , we can effectively suppress the probability of generating copyrighted content without targeting specific concepts while pre the general generative capabilities of the model. Additionally, we show that our approach is both orthogonal and complementary to existing unlearning methods, thereby highlighting its potential to improve current unlearning and de-memorization techniques.

d-TreeRPO Towards More Reliable Policy Optimization for Diffusion Language Models

Authors: Leyi Pan, Shuchang Tao, Yunpeng Zhai, Zheyu Fu, Liancheng Fang, Minghua He, Lingzhe Zhang, Zhaoyang Liu, Bolin Ding, Aiwei Liu, Lijie Wen

2025-12-10

http://arxiv.org/abs/2512.09675v1

Reliable reinforcement learning (RL) for diffusion large language models (ds) requires both accurate advantage estimation and precise estimation of prediction probabilities. Existing RL methods for ds fall short in both aspects: they rely on coarse or unverifiable reward signals, and they estimate prediction probabilities without accounting for the bias relative to the true, unbiased expected prediction probability that properly integrates over all possible orders. To mitigate these issues, we propose \emph{d}-TreeRPO, a reliable RL framework for ds that leverages tree-structured rollouts and bottom-up advantage computation based on verifiable outcome rewards to provide fine-grained and verifiable step-wise reward signals. When estimating the conditional transition probability from a parent node to a child node, we theoretically analyze the estimation error between the unbiased expected prediction probability and the estimate obtained via a single forward pass, and find that higher prediction confidence leads to lower estimation error. Guided by this analysis, we introduce a time-scheduled self-distillation loss during training that enhances prediction confidence in later training stages, thereby enabling more accurate probability estimation and improved convergence. Experiments show that \emph{d}-TreeRPO outperforms existing baselines and achieves significant gains on multiple reasoning benchmarks, including +86.2 on Sudoku, +51.6 on Countdown, +4.5 on GSM8K, and +5.3 on Math500. Ablation studies and computational cost analyses further demonstrate the effectiveness and practicality of our design choices.

RIS-Assisted Coordinated Multi-Point ISAC for Low-Altitude Sensing Coverage

Authors: Ying Zhang, Zeqi Hao, Tingting Zhang

2025-12-10

http://arxiv.org/abs/2512.09625v1

The low-altitude economy (LAE) has emerged and developed in various fields, which has gained considerable interest. To ensure the security of LAE, it is essential to establish a proper sensing coverage scheme for monitoring the unauthorized targets. Introducing integrated sensing and (ISAC) into cellular networks is a promising solution that enables coordinated multiple base stations (BSs) to significantly enhance sensing performance and extend coverage. Meanwhile, deploying a reconfigurable intelligent surface (RIS) can mitigate signal blockages between BSs and low-altitude targets in urban areas. Therefore, this paper focuses on the low-altitude sensing coverage problem in RIS-assisted coordinated multi-point ISAC networks, where a RIS is employed to enable multiple BSs to sense a prescribed region while multiple users. A joint beamforming and phase shifts design is proposed to minimize the total transmit power while guaranteeing sensing signal-to-noise ratio and spectral efficiency. To tackle this non-convex optimization problem, an efficient algorithm is proposed by using the alternating optimization and semi-definite relaxation techniques. Numerical results demonstrate the superiority of our proposed scheme over the baseline schemes.

System Report for CCL25-Eval Task 10 Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection

Authors: Binglin Wu, Jiaxiu Zou, Xianneng Li

2025-12-10

http://arxiv.org/abs/2512.09563v1

The proliferation of hate speech on Chinese social media poses urgent societal risks, yet traditional systems struggle to context-dependent rhetorical strategies and evolving slang. To bridge this gap, we propose a novel three-stage -based framework: Prompt Engineering, Supervised Fine-tuning, and Merging. First, context-aware prompts are designed to guide s in extracting implicit hate patterns. Next, task-specific features are integrated during supervised fine-tuning to enhance domain adaptation. Finally, merging fine-tuned s improves robustness against out-of-distribution cases. Evaluations on the STATE-ToxiCN benchmark validate the framework's effectiveness, demonstrating superior performance over baseline methods in detecting fine-grained hate speech.

High-throughput characterization of snap-through stability boundaries of bistable beams in a programmable rotating platform

Authors: Eduardo Gutierrez-Prieto, Gilad Yakir, Pedro M. Reis

2025-12-10

http://arxiv.org/abs/2512.09544v1

We introduce a high-throughput platform that enables simultaneous, parallel testing of six bistable beams via programmable motion of a rotating disk. By prescribing harmonic angular dynamics, the platform explores the phase space of angular velocity and $(Ω,\,\dotΩ)$ , producing continuously varying centrifugal and Euler force fields that act as tunable body forces in our specimens. Image processing extracts beam kinematics with sub-pixel accuracy, enabling precise identification of snap-through events. By testing six beams in parallel, the platform allows systematic variation of beam thickness, pre-, tilt angle, and clamp orientations across 65 distinct configurations, generating 23,400 individual experiments. We construct stability boundaries and quantitatively parameterize them as parabolic functions, characterized by a vertical offset and a curvature parameter. Tilt angle provides the most robust mechanism for tuning the curvature parameter, while beam thickness and pre- modulate vertical offset. Modal decomposition analysis reveals that antisymmetric clamp configurations can trigger mode switching, in which competing geometric and inertial effects drive transitions through different deformation pathways. Our work establishes a scalable experimental framework for high-throughput characterization of dynamic nonlinear instabilities in mechanics. The complete experimental dataset is made publicly available to support data-driven design and machine learning models for nonlinear mechanics with applications to bistability-based metamaterials, mechanical memory, and electronics-free sensing systems.

Masked Registration and Autoencoding of CT Images for Predictive Tibia Reconstruction

Authors: Hongyou Zhou, Cederic Aßmann, Alaa Bejaoui, Heiko Tzschätzsch, Mark Heyland, Julian Zierke, Niklas Tuttle, Sebastian Hölzl, Timo Auer, David A. Back, Marc Toussaint

2025-12-10

http://arxiv.org/abs/2512.09525v1

Surgical planning for complex tibial fractures can be challenging for surgeons, as the 3D structure of the later desirable bone alignment may be diffi- cult to imagine. To assist in such planning, we address the challenge of predicting a patient-specific reconstruction target from a CT of the fractured tibia. Our ap- proach combines neural registration and autoencoder models. Specifically, we first train a modified spatial network (STN) to register a raw CT to a standardized coordinate system of a jointly trained tibia prototype. Subsequently, various autoencoder (AE) architectures are trained to model healthy tibial varia- tions. Both the STN and AE models are further designed to be robust to masked input, allowing us to apply them to fractured CTs and to a prediction of the patient-specific healthy bone in standard coordinates. Our contributions include: i) a 3D-adapted STN for global spatial registration, ii) a comparative analysis of AEs for bone CT modeling, and iii) the extension of both to handle masked inputs for predictive generation of healthy bone structures. Project page: https://github.com/HongyouZhou/repair

Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks

Authors: Xinye Cao, Yihan Lin, Guoshun Nan, Qinchuan Zhou, Yuhang Luo, Yurui Gao, Zeliang Zhang, Haolang Lu, Qimei Cui, Yanzhao Hou, Xiaofeng Tao, Tony Q. S. Quek

2025-12-10

http://arxiv.org/abs/2512.09485v1

Zero-Touch Networks (ZTNs) represent a transformative paradigm toward fully automated and intelligent network management, providing the scalability and adaptability required for the complexity of sixth-generation (6G) networks. However, the distributed architecture, high openness, and deep heterogeneity of 6G networks expand the attack surface and pose unprecedented security challenges. To address this, security automation aims to enable intelligent security management across dynamic and complex environments, as a key capability for securing 6G ZTNs. Despite its promise, implementing security automation in 6G ZTNs presents two primary challenges: 1) automating the lifecycle from security strategy generation to validation and update under real-world, parallel, and adversarial conditions, and 2) adapting security strategies to evolving threats and dynamic environments. This motivates us to propose SecLoop and SA-GRPO. SecLoop constitutes the first fully automated framework that integrates large language models (s) across the entire lifecycle of security strategy generation, orchestration, response, and feedback, enabling intelligent and adaptive defenses in dynamic network environments, thus tackling the first challenge. Furthermore, we propose SA-GRPO, a novel security-aware group relative policy optimization algorithm that iteratively refines security strategies by contrasting group feedback collected from parallel SecLoop executions, thereby addressing the second challenge. Extensive real-world experiments on five benchmarks, including 11 MITRE ATT&CK processes and over 20 types of attacks, demonstrate the superiority of the proposed SecLoop and SA-GRPO. We will release our platform to the community, facilitating the advancement of security automation towards next generation s.

An Efficient Interaction Human-AI Synergy System Bridging Visual Awareness and Large Language Model for Intensive Care Units

Authors: Yibowen Zhao, Yiming Cao, Zhiqi Shen, Juan Du, Yonghui Xu, Lizhen Cui, Cyril Leung

2025-12-10

http://arxiv.org/abs/2512.09473v1

Intensive Care Units (ICUs) are critical environments characterized by high-stakes monitoring and complex data management. However, current practices often rely on manual data transcription and fragmented information systems, introducing potential risks to patient safety and operational efficiency. To address these issues, we propose a human-AI synergy system based on a cloud-edge-end architecture, which integrates visual-aware data extraction and semantic interaction mechanisms. Specifically, a visual-aware edge module non-invasively captures real-time physiological data from bedside monitors, reducing manual entry errors. To improve accessibility to fragmented data sources, a semantic interaction module, powered by a Large Language Model (), enables physicians to perform efficient and intuitive voice-based queries over structured patient data. The hierarchical cloud-edge-end deployment ensures low-latency and scalable system performance. Our system reduces the cognitive burden on ICU nurses and physicians and demonstrates promising potential for broader applications in intelligent healthcare systems.

LiePrune Lie Group and Quantum Geometric Dual Representation for One-Shot Structured Pruning of Quantum Neural Networks

Authors: Haijian Shao, Bowen Yang, Wei Liu, Xing Deng, Yingtao Jiang

2025-12-10

http://arxiv.org/abs/2512.09469v1

Quantum neural networks (QNNs) and parameterized quantum circuits (PQCs) are key building blocks for near-term quantum machine learning. However, their scalability is constrained by excessive parameters, barren plateaus, and hardware limitations. We propose LiePrune, the first mathematically grounded one-shot structured framework for QNNs that leverages Lie group structure and quantum geometric information. Each gate is jointly represented in a Lie group--Lie algebra dual space and a quantum geometric feature space, enabling principled redundancy detection and aggressive . Experiments on quantum classification (MNIST, FashionMNIST), quantum generative modeling (Bars-and-Stripes), and quantum chemistry (LiH VQE) show that LiePrune achieves over $10\times$ with negligible or even improved task performance, while providing provable guarantees on redundancy detection, functional approximation, and computational complexity.

Cytoplasmic Strings Analysis in Human Embryo Time-Lapse Videos using Deep Learning Framework

Authors: Anabia Sohail, Mohamad Alansari, Ahmed Abughali, Asmaa Chehab, Abdelfatah Ahmed, Divya Velayudhan, Sajid Javed, Hasan Al Marzouqi, Ameena Saad Al-Sumaiti, Junaid Kashir, Naoufel Werghi

2025-12-10

http://arxiv.org/abs/2512.09461v1

Infertility is a major global health issue, and while in-vitro fertilization has improved treatment outcomes, embryo selection remains a critical bottleneck. Time-lapse imaging enables continuous, non-invasive monitoring of embryo development, yet most automated assessment methods rely solely on conventional morphokinetic features and overlook emerging biomarkers. Cytoplasmic Strings, thin filamentous structures connecting the inner cell mass and trophectoderm in expanded blastocysts, have been associated with faster blastocyst formation, higher blastocyst grades, and improved viability. However, CS assessment currently depends on manual visual inspection, which is labor-intensive, subjective, and severely affected by detection and subtle visual appearance. In this work, we present, to the best of our knowledge, the first computational framework for CS analysis in human IVF embryos. We first design a human-in-the-loop annotation pipeline to curate a biologically validated CS dataset from TLI videos, comprising 13,568 frames with highly CS-positive instances. Building on this dataset, we propose a two-stage deep learning framework that (i) classifies CS presence at the frame level and (ii) localizes CS regions in positive cases. To address severe imbalance and feature uncertainty, we introduce the Novel Uncertainty-aware Contractive Embedding (NUCE) loss, which couples confidence-aware reweighting with an embedding contraction term to form compact, well-separated class clusters. NUCE consistently improves F1-score across five backbones, while RF-DETR-based localization achieves state-of-the-art (SOTA) detection performance for thin, low-contrast CS structures. The source code will be made publicly available at: https://github.com/HamadYA/CS_Detection.

CourtPressGER A German Court Decision to Press Release Summarization Dataset

Authors: Sebastian Nagl, Mohamed Elganayni, Melanie Pospisil, Matthias Grabmair

2025-12-10

http://arxiv.org/abs/2512.09434v1

Official court press releases from Germany's highest courts present and explain judicial rulings to the public, as well as to expert audiences. Prior NLP efforts emphasize technical headnotes, ignoring citizen-oriented needs. We introduce CourtPressGER, a 6.4k dataset of triples: rulings, human-drafted press releases, and synthetic prompts for s to generate comparable releases. This benchmark trains and evaluates s in generating accurate, readable summaries from long judicial texts. We benchmark small and large s using reference-based metrics, factual-consistency checks, -as-judge, and expert ranking. Large s produce high-quality drafts with minimal hierarchical performance loss; smaller models require hierarchical setups for long judgments. Initial benchmarks show varying model performance, with human-drafted releases ranking highest.

Authors: Haobo Jiang, Jin Xie, Jian Yang, Liang Yu, Jianmin Zheng

2025-12-10

http://arxiv.org/abs/2512.09373v1

Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints. This paper proposes FUSER, the first feed-forward multiview registration that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency. Building upon FUSER, we further introduce FUSER-DF, an SE(3) $^N$ diffusion refinement framework to correct FUSER's estimates via denoising in the joint SE(3) $^N$ space. FUSER acts as a surrogate multiview registration model to construct the denoiser, and a prior-conditioned SE(3) $^N$ variational lower bound is derived for denoising supervision. Extensive experiments on 3DMatch, ScanNet and ArkitScenes demonstrate that our approach achieves the superior registration accuracy and outstanding computational efficiency.

Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs

Authors: Yezi Liu, William Youngwoo Chung, Hanning Chen, Calvin Yeung, Mohsen Imani

2025-12-10

http://arxiv.org/abs/2512.09369v1

Recent advances in large language models (s) have enabled strong reasoning over both structured and unstructured knowledge. When grounded on knowledge graphs (KGs), however, prevailing pipelines rely on heavy neural encoders to embed and score symbolic paths or on repeated calls to rank candidates, leading to high latency, GPU cost, and opaque decisions that hinder faithful, scalable deployment. We propose PathHD, a lightweight and encoder-free KG reasoning framework that replaces neural path scoring with hyperdimensional computing (HDC) and uses only a single call per query. PathHD encodes relation paths into block-diagonal GHRR hypervectors, ranks candidates with blockwise cosine similarity and Top-K , and then performs a one-shot adjudication to produce the final answer together with cited supporting paths. Technically, PathHD is built on three ingredients: (i) an order-aware, non-commutative binding operator for path composition, (ii) a calibrated similarity for robust hypervector-based retrieval, and (iii) a one-shot adjudication step that preserves interpretability while eliminating per-path scoring. On WebQSP, CWQ, and the GrailQA split, PathHD (i) attains comparable or better Hits@1 than strong neural baselines while using one call per query; (ii) reduces end-to-end latency by $40-60\%$ and GPU memory by $3-5\times$ thanks to encoder-free retrieval; and (iii) delivers faithful, path-grounded rationales that improve error diagnosis and controllability. These results indicate that carefully designed HDC representations provide a practical substrate for efficient KG- reasoning, offering a favorable accuracy-efficiency-interpretability trade-off.

Infinitesimal containment and sparse factors of iid

Authors: Mikołaj Frączyk

2025-12-10

http://arxiv.org/abs/2512.09301v1

We introduce infinitesimal weak containment for measure-pre actions of a countable group $Γ$ : an action $(X,μ)$ is infinitesimally contained in $(Y,ν)$ if the statistics of the action of $Γ$ on small measure subsets of $X$ can be approximated inside $Y$ . We show that the Bernoulli shift $[0,1]^Γ$ is infinitesimally contained in the left-regular action of $Γ$ . For exact groups, this implies that factor-of-iid subsets of $Γ$ are approximately hyperfinite. We use it to quantify a theorem of Chifan--Ioana on measured subrelations of the Bernoulli shift of an exact group. For the proof of infinitesimal containment we define \emph{entropy support maps}, which take a small subset $U$ of $\{0,1\}^I$ and assign weights to coordinates above every point of $U$ , according to how ''important'' they are for the structure of the set.

GoodSpeed Optimizing Fair Goodput with Adaptive Speculative Decoding in Distributed Edge Inference

Authors: Phuong Tran, Tzu-Hao Liu, Long Tan Le, Tung-Anh Nguyen, Van Quan La, Eason Yu, Han Shu, Choong Seon Hong, Nguyen H. Tran

2025-12-10

http://arxiv.org/abs/2512.09963v1

Large language models (s) have revolutionized natural language processing, yet their high computational demands pose significant challenges for real-time inference, especially in multi-user server speculative and resource-constrained environments. Speculative has emerged as a promising technique to accelerate inference by using lightweight draft models to generate candidate tokens, which are subsequently verified by a larger, more accurate model. However, ensuring both high goodput (the effective rate of accepted tokens) and fairness across multiple draft servers cooperating with a central verification server remains an open challenge. This paper introduces GOODSPEED, a novel distributed inference framework that optimizes goodput through adaptive speculative . GOODSPEED employs a central verification server that coordinates a set of heterogeneous draft servers, each running a small language model to generate speculative tokens. To manage resource allocation effectively, GOODSPEED incorporates a gradient scheduling algorithm that dynamically assigns token verification tasks, maximizing a logarithmic utility function to ensure proportional fairness across servers. By processing speculative outputs from all draft servers in parallel, the framework enables efficient collaboration between the verification server and distributed draft generators, streamlining both latency and throughput. Through rigorous fluid sample path analysis, we show that GOODSPEED converges to the optimal goodput allocation in steady-state conditions and maintains near-optimal performance with provably bounded error under dynamic workloads. These results demonstrate that GOODSPEED provides a scalable, fair and efficient solution for multi- in distributed inference systems.

Efficient MoE Serving in the Memory-Bound Regime Balance Activated Experts, Not Tokens

Authors: Yanpeng Yu, Haiyue Ma, Krish Agarwal, Nicolai Oswald, Qijing Huang, Hugo Linsenmaier, Chunhui Mei, Ritchie Zhao, Ritika Borkar, Bita Darvish Rouhani, David Nellans, Ronny Krashinsky, Anurag Khandelwal

2025-12-10

http://arxiv.org/abs/2512.09277v1

Expert Parallelism (EP) permits Mixture of Experts (MoE) models to scale beyond a single GPU. To address load imbalance across GPUs in EP, existing approaches aim to balance the number of tokens each GPU processes. Surprisingly, we find that this objective degrades performance rather than improving it when processing is memory-bound - a common occurrence in MoE , especially in the phase. Our analysis reveals that balancing the number of tokens processed per GPU increases the number of activated experts, exacerbating memory pressure in the memory-bound regime. We propose Minimum Expert Token ROuting, a novel token-routing algorithm for high-performance expert-parallel MoE in the memory-bound regime that balances the number of activated experts per GPU rather than token counts. METRO achieves near-optimal routing quality with minimal computational overhead by jointly optimizing algorithmic efficiency and leveraging the GPU's parallel processing power. To guarantee routing quality, METRO also employs a novel allGather scheme to gather global top-k knowledge, which has minimal overhead compared to conventional allToAll. Our evaluation of METRO against EPLB on both real systems (v over 8 A100 GPUs) and a proprietary simulator (8-16 B200 GPUs) shows that METRO reduces latency by 11 - 22%, and total token throughput by 3 - 21% for Qwen3 and DeepSeek-V3 , where and phases are co-deployed. In addition, by trading latency headroom for throughput, METRO improves throughput by up to 4.11x over EPLB at a fixed SLO.

Training-free Context-adaptive Attention for Efficient Long Context Modeling

Authors: Zeng You, Yaofo Chen, Shuhai Zhang, Zhijie Qiu, Tingyu Wu, Yingjian Li, Yaowei Wang, Mingkui Tan

2025-12-10

http://arxiv.org/abs/2512.09238v1

Large Language Models (s) have demonstrated remarkable capabilities across a wide range of natural language processing tasks. These capabilities stem primarily from the self-attention mechanism, which enables modeling of long-range dependencies. However, the quadratic complexity of self-attention with respect to sequence length poses significant computational and memory challenges, especially as sequence length extends to extremes. While various attention and methods have been proposed to improve efficiency, they often suffer from limitations such as reliance on fixed patterns, inability to handle both ing and stages, or the requirement for additional training. In this paper, we propose Training-free Context-adaptive Attention (TCA-Attention), a training-free attention mechanism that selectively attends to only the informative tokens for efficient long-context inference. Our method consists of two lightweight phases: i) an offline calibration phase that determines head-specific budgets via a single forward pass, and ii) an online token selection phase that adaptively retains core context tokens using a lightweight redundancy metric. TCA-Attention provides a unified solution that accelerates both ing and while reducing memory footprint, without requiring parameter updates or architectural changes. Theoretical analysis shows that our approach maintains bounded approximation error. Extensive experiments demonstrate that TCA-Attention achieves a 2.8 $\times$ speedup and reduces by 61% at 128K context length while maintaining performance comparable to full attention across various benchmarks, offering a practical plug-and-play solution for efficient long-context inference.

Efficient Feature Compression for Machines with Global Statistics Preservation

Authors: Md Eimran Hossain Eimon, Hyomin Choi, Fabien Racapé, Mateen Ulhaq, Velibor Adzic, Hari Kalva, Borko Furht

2025-12-10

http://arxiv.org/abs/2512.09235v1

The split-inference paradigm divides an artificial intelligence (AI) model into two parts. This necessitates the transfer of intermediate feature data between the two halves. Here, effective of the feature data becomes vital. In this paper, we employ Z-score normalization to efficiently recover the compressed feature data at the r side. To examine the efficacy of our method, the proposed method is integrated into the latest Feature Coding for Machines (FCM) codec standard under development by the Moving Picture Experts Group (MPEG). Our method supersedes the existing scaling method used by the current standard under development. It both reduces the overhead bits and improves the end-task accuracy. To further reduce the overhead in certain circumstances, we also propose a simplified method. Experiments show that using our proposed method shows 17.09% reduction in bitrate on average across different tasks and up to 65.69% for object tracking without sacrificing the task accuracy.

Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers

Authors: Jinming Lu, Jiayi Tian, Yequan Zhao, Hai Li, Zheng Zhang

2025-12-10

http://arxiv.org/abs/2512.09202v1

Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training objectives. However, their deployment on resource-constrained platforms is hindered by substantial computational and memory overhead, primarily stemming from higher-order automatic differentiation, intensive tensor operations, and reliance on full-precision arithmetic. To address these challenges, we present a framework that enables scalable and energy-efficient PINN training on edge devices. This framework integrates fully d training, Stein's estimator (SE)-based residual loss computation, and tensor-train (TT) decomposition for weight . It contributes three key innovations: (1) a mixed-precision training method that use a square-block (S) format to eliminate data duplication during backpropagation; (2) a difference-based scheme for the Stein's estimator that mitigates underflow; and (3) a partial-reconstruction scheme (PRS) for TT-Layers that reduces -error accumulation. We further design PINTA, a precision-scalable hardware accelerator, to fully exploit the performance of the framework. Experiments on the 2-D Poisson, 20-D Hamilton-Jacobi-Bellman (HJB), and 100-D Heat equations demonstrate that the proposed framework achieves accuracy comparable to or better than full-precision, uncompressed baselines while delivering 5.5x to 83.5x speedups and 159.6x to 2324.1x energy savings. This work enables real-time PDE solving on edge devices and paves the way for energy-efficient scientific computing at scale.

Understanding the Failure Modes of Transformers through the Lens of Graph Neural Networks

Authors: Hunjae Lee

2025-12-09

http://arxiv.org/abs/2512.09182v1

Transformers and more specifically r-only s dominate modern architectures. While they have shown to work exceptionally well, they are not without issues, resulting in surprising failure modes and predictably asymmetric performance degradation. This article is a study of many of these observed failure modes of s through the lens of graph neural network (GNN) theory. We first make the case that much of deep learning, including s, is about learnable information mixing and propagation. This makes the study of model failure modes a study of bottlenecks in information propagation. This naturally leads to GNN theory, where there is already a rich literature on information propagation bottlenecks and theoretical failure modes of models. We then make the case that many issues faced by GNNs are also experienced by s. In addition, we analyze how the causal nature of r-only s create interesting geometric properties in information propagation, resulting in predictable and potentially devastating failure modes. Finally, we observe that existing solutions in research tend to be ad-hoc and driven by intuition rather than grounded theoretical motivation. As such, we unify many such solutions under a more theoretical perspective, providing insight into why they work, what problem they are actually solving, and how they can be further improved to target specific failure modes of s. Overall, this article is an attempt to bridge the gap between observed failure modes in s and a general lack of theoretical understanding of them in this space.

Freely controllable single-optical-frequency comb for highly sensitive cavity ring-down spectroscopy

Authors: Norihiko Nishizawa, Shotaro Kitajima, Ningwu Liu, Ryohei Terabayashi, Daiki Hashimoto, Hisashi Abe, Hideki Tomita

2025-12-09

http://arxiv.org/abs/2512.09159v1

Direct comb spectroscopy is a useful tool for obtaining highly accurate spectroscopic information. However, as the number of comb modes is very large and the optical energy is dispersed over them, the optical energy per each comb mode is ultrasmall, limiting the sensitivity of highly sensitive spectroscopy. If we can concentrate the optical energy into the comb modes that only with the absorption spectra, we can demonstrate drastic improvements in its measurement sensitivity. In this study, we developed a freely controllable optical frequency comb source based on the spectral peak phenomenon. The comb modes ping the CH4 absorption spectra were transformed into background-suppressed spectral peaks at the nonlinear loop mirror using a CH4 gas cell. Coherence-pre power scaling of the generated comb was demonstrated using a fiber Raman amplifier. Subsequently, only the single-comb mode was filtered using a newly developed spectral filter with an ultrahigh resolution. The maximum optical power of a single comb was estimated to be more than 10 mW. The ring-down decay signal from the high-finesse optical cavity was measured using a single selected mode of the generated controllable comb. As a demonstration, the 2v_3 bands of the CH4 absorption spectra were accurately measured by comb-mode-resolved, cavity ring-down spectroscopy (CRDS) with high sensitivity up to 4.2 x 10^(-11) cm^(-1). This sensitivity is two orders of magnitude higher than that of previously reported comb-based CRDS. The residual was only 0.29 %, indicating the high accuracy of the proposed spectrometer for molecular spectral analysis. This approach can be extended to other wavelength ranges and is useful for highly sensitive, high-resolution, comb-resolved spectroscopy.

When Quantum Federated Learning Meets Blockchain in 6G Networks

Authors: Dinh C. Nguyen, Md Bokhtiar Al Zami, Ratun Rahman, Shaba Shaon, Tuy Tan Nguyen, Fatemeh Afghah

2025-12-09

http://arxiv.org/abs/2512.09958v1

Quantum federated learning (QFL) is emerging as a key enabler for intelligent, secure, and privacy-pre model training in next-generation 6G networks. By leveraging the computational advantages of quantum devices, QFL offers significant improvements in learning efficiency and resilience against quantum-era threats. However, future 6G environments are expected to be highly dynamic, decentralized, and data-intensive, which necessitates moving beyond traditional centralized federated learning frameworks. To meet this demand, blockchain technology provides a decentralized, tamper-resistant infrastructure capable of enabling trustless collaboration among distributed quantum edge devices. This paper presents QFLchain, a novel framework that integrates QFL with blockchain to support scalable and secure 6G intelligence. In this work, we investigate four key pillars of \textit{QFLchain} in the 6G context: (i) and consensus overhead, (ii) scalability and storage overhead, (iii) energy inefficiency, and (iv) security vulnerability. A case study is also presented, demonstrating potential advantages of QFLchain, based on simulation, over state-of-the-art approaches in terms of training performance.