2025-12-19

Plausibility as Failure How LLMs and Humans Co-Construct Epistemic Error
TreeNet A Light Weight Model for Low Bitrate Image Compression
Yuan-TecSwin A text conditioned Diffusion model with Swin-transformer blocks
StageVAR Stage-Aware Acceleration for Visual Autoregressive Models
Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
SoK Reviewing Two Decades of Security, Privacy, Accessibility, and Usability Studies on Internet of Things for Older Adults
Kascade A Practical Sparse Attention Method for Long-Context LLM Inference
GMODiff One-Step Gain Map Refinement with Diffusion Priors for HDR Reconstruction
Feature-Selective Representation Misdirection for Machine Unlearning
Ein Typenrad auf der Überholspur Die Kult-Schreibmaschine "Erika" trifft KI
CKA-Guided Modular Quantization Beyond Bit-Width to Algorithmic Diversity
Fast Collaborative Inference via Distributed Speculative Decoding
AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints
Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution
LoPA Scaling dLLM Inference via Lookahead Parallel Decoding
SegGraph Leveraging Graphs of SAM Segments for Few-Shot 3D Part Segmentation
LLM4Perf Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling (Copy)
MultiPath Transfer Engine Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services
Enhancing Line Density Plots with Outlier Control and Bin-based Illumination
Hierarchical Neural Surfaces for 3D Mesh Compression
AIE4ML An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines
SALVE Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks
Dynamic Rebatching for Efficient Early-Exit Inference with DREX
Multi-Modal Semantic Communication
The longest known tails of ram-pressure stripped star-forming galaxies are caused by an ICM shock in Abell 1367
VTCBench Can Vision-Language Models Understand Long Context with Vision-Text Compression?
IC-Effect Precise and Efficient Video Effects Editing via In-Context Learning
Note on bulk viscosity as an alternative to dark energy
Reducing Pilots in Channel Estimation With Predictive Foundation Models
CTkvr KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing
Attention in Motion Secure Platooning via Transformer-based Misbehavior Detection
GenAI-enabled Residual Motion Estimation for Energy-Efficient Semantic Video Communication
Randomized orthogonalization and Krylov subspace methods principles and algorithms
Three-Dimensional Radio Localization A Channel Charting-Based Approach
Emotion Recognition in Signers
Adversarial versification in portuguese as a jailbreak operator in LLMs
LLMQ Efficient Lower-Precision Pretraining for Consumer GPUs
Keep the Core Adversarial Priors for Significance-Preserving Brain MRI Segmentation
Defect Tolerance and Local Structural Response to 3d Transition-Metal Substitution in CsPbI3
Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory
Audio-Visual Cross-Modal Compression for Generative Face Video Coding
The Moralization Corpus Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres
Magnetised turbulent plasmas as high-energy particle accelerators
DEER Draft with Diffusion, Verify with Autoregressive Models
Beyond Majority Voting Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning
Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Plausibility as Failure How LLMs and Humans Co-Construct Epistemic Error

Authors: Claudia Vale Oliveira, Nelson Zagalo, Filipe Silva, Anabela Brandao, Syeda Faryal Hussain Khurrum, Joaquim Santos

2025-12-18

http://arxiv.org/abs/2512.16750v1

Large language models (s) are increasingly used as epistemic partners in everyday reasoning, yet their errors remain predominantly analyzed through predictive metrics rather than through their interpretive effects on human judgment. This study examines how different forms of epistemic failure emerge, are masked, and are tolerated in human AI interaction, where failure is understood as a relational breakdown shaped by model-generated plausibility and human interpretive judgment. We conducted a three round, multi evaluation using interdisciplinary tasks and progressively differentiated assessment frameworks to observe how evaluators interpret model responses across linguistic, epistemic, and credibility dimensions. Our findings show that errors shift from predictive to hermeneutic forms, where linguistic fluency, structural coherence, and superficially plausible citations conceal deeper distortions of meaning. Evaluators frequently conflated criteria such as correctness, relevance, bias, groundedness, and consistency, indicating that human judgment collapses analytical distinctions into intuitive heuristics shaped by form and fluency. Across rounds, we observed a systematic verification burden and cognitive drift. As tasks became denser, evaluators increasingly relied on surface cues, allowing erroneous yet well formed answers to pass as credible. These results suggest that error is not solely a property of model behavior but a co-constructed outcome of generative plausibility and human interpretive shortcuts. Understanding AI epistemic failure therefore requires reframing evaluation as a relational interpretive process, where the boundary between system failure and human miscalibration becomes porous. The study provides implications for assessment, digital literacy, and the design of trustworthy human AI .

TreeNet A Light Weight Model for Low Bitrate Image Compression

Authors: Mahadev Prasad Panda, Purnachandra Rao Makkena, Srivatsa Prativadibhayankaram, Siegfried Fößel, André Kaup

2025-12-18

http://arxiv.org/abs/2512.16743v1

Reducing computational complexity remains a critical challenge for the widespread adoption of learning-based image techniques. In this work, we propose TreeNet, a novel low-complexity image model that leverages a binary tree-structured encoder-r architecture to achieve efficient representation and reconstruction. We employ attentional feature fusion mechanism to effectively integrate features from multiple branches. We evaluate TreeNet on three widely used benchmark datasets and compare its performance against competing methods including JPEG AI, a recent standard in learning-based image . At low bitrates, TreeNet achieves an average improvement of 4.83% in BD-rate over JPEG AI, while reducing model complexity by 87.82%. Furthermore, we conduct extensive ablation studies to investigate the influence of various latent representations within TreeNet, offering deeper insights into the factors contributing to reconstruction.

Yuan-TecSwin A text conditioned Diffusion model with Swin-transformer blocks

Authors: Shaohua Wu, Tong Yu, Shenling Wang, Xudong Zhao

2025-12-18

http://arxiv.org/abs/2512.16586v1

Diffusion models have shown remarkable capacity in image synthesis based on their U-shaped architecture and convolutional neural networks (CNN) as basic blocks. The locality of the convolution operation in CNN may limit the model's ability to understand long-range semantic information. To address this issue, we propose Yuan-TecSwin, a text-conditioned diffusion model with Swin- in this work. The Swin- blocks take the place of CNN blocks in the encoder and r, to improve the non-local modeling ability in feature extraction and image restoration. The text-image alignment is improved with a well-chosen text encoder, effective utilization of text embedding, and careful design in the incorporation of text condition. Using an adapted time step to search in different diffusion stages, inference performance is further improved by 10%. Yuan-TecSwin achieves the state-of-the-art FID score of 1.37 on ImageNet generation benchmark, without any additional models at different denoising stages. In a side-by-side comparison, we find it difficult for human interviewees to tell the model-generated images from the human-painted ones.

StageVAR Stage-Aware Acceleration for Visual Autoregressive Models

Authors: Senmao Li, Kai Wang, Salman Khan, Fahad Shahbaz Khan, Jian Yang, Yaxing Wang

2025-12-18

http://arxiv.org/abs/2512.16483v1

Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present StageVAR, a systematic study and stage-aware framework for VAR models. Our analysis shows that early steps are critical for pre semantic and structural consistency and should remain intact, while later steps mainly refine details and can be pruned or approximated for . Building on these insights, StageVAR introduces a plug-and-play strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed StageVAR achieves up to 3.4x speedup with only a 0.01 drop on GenEval and a 0.26 decrease on DPG, consistently outperforming existing baselines. These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.

Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems

Authors: En-Ming Huang, Li-Shang Lin, Chun-Yi Lee

2025-12-18

http://arxiv.org/abs/2512.16473v1

Large Language Models (s) have achieved impressive results across various tasks, yet their high computational demands pose deployment challenges, especially on consumer-grade hardware. Mixture of Experts (MoE) models provide an efficient solution through selective activation of parameter subsets, which reduces computation requirements. Despite this efficiency, state-of-the-art MoE models still require substantial memory beyond typical consumer GPU capacities. Traditional offloading methods that transfer model weights between CPU and GPU introduce latency, limiting inference performance. This paper presents a novel CPU-GPU collaborative inference framework that incorporates an expert caching mechanism on the GPU to reduce data transfer requirements and enable faster inference through hits. Computations are offloaded to CPU for efficient miss handling, which benefits from CPU multithreading optimizations. The evaluations of our framework demonstrate performance improvements and highlight the potential of CPU-GPU collaboration to maximize hardware utilization for single-request inference scenarios on consumer-grade systems. The implementation of our framework is available at https://github.com/elsa-lab/MoE-CPU-GPU-Collaborative-Inference.

SoK Reviewing Two Decades of Security, Privacy, Accessibility, and Usability Studies on Internet of Things for Older Adults

Authors: Suleiman Saka, Sanchari Das

2025-12-18

http://arxiv.org/abs/2512.16394v1

The Internet of Things (IoT) has the potential to enhance older adults' independence and quality of life, but it also exposes them to security, privacy, accessibility, and usability (SPAU) risks. We conducted a systematic review of 44 peer-reviewed studies published between 2004 and 2024 using a five-phase screening pipeline. From each study, we extracted data on study design, IoT type, SPAU measures, and identified research gaps. We introduce the SPAU-IoT Framework, which comprises 27 criteria across four dimensions: security (e.g., resilience to cyber threats, secure authentication, encrypted , secure-by-default settings, and guardianship features), privacy (e.g., data minimization, explicit consent, and privacy-pre analytics), accessibility (e.g., compliance with ADA/WCAG standards and assistive-technology compatibility), and usability (e.g., guided interaction, integrated assistance, and progressive learning). Applying this framework revealed that more than 70% of studies implemented authentication and encryption mechanisms, whereas fewer than 50% addressed accessibility or usability concerns. We further developed a threat model that maps IoT assets, networks, and backend servers to exploit vectors such as phishing, caregiver exploitation, and weak-password attacks, explicitly accounting for age-related vulnerabilities including cognitive decline and sensory impairment. Our results expose a systemic lack of integrated SPAU approaches in existing IoT research and translate these gaps into actionable, standards-aligned design guidelines for IoT systems designed for older adults.

Kascade A Practical Sparse Attention Method for Long-Context LLM Inference

Authors: Dhruv Deshmukh, Saurabh Goyal, Nipun Kwatra, Ramachandran Ramjee

2025-12-18

http://arxiv.org/abs/2512.16391v1

Attention is the dominant source of latency during long-context inference, an increasingly popular workload with reasoning models and RAG. We propose Kascade, a training-free attention method that leverages known observations such as 1) post-softmax attention is intrinsically , and 2) the identity of high-weight keys is stable across nearby layers. Kascade computes exact Top-k indices in a small set of anchor layers, then reuses those indices in intermediate reuse layers. The anchor layers are selected algorithmically, via a dynamic-programming objective that maximizes cross-layer similarity over a development set, allowing easy deployment across models. The method incorporates efficient implementation constraints (e.g. tile-level operations), across both and attention. The Top-k selection and reuse in Kascade is head-aware and we show in our experiments that this is critical for high accuracy. Kascade achieves up to 4.1x speedup in attention and 2.2x speedup in attention over FlashAttention-3 baseline on H100 GPUs while closely matching dense attention accuracy on long-context benchmarks such as LongBench and AIME-24.

Authors: Tao Hu, Weiyu Zhou, Yanjie Tu, Peng Wu, Wei Dong, Qingsen Yan, Yanning Zhang

2025-12-18

http://arxiv.org/abs/2512.16357v1

Pre-trained Latent Diffusion Models (LDMs) have recently shown strong perceptual priors for low-level vision tasks, making them a promising direction for multi-exposure High Dynamic Range (HDR) reconstruction. However, directly applying LDMs to HDR remains challenging due to: (1) limited dynamic-range representation caused by 8-bit latent , (2) high inference cost from multi-step denoising, and (3) content hallucination inherent to generative nature. To address these challenges, we introduce GMODiff, a gain map-driven one-step diffusion framework for multi-exposure HDR reconstruction. Instead of reconstructing full HDR content, we reformulate HDR reconstruction as a conditionally guided Gain Map (GM) estimation task, where the GM encodes the extended dynamic range while retaining the same bit depth as LDR images. We initialize the denoising process from an informative regression-based estimate rather than pure noise, enabling the model to generate high-quality GMs in a single denoising step. Furthermore, recognizing that regression-based models excel in content fidelity while LDMs favor perceptual quality, we leverage regression priors to guide both the denoising process and latent of the LDM, suppressing hallucinations while pre structural accuracy. Extensive experiments demonstrate that our GMODiff performs favorably against several state-of-the-art methods and is 100 faster than previous LDM-based methods.

Feature-Selective Representation Misdirection for Machine Unlearning

Authors: Taozhao Chen, Linghan Huang, Kim-Kwang Raymond Choo, Huaming Chen

2025-12-18

http://arxiv.org/abs/2512.16297v1

As large language models (s) are increasingly adopted in safety-critical and regulated sectors, the retention of sensitive or prohibited knowledge introduces escalating risks, ranging from privacy leakage to regulatory non-compliance to to potential misuse, and so on. Recent studies suggest that machine unlearning can help ensure deployed models comply with evolving legal, safety, and governance requirements. However, current unlearning techniques assume clean separation between forget and retain datasets, which is challenging in operational settings characterized by highly entangled distributions. In such scenarios, perturbation-based methods often degrade general model utility or fail to ensure safety. To address this, we propose Selective Representation Misdirection for Unlearning (SRMU), a novel principled activation-editing framework that enforces feature-aware and directionally controlled perturbations. Unlike indiscriminate model weights perturbations, SRMU employs a structured misdirection vector with an activation importance map. The goal is to allow SRMU selectively suppresses harmful representations while pre the utility on benign ones. Experiments are conducted on the widely used WMDP benchmark across low- and high-entanglement configurations. Empirical results reveal that SRMU delivers state-of-the-art unlearning performance with minimal utility losses, and remains effective under 20-30\% where existing baselines collapse. SRMU provides a robust foundation for safety-driven model governance, privacy compliance, and controlled knowledge removal in the emerging -based applications. We release the replication package at https://figshare.com/s/d5931192a8824de26aff.

Ein Typenrad auf der Überholspur Die Kult-Schreibmaschine "Erika" trifft KI

Authors: Karola Köpferl, Albrecht Kurze

2025-12-18

http://arxiv.org/abs/2512.16293v1

In the 15th century, printing revolutionized the dissemination of information. Innovations such as typewriters and computers have increased the speed and volume of information flows over time. More recent developments in large language models such as ChatGPT enable text to be generated in a matter of seconds. However, many people do not understand how this works and what the long-term implications are. That is why we have "hacked" an old typewriter so that users can interact with an chatbot, which over 1,200 participants have now been able to experience. It helps to understand the possibilities and limitations of AI. It gives us researchers insights into participants' concepts of AI as well as their expectations and concerns. It raises questions about these technological developments and stimulates discussions about the social impact of the intensification and of information and flows.

CKA-Guided Modular Quantization Beyond Bit-Width to Algorithmic Diversity

Authors: Jinhao Zhang, Yunquan Zhang, Daning Chen

2025-12-18

http://arxiv.org/abs/2512.16282v1

Current mainstream post-training methods for large language models typically apply a uniform strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CKA Guided Modular Quantization, a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous . Our method independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal strategy per layer. The individually optimized strategies are then integrated to construct a hybrid d model. Experiments demonstrate that our approach consistently outperforms both uniform baselines and state-of-the-art mixed-precision methods across mainstream s including LLaMA and Qwen ,in terms of perplexity (PPL) and downstream task performance.

Fast Collaborative Inference via Distributed Speculative Decoding

Authors: Ce Zheng, Ke Zhang, Sun Chen, Wenqi Zhang, Qiong Liu, Angesom Ataklity Tesfay

2025-12-18

http://arxiv.org/abs/2512.16273v1

Speculative accelerates large language model () inference by allowing a small draft model to predict multiple future tokens for verification by a larger target model. In AI-native radio access networks (AI-RAN), this enables device-edge collaborative inference but introduces significant uplink overhead, as existing distributed speculative schemes transmit full vocabulary logits at every step. We propose a sparsify-then-sample strategy, Truncated Sparse Logits Transmission (TSLT), which transmits only the logits and indices of a truncated candidate set. We provide theoretical guarantees showing that the acceptance rate is preserved under TSLT. TSLT is further extended to multi-candidate case, where multiple draft candidates per step increase acceptance probability. Experiments show that TSLT significantly reduces uplink while maintaining end-to-end inference latency and model quality, demonstrating its effectiveness for scalable, -efficient distributed inference in future AI-RAN systems.

AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints

Authors: Aniruddha Roy, Jyoti Patel, Aman Chadha, Vinija Jain, Amitava Das

2025-12-18

http://arxiv.org/abs/2512.16245v1

Merging large language models (s) is a practical way to compose capabilities from multiple fine-tuned checkpoints without retraining. Yet standard schemes (linear weight soups, task vectors, and Fisher-weighted averaging) can preserve loss while quietly destroying alignment. We argue that merging is not a numerical trick but a geometry-constrained operation around an already-aligned anchor: fusion must be steered to respect safety geometry, not validated post hoc. We introduce AlignMerge, a geometry-aware merging framework that makes alignment an explicit invariant. In a local Fisher chart around an instruction-tuned base, we estimate an alignment subspace with projector P_A and optimize: L_AlignMerge = L_geo + lambda_align * L_align + lambda_bud * L_bud, where L_geo keeps the merge close to its experts in Fisher-Rao geometry, L_align penalizes motion along alignment-sensitive directions, and L_bud enforces a soft alignment budget. As the alignment functional we use the -invariant Alignment Quality Index (AQI), a latent-space criterion that captures how cleanly aligned and misaligned behaviors separate in representation space. Across five model families (LLaMA-3 8B, Mistral 7B, Qwen 2, Phi-3.5, Gemma 2), merging safety anchors with task experts, AlignMerge improves alignment metrics (AQI, toxicity, -judge alignment) while matching or exceeding the best expert on instruction-following, reasoning, and helpfulness. It also exhibits smaller alignment-subspace drift and fewer budget violations than Fisher soups, TIES, SafeMerge, and MergeAlign. These results make alignment-pre merging a first-class design goal and suggest a path to geometry-aware composition of future foundation models.

Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution

Authors: Yifeng Cai, Zhida An, Yuhan Meng, Houqian Liu, Pengli Wang, Yao Guo, Ding Li

2025-12-18

http://arxiv.org/abs/2512.16238v1

Future improvements in large language model () services increasingly hinge on access to high-value professional knowledge rather than more generic web data. However, the data providers of this knowledge face a skewed tradeoff between income and risk: they receive little share of downstream value yet retain copyright and privacy liability, making them reluctant to contribute their assets to services. Existing techniques do not offer a trustworthy and controllable way to use professional knowledge, because they keep providers in the dark and combine knowledge parameters with the underlying backbone. In this paper, we present PKUS, the Professional Knowledge Utilization System, which treats professional knowledge as a first-class, separable artifact. PKUS keeps the backbone model on GPUs and encodes each provider's contribution as a compact adapter that executes only inside an attested Trusted Execution Environment (TEE). A hardware-rooted lifecycle protocol, adapter , multi-provider aggregation, and split-execution scheduling together make this design practical at time. On SST-2, MNLI, and SQuAD with GPT-2 Large and Llama-3.2-1B, PKUS preserves model utility, matching the accuracy and F1 of full fine-tuning and plain LoRA, while achieving the lowest per-request latency with 8.1-11.9x speedup over CPU-only TEE inference and naive CPU-GPU co-execution.

LoPA Scaling dLLM Inference via Lookahead Parallel Decoding

Authors: Chenkai Xu, Yijie Jin, Jiajun Li, Yi Tu, Guoping Long, Dandan Tu, Tianqi Hou, Junchi Yan, Zhijie Deng

2025-12-18

http://arxiv.org/abs/2512.16229v1

Diffusion Large Language Models (ds) have demonstrated significant potential for high-speed inference. However, current confidence-driven strategies are constrained by limited parallelism, typically achieving only 1--3 tokens per forward pass (TPF). In this work, we identify that the degree of parallelism during d inference is highly sensitive to the Token Filling Order (TFO). Then, we introduce Lookahead PArallel Decoding LoPA, a training-free, plug-and-play algorithm, to identify a superior TFO and hence accelerate inference. LoPA concurrently explores distinct candidate TFOs via parallel branches, and selects the one with the highest potential for future parallelism based on branch confidence. We apply LoPA to the state-of-the-art D2F model and observe a substantial enhancement in efficiency. Notably, LoPA increases the TPF of D2F-Dream to 10.1 on the GSM8K while maintaining performance superior to the Dream baseline. Furthermore, to facilitate this unprecedented degree of parallelism, we develop a specialized multi-device inference system featuring Branch Parallelism (BP), which achieves a single-sample throughput of 1073.9 tokens per second under multi-GPU deployment. The code is available at https://github.com/zhijie-group/LoPA.

SegGraph Leveraging Graphs of SAM Segments for Few-Shot 3D Part Segmentation

Authors: Yueyang Hu, Haiyong Jiang, Haoxuan Song, Jun Xiao, Hao Pan

2025-12-18

http://arxiv.org/abs/2512.16143v1

This work presents a novel framework for few-shot 3D part segmentation. Recent advances have demonstrated the significant potential of 2D foundation models for low-shot 3D part segmentation. However, it is still an open problem that how to effectively aggregate 2D knowledge from foundation models to 3D. Existing methods either ignore geometric structures for 3D feature learning or neglects the high-quality grouping clues from SAM, leading to under-segmentation and inconsistent part labels. We devise a novel SAM segment graph-based propagation method, named SegGraph, to explicitly learn geometric features encoded within SAM's segmentation masks. Our method encodes geometric features by modeling mutual and adjacency between segments while pre intra-segment semantic consistency. We construct a segment graph, conceptually similar to an atlas, where nodes represent segments and edges capture their spatial relationships (/adjacency). Each node adaptively modulates 2D foundation model features, which are then propagated via a graph neural network to learn global geometric structures. To enforce intra-segment semantic consistency, we map segment features to 3D points with a novel view-direction-weighted fusion attenuating contributions from low-quality segments. Extensive experiments on PartNet-E demonstrate that our method outperforms all competing baselines by at least 6.9 percent mIoU. Further analysis reveals that SegGraph achieves particularly strong performance on small components and part boundaries, demonstrating its superior geometric understanding. The code is available at: https://github.com/YueyangHu2000/SegGraph.

LLM4Perf Large Language Models Are Effective Samplers for Multi-Objective Performance Modeling (Copy)

Authors: Xin Wang, Zhenhao Li, Zishuo Ding

2025-12-18

http://arxiv.org/abs/2512.16070v1

The performance of modern software systems is critically dependent on their complex configuration options. Building accurate performance models to navigate this vast space requires effective sampling strategies, yet existing methods often struggle with multi-objective optimization and cannot leverage semantic information from documentation. The recent success of Large Language Models (s) motivates the central question of this work: Can s serve as effective samplers for multi-objective performance modeling? To explore this, we present a comprehensive empirical study investigating the capabilities and characteristics of -driven sampling. We design and implement 4Perf, a feedback-based framework, and use it to systematically evaluate the -guided sampling process across four highly configurable, real-world systems. Our study reveals that the -guided approach outperforms traditional baselines in most cases. Quantitatively, 4Perf achieves the best performance in nearly 68.8% (77 out of 112) of all evaluation scenarios, demonstrating its superior effectiveness. We find this effectiveness stems from the 's dual capabilities of configuration space and feedback-driven strategy refinement. The effectiveness of this is further validated by the fact that it also improves the performance of the baseline methods in nearly 91.5% (410 out of 448) of cases. Furthermore, we show how the choices for each component and hyperparameters within 4Perf affect its effectiveness. Overall, this paper provides strong evidence for the effectiveness of s in performance engineering and offers concrete insights into the mechanisms that drive their success.

MultiPath Transfer Engine Breaking GPU and Host-Memory Bandwidth Bottlenecks in LLM Services

Authors: Lingfeng Tang, Daoping Zhang, Junjie Chen, Peihao Huang, Feng Jin, Chengguang Xu, Yuxin Chen, Feiqiang Sun, Guo Chen

2025-12-18

http://arxiv.org/abs/2512.16056v1

The limited bandwidth of PCIe has emerged as the critical bottleneck for large language model () performance, such as prefix fetching and model switching. Although intra-server multipath data transfer between GPU and host memory is theoretically possible, heterogeneous protocols such as PCIe and NVLink currently limit the bandwidth between host memory and GPUs to that of a single PICe link. This limitation resuals in underutilized intra-server bandwidth. To address this issue, we propose Multipath Memory Access (MMA), a scheme that, to the best of our knowledge, is the first to enalbe efficient multipath data transfer between GPU and host memory. MMA supports seamless deployment via dynamic library injection, enabling applications to benefit from MMA without requiring any code modification. In our testbed, MMA significantly improves the data transfer bandwidth between the GPU and memory, achieving a peak bandwidth of 245 GB/s-representing a 4.62x speedup compared to the natice single-path bandwidth. End-to-end evaluations demonstrate that MMA reduces the time-to-first-token (TTFT) for by 1.14x to 2.38x and decreases model-switching latency in v's sleep mode by 1.12x to 2.48x.

Enhancing Line Density Plots with Outlier Control and Bin-based Illumination

Authors: Yumeng Xue, Bin Chen, Patrick Paetzold, Yunhai Wang, Christophe Hurter, Oliver Deussen

2025-12-17

http://arxiv.org/abs/2512.16017v1

Density plots effectively summarize large numbers of points, which would otherwise lead to severe overplotting in, for example, a scatter plot. However, when applied to line-based datasets, such as trajectories or time series, density plots alone are insufficient, as they disrupt path continuity, obscuring smooth trends and rare anomalies. We propose a bin-based illumination model that decouples structure from density to enhance flow and reveal outliers while pre the original colormap. We introduce a bin-based outlierness metric to rank trajectories. Guided by this ranking, we construct a structural normal map and apply locally-adaptive lighting in the luminance channel to highlight chosen patterns -- from dominant trends to atypical paths -- with acceptable color distortion. Our interactive method enables analysts to prioritize main trends, focus on outliers, or strike a balance between the two. We demonstrate our method on several real-world datasets, showing it reveals details missed by simpler alternatives, achieves significantly lower CIEDE2000 color distortion than standard shading, and supports interactive updates for up to 10,000 lines.

Hierarchical Neural Surfaces for 3D Mesh Compression

Authors: Sai Karthikey Pentapati, Gregoire Phillips, Alan Bovik

2025-12-17

http://arxiv.org/abs/2512.15985v1

Implicit Neural Representations (INRs) have been demonstrated to achieve state-of-the-art of a broad range of modalities such as images, videos, 3D surfaces, and audio. Most studies have focused on building neural counterparts of traditional implicit representations of 3D geometries, such as signed distance functions. However, the triangle mesh-based representation of geometry remains the most widely used representation in the industry, while building INRs capable of generating them has been ly studied. In this paper, we present a method for building compact INRs of zero-genus 3D manifolds. Our method relies on creating a spherical parameterization of a given 3D mesh - mapping the surface of a mesh to that of a unit sphere - then constructing an INR that encodes the displacement vector field defined continuously on its surface that regenerates the original shape. The compactness of our representation can be attributed to its hierarchical structure, wherein it first recovers the coarse structure of the encoded surface before adding high-frequency details to it. Once the INR is computed, 3D meshes of arbitrary resolution/connectivity can be d from it. The can be performed in real time while achieving a state-of-the-art trade-off between reconstruction quality and the size of the compressed representations.

AIE4ML An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines

Authors: Dimitrios Danopoulos, Enrico Lupi, Chang Sun, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

2025-12-17

http://arxiv.org/abs/2512.15946v1

Efficient AI inference on AMD's Versal AI Engine (AIE) is challenging due to tightly coupled VLIW execution, explicit datapaths, and local memory management. Prior work focused on first-generation AIE kernel optimizations, without tackling full neural network execution across the 2D array. In this work, we present AIE4ML, the first comprehensive framework for converting AI models automatically into optimized firmware targeting the AIE-ML generation devices, also with forward compatibility for the newer AIE-MLv2 architecture. At the single-kernel level, we attain performance close to the architectural peak. At the graph and system levels, we provide a structured parallelization method that can scale across the 2D AIE-ML fabric and exploit its dedicated memory tiles to stay entirely on-chip throughout the model execution. As a demonstration, we designed a generalized and highly efficient linear-layer implementation with intrinsic support for fused bias addition and ReLU activation. Also, as our framework necessitates the generation of multi-layer implementations, our approach systematically derives deterministic, compact, and topology-optimized placements tailored to the physical 2D grid of the device through a novel graph placement and search algorithm. Finally, the framework seamlessly accepts d models imported from high-level tools such as hls4ml or PyTorch while pre bit-exactness. In layer scaling benchmarks, we achieve up to 98.6% efficiency relative to the single-kernel baseline, utilizing 296 of 304 AIE tiles (97.4%) of the device with entirely on-chip data movement. With evaluations across real-world model topologies, we demonstrate that AIE4ML delivers GPU-class throughput under microsecond latency constraints, making it a practical companion for ultra-low-latency environments such as trigger systems in particle physics experiments.

SALVE Sparse Autoencoder-Latent Vector Editing for Mechanistic Control of Neural Networks

Authors: Vegard Flovik

2025-12-17

http://arxiv.org/abs/2512.15938v1

Deep neural networks achieve impressive performance but remain difficult to interpret and control. We present SALVE (Sparse Autoencoder-Latent Vector Editing), a unified "discover, validate, and control" framework that bridges mechanistic interpretability and model editing. Using an $\ell_1$ -regularized autoencoder, we learn a , model-native feature basis without supervision. We validate these features with Grad-FAM, a feature-level saliency mapping method that visually grounds latent features in input data. Leveraging the autoencoder's structure, we perform precise and permanent weight-space interventions, enabling continuous modulation of both class-defining and cross-class features. We further derive a critical suppression threshold, $α_{crit}$ , quantifying each class's reliance on its dominant feature, supporting fine-grained robustness diagnostics. Our approach is validated on both convolutional (ResNet-18) and -based (ViT-B/16) models, demonstrating consistent, interpretable control over their behavior. This work contributes a principled methodology for turning feature discovery into actionable model edits, advancing the development of transparent and controllable AI systems.

Dynamic Rebatching for Efficient Early-Exit Inference with DREX

Authors: Xuting Liu, Daniel Alexander, Siva Kesava Reddy Kakarla, Behnaz Arzani, Vincent Liu

2025-12-17

http://arxiv.org/abs/2512.15705v1

Early-Exit (EE) is a Large Language Model () architecture that accelerates inference by allowing easier tokens to be generated using only a subset of the model's layers. However, traditional batching frameworks are ill-suited for EE s, as not all requests in a batch may be ready to exit at the same time. Existing solutions either force a uniform decision on the batch, which overlooks EE opportunities, or degrade output quality by forcing premature exits. We propose Dynamic Rebatching, a solution where we dynamically reorganize the batch at each early-exit point. Requests that meet the exit criteria are immediately processed, while those that continue are held in a buffer, re-grouped into a new batch, and forwarded to deeper layers. We introduce DREX, an early-exit inference system that implements Dynamic Rebatching with two key optimizations: 1) a copy-free rebatching buffer that avoids physical data movement, and 2) an EE and SLA-aware scheduler that analytically predicts whether a given rebatching operation will be profitable. DREX also efficiently handles the missing from skipped layers using memory-efficient state-copying. Our evaluation shows that DREX improves throughput by 2-12% compared to baseline approaches while maintaining output quality. Crucially, DREX completely eliminates involuntary exits, providing a key guarantee for pre the output quality intended by the EE model.

Authors: Matin Mortaheb, Erciyes Karakaya, Sennur Ulukus

2025-12-17

http://arxiv.org/abs/2512.15691v1

Semantic aims to transmit information most relevant to a task rather than raw data, offering significant gains in efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent -based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image patches at adaptive resolutions using independently trained encoder-r pairs, with total bitrate matching the channel capacity. At the receiver, the patches are reconstructed and combined to preserve task-critical information. This flexible and goal-driven design enables efficient semantic in complex and bandwidth-constrained environments.

The longest known tails of ram-pressure stripped star-forming galaxies are caused by an ICM shock in Abell 1367

Authors: H. W. Edler, M. Hoeft, S. Bhagat, A. Basu, A. Drabent, K. Rajpurohit, M. Sun, F. de Gasperin, A. Botteon, M. Brüggen, A. Ignesti, I. D. Roberts, R. van Weeren

2025-12-17

http://arxiv.org/abs/2512.15660v1

The environment plays an important role in shaping the evolution of cluster galaxies through mechanisms such as ram pressure stripping (RPS), whose effect may be enhanced in merging clusters. We investigate a complex of three galaxies UGC 6697, CGCG 097-073 and CGCG 097-079, that are currently undergoing extreme RPS, as evident from their multi-wavelength-detected tails. The galaxies are members of the nearby ( $d=92$ Mpc) merging cluster Abell 1367 and are located in proximity to an intracluster medium (ICM) shock that is traced by X-ray observations and the presence of a radio relic. We analyze LOFAR and MeerKAT observations at frequencies of 54, 144, 817 and 1270 MHz to perform a detailed spectral analysis of the tails. We find that all three tails are significantly more extended than in previous radio studies, with lengths $\geq70$ kpc. For UGC 6697, we detected a tail of 300 kpc, making it the longest known RPS tail of a star-forming galaxy at any wavelength. The length and spectral variations of the tail cannot be explained purely by the spectral aging of stripped cosmic rays. We construct a model of the tail that includes and re- due to the encounter with the nearby ICM shock, which can plausibly account for the extreme RPS as well as the length and spectral variation of the tail. We further discover a radio plume at the leading edge of UGC 6697 that connects to a narrow filament. These sources exhibit extremely steep ( $α\approx-1.7$ ) and highly curved spectra. We speculate that this emission arises from cosmic rays re-energized by UGC 6697's rapid infall which propagate along magnetic filaments in the cluster center. Our findings represent direct evidence of a cluster merger shock impacting the evolution of member galaxies. Furthermore, we report the first tentative detection of particle at the leading edge of an infalling galaxy.

VTCBench Can Vision-Language Models Understand Long Context with Vision-Text Compression?

Authors: Hongbo Zhao, Meng Wang, Fei Zhu, Wenzhuo Liu, Bolin Ni, Fanhu Zeng, Gaofeng Meng, Zhaoxiang Zhang

2025-12-17

http://arxiv.org/abs/2512.15649v1

The computational and memory overheads associated with expanding the context window of s severely limit their scalability. A noteworthy solution is vision-text (VTC), exemplified by frameworks like DeepSeek-OCR and Glyph, which convert long texts into dense 2D visual representations, thereby achieving token ratios of 3x-20x. However, the impact of this high information density on the core long-context capabilities of vision-language models (VLMs) remains under-investigated. To address this gap, we introduce the first benchmark for VTC and systematically assess the performance of VLMs across three long-context understanding settings: VTC-Retrieval, which evaluates the model's ability to retrieve and aggregate information; VTC-Reasoning, which requires models to infer latent associations to locate facts with minimal lexical ; and VTC-Memory, which measures comprehensive question answering within long-term dialogue memory. Furthermore, we establish the VTCBench-Wild to simulate diverse input scenarios.We comprehensively evaluate leading open-source and proprietary models on our benchmarks. The results indicate that, despite being able to textual information (e.g., OCR) well, most VLMs exhibit a surprisingly poor long-context understanding ability with VTC-compressed information, failing to capture long associations or dependencies in the context.This study provides a deep understanding of VTC and serves as a foundation for designing more efficient and scalable VLMs.

IC-Effect Precise and Efficient Video Effects Editing via In-Context Learning

Authors: Yuanhang Li, Yiren Song, Junzhe Bai, Xinran Liang, Hu Yang, Libiao Jin, Qi Mao

2025-12-17

http://arxiv.org/abs/2512.15635v1

We propose \textbf{IC-Effect}, an instruction-guided, DiT-based framework for few-shot video VFX editing that synthesizes complex effects (\eg flames, particles and cartoon characters) while strictly pre spatial and temporal consistency. Video VFX editing is highly challenging because injected effects must blend seamlessly with the background, the background must remain entirely unchanged, and effect patterns must be learned efficiently from limited paired data. However, existing video editing models fail to satisfy these requirements. IC-Effect leverages the source video as clean contextual conditions, exploiting the contextual learning capability of DiT models to achieve precise background preservation and natural effect injection. A two-stage training strategy, consisting of general editing adaptation followed by effect-specific learning via Effect-LoRA, ensures strong instruction following and robust effect modeling. To further improve efficiency, we introduce spatiotemporal tokenization, enabling high fidelity with substantially reduced computation. We also release a paired VFX editing dataset spanning $15$ high-quality visual styles. Extensive experiments show that IC-Effect delivers high-quality, controllable, and temporally consistent VFX editing, opening new possibilities for video creation.

Note on bulk viscosity as an alternative to dark energy

Authors: P. P. Avelino, A. R. Gomes, D. A. Tamayo

2025-12-17

http://arxiv.org/abs/2512.15633v1

Bulk viscosity, which characterizes the irreversible dissipative resistance of a fluid to volume changes, has been proposed as a potential mechanism for explaining both early- and late-time accelerated expansion of the Universe. In this work, we investigate two distinct physical scenarios for the origin of bulk viscosity: (1) nonminimal interactions between two fluids, and (2) elastic collisions in an ideal gas. In both cases, we demonstrate that while the associated energy-momentum exchange can significantly influence fluid dynamics, overall energy-momentum conservation precludes such exchange from having any direct gravitational effect in the context of General Relativity. In case (1), we show that the standard bulk viscous energy-momentum tensor can be obtained for the two-fluid system only at the cost of the violation of all classical energy conditions: null, weak, dominant, and strong. In case (2), we consider a single fluid composed of point particles undergoing instantaneous, energy- and momentum-con collisions, and find that the proper pressure remains strictly non-negative, with the equation-of-state parameter confined to the interval $[0,1/3]$ . In both scenarios, achieving a sufficiently negative effective pressure to drive cosmic requires assumptions that compromise the physical viability of the model. Our results highlight some of the key physical challenges involved in modeling dark energy through bulk viscous effects.

Reducing Pilots in Channel Estimation With Predictive Foundation Models

Authors: Xingyu Zhou, Le Liang, Hao Ye, Jing Zhang, Chao-Kai Wen, Shi Jin

2025-12-17

http://arxiv.org/abs/2512.15562v1

Accurate channel state information (CSI) acquisition is essential for modern wireless systems, which becomes increasingly difficult under large antenna arrays, strict pilot overhead constraints, and diverse deployment environments. Existing artificial intelligence-based solutions often lack robustness and fail to generalize across scenarios. To address this limitation, this paper introduces a predictive-foundation-model-based channel estimation framework that enables accurate, low-overhead, and generalizable CSI acquisition. The proposed framework employs a predictive foundation model trained on large-scale cross-domain CSI data to extract universal channel representations and provide predictive priors with strong cross-scenario transferability. A pilot processing network based on a vision architecture is further designed to capture spatial, temporal, and frequency correlations from pilot observations. An efficient fusion mechanism integrates predictive priors with real-time measurements, enabling reliable CSI reconstruction even under or noisy conditions. Extensive evaluations across diverse configurations demonstrate that the proposed estimator significantly outperforms both classical and data-driven baselines in accuracy, robustness, and generalization capability.

CTkvr KV Cache Retrieval for Long-Context LLMs via Centroid then Token Indexing

Authors: Kuan Lu, Shuhang Lin, Sai Wu, Yichen Yao, Junhan Yang, Huan Li, Wei Chu, Xu Yinghui, Yuan Qi, Gang Chen

2025-12-17

http://arxiv.org/abs/2512.15550v1

Large language models (s) are increasingly applied in long-context scenarios such as multi-turn conversations. However, long contexts pose significant challenges for inference efficiency, including high memory overhead from Key-Value () and increased latency due to excessive memory accesses. Recent methods for dynamic selection struggle with trade-offs: block-level indexing degrades accuracy by retrieving irrelevant entries, while token-level indexing incurs high latency from inefficient retrieval mechanisms. In this paper, we propose CTR, a novel centroid-then-token retrieval scheme that addresses these limitations. CTR leverages a key observation: query vectors adjacent in position exhibit high similarity after Rotary Position Embedding (RoPE) and share most of their top-k entries. Based on this insight, CTR employs a two-stage retrieval strategy: lightweight centroids are precomputed during ing for centroid-grained indexing, followed by token-level refinement for precise retrieval. This approach balances retrieval efficiency and accuracy. To further enhance performance, we implement an optimized system for indexing construction and search using CPU-GPU co-execution. Experimentally, CTR achieves superior performance across multiple benchmarks with less than 1% accuracy degradation. Meanwhile, CTR delivers 3 times and 4 times throughput speedups on Llama-3-8B and Yi-9B at 96K context length across diverse GPU hardware.

Attention in Motion Secure Platooning via Transformer-based Misbehavior Detection

Authors: Konstantinos Kalogiannis, Ahmed Mohamed Hussain, Hexu Li, Panos Papadimitratos

2025-12-17

http://arxiv.org/abs/2512.15503v1

Vehicular platooning promises transformative improvements in transportation efficiency and safety through the coordination of multi-vehicle formations enabled by Vehicle-to-Everything (V2X) . However, the distributed nature of platoon coordination creates security vulnerabilities, allowing authenticated vehicles to inject falsified kinematic data, compromise operational stability, and pose a threat to passenger safety. Traditional misbehaviour detection approaches, which rely on plausibility checks and statistical methods, suffer from high False Positive (FP) rates and cannot capture the complex temporal dependencies inherent in multi-vehicle coordination dynamics. We present Attention In Motion (AIMformer), a -based framework specifically tailored for real-time misbehaviour detection in vehicular platoons with edge deployment capabilities. AIMformer leverages multi-head self-attention mechanisms to simultaneously capture intra-vehicle temporal dynamics and inter-vehicle spatial correlations. It incorporates global positional encoding with vehicle-specific temporal offsets to handle join/exit maneuvers. We propose a Precision-Focused (BCE) loss function that penalizes FPs to meet the requirements of safety-critical vehicular systems. Extensive evaluation across 4 platoon controllers, multiple attack vectors, and diverse mobility scenarios demonstrates superior performance ( $\geq$ 0.93) compared to state-of-the-art baseline architectures. A comprehensive deployment analysis utilizing TensorFlow Lite (TFLite), Open Neural Network Exchange (ONNX), and TensorRT achieves sub-millisecond inference latency, making it suitable for real-time operation on resource-constrained edge platforms. Hence, validating AIMformer is viable for both in-vehicle and roadside infrastructure deployment.

GenAI-enabled Residual Motion Estimation for Energy-Efficient Semantic Video Communication

Authors: Shavbo Salehi, Pedro Enrique Iturria-Rivera, Medhat Elsayed, Majid Bavand, Yigit Ozcan, Melike Erol-Kantarci

2025-12-17

http://arxiv.org/abs/2512.15481v1

Semantic addresses the limitations of the Shannon paradigm by focusing on transmitting meaning rather than exact representations, thereby reducing unnecessary resource consumption. This is particularly beneficial for video, which dominates network traffic and demands high bandwidth and power, making semantic approaches ideal for con resources while maintaining quality. In this paper, we propose a Predictability-aware and Entropy-adaptive Neural Motion Estimation (PENME) method to address challenges related to high latency, high bitrate, and power consumption in video transmission. PENME makes per-frame decisions to select a residual motion extraction model, convolutional neural network, vision , or optical flow, using a five-step policy based on motion strength, global motion consistency, peak sharpness, heterogeneity, and residual error. The residual motions are then transmitted to the receiver, where the frames are reconstructed via motion-compensated updates. Next, a selective diffusion-based refinement, the Latent Consistency Model (LCM-4), is applied on frames that trigger refinement due to low predictability or large residuals, while predictable frames skip refinement. PENME also allocates radio resource blocks with awareness of residual motion and channel state, reducing power consumption and bandwidth usage while maintaining high semantic similarity. Our simulation results on the Vimeo90K dataset demonstrate that the proposed PENME method handles various types of video, outperforming traditional , hybrid, and adaptive bitrate semantic techniques, achieving 40% lower latency, 90% less transmitted data, and 35% higher throughput. For semantic metrics, PENME improves PSNR by about 40%, increases MS-SSIM by roughly 19%, and reduces LPIPS by nearly 35%, compared with the baseline methods.

Randomized orthogonalization and Krylov subspace methods principles and algorithms

Authors: Jean-Guillaume de Damas, Laura Grigori, Igor Simunec, Edouard Timsit

2025-12-17

http://arxiv.org/abs/2512.15455v1

We present an overview of randomized orthogonalization techniques that construct a well-conditioned basis whose sketch is orthonormal. Randomized orthogonalization has recently emerged as a powerful paradigm for reducing the computational and cost of state-of-the-art orthogonalization procedures on parallel architectures, while pre, and in some cases improving, their numerical stability. This approach can be employed within Krylov subspace methods to mitigate the cost of orthogonalization, yielding a randomized Arnoldi relation. We review the main variants of the randomized Gram--Schmidt and Householder QR algorithms, and discuss their application to Krylov methods for the solution of large-scale linear algebra problems, such as linear systems of equations, eigenvalue problems, the evaluation of matrix functions, and matrix equations.

Three-Dimensional Radio Localization A Channel Charting-Based Approach

Authors: Phillip Stephan, Florian Euchner, Stephan ten Brink

2025-12-17

http://arxiv.org/abs/2512.15399v1

Channel charting creates a low-dimensional representation of the radio environment in a self-supervised manner using manifold learning. Pre relative spatial distances in the latent space, channel charting is well suited to support user localization. While prior work on channel charting has mainly focused on two-dimensional scenarios, real-world environments are inherently three-dimensional. In this work, we investigate two distinct three-dimensional indoor localization scenarios using simulated, but realistic ray tracing-based datasets: a factory hall with a three-dimensional spatial distribution of datapoints, and a multistory building where each floor exhibits a two-dimensional datapoint distribution. For the first scenario, we apply the concept of augmented channel charting, which combines classical localization and channel charting, to a three-dimensional setting. For the second scenario, we introduce multistory channel charting, a two-stage approach consisting of floor classification via clustering followed by the training of a dedicated expert neural network for channel charting on each individual floor, thereby enhancing the channel charting performance. In addition, we propose a novel feature engineering method designed to extract features from the beamspace channel state information that are suitable for localization.

Emotion Recognition in Signers

Authors: Kotaro Funakoshi, Yaoxiong Zhu

2025-12-17

http://arxiv.org/abs/2512.15376v1

Recognition of signers' emotions suffers from one theoretical challenge and one practical challenge, namely, the between grammatical and affective facial expressions and the scarcity of data for model training. This paper addresses these two challenges in a cross-lingual setting using our eJSL dataset, a new benchmark dataset for emotion recognition in Japanese Sign Language signers, and BOBSL, a large British Sign Language dataset with subtitles. In eJSL, two signers expressed 78 distinct utterances with each of seven different emotional states, resulting in 1,092 video clips. We empirically demonstrate that 1) textual emotion recognition in spoken language mitigates data scarcity in sign language, 2) temporal segment selection has a significant impact, and 3) incorporating hand motion enhances emotion recognition in signers. Finally we establish a stronger baseline than spoken language s.

Adversarial versification in portuguese as a jailbreak operator in LLMs

Authors: Joao Queiroz

2025-12-17

http://arxiv.org/abs/2512.15353v1

Recent evidence shows that the versification of prompts constitutes a highly effective adversarial mechanism against aligned s. The study 'Adversarial poetry as a universal single-turn jailbreak mechanism in large language models' demonstrates that instructions routinely refused in prose become executable when rewritten as verse, producing up to 18 x more safety failures in benchmarks derived from MLCommons AILuminate. Manually written poems reach approximately 62% ASR, and automated versions 43%, with some models surpassing 90% success in single-turn interactions. The effect is structural: systems trained with RLHF, constitutional AI, and hybrid pipelines exhibit consistent degradation under minimal semiotic formal variation. Versification displaces the prompt into ly supervised latent regions, revealing guardrails that are excessively dependent on surface patterns. This dissociation between apparent robustness and real vulnerability exposes deep limitations in current alignment regimes. The absence of evaluations in Portuguese, a language with high morphosyntactic complexity, a rich metric-prosodic tradition, and over 250 million speakers, constitutes a critical gap. Experimental protocols must parameterise scansion, metre, and prosodic variation to test vulnerabilities specific to Lusophone patterns, which are currently ignored.

LLMQ Efficient Lower-Precision Pretraining for Consumer GPUs

Authors: Erik Schultheis, Dan Alistarh

2025-12-17

http://arxiv.org/abs/2512.15306v1

We present Q, an end-to-end CUDA/C++ implementation for medium-sized language-model training, e.g. 3B to 32B parameters, on affordable, commodity GPUs. These devices are characterized by low memory availability and slow compared to datacentre-grade GPUs. Consequently, we showcase a range of optimizations that target these bottlenecks, including activation checkpointing, offloading, and copy-engine based collectives. Q is able to train or fine-tune a 7B model on a single 16GB mid-range gaming card, or a 32B model on a workstation equipped with 4 RTX 4090s. This is achieved while executing a standard 8-bit training pipeline, without additional algorithmic approximations, and maintaining FLOP utilization of around 50%. The efficiency of Q rivals that of production-scale systems on much more expensive cloud-grade GPUs.

Keep the Core Adversarial Priors for Significance-Preserving Brain MRI Segmentation

Authors: Feifei Zhang, Zhenhong Jia, Sensen Song, Fei Shi, Aoxue Chen, Dayong Ren

2025-12-17

http://arxiv.org/abs/2512.15811v1

Medical image segmentation is constrained by pathological annotations. Existing augmentation strategies, from conventional transforms to random masking for self-supervision, are feature-agnostic: they often corrupt critical diagnostic semantics or fail to prioritize essential features. We introduce "Keep the Core," a novel data-centric paradigm that uses adversarial priors to guide both augmentation and masking in a significance-pre manner. Our approach uses SAGE (Sparse Adversarial Gated Estimator), an offline module identifying minimal tokens whose micro-perturbation flips segmentation boundaries. SAGE forges the Token Importance Map $W$ by solving an adversarial optimization problem to maximally degrade performance, while an $\ell_1$ penalty encourages a compact set of sensitive tokens. The online KEEP (Key-region Enhancement \& Preservation) module uses $W$ for a two-pronged augmentation strategy: (1) Semantic-Pre Augmentation: High-importance tokens are augmented, but their original pixel values are strictly restored. (2) Guided-Masking Augmentation: Low-importance tokens are selectively masked for an $\text{MAE}$ -style reconstruction, forcing the model to learn robust representations from preserved critical features. "Keep the Core" is backbone-agnostic with no inference overhead. Extensive experiments show SAGE's structured priors and KEEP's region-selective mechanism are highly complementary, achieving state-of-the-art segmentation robustness and generalization on 2D medical datasets.

Defect Tolerance and Local Structural Response to 3d Transition-Metal Substitution in CsPbI3

Authors: Misbah Shaheen, Sheharyar Pervez

2025-12-17

http://arxiv.org/abs/2512.15280v1

We present a systematic first-principles study of substitutional 3d transition-metal (TM) defects in CsPbI3 using the spin-polarized GGA+U framework. TM incorporation is generally energetically favorable and induces lattice distortions that are strongly localized around the defect site, pre the overall structural integrity of the host. Analysis of defect formation energies and electronic structure shows that, with the exception of Sc and Ti, CsPbI3 exhibits a strong resistance to deep trap formation. Most TM substitutions instead introduce resonant states that hybridize with the band edges, consistent with the defect-tolerant nature of the material. While these states can modify the band gap, they do not generate isolated mid-gap traps. The observed distortions arise from strain-driven Van Vleck modes governed by ionic-radius mismatch, electronegativity differences, and TM-I orbital , with amplitudes that decay rapidly away from the defect. Spin-polarized calculations reveal significant TM-induced spin polarization on the ligands and, in some cases, on neighboring Pb atoms, reflecting variations in covalency and hybridization across the 3d series. Together, these results establish a unified picture in which local structural response, electronic hybridization, and spin polarization jointly control the stability and electronic impact of TM defects in CsPbI3 , identifying dopants that are electronically benign or detrimental.

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Authors: Huiyan Xue, Xuming Ran, Yaxin Li, Qi Xu, Enhui Li, Yi Xu, Qiang Zhang

2025-12-17

http://arxiv.org/abs/2512.15267v1

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high . We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while pre modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for continual learning.

Authors: Youmin Xu, Mengxi Guo, Shijie Zhao, Weiqi Li, Junlin Li, Li Zhang, Jian Zhang

2025-12-17

http://arxiv.org/abs/2512.15262v1

Generative face video coding (GFVC) is vital for modern applications like video conferencing, yet existing methods primarily focus on video motion while neglecting the significant bitrate contribution of audio. Despite the well-established correlation between audio and lip movements, this cross-modal coherence has not been systematically exploited for . To address this, we propose an Audio-Visual Cross-Modal Compression (AVCC) framework that jointly compresses audio and video streams. Our framework extracts motion information from video and tokenizes audio features, then aligns them through a unified audio-video diffusion process. This allows synchronized reconstruction of both modalities from a shared representation. In extremely low-rate scenarios, AVCC can even reconstruct one modality from the other. Experiments show that AVCC significantly outperforms the Versatile Video Coding (VVC) standard and state-of-the-art GFVC schemes in rate-distortion performance, paving the way for more efficient multimodal systems.

The Moralization Corpus Frame-Based Annotation and Analysis of Moralizing Speech Acts across Diverse Text Genres

Authors: Maria Becker, Mirko Sommer, Lars Tapken, Yi Wan Teh, Bruno Brocai

2025-12-17

http://arxiv.org/abs/2512.15248v1

Moralizations - arguments that invoke moral values to justify demands or positions - are a yet underexplored form of persuasive . We present the Moralization Corpus, a novel multi-genre dataset designed to analyze how moral values are strategically used in argumentative discourse. Moralizations are pragmatically complex and often implicit, posing significant challenges for both human annotators and NLP systems. We develop a frame-based annotation scheme that captures the constitutive elements of moralizations - moral values, demands, and discourse protagonists - and apply it to a diverse set of German texts, including political debates, news articles, and online discussions. The corpus enables fine-grained analysis of moralizing language across communicative formats and domains. We further evaluate several large language models (s) under varied prompting conditions for the task of moralization detection and moralization component extraction and compare it to human annotations in order to investigate the challenges of automatic and manual analysis of moralizations. Results show that detailed prompt instructions has a greater effect than few-shot or explanation-based prompting, and that moralization remains a highly subjective and context-sensitive task. We release all data, annotation guidelines, and code to foster future interdisciplinary research on moral discourse and moral reasoning in NLP.

Magnetised turbulent plasmas as high-energy particle accelerators

Authors: M. Lemoine

2025-12-17

http://arxiv.org/abs/2512.15239v1

This proceedings paper reports on the theoretical modelling of particle in magnetised turbulent plasmas. It briefly reviews some recent findings obtained from fully kinetic numerical simulations of large-amplitude, semi to fully relativistic turbulence. The paper then argues that these findings can be understood within the framework of a ``generalised Fermi'' picture of stochastic , which it summarises. The dominant contributions to appear to arise from particle interactions with sharp, dynamic bends of the magnetic field lines and regions of velocity . Interestingly, the rate is spatially inhomogeneous and its probability distribution follows a broken power law extending up to large values. This makes relativistic, large-amplitude turbulence an extreme particle accelerator. Some implications for particle transport and the shape of the particle energy spectrum in the presence of radiative losses and over long timescales are also discussed.

DEER Draft with Diffusion, Verify with Autoregressive Models

Authors: Zicong Cheng, Guo-Wei Yang, Jia Li, Zhijie Deng, Meng-Hao Guo, Shi-Min Hu

2025-12-17

http://arxiv.org/abs/2512.15176v1

Efficiency, as a critical practical challenge for -driven agentic and reasoning systems, is increasingly constrained by the inherent latency of autoregressive (AR) . Speculative mitigates this cost through a draft-verify scheme, yet existing approaches rely on AR draft models (a.k.a., drafters), which introduce two fundamental issues: (1) step-wise uncertainty accumulation leads to a progressive collapse of trust between the target model and the drafter, and (2) inherently sequential of AR drafters. Together, these factors cause limited speedups. In this paper, we show that a diffusion large language model (d) drafters can naturally overcome these issues through its fundamentally different probabilistic modeling and efficient parallel strategy. Building on this insight, we introduce DEER, an efficient speculative framework that drafts with diffusion and verifies with AR models. To enable high-quality drafting, DEER employs a two-stage training pipeline to align the d-based drafters with the target AR model, and further adopts single-step to generate long draft segments. Experiments show DEER reaches draft acceptance lengths of up to 32 tokens, far surpassing the 10 tokens achieved by EAGLE-3. Moreover, on HumanEval with Qwen3-30B-A3B, DEER attains a 5.54x speedup, while EAGLE-3 achieves only 2.41x. Code, model, demo, etc, will be available at https://czc726.github.io/DEER/

Beyond Majority Voting Towards Fine-grained and More Reliable Reward Signal for Test-Time Reinforcement Learning

Authors: Weiqin Wang, Yile Wang, Kehao Chen, Hui Huang

2025-12-17

http://arxiv.org/abs/2512.15146v2

Test-time reinforcement learning mitigates the reliance on annotated data by using majority voting results as pseudo-labels, emerging as a complementary direction to reinforcement learning with verifiable rewards (RLVR) for improving reasoning ability of large language models (s). However, this voting strategy often induces confirmation bias and suffers from rewards, limiting the overall performance. In this work, we propose subgroup-specific step-wise confidence-weighted pseudo-label estimation (SCOPE), a framework integrating model confidence and dynamic subgroup partitioning to address these issues. Specifically, SCOPE integrates the proposed step-wise confidence into pseudo label deduction, prioritizing high-quality reasoning paths over simple frequency count. Furthermore, it dynamically partitions the candidate outputs pool into independent subgroups by balancing reasoning quality against exploration diversity. By deriving local consensus via repeat sampling for each sub group, SCOPE provides diverse supervision targets to encourage broader exploration. We conduct experiments across various models and benchmarks, experimental results show that SCOPE consistently outperforms recent baselines. Notably, SCOPE achieving relative improvements of 13.1% on challenging AIME 2025 and 8.1% on AMC. The code is released at https://github.com/szu-tera/SCOPE.

Tracking spatial temporal details in ultrasound long video via wavelet analysis and memory bank

Authors: Chenxiao Zhang, Runshi Zhang, Junchen Wang

2025-12-17

http://arxiv.org/abs/2512.15066v1

Medical ultrasound videos are widely used for medical inspections, disease diagnosis and surgical planning. High-fidelity lesion area and target organ segmentation constitutes a key component of the computer-assisted surgery workflow. The low contrast levels and noisy backgrounds of ultrasound videos cause missegmentation of organ boundary, which may lead to small object losses and increase boundary segmentation errors. Object tracking in long videos also remains a significant research challenge. To overcome these challenges, we propose a memory bank-based wavelet filtering and fusion network, which adopts an encoder-r structure to effectively extract fine-grained detailed spatial features and integrate high-frequency (HF) information. Specifically, memory-based wavelet convolution is presented to simultaneously capture category, detailed information and utilize adjacent information in the encoder. Cascaded wavelet is used to fuse multiscale frequency-domain features and expand the receptive field within each convolutional layer. A long short-term memory bank using cross-attention and memory mechanisms is designed to track objects in long video. To fully utilize the boundary-sensitive HF details of feature maps, an HF-aware feature fusion module is designed via adaptive wavelet filters in the r. In extensive benchmark tests conducted on four ultrasound video datasets (two thyroid nodule, the thyroid gland, the heart datasets) compared with the state-of-the-art methods, our method demonstrates marked improvements in segmentation metrics. In particular, our method can more accurately segment small thyroid nodules, demonstrating its effectiveness for cases involving small ultrasound objects in long video. The code is available at https://github.com/XiAooZ/MWNet.