2025-09-19

Table of Contents

LNE-Blocking An Efficient Framework for Contamination Mitigation Evaluation on Large Language Models

Authors: Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu

2025-09-18

http://arxiv.org/abs/2509.15218v1

The problem of data contamination is now almost inevitable during the development of large language models (keys), with the training data commonly integrating those evaluation benchmarks even unintentionally. This problem subsequently makes it hard to benchmark keys fairly. Instead of constructing contamination-free datasets (quite hard), we propose a novel framework, \textbf{LNE-Blocking}, to restore model performance prior to contamination on potentially leaked datasets. Our framework consists of two components: contamination detection and disruption operation. For the prompt, the framework first uses the contamination detection method, \textbf{LNE}, to assess the extent of contamination in the model. Based on this, it adjusts the intensity of the disruption operation, \textbf{Blocking}, to elicit non-memorized responses from the model. Our framework is the first to efficiently restore the model's greedy key performance. This comes with a strong performance on multiple datasets with potential leakage risks, and it consistently achieves stable recovery results across different models and varying levels of data contamination. We release the code at https://github.com/RuijieH/LNE-Blocking to facilitate research.

Beyond Surface Alignment Rebuilding LLMs Safety Mechanism via Probabilistically Ablating Refusal Direction

Authors: Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu

2025-09-18

http://arxiv.org/abs/2509.15202v1

Jailbreak attacks pose persistent threats to large language models (keys). Current safety alignment methods have attempted to address these issues, but they experience two significant limitations: insufficient safety alignment depth and unrobust internal defense mechanisms. These limitations make them vulnerable to adversarial attacks such as keying and refusal direction manipulation. We introduce DeepRefusal, a robust safety alignment framework that overcomes these issues. DeepRefusal forces the model to dynamically rebuild its refusal mechanisms from jailbreak states. This is achieved by probabilistically ablating the refusal direction across layers and token depths during fine-tuning. Our method not only defends against keying and refusal direction attacks but also demonstrates strong resilience against other unseen jailbreak strategies. Extensive evaluations on four open-source key families and six representative attacks show that DeepRefusal reduces attack success rates by approximately 95%, while maintaining model capabilities with minimal performance degradation.

MaRVIn A Cross-Layer Mixed-Precision RISC-V Framework for DNN Inference, from ISA Extension to Hardware Acceleration

Authors: Giorgos Armeniakos, Alexis Maras, Sotirios Xydis, Dimitrios Soudris

2025-09-18

http://arxiv.org/abs/2509.15187v1

The evolution of key and mixed-precision techniques has unlocked new possibilities for enhancing the speed and energy efficiency of NNs. Several recent studies indicate that adapting precision levels across different parameters can maintain accuracy comparable to full-precision models while significantly reducing computational demands. However, existing embedded microprocessors lack sufficient architectural support for efficiently executing mixed-precision NNs, both in terms of ISA extensions and hardware design, resulting in inefficiencies such as excessive data packing/unpacking and underutilized arithmetic units. In this work, we propose novel ISA extensions and a micro-architecture implementation specifically designed to optimize mixed-precision execution, enabling energy-efficient deep learning inference on RISC-V architectures. We introduce MaRVIn, a cross-layer hardware-software co-design framework that enhances power efficiency and performance through a combination of hardware improvements, mixed-precision key, ISA-level optimizations, and cycle-accurate emulation. At the hardware level, we enhance the ALU with configurable mixed-precision arithmetic (2, 4, 8 bits) for weights/activations and employ multi-pumping to reduce execution latency while implementing soft SIMD for efficient 2-bit ops. At the software level, we integrate a key-aware fine-tuning method to optimize model key and a greedy-based DSE approach to efficiently search for Pareto-optimal mixed-keyd models. Additionally, we incorporate voltage scaling to boost the power efficiency of our system. Our experimental evaluation over widely used DNNs and datasets, such as CIFAR10 and ImageNet, demonstrates that our framework can achieve, on average, 17.6x speedup for less than 1% accuracy loss and outperforms the ISA-agnostic state-of-the-art RISC-V cores, delivering up to 1.8 TOPs/W.

A1 Asynchronous Test-Time Scaling via Conformal Prediction

Authors: Jing Xiong, Qiujiang Chen, Fanghua Ye, Zhongwei Wan, Chuanyang Zheng, Chenyang Zhao, Hui Shen, Alexander Hanbo Li, Chaofan Tao, Haochen Tan, Haoli Bai, Lifeng Shang, Lingpeng Kong, Ngai Wong

2025-09-18

http://arxiv.org/abs/2509.15148v1

Large language models (keys) benefit from test-time scaling, but existing methods face significant challenges, including severe synchronization overhead, memory bottlenecks, and latency, especially during speculative key with long reasoning chains. We introduce A1 (Asynchronous Test-Time Scaling), a statistically guaranteed adaptive inference framework that addresses these challenges. A1 refines arithmetic intensity to identify synchronization as the dominant bottleneck, proposes an online calibration strategy to enable asynchronous inference, and designs a three-stage rejection sampling pipeline that supports both sequential and parallel scaling. Through experiments on the MATH, AMC23, AIME24, and AIME25 datasets, across various draft-target model families, we demonstrate that A1 achieves a remarkable 56.7x speedup in test-time scaling and a 4.14x improvement in throughput, all while maintaining accurate rejection-rate control, reducing latency and memory overhead, and no accuracy loss compared to using target model scaling alone. These results position A1 as an efficient and principled solution for scalable key inference. We have released the code at https://github.com/menik1126/asynchronous-test-time-scaling.

Adaptive LoRA Experts Allocation and Selection for Federated Fine-Tuning

Authors: Lei Wang, Jieming Bian, Letian Zhang, Jie Xu

2025-09-18

http://arxiv.org/abs/2509.15087v1

Large Language Models (keys) have demonstrated impressive capabilities across various tasks, but fine-tuning them for domain-specific applications often requires substantial domain-specific data that may be distributed across multiple organizations. Federated Learning (FL) offers a privacy-prekey solution, but faces challenges with computational constraints when applied to keys. Low-Rank Adaptation (LoRA) has emerged as a parameter-efficient fine-tuning approach, though a single LoRA module often struggles with heterogeneous data across diverse domains. This paper addresses two critical challenges in federated LoRA fine-tuning: 1. determining the optimal number and allocation of LoRA experts across heterogeneous clients, and 2. enabling clients to selectively utilize these experts based on their specific data characteristics. We propose FedLEASE (Federated adaptive LoRA Expert Allocation and SElection), a novel framework that adaptively clusters clients based on representation similarity to allocate and train domain-specific LoRA experts. It also introduces an adaptive top- Mixture-of-Experts mechanism that allows each client to select the optimal number of utilized experts. Our extensive experiments on diverse benchmark datasets demonstrate that FedLEASE significantly outperforms existing federated fine-tuning approaches in heterogeneous client settings while maintaining key efficiency.

Communication Efficient Split Learning of ViTs with Attention-based Double Compression

Authors: Federico Alvetreti, Jary Pomponi, Paolo Di Lorenzo, Simone Scardapane

2025-09-18

http://arxiv.org/abs/2509.15058v1

This paper proposes a novel key-efficient Split Learning (SL) framework, named Attention-based Double Compression (ADC), which reduces the key overhead required for transmitting intermediate Vision Transformers activations during the SL training process. ADC incorporates two parallel key strategies. The first one merges samples' activations that are similar, based on the average attention score calculated in the last client layer; this strategy is class-agnostic, meaning that it can also merge samples having different classes, without losing generalization ability nor decreasing final results. The second strategy follows the first and discards the least meaningful tokens, further reducing the key cost. Combining these strategies not only allows for sending less during the forward pass, but also the gradients are naturally compressed, allowing the whole model to be trained without additional tuning or approximations of the gradients. Simulation results demonstrate that Attention-based Double Compression outperforms state-of-the-art SL frameworks by significantly reducing key overheads while maintaining high accuracy.

Value-Guided KV Compression for LLMs via Approximated CUR Decomposition

Authors: Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty

2025-09-18

http://arxiv.org/abs/2509.15038v1

Key-value (key) key key has emerged as a critical technique for reducing the memory and latency overhead of autoregressive language models during inference. Prior approaches predominantly rely on query-key attention scores to rank and evict keyd tokens, assuming that attention intensity correlates with semantic importance. However, this heuristic overlooks the contribution of value vectors, which directly influence the attention output. In this paper, we propose CurDkey, a novel, value-centric key key method that selects keys and values based on leverage scores computed from CUR matrix decomposition. Our approach approximates the dominant subspace of the attention output , ensuring that the retained tokens best preserve the model's predictive behavior. Theoretically, we show that attention score approximation does not guarantee output preservation, and demonstrate that CUR-based selection minimizes end-to-end attention reconstruction loss. Empirically, CurDkey achieves up to 9.6% higher accuracy than state-of-the-art methods like Snapkey and Chunkkey under aggressive key budgets on LLaMA and Mistral, while maintaining compatibility with FlashAttention and Grouped Query Attention. In addition to improved accuracy, CurDkey reduces generation latency by up to 40% at high key, offering a practical speed-accuracy tradeoff.

FAWN A MultiEncoder Fusion-Attention Wave Network for Integrated Sensing and Communication Indoor Scene Inference

Authors: Carlos Barroso-Fernández, Alejandro Calvillo-Fernandez, Antonio de la Oliva, Carlos J. Bernardos

2025-09-18

http://arxiv.org/abs/2509.14968v1

The upcoming generations of wireless technologies promise an era where everything is interconnected and intelligent. As the need for intelligence grows, networks must learn to better understand the physical world. However, deploying dedicated hardware to perceive the environment is not always feasible, mainly due to costs and/or complexity. Integrated Sensing and Communication (ISAC) has made a step forward in addressing this challenge. Within ISAC, passive sensing emerges as a cost-effective solution that reuses wireless keys to sense the environment, without interfering with existing keys. Nevertheless, the majority of current solutions are limited to one technology (mostly Wi-Fi or 5G), constraining the maximum accuracy reachable. As different technologies work with different spectrums, we see a necessity in integrating more than one technology to augment the coverage area. Hence, we take the advantage of ISAC passive sensing, to present FAWN, a MultiEncoder Fusion-Attention Wave Network for ISAC indoor scene inference. FAWN is based on the original keys architecture, to fuse information from Wi-Fi and 5G, making the network capable of understanding the physical world without interfering with the current key. To test our solution, we have built a prototype and integrated it in a real scenario. Results show errors below 0.6 m around 84% of times.

Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems

Authors: Diego Gosmar, Deborah A. Dahl

2025-09-18

http://arxiv.org/abs/2509.14956v1

This paper proposes a novel architectural framework aimed at enhancing security and reliability in multi-agent systems (MAS). A central component of this framework is a network of Sentinel Agents, functioning as a distributed security layer that integrates techniques such as semantic analysis via large language models (keys), behavioral analytics, retrieval-augmented verification, and cross-agent anomaly detection. Such agents can potentially oversee inter-agent keys, identify potential threats, enforce privacy and access controls, and maintain comprehensive audit records. Complementary to the idea of Sentinel Agents is the use of a Coordinator Agent. The Coordinator Agent supervises policy implementation, and manages agent participation. In addition, the Coordinator also ingests alerts from Sentinel Agents. Based on these alerts, it can adapt policies, isolate or quarantine misbehaving agents, and contain threats to maintain the integrity of the MAS ecosystem. This dual-layered security approach, combining the continuous monitoring of Sentinel Agents with the governance functions of Coordinator Agents, supports dynamic and adaptive defense mechanisms against a range of threats, including prompt injection, collusive agent behavior, hallucinations generated by keys, privacy breaches, and coordinated multi-agent attacks. In addition to the architectural design, we present a simulation study where 162 synthetic attacks of different families (prompt injection, hallucination, and data exfiltration) were injected into a multi-agent conversational environment. The Sentinel Agents successfully detected the attack attempts, confirming the practical feasibility of the proposed monitoring approach. The framework also offers enhanced system observability, supports regulatory compliance, and enables policy evolution over time.

A Comparative Analysis of Transformer Models in Social Bot Detection

Authors: Rohan Veit, Michael Lones

2025-09-18

http://arxiv.org/abs/2509.14936v1

Social media has become a key medium of key in today's society. This realisation has led to many parties employing artificial users (or bots) to mislead others into believing untruths or acting in a beneficial manner to such parties. Sophisticated text generation tools, such as large language models, have further exacerbated this issue. This paper aims to compare the effectiveness of bot detection models based on encoder and keyr keys. Pipelines are developed to evaluate the performance of these classifiers, revealing that encoder-based classifiers demonstrate greater accuracy and robustness. However, keyr-based models showed greater adaptability through task-specific alignment, suggesting more potential for generalisation across different use cases in addition to superior observa. These findings contribute to the ongoing effort to prevent digital environments being manipulated while protecting the integrity of online discussion.

Leveraging Reinforcement Learning, Genetic Algorithms and Transformers for background determination in particle physics

Authors: Guillermo Hijano Mendizabal, Davide Lancierini, Alex Marshall, Andrea Mauri, Patrick Haworth Owen, Mitesh Patel, Konstantinos Petridis, Shah Rukh Qasim, Nicola Serra, William Sutcliffe, Hanae Tilquin

2025-09-18

http://arxiv.org/abs/2509.14894v1

Experimental studies of beauty hadron decays face significant challenges due to a wide range of backgrounds arising from the numerous possible decay channels with similar final states. For a particular signal decay, the process for ascertaining the most relevant background processes necessitates a detailed analysis of final state particles, potential misidentifications, and kinematic keys, which, due to computational limitations, is restricted to the simulation of only the most relevant backgrounds. Moreover, this process typically relies on the physicist's intuition and expertise, as no systematic method exists. This paper has two primary goals. First, from a particle physics perspective, we present a novel approach that utilises Reinforcement Learning (RL) to overcome the aforementioned challenges by systematically determining the critical backgrounds affecting beauty hadron decay measurements. While beauty hadron physics serves as the case study in this work, the proposed strategy is broadly adaptable to other types of particle physics measurements. Second, from a Machine Learning perspective, we introduce a novel algorithm which exploits the synergy between RL and Genetic Algorithms (GAs) for environments with highly key rewards and a large trajectory space. This strategy leverages GAs to efficiently explore the trajectory space and identify successful trajectories, which are used to guide the RL agent's training. Our method also incorporates a key architecture for the RL agent to handle token sequences representing decays.

Llama-Mimi Speech Language Models with Interleaved Semantic and Acoustic Tokens

Authors: Issa Sugiura, Shuhei Kurita, Yusuke Oda, Ryuichiro Higashinaka

2025-09-18

http://arxiv.org/abs/2509.14882v1

We propose Llama-Mimi, a speech language model that uses a unified tokenizer and a single Transformer keyr to jointly model sequences of interleaved semantic and acoustic tokens. Comprehensive evaluation shows that Llama-Mimi achieves state-of-the-art performance in acoustic consistency and possesses the ability to preserve speaker identity. Our analysis further demonstrates that increasing the number of keyrs improves acoustic fidelity but degrades linguistic performance, highlighting the inherent challenge of maintaining long-term coherence. We additionally introduce an key-as-a-Judge-based evaluation to assess the spoken content quality of generated outputs. Our models, code, and speech samples are publicly available.

From Hype to Insight Rethinking Large Language Model Integration in Visual Speech Recognition

Authors: Rishabh Jain, Naomi Harte

2025-09-18

http://arxiv.org/abs/2509.14880v1

Advances in self-supervised encoders have improved Visual Speech Recognition (VSR). Recent approaches integrating these encoders with key keyrs improves transcription accuracy; however, it remains unclear whether these gains stem from visual understanding or stronger language modeling. In this work, we systematically evaluate key keyrs by freezing or selectively updating the visual encoder, scaling keyr size, comparing adaptation strategies and architectures, and varying training data across LRS2, LRS3, and their combination. Evaluation on LRS2, LRS3, and WildVSR shows that scaling and adaptation yield limited improvements, while combining datasets enhances generalization. Semantic analysis reveals that gains arise primarily from lexical rather than semantic processing. Our Llama-2-13B model trained on the combined set achieves 24.7\% WER on LRS3 and 47.0\% on WildVSR, establishing SOTA among models trained without additional supervision. Our findings indicate key keyrs refine contextual reasoning rather than visual features, emphasizing the need for stronger visual encoders to drive meaningful progress.

Studying SNR-MC interactions as galactic PeVatrons in the era of CTAO and ASTRI Mini-Array

Authors: Alan Sunny, Martina Cardillo, Antonio Tutone

2025-09-18

http://arxiv.org/abs/2509.14867v1

Supernova remnants (SNRs) are widely recognized as key accelerators of Galactic cosmic rays (CRs), supported by the detection of the characteristic pion bump in the gamma-ray spectra of several SNRs. However, the recent observation of ultra-high-energy (UHE, greater than 100 TeV) gamma-rays by LHAASO from sources such as W51 region challenges standard models, which predict CR key up to PeV energies only during the early (~ 100 year) phase of SNR evolution. Given the older age of known SNRs, alternative mechanisms - such as the interaction of runaway CRs with nearby molecular clouds (MCs) - have been proposed to explain the persistent UHE emission. In this study, we focus on the W51 complex, particularly the W51C-B region, as a promising site for investigating SNR-MC interactions. Simulated observations with the CTAO and the ASTRI Mini-Array are presented to demonstrate their crucial role in bridging the energy gap between Fermi-LAT and LHAASO, especially in the 0.3-100 TeV range. Their improved angular resolution will also help disentangle emission components from the interaction zone and nearby sources. Our theoretical modelling suggests that accelerated particles at the shock can account for the radio and GeV data, while UHE emission could be best explained by the combined contribution from both key and adiabatic key of cloud material at the SNR-MC interface.

Hint hierarchical inter-frame correlation for one-shot point cloud sequence compression

Authors: Yuchen Gao, Qi Zhang

2025-09-18

http://arxiv.org/abs/2509.14859v1

Deep learning has demonstrated strong capability in compressing point clouds. Within this area, entropy modeling for lossless key is widely investigated. However, most methods rely solely on parent orsibling contexts and level-wise autoregression, which suffers from key latency on the order of 10 to 100 seconds. We propose HINT, a method that integrates temporal and spatial correlation for sequential point cloud key. Specifically, it first uses a two stage temporal feature extraction: (i) a parent-level existence map and (ii) a child-level neighborhood lookup in the previous frame. These cues are fused with the spatial features via elementwise addition and encoded with a group-wise strategy. Experimental results show that HINT achieves encoding and key time at 105 ms and 140 ms, respectively, equivalent to 49.6x and 21.6x key in comparison with G-PCC, while achieving up to bit rate reduction of 43.6%, in addition, consistently outperforming over the strong spatial only baseline (RENO).

MELA-TTS Joint transformer-diffusion model with representation alignment for speech synthesis

Authors: Keyu An, Zhiyu Zhang, Changfeng Gao, Yabin Li, Zhendong Peng, Haoxu Wang, Zhihao Du, Han Zhao, Zhifu Gao, Xiangang Li

2025-09-18

http://arxiv.org/abs/2509.14784v1

This work introduces MELA-TTS, a novel joint key-diffusion framework for end-to-end text-to-speech synthesis. By autoregressively generating continuous mel-spectrogram frames from linguistic and speaker conditions, our architecture eliminates the need for speech tokenization and multi-stage processing pipelines. To address the inherent difficulties of modeling continuous features, we propose a representation alignment module that aligns output representations of the key keyr with semantic embeddings from a pretrained ASR encoder during training. This mechanism not only speeds up training convergence, but also enhances cross-modal coherence between the textual and acoustic domains. Comprehensive experiments demonstrate that MELA-TTS achieves state-of-the-art performance across multiple evaluation metrics while maintaining robust zero-shot voice cloning capabilities, in both offline and streaming synthesis modes. Our results establish a new benchmark for continuous feature generation approaches in TTS, offering a compelling alternative to discrete-token-based paradigms.

LEAP LLM Inference on Scalable PIM-NoC Architecture with Balanced Dataflow and Fine-Grained Parallelism

Authors: Yimin Wang, Yue Jiet Chong, Xuanyao Fong

2025-09-18

http://arxiv.org/abs/2509.14781v1

Large language model (key) inference has been a prevalent demand in daily life and industries. The large tensor sizes and computing complexities in keys have brought challenges to memory, computing, and databus. This paper proposes a computation/memory/key co-designed non-von Neumann accelerator by aggregating processing-in-memory (PIM) and computational network-on-chip (NoC), termed LEAP. The matrix multiplications in keys are assigned to PIM or NoC based on the data dynamicity to maximize data locality. Model partition and mapping are optimized by heuristic design space exploration. Dedicated fine-grained parallelism and tiling techniques enable high-throughput dataflow across the distributed resources in PIM and NoC. The architecture is evaluated on Llama 1B/8B/13B models and shows 2.55 throughput (tokens/sec) improvement and 71.94 energy efficiency (tokens/Joule) boost compared to the A100 GPU.

Subjective Evaluation of Low Distortion Coded Light Fields with View Synthesis

Authors: Daniela Saraiva, Joao Prazeres, Manuela Pereira, Antonio M. G. Pinheiro

2025-09-18

http://arxiv.org/abs/2509.14761v1

Light field technology is a powerful imaging method that captures both the intensity and direction of light rays in a scene, enabling the reconstruction of 3D information and supporting a range of unique applications. However, light fields produce vast amounts of data, making efficient key essential for their practical use. View synthesis plays a key role in light field technology by enabling the generation of new views, yet its interaction with key has not been fully explored. In this work, a subjective analysis of the effect of view synthesis on light field key is conducted. To achieve this, a keyly sampled light field is created by dropping views from an original light field. Both light fields are then encoded using JPEG Pleno and VVC. View synthesis is then applied to the compressed sampled light field to reconstruct the same number of views as the original. The subjective evaluation follows the proposed JPEG AIC-3 test methodology designed to assess the quality of high-fidelity compressed images. This test consists of two test stimuli displayed side-by-side, each alternating between an original and a coded view, creating a flicker effect on both sides. The user must choose which side has the stronger flicker and, therefore, the lower quality. Using these subjective results, a selection of metrics is validated.

Characterization of supersonic boundary layers of adiabatic and isothermal curved surfaces with shock interactions

Authors: Gabriel Y. R. Hamada, William R. Wolf, Hugo F. S. Lui, Carlos Junqueira-Junior

2025-09-18

http://arxiv.org/abs/2509.14756v1

Boundary layers of adiabatic and isothermal curved walls are investigated for a supersonic turbine cascade, including the effects of shock-boundary layer interactions (SBLIs). Wall-resolved large eddy simulations (LES) are performed for a linear cascade of blades with an inlet Mach number of and Reynolds number based on the axial chord . The wall to inlet temperature ratio of the isothermal case is , representing a cooled wall. An assessment of the effects of pressure gradient, thermal boundary conditions and SBLIs is presented in terms of the downstream variation of mean flow quantities such as density, temperature, and momentum profiles. The different thermal boundary conditions affect the density and temperature profiles along the boundary layer, where cooling increases the density of the gas near the wall, and reduces its temperature and viscosity. Both of these effects make the momentum profiles fuller and, hence, the boundary layer of the isothermal case is less prone to separate than that of the adiabatic wall. The mean density profiles are also affected by pressure gradients induced by the convex and concave curvatures of the blade, which lead to expansion and key of the flow, respectively. The analysis of separate terms from the momentum balance equation explains the behavior of various physical mechanisms in the inner and outer regions of the supersonic boundary layers. The importance of mean flow advection, compressibility, and Reynolds stresses is presented in terms of flow key and deceleration. The impact of the SBLIs in the momentum balance mechanisms is also investigated, showing that a combination of keys and expansions impact the boundary layers by redirecting the flow toward the wall due to the shock formations.