2025-12-26

Fast SAM2 with Text-Driven Token Pruning
Parallel Token Prediction for Language Models
An Allele-Centric Pan-Graph-Matrix Representation for Scalable Pangenome Analysis
Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential
ACD Direct Conditional Control for Video Diffusion Models via Attention Supervision
ReaSeq Unleashing World Knowledge via Reasoning for Sequential Modeling
Three-Family Supersymmetric Pati-Salam Flux Models from Rigid D-Branes
GateBreaker Gate-Guided Attacks on Mixture-of-Expert LLMs
Mesh-Attention A New Communication-Efficient Distributed Attention with Improved Data Locality
SACodec Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs
Quantile Rendering Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting
RevFFN Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks
From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction
NeRV360 Neural Representation for 360-Degree Videos with a Viewport Decoder
Measuring Mechanistic Independence Can Bias Be Removed Without Erasing Demographics?
Real-World Adversarial Attacks on RF-Based Drone Detectors
MoE-DiffuSeq Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Fail Fast, Win Big Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs
FlashVLM Text-Guided Visual Token Selection for Large Multimodal Models
Viterbi State Selection for Discrete Pinching Antenna Systems
SmartSplat Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images
Branch Learning in MRI More Data, More Models, More Training
Can LLMs Solve My Grandma's Riddle? Evaluating Multilingual Large Language Models on Reasoning Traditional Bangla Tricky Riddles
Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion
Memory as Resonance A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds
Designing Spatial Architectures for Sparse Attention STAR Accelerator via Cross-Stage Tiling
Generative Latent Coding for Ultra-Low Bitrate Image Compression
milliMamba Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion
HEART-VIT Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer
UMAMI Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis
Neural Compression of 360-Degree Equirectangular Videos using Quality Parameter Adaptation
Spatio-Temporal Graphs Beyond Grids Benchmark for Maritime Anomaly Detection
Progressive Learned Image Compression for Machine Perception
FastMPS Revisit Data Parallel in Large-scale Matrix Product State Sampling
Scaling Reinforcement Learning for Content Moderation with Large Language Models
VALLR-Pin Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement
CoLaS Copula-Seeded Sparse Local Graphs with Tunable Assortativity, Persistent Clustering, and a Degree-Tail Dichotomy
Spatio-Temporal Graph Neural Networks for Dairy Farm Sustainability Forecasting and Counterfactual Policy Analysis
Interpolative Decoding Exploring the Spectrum of Personality Traits in LLMs
Gate-Based Microwave Quantum Repeater Via Grid-State Encoding
Gaussian Variational Inference with Non-Gaussian Factors for State Estimation A UWB Localization Case Study
Power-Scalable Generation of High-Order Optical Vortices Via Coherent Beam Combining
PhysMaster Building an Autonomous AI Physicist for Theoretical and Computational Physics Research
RAPID-LLM Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
Possibilistic Inferential Models for Post-Selection Inference in High-Dimensional Linear Regression
Event Extraction in Large Language Model
Lightweight Intrusion Detection in IoT via SHAP-Guided Feature Pruning and Knowledge-Distilled Kronecker Networks
A Mathematical Framework for Misinformation Propagation in Complex Networks Topology-Dependent Distortion and Control
Sensitivity-Aware Mixed-Precision Quantization for ReRAM-based Computing-in-Memory
D2Pruner Debiased Importance and Structural Diversity for MLLM Token Pruning
Real-Time Streamable Generative Speech Restoration with Flow Matching
From Retrieval to Reasoning A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions

Fast SAM2 with Text-Driven Token Pruning

Authors: Avilasha Mandal, Chaoning Zhang, Fachrina Dewi Puspitasari, Xudong Wang, Jiaquan Zhang, Caiyan Qin, Guoqing Wang, Yang Yang, Heng Tao Shen

2025-12-24

http://arxiv.org/abs/2512.21333v1

Segment Anything Model 2 (SAM2), a vision foundation model has significantly advanced in prompt-driven video object segmentation, yet their practical deployment remains limited by the high computational and memory cost of processing dense visual tokens across time. The SAM2 pipelines typically propagate all visual tokens produced by the image encoder through downstream temporal reasoning modules, regardless of their relevance to the target object, resulting in reduced scalability due to quadratic memory attention overhead. In this work, we introduce a text-guided token framework that improves inference efficiency by selectively reducing token density prior to temporal propagation, without modifying the underlying segmentation architecture. Operating after visual encoding and before memory based propagation, our method ranks tokens using a lightweight routing mechanism that integrates local visual context, semantic relevance derived from object-centric textual descriptions (either user-provided or automatically generated), and uncertainty cues that help preserve ambiguous or boundary critical regions. By retaining only the most informative tokens for downstream processing, the proposed approach reduces redundant computation while maintaining segmentation fidelity. Extensive experiments across multiple challenging video segmentation benchmarks demonstrate that post-encoder token provides a practical and effective pathway to efficient, prompt-aware video segmentation, achieving up to 42.50 percent faster inference and 37.41 percent lower GPU memory usage compared to the unpruned baseline SAM2, while pre competitive J and F performance. These results highlight the potential of early token selection to improve the scalability of -based video segmentation systems for real-time and resource-constrained applications.

Parallel Token Prediction for Language Models

Authors: Felix Draxler, Justus Will, Farrin Marouf Sofian, Theofanis Karaletsos, Sameer Singh, Stephan Mandt

2025-12-24

http://arxiv.org/abs/2512.21323v1

We propose Parallel Token Prediction (PTP), a universal framework for parallel sequence generation in language models. PTP jointly predicts multiple dependent tokens in a single call by incorporating the sampling procedure into the model. This reduces the latency bottleneck of autoregressive , and avoids the restrictive independence assumptions common in existing multi-token prediction methods. We prove that PTP can represent arbitrary autoregressive sequence distributions. PTP is trained either by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, we achieve state-of-the-art speculative performance on Vicuna-7B by accepting over four tokens per step on Spec-Bench. The universality of our framework indicates that parallel generation of long sequences is feasible without loss of modeling power.

An Allele-Centric Pan-Graph-Matrix Representation for Scalable Pangenome Analysis

Authors: Roberto Garrone

2025-12-24

http://arxiv.org/abs/2512.21320v1

Population-scale pangenome analysis increasingly requires representations that unify single-nucleotide and structural variation while remaining scalable across large cohorts. Existing formats are typically sequence-centric, path-centric, or sample-centric, and often obscure population structure or fail to exploit carrier . We introduce the H1 pan-graph-matrix, an allele-centric representation that encodes exact haplotype membership using adaptive per-allele . By treating alleles as first-class objects and selecting optimal encodings based on carrier distribution, H1 achieves near-optimal storage across both common and rare variants. We further introduce H2, a path-centric dual representation derived from the same underlying allele-haplotype incidence information that restores explicit haplotype ordering while remaining exactly equivalent in information content. Using real human genome data, we show that this representation yields substantial gains, particularly for structural variants, while remaining equivalent in information content to pangenome graphs. H1 provides a unified, population-aware foundation for scalable pangenome analysis and downstream applications such as rare-variant interpretation and drug discovery.

Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential

Authors: Shihao Zou, Jingjing Li, Wei Ji, Jincai Huang, Kai Wang, Guo Dan, Weixin Si, Yi Pan

2025-12-24

http://arxiv.org/abs/2512.21284v1

Modern surgical systems increasingly rely on intelligent scene understanding to provide timely situational awareness for enhanced intra-operative safety. Within this pipeline, surgical scene segmentation plays a central role in accurately perceiving operative events. Although recent deep learning models, particularly large-scale foundation models, achieve remarkable segmentation accuracy, their substantial computational demands and power consumption hinder real-time deployment in resource-constrained surgical environments. To address this limitation, we explore the emerging SNN as a promising paradigm for highly efficient surgical intelligence. However, their performance is still constrained by the scarcity of labeled surgical data and the inherently nature of surgical video representations. To this end, we propose \textit{SpikeSurgSeg}, the first spike-driven video Transformer framework tailored for surgical scene segmentation with real-time potential on non-GPU platforms. To address the limited availability of surgical annotations, we introduce a surgical-scene masked autoencoding pretraining strategy for SNNs that enables robust spatiotemporal representation learning via layer-wise tube masking. Building on this pretrained backbone, we further adopt a lightweight spike-driven segmentation head that produces temporally consistent predictions while pre the low-latency characteristics of SNNs. Extensive experiments on EndoVis18 and our in-house SurgBleed dataset demonstrate that SpikeSurgSeg achieves mIoU comparable to SOTA ANN-based models while reducing inference latency by at least $8\times$ . Notably, it delivers over $20\times$ relative to most foundation-model baselines, underscoring its potential for time-critical surgical scene segmentation.

ACD Direct Conditional Control for Video Diffusion Models via Attention Supervision

Authors: Weiqi Li, Zehao Zhang, Liang Lin, Guangrun Wang

2025-12-24

http://arxiv.org/abs/2512.21268v1

Controllability is a fundamental requirement in video synthesis, where accurate alignment with conditioning signals is essential. Existing classifier-free guidance methods typically achieve conditioning indirectly by modeling the joint distribution of data and conditions, which often results in limited controllability over the specified conditions. Classifier-based guidance enforces conditions through an external classifier, but the model may exploit this mechanism to raise the classifier score without genuinely satisfying the intended condition, resulting in adversarial artifacts and limited effective controllability. In this paper, we propose Attention-Conditional Diffusion (ACD), a novel framework for direct conditional control in video diffusion models via attention supervision. By aligning the model's attention maps with external control signals, ACD achieves better controllability. To support this, we introduce a 3D-aware object layout as an efficient conditioning signal, along with a dedicated Layout ControlNet and an automated annotation pipeline for scalable layout integration. Extensive experiments on benchmark video generation datasets demonstrate that ACD delivers superior alignment with conditioning inputs while pre temporal coherence and visual fidelity, establishing an effective paradigm for conditional video synthesis.

ReaSeq Unleashing World Knowledge via Reasoning for Sequential Modeling

Authors: Chuan Wang, Gaoming Yang, Han Wu, Jiakai Tang, Jiahao Yu, Jian Wu, Jianwu Hu, Junjun Zheng, Shuwen Xiao, Yeqiu Yang, Yuning Jiang, Ahjol Nurlanbek, Binbin Cao, Bo Zheng, Fangmei Zhu, Gaoming Zhou, Huimin Yi, Huiping Chu, Jin Huang, Jinzhe Shan, Kenan Cui, Longbin Li, Silu Zhou, Wen Chen, Xia Ming, Xiang Gao, Xin Yao, Xingyu Wen, Yan Zhang, Yiwen Hu, Yulin Wang, Ziheng Bao, Zongyuan Wu

2025-12-24

http://arxiv.org/abs/2512.21257v1

Industrial recommender systems face two fundamental limitations under the log-driven paradigm: (1) knowledge poverty in ID-based item representations that causes brittle interest modeling under data , and (2) systemic blindness to beyond-log user interests that constrains model performance within platform boundaries. These limitations stem from an over-reliance on shallow interaction statistics and close-looped feedback while neglecting the rich world knowledge about product semantics and cross-domain behavioral patterns that Large Language Models have learned from vast corpora. To address these challenges, we introduce ReaSeq, a reasoning-enhanced framework that leverages world knowledge in Large Language Models to address both limitations through explicit and implicit reasoning. Specifically, ReaSeq employs explicit Chain-of-Thought reasoning via multi-agent collaboration to distill structured product knowledge into semantically enriched item representations, and latent reasoning via Diffusion Large Language Models to infer plausible beyond-log behaviors. Deployed on Taobao's ranking system hundreds of millions of users, ReaSeq achieves substantial gains: >6.0% in IPV and CTR, >2.9% in Orders, and >2.5% in GMV, validating the effectiveness of world-knowledge-enhanced reasoning over purely log-driven approaches.

Three-Family Supersymmetric Pati-Salam Flux Models from Rigid D-Branes

Authors: Adeel Mansha, Mudassar Sabir, Tianjun Li, Luyang Wang

2025-12-24

http://arxiv.org/abs/2512.21141v1

Intersecting D-brane model building often suffer from the unstabilized open-string moduli, leading to the unwanted massless adjoint scalars. In our previous work arXiv:2505.03664, this issue was resolved by employing the rigid D6-branes on the $\mathbb{T}^6/(\mathbb{Z}_2 \times \mathbb{Z}_2^\prime)$ orientifold with discrete torsion, where fractional cycles eliminate all adjoint scalars. In this paper, we construct new three-family flux models in the Type IIB setup on $\mathbb{T}^6/(\mathbb{Z}_2 \times \mathbb{Z}_2)$ , T-dual to the Type IIA rigid D6-brane construction with discrete torsion, by introducing the d background $G_3$ flux that stabilizes the closed-string complex structure moduli and axio-dilaton. The resulting Pati-Salam gauge symmetry can be spontaneously broken down to the Standard Model via a supersymmetry-pre Higgs mechanism. All the consistency conditions, including $\mathcal{N}=1$ supersymmetry, RR tadpole cancellation, and K-theory constraints, are satisfied. We present the complete particle spectra for these models and discuss how exotic states dynamically decouple through strong dynamics in the hidden sector.

GateBreaker Gate-Guided Attacks on Mixture-of-Expert LLMs

Authors: Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Stjepan Picek, Ahmad-Reza Sadeghi

2025-12-24

http://arxiv.org/abs/2512.21008v1

Mixture-of-Experts (MoE) architectures have advanced the scaling of Large Language Models (s) by activating only a subset of parameters per input, enabling state-of-the-art performance with reduced computational cost. As these models are increasingly deployed in critical domains, understanding and strengthening their alignment mechanisms is essential to prevent harmful outputs. However, existing safety research has focused almost exclusively on dense architectures, leaving the unique safety properties of MoEs largely unexamined. The modular, ly-activated design of MoEs suggests that safety mechanisms may operate differently than in dense models, raising questions about their robustness. In this paper, we present GateBreaker, the first training-free, lightweight, and architecture-agnostic attack framework that compromises the safety alignment of modern MoE s at inference time. GateBreaker operates in three stages: (i) gate-level profiling, which identifies safety experts disproportionately routed on harmful inputs, (ii) expert-level localization, which localizes the safety structure within safety experts, and (iii) targeted safety removal, which disables the identified safety structure to compromise the safety alignment. Our study shows that MoE safety concentrates within a small subset of neurons coordinated by routing. Selective disabling of these neurons, approximately 3% of neurons in the targeted expert layers, significantly increases the averaged attack success rate (ASR) from 7.4% to 64.9% against the eight latest aligned MoE s with limited utility degradation. These safety neurons transfer across models within the same family, raising ASR from 17.9% to 67.7% with one-shot transfer attack. Furthermore, GateBreaker generalizes to five MoE vision language models (VLMs) with 60.9% ASR on unsafe image inputs.

Mesh-Attention A New Communication-Efficient Distributed Attention with Improved Data Locality

Authors: Sirui Chen, Jingji Chen, Siqi Zhu, Ziheng Jiang, Yanghua Peng, Xuehai Qian

2025-12-24

http://arxiv.org/abs/2512.20968v1

Distributed attention is a fundamental problem for scaling context window for Large Language Models (s). The state-of-the-art method, Ring-Attention, suffers from scalability limitations due to its excessive traffic. This paper proposes a new distributed attention algorithm, Mesh-Attention, by rethinking the design space of distributed attention with a new matrix-based model. Our method assigns a two-dimensional tile -- rather than one-dimensional row or column -- of computation blocks to each GPU to achieve higher efficiency through lower -computation (CommCom) ratio. The general approach covers Ring-Attention as a special case, and allows the tuning of CommCom ratio with different tile shapes. Importantly, we propose a greedy algorithm that can efficiently search the scheduling space within the tile with restrictions that ensure efficient among GPUs. The theoretical analysis shows that Mesh-Attention leads to a much lower complexity and exhibits good scalability comparing to other current algorithms. Our extensive experiment results show that Mesh-Attention can achieve up to 3.4x speedup (2.9x on average) and reduce the volume by up to 85.4% (79.0% on average) on 256 GPUs. Our scalability results further demonstrate that Mesh-Attention sustains superior performance as the system scales, substantially reducing overhead in large-scale deployments. The results convincingly confirm the advantage of Mesh-Attention.

SACodec Asymmetric Quantization with Semantic Anchoring for Low-Bitrate High-Fidelity Neural Speech Codecs

Authors: Zhongren Dong, Bin Wang, Jing Han, Haotian Guo, Xiaojun Mo, Yimin Cao, Zixing Zhang

2025-12-24

http://arxiv.org/abs/2512.20944v1

Neural Speech Codecs face a fundamental trade-off at low bitrates: pre acoustic fidelity often compromises semantic richness. To address this, we introduce SACodec, a novel codec built upon an asymmetric dual-r that employs our proposed Semantic Anchoring mechanism. This design strategically decouples the of Semantic and Acoustic details. The semantic anchoring is achieved via a lightweight projector that aligns acoustic features with a frozen, large-scale mHuBERT codebook, injecting linguistic priors while guaranteeing full codebook utilization. Sequentially, for acoustic details, a residual activation module with SimVQ enables a single-layer r (acoustic path) to faithfully recover fine-grained information. At just 1.5 kbps, SACodec establishes a new state of the art by excelling in both fidelity and semantics: subjective listening tests confirm that its reconstruction quality is perceptually highly comparable to ground-truth audio, while its tokens demonstrate substantially improved semantic richness in downstream tasks.

Quantile Rendering Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting

Authors: Yoonwoo Jeong, Cheng Sun, Frank Wang, Minsu Cho, Jaesung Choe

2025-12-24

http://arxiv.org/abs/2512.20927v1

Recent advancements in computer vision have successfully extended Open-vocabulary segmentation (OVS) to the 3D domain by leveraging 3D Gaussian Splatting (3D-GS). Despite this progress, efficiently rendering the high-dimensional features required for open-vocabulary queries poses a significant challenge. Existing methods employ codebooks or feature , causing information loss, thereby degrading segmentation quality. To address this limitation, we introduce Quantile Rendering (Q-Render), a novel rendering strategy for 3D Gaussians that efficiently handles high-dimensional features while maintaining high fidelity. Unlike conventional volume rendering, which densely samples all 3D Gaussians intersecting each ray, Q-Render ly samples only those with dominant influence along the ray. By integrating Q-Render into a generalizable 3D neural network, we also propose Gaussian Splatting Network (GS-Net), which predicts Gaussian features in a generalizable manner. Extensive experiments on ScanNet and LeRF demonstrate that our framework outperforms state-of-the-art methods, while enabling real-time rendering with an approximate ~43.7x speedup on 512-D feature maps. Code will be made publicly available.

RevFFN Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks

Authors: Ningyuan Liu, Jing Yang, Kaitong Cai, Keze Wang

2025-12-24

http://arxiv.org/abs/2512.20920v1

Full parameter fine tuning is a key technique for adapting large language models (s) to downstream tasks, but it incurs substantial memory overhead due to the need to extensive intermediate activations for backpropagation. This bottleneck makes full fine tuning of contemporary large scale s challenging in practice. Existing distributed training frameworks such as DeepSpeed alleviate this issue using techniques like ZeRO and FSDP, which rely on multi GPU memory or CPU offloading, but often require additional hardware resources and reduce training speed. We introduce RevFFN, a memory efficient fine tuning paradigm for mixture of experts (MoE) s. RevFFN employs carefully designed reversible Transformer blocks that allow reconstruction of layer input activations from outputs during backpropagation, eliminating the need to store most intermediate activations in memory. While pre the expressive capacity of MoE architectures, this approach significantly reduces peak memory consumption for full parameter fine tuning. As a result, RevFFN enables efficient full fine tuning on a single consumer grade or server grade GPU.

From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction

Authors: Sami Marouani, Kamal Singh, Baptiste Jeudy, Amaury Habrard

2025-12-24

http://arxiv.org/abs/2512.20885v1

Accurate prediction of flow delay is essential for optimizing and managing modern networks. We investigate three levels of modeling for this task. First, we implement a heterogeneous GNN with attention-based message passing, establishing a strong neural baseline. Second, we propose FlowKANet in which Kolmogorov-Arnold Networks replace standard MLP layers, reducing trainable parameters while maintaining competitive predictive performance. FlowKANet integrates KAMP-Attn (Kolmogorov-Arnold Message Passing with Attention), embedding KAN operators directly into message-passing and attention computation. Finally, we distill the model into symbolic surrogate models using block-wise regression, producing closed-form equations that eliminate trainable weights while pre graph-structured dependencies. The results show that KAN layers provide a favorable trade-off between efficiency and accuracy and that symbolic surrogates emphasize the potential for lightweight deployment and enhanced transparency.

NeRV360 Neural Representation for 360-Degree Videos with a Viewport Decoder

Authors: Daichi Arai, Kyohei Unno, Yasuko Sugito, Yuichi Kusakabe

2025-12-24

http://arxiv.org/abs/2512.20871v1

Implicit neural representations for videos (NeRV) have shown strong potential for video . However, applying NeRV to high-resolution 360-degree videos causes high memory usage and slow , making real-time applications impractical. We propose NeRV360, an end-to-end framework that s only the user-selected viewport instead of reconstructing the entire panoramic frame. Unlike conventional pipelines, NeRV360 integrates viewport extraction into and introduces a spatial-temporal affine transform module for conditional based on viewpoint and time. Experiments on 6K-resolution videos show that NeRV360 achieves a 7-fold reduction in memory consumption and a 2.5-fold increase in speed compared to HNeRV, a representative prior work, while delivering better image quality in terms of objective metrics.

Measuring Mechanistic Independence Can Bias Be Removed Without Erasing Demographics?

Authors: Zhengyang Shan, Aaron Mueller

2025-12-23

http://arxiv.org/abs/2512.20796v1

We investigate how independent demographic bias mechanisms are from general demographic recognition in language models. Using a multi-task evaluation setup where demographics are associated with names, professions, and education levels, we measure whether models can be debiased while pre demographic detection capabilities. We compare attribution-based and correlation-based methods for locating bias features. We find that targeted autoencoder feature ablations in Gemma-2-9B reduce bias without degrading recognition performance: attribution-based ablations mitigate race and gender profession stereotypes while pre name recognition accuracy, whereas correlation-based ablations are more effective for education bias. Qualitative analysis further reveals that removing attribution features in education tasks induces ``prior collapse'', thus increasing overall bias. This highlights the need for dimension-specific interventions. Overall, our results show that demographic bias arises from task-specific mechanisms rather than absolute demographic markers, and that mechanistic inference-time interventions can enable surgical debiasing without compromising core model capabilities.

Real-World Adversarial Attacks on RF-Based Drone Detectors

Authors: Omer Gazit, Yael Itzhakev, Yuval Elovici, Asaf Shabtai

2025-12-23

http://arxiv.org/abs/2512.20712v1

Radio frequency (RF) based systems are increasingly used to detect drones by analyzing their RF signal patterns, converting them into spectrogram images which are processed by object detection models. Existing RF attacks against image based models alter digital features, making over-the-air (OTA) implementation difficult due to the challenge of converting digital perturbations to transmittable waveforms that may introduce synchronization errors and interference, and encounter hardware limitations. We present the first physical attack on RF image based drone detectors, optimizing class-specific universal complex baseband (I/Q) perturbation waveforms that are transmitted alongside legitimate s. We evaluated the attack using RF recordings and OTA experiments with four types of drones. Our results show that modest, structured I/Q perturbations are compatible with standard RF chains and reliably reduce target drone detection while pre detection of legitimate drones.

MoE-DiffuSeq Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts

Authors: Alexandros Christoforos, Chadbourne Davis

2025-12-23

http://arxiv.org/abs/2512.20604v1

We present MoE-DiffuSeq, a mixture of experts based framework for enhancing diffusion models in long document generation. Existing diffusion based text generation models, such as DiffuSeq, suffer from high computational cost and memory overhead when applied to extended sequences. To address these challenges, MoE-DiffuSeq integrates attention with a mixture of experts architecture, enabling efficient and scalable long sequence modeling. Our approach introduces a customized attention mechanism designed to reduce computational complexity while pre text quality and coherence. In addition, we incorporate a soft absorbing state within the diffusion process to accelerate sequence reconstruction and improve generation precision. Extensive experiments demonstrate that MoE-DiffuSeq significantly improves training efficiency and sampling speed compared to existing diffusion models. These advantages are particularly effective for long document scenarios, including scientific article generation, code repository modeling, and long form dialogue generation. Benchmark results further show that MoE-DiffuSeq improves efficiency, speed, accuracy, and expressiveness, advancing the practical applicability of diffusion models for high quality long form text generation.

Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Authors: Amirhosein Ghasemabadi, Di Niu

2025-12-23

http://arxiv.org/abs/2512.20578v1

Large language models (s) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, multi-sample consistency, or text-based self-critique, which incur additional compute or correlate weakly with true correctness. We ask: can s predict their own failures by inspecting internal states during inference? We introduce Gnosis, a lightweight self-awareness mechanism that enables frozen s to perform intrinsic self-verification by signals from hidden states and attention patterns. Gnosis passively observes internal traces, compresses them into fixed-budget descriptors, and predicts correctness with negligible inference cost, adding only ~5M parameters and operating independently of sequence length. Across math reasoning, open-domain question answering, and academic knowledge benchmarks, and over frozen backbones ranging from 1.7B to 20B parameters, Gnosis consistently outperforms strong internal baselines and large external judges in both accuracy and calibration. Moreover, it generalizes zero-shot to partial generations, enabling early detection of failing trajectories and compute-aware control. These results show that reliable correctness cues are intrinsic to generation process and can be extracted efficiently without external supervision.

Fail Fast, Win Big Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs

Authors: Rui Pan, Zhuofu Chen, Ravi Netravali

2025-12-23

http://arxiv.org/abs/2512.20573v1

Diffusion Large Language Models (ds) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully applied, the attributes of ds can actually be a strength for drafters in speculative with autoregressive (AR) verifiers. Our core insight is that d's speed from parallel drastically lowers the risk of costly rejections, providing a practical mechanism to effectively realize the (elusive) lengthy drafts that lead to large speedups with speculative . We present FailFast, a d-based speculative framework that realizes this approach by dynamically adapting its speculation length. It "fails fast" by spending minimal compute in hard-to-speculate regions to shrink speculation latency and "wins big" by aggressively extending draft lengths in easier regions to reduce verification latency (in many cases, speculating and accepting 70 tokens at a time!). Without any fine-tuning, FailFast delivers lossless of AR s and achieves up to 4.9 $\times$ speedup over vanilla , 1.7 $\times$ over the best naive d drafter, and 1.4 $\times$ over EAGLE-3 across diverse models and workloads. We open-source FailFast at https://github.com/ruipeterpan/failfast.

FlashVLM Text-Guided Visual Token Selection for Large Multimodal Models

Authors: Kaitong Cai, Jusheng Zhang, Jing Yang, Yijia Fan, Pengtao Xie, Jian Wang, Keze Wang

2025-12-23

http://arxiv.org/abs/2512.20561v1

Large vision-language models (VLMs) typically process hundreds or thousands of visual tokens per image or video frame, incurring quadratic attention cost and substantial redundancy. Existing token reduction methods often ignore the textual query or rely on deep attention maps, whose instability under aggressive leads to degraded semantic alignment. We propose FlashVLM, a text guided visual token selection framework that dynamically adapts visual inputs to the query. Instead of relying on noisy attention weights, FlashVLM computes an explicit cross modal similarity between projected image tokens and normalized text embeddings in the language model space. This extrinsic relevance is fused with intrinsic visual saliency using log domain weighting and temperature controlled sharpening. In addition, a diversity pre partition retains a minimal yet representative set of background tokens to maintain global context. Under identical token budgets and evaluation protocols, FlashVLM achieves beyond lossless , slightly surpassing the unpruned baseline while up to 77.8 percent of visual tokens on LLaVA 1.5, and maintaining 92.8 percent accuracy even under 94.4 percent . Extensive experiments on 14 image and video benchmarks demonstrate that FlashVLM delivers state of the art efficiency performance trade offs while maintaining strong robustness and generalization across mainstream VLMs.

Viterbi State Selection for Discrete Pinching Antenna Systems

Authors: Victoria E. Galanopoulou, Thrassos K. Oikonomou, Odysseas G. Karagiannidis, Sotiris A. Tegos, Panagiotis D. Diamantoulakis

2025-12-23

http://arxiv.org/abs/2512.20389v1

Pinching antennas enable dynamic control of electromagnetic wave propagation through reconfigurable radiating structures, but selecting an optimal subset of antennas remains a combinatorial problem with exponential complexity. This letter considers antenna subset selection for a waveguide-fed pinching antenna array ground users under a time-division access scheme. The achievable rate depends on the coherent superposition of the effective complex channel gains and is therefore highly sensitive to the relative phase alignment of the activated antennas. To address the prohibitive complexity of exhaustive search, we propose a Viterbi state selection (VSS) algorithm that exploits the phase structure of the combined received signal. The trellis state is defined by a d representation of the phase of the accumulated complex gain, and a Viterbi-based survivor rule is used to prune dominated antenna subsets across stages. Numerical results demonstrate that the proposed method achieves the same antenna selection and rate as exhaustive search, while reducing the computational complexity from exponential to polynomial in the number of available antennas.

SmartSplat Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images

Authors: Linfei Li, Lin Zhang, Zhong Wang, Ying Shen

2025-12-23

http://arxiv.org/abs/2512.20377v1

Recent advances in generative AI have accelerated the production of ultra-high-resolution visual content, posing significant challenges for efficient and real-time on end-user devices. Inspired by 3D Gaussian Splatting, recent 2D Gaussian image models improve representation efficiency, yet existing methods struggle to balance ratio and reconstruction fidelity in ultra-high-resolution scenarios. To address this issue, we propose SmartSplat, a highly adaptive and feature-aware GS-based image framework that supports arbitrary image resolutions and ratios. SmartSplat leverages image-aware features such as gradients and color variances, introducing a Gradient-Color Guided Variational Sampling strategy together with an Exclusion-based Uniform Sampling scheme to improve the non-ping coverage of Gaussian primitives in pixel space. In addition, we propose a Scale-Adaptive Gaussian Color Sampling method to enhance color initialization across scales. Through joint optimization of spatial layout, scale, and color initialization, SmartSplat efficiently captures both local structures and global textures using a limited number of Gaussians, achieving high reconstruction quality under strong . Extensive experiments on DIV8K and a newly constructed 16K dataset demonstrate that SmartSplat consistently outperforms state-of-the-art methods at comparable ratios and exceeds their limits, showing strong scalability and practical applicability. The code is publicly available at https://github.com/lif314/SmartSplat.

Branch Learning in MRI More Data, More Models, More Training

Authors: Yuyang Li, Yipin Deng, Zijian Zhou, Peng Hu

2025-12-23

http://arxiv.org/abs/2512.20330v1

We investigated two complementary strategies for multicontrast cardiac MR reconstruction: physics-consistent data-space augmentation (DualSpaceCMR) and parameter-efficient capacity scaling via VQPrompt and Moero. DualSpaceCMR couples image-level transforms with kspace noise and motion simulations while pre forwardmodel consistency. VQPrompt adds a lightweight bottleneck prompt; Moero embeds a mixture of experts within a deep unrolled network with histogram-based routing. In the multivendor, multisite CMRxRecon25 benchmark, we evaluate fewshot and out-of-distribution generalization. On small datasets, k-space motion-plus-noise improves reconstruction; on the large benchmark it degrades performance, revealing sensitivity to augmentation ratio and schedule. VQPrompt produces modest and consistent gains with negligible memory overhead. Moero continues to improve after early plateaus and maintains baseline-like fewshot and out-of-distribution behavior despite mild overfitting, but routing lowers PyTorch throughput and makes wall clock time the main bottleneck. These results motivate scale-aware augmentation and suggest prompt-based capacity scaling as a practical path, while efficiency improvements are crucial for expert models.

Can LLMs Solve My Grandma's Riddle? Evaluating Multilingual Large Language Models on Reasoning Traditional Bangla Tricky Riddles

Authors: Nurul Labib Sayeedi, Md. Faiyaz Abdullah Sayeedi, Khushnur Binte Jahangir, Swakkhar Shatabda, Sarah Masud Preum

2025-12-23

http://arxiv.org/abs/2512.20324v1

Large Language Models (s) show impressive performance on many NLP benchmarks, yet their ability to reason in figurative, culturally grounded, and low-resource settings remains underexplored. We address this gap for Bangla by introducing BanglaRiddleEval, a benchmark of 1,244 traditional Bangla riddles instantiated across four tasks (4,976 riddle-task artifacts in total). Using an -based pipeline, we generate Chain-of-Thought explanations, semantically coherent distractors, and fine-grained ambiguity annotations, and evaluate a diverse suite of open-source and closed-source models under different prompting strategies. Models achieve moderate semantic on generative QA but low correctness, MCQ accuracy peaks at only about 56% versus an 83% human baseline, and ambiguity resolution ranges from roughly 26% to 68%, with high-quality explanations confined to the strongest models. These results show that current s capture some cues needed for Bangla riddle reasoning but remain far from human-level performance, establishing BanglaRiddleEval as a challenging new benchmark for low-resource figurative reasoning. All data, code, and evaluation scripts are available on GitHub: https://github.com/Labib1610/BanglaRiddleEval.

Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion

Authors: Xuanyu Hu

2025-12-23

http://arxiv.org/abs/2512.20249v1

Multimodal brain aims to reconstruct semantic information that is consistent with visual stimuli from brain activity signals such as fMRI, and then generate readable natural language descriptions. However, multimodal brain still faces key challenges in cross-subject generalization and interpretability. We propose a BrainROI model and achieve leading-level results in brain-captioning evaluation on the NSD dataset. Under the cross-subject setting, compared with recent state-of-the-art methods and representative baselines, metrics such as BLEU-4 and CIDEr show clear improvements. Firstly, to address the heterogeneity of functional brain topology across subjects, we design a new fMRI encoder. We use multi-atlas soft functional parcellations (soft-ROI) as a shared space. We extend the discrete ROI Concatenation strategy in MIND to a voxel-wise gated fusion mechanism (Voxel-gate). We also ensure consistent ROI mapping through global label alignment, which enhances cross-subject transferability. Secondly, to overcome the limitations of manual and black-box prompting methods in stability and transparency, we introduce an interpretable prompt optimization process. In a small-sample closed loop, we use a locally deployed Qwen model to iteratively generate and select human-readable prompts. This process improves the stability of prompt design and preserves an auditable optimization trajectory. Finally, we impose parameterized constraints during inference to further improve the stability and quality of the generated descriptions.

Memory as Resonance A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds

Authors: Tarik Houichime, Abdelghani Souhar, Younes El Amrani

2025-12-23

http://arxiv.org/abs/2512.20245v1

The memory of contemporary Large Language Models is bound by a physical paradox: as they learn, they fill up. The linear accumulation (O(N)) of Key-Value states treats context as a warehouse of static artifacts, eventually forcing a destructive choice between amnesia and latency. We challenge this discrete orthodoxy, proposing that long-term memory is not the storage of items, but the persistence of a trajectory. We introduce Phonetic Trajectory Memory (PTM), a neuro-symbolic architecture that encodes language not as a sequence of tensors, but as a continuous path on an ergodic manifold governed by irrational rotation matrices. By decoupling the navigation (an invariant O(1) geometric signal) from the reconstruction (a probabilistic generative act), PTM achieves a magnitude of greater than 3,000x relative to dense s. We demonstrate that retrieval becomes a process of resonance: the phonetic trace stabilizes the model against hallucination via "Signal Consensus" mechanism, securing up to approximately 92% factual accuracy. While this aggressive abstraction alters generative texture, it unlocks immediate access latency (approximately 34ms) independent of depth. Our results suggest that infinite context does not require infinite silicon; it requires treating memory not as data to be stored, but as a reconstructive process acting on a conserved, undying physical signal.

Designing Spatial Architectures for Sparse Attention STAR Accelerator via Cross-Stage Tiling

Authors: Huizheng Wang, Taiquan Wei, Hongbin Wang, Zichuan Wang, Xinru Tang, Zhiheng Yue, Shaojun Wei, Yang Hu, Shouyi Yin

2025-12-23

http://arxiv.org/abs/2512.20198v2

Large language models (s) rely on self-attention for contextual understanding, demanding high-throughput inference and large-scale token parallelism (LTPP). Existing dynamic accelerators falter under LTPP scenarios due to stage-isolated optimizations. Revisiting the end-to-end flow, we identify an overlooked opportunity: cross-stage coordination can substantially reduce redundant computation and memory access. We propose STAR, a cross-stage compute- and memory-efficient algorithm-hardware co-design tailored for Transformer inference under LTPP. STAR introduces a leading-zero-based prediction using log-domain add-only operations to minimize prediction overhead. It further employs distributed sorting and a sorted updating FlashAttention mechanism, guided by a coordinated tiling strategy that enables fine-grained stage interaction for improved memory efficiency and latency. These optimizations are supported by a dedicated STAR accelerator architecture, achieving up to 9.2 $\times$ speedup and 71.2 $\times$ energy efficiency over A100, and surpassing SOTA accelerators by up to 16.1 $\times$ energy and 27.1 $\times$ area efficiency gains. Further, we deploy STAR onto a multi-core spatial architecture, optimizing dataflow and execution orchestration for ultra-long sequence processing. Architectural evaluation shows that, compared to the baseline design, Spatial-STAR achieves a 20.1 $\times$ throughput improvement.

Generative Latent Coding for Ultra-Low Bitrate Image Compression

Authors: Zhaoyang Jia, Jiahao Li, Bin Li, Houqiang Li, Yan Lu

2025-12-23

http://arxiv.org/abs/2512.20194v1

Most existing image approaches perform transform coding in the pixel space to reduce its spatial redundancy. However, they encounter difficulties in achieving both high-realism and high-fidelity at low bitrate, as the pixel-space distortion may not align with human perception. To address this issue, we introduce a Generative Latent Coding (GLC) architecture, which performs transform coding in the latent space of a generative vector-d variational auto-encoder (VQ-VAE), instead of in the pixel space. The generative latent space is characterized by greater , richer semantic and better alignment with human perception, rendering it advantageous for achieving high-realism and high-fidelity . Additionally, we introduce a categorical hyper module to reduce the bit cost of hyper-information, and a code-prediction-based supervision to enhance the semantic consistency. Experiments demonstrate that our GLC maintains high visual quality with less than 0.04 bpp on natural images and less than 0.01 bpp on facial images. On the CLIC2020 test set, we achieve the same FID as MS-I with 45% fewer bits. Furthermore, the powerful generative latent space enables various applications built on our GLC pipeline, such as image restoration and style transfer. The code is available at https://github.com/jzyustc/GLC.

milliMamba Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion

Authors: Niraj Prakash Kini, Shiau-Rung Tsai, Guan-Hsun Lin, Wen-Hsiao Peng, Ching-Wen Ma, Jenq-Neng Hwang

2025-12-23

http://arxiv.org/abs/2512.20128v1

Millimeter-wave radar offers a privacy-pre and lighting-invariant alternative to RGB sensors for Human Pose Estimation (HPE) task. However, the radar signals are often due to specular reflection, making the extraction of robust features from radar signals highly challenging. To address this, we present milliMamba, a radar-based 2D human pose estimation framework that jointly models spatio-temporal dependencies across both the feature extraction and stages. Specifically, given the high dimensionality of radar inputs, we adopt a Cross-View Fusion Mamba encoder to efficiently extract spatio-temporal features from longer sequences with linear complexity. A Spatio-Temporal-Cross Attention r then predicts joint coordinates across multiple frames. Together, this spatio-temporal modeling pipeline enables the model to leverage contextual cues from neighboring frames and joints to infer missing joints caused by specular reflections. To reinforce motion smoothness, we incorporate a velocity loss alongside the standard keypoint loss during training. Experiments on the TransHuPR and HuPR datasets demonstrate that our method achieves significant performance improvements, exceeding the baselines by 11.0 AP and 14.6 AP, respectively, while maintaining reasonable complexity. Code: https://github.com/NYCU-MAPL/milliMamba

HEART-VIT Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer

Authors: Mohammad Helal Uddin, Liam Seymour, Sabur Baidya

2025-12-23

http://arxiv.org/abs/2512.20120v1

Vision Transformers (ViTs) deliver state-of-the-art accuracy but their quadratic attention cost and redundant computations severely hinder deployment on latency and resource-constrained platforms. Existing approaches treat either tokens or heads in isolation, relying on heuristics or first-order signals, which often sacrifice accuracy or fail to generalize across inputs. We introduce HEART-ViT, a Hessian-guided efficient dynamic attention and token framework for vision s, which to the best of our knowledge is the first unified, second-order, input-adaptive framework for ViT optimization. HEART-ViT estimates curvature-weighted sensitivities of both tokens and attention heads using efficient Hessian-vector products, enabling principled decisions under explicit loss budgets.This dual-view sensitivity reveals an important structural insight: token dominates computational savings, while head provides fine-grained redundancy removal, and their combination achieves a superior trade-off. On ImageNet-100 and ImageNet-1K with ViT-B/16 and DeiT-B/16, HEART-ViT achieves up to 49.4 percent FLOPs reduction, 36 percent lower latency, and 46 percent higher throughput, while consistently matching or even surpassing baseline accuracy after fine-tuning, for example 4.7 percent recovery at 40 percent token . Beyond theoretical benchmarks, we deploy HEART-ViT on different edge devices such as AGX Orin, demonstrating that our reductions in FLOPs and latency translate directly into real-world gains in inference speed and energy efficiency. HEART-ViT bridges the gap between theory and practice, delivering the first unified, curvature-driven framework that is both accuracy-pre and edge-efficient.

UMAMI Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis

Authors: Thanh-Tung Le, Tuan Pham, Tung Nguyen, Deying Kong, Xiaohui Xie, Stephan Mandt

2025-12-23

http://arxiv.org/abs/2512.20107v1

Novel view synthesis (NVS) seeks to render photorealistic, 3D-consistent images of a scene from unseen camera poses given only a set of posed views. Existing deterministic networks render observed regions quickly but blur unobserved areas, whereas stochastic diffusion-based methods hallucinate plausible content yet incur heavy training- and inference-time costs. In this paper, we propose a hybrid framework that unifies the strengths of both paradigms. A bidirectional encodes multi-view image tokens and Plucker-ray embeddings, producing a shared latent representation. Two lightweight heads then act on this representation: (i) a feed-forward regression head that renders pixels where geometry is well constrained, and (ii) a masked autoregressive diffusion head that completes occluded or unseen regions. The entire model is trained end-to-end with joint photometric and diffusion losses, without handcrafted 3D inductive biases, enabling scalability across diverse scenes. Experiments demonstrate that our method attains state-of-the-art image quality while reducing rendering time by an order of magnitude compared with fully generative baselines.

Neural Compression of 360-Degree Equirectangular Videos using Quality Parameter Adaptation

Authors: Daichi Arai, Yuichi Kondo, Kyohei Unno, Yasuko Sugito, Yuichi Kusakabe

2025-12-23

http://arxiv.org/abs/2512.20093v1

This study proposes a practical approach for compressing 360-degree equirectangular videos using pretrained neural video (NVC) models. Without requiring additional training or changes in the model architectures, the proposed method extends parameter adaptation techniques from traditional video codecs to NVC, utilizing the spatially varying sampling density in equirectangular projections. We introduce latitude-based adaptive quality parameters through rate-distortion optimization for NVC. The proposed method utilizes vector bank interpolation for latent modulation, enabling flexible adaptation with arbitrary quality parameters and mitigating the limitations caused by rounding errors in the adaptive parameters. Experimental results demonstrate that applying this method to the DCVC-RT framework yields BD-Rate savings of 5.2% in terms of the weighted spherical peak signal-to-noise ratio for JVET class S1 test sequences, with only a 0.3% increase in processing time.

Spatio-Temporal Graphs Beyond Grids Benchmark for Maritime Anomaly Detection

Authors: Jeehong Kim, Youngseok Hwang, Minchan Kim, Sungho Bae, Hyunwoo Park

2025-12-23

http://arxiv.org/abs/2512.20086v1

Spatio-temporal graph neural networks (ST-GNNs) have achieved notable success in structured domains such as road traffic and public transportation, where spatial entities can be naturally represented as fixed nodes. In contrast, many real-world systems including maritime traffic lack such fixed anchors, making the construction of spatio-temporal graphs a fundamental challenge. Anomaly detection in these non-grid environments is particularly difficult due to the absence of canonical reference points, the and irregularity of trajectories, and the fact that anomalies may manifest at multiple granularities. In this work, we introduce a novel benchmark dataset for anomaly detection in the maritime domain, extending the Open Maritime Traffic Analysis Dataset (OMTAD) into a benchmark tailored for graph-based anomaly detection. Our dataset enables systematic evaluation across three different granularities: node-level, edge-level, and graph-level anomalies. We plan to employ two specialized -based agents: \emph{Trajectory Synthesizer} and \emph{Anomaly Injector} to construct richer interaction contexts and generate semantically meaningful anomalies. We expect this benchmark to promote reproducibility and to foster methodological advances in anomaly detection for non-grid spatio-temporal systems.

Progressive Learned Image Compression for Machine Perception

Authors: Jungwoo Kim, Jun-Hyuk Kim, Jong-Seok Lee

2025-12-23

http://arxiv.org/abs/2512.20070v1

Recent advances in learned image codecs have been extended from human perception toward machine perception. However, progressive image with fine granular scalability (FGS)-which enables a single bitstream at multiple quality levels-remains unexplored for machine-oriented codecs. In this work, we propose a novel progressive learned image codec for machine perception, PICM-Net, based on trit-plane coding. By analyzing the difference between human- and machine-oriented rate-distortion priorities, we systematically examine the latent prioritization strategies in terms of machine-oriented codecs. To further enhance real-world adaptability, we design an adaptive controller, which dynamically determines the necessary level during inference time to maintain the desired confidence of downstream machine prediction. Extensive experiments demonstrate that our approach enables efficient and adaptive progressive transmission while maintaining high performance in the downstream classification task, establishing a new paradigm for machine-aware progressive image .

FastMPS Revisit Data Parallel in Large-scale Matrix Product State Sampling

Authors: Yaojian Chen, Si-Qiu Gong, Lin Gan, Yanfei Liu, An Yang, Yinuo Wang, Chao-yang Lu, Guangwen Yang

2025-12-23

http://arxiv.org/abs/2512.20064v1

Matrix Product State (MPS) is a versatile tensor network representation widely applied in quantum physics, quantum chemistry, and machine learning, etc. MPS sampling serves as a critical fundamental operation in these fields. As the problems become more complex, the scale of MPS is rapidly increasing. Traditional data parallelism is limited by memory and heavy I/O in large-scale MPS. Model parallelism that can handle large-scale MPS imposes rigid process bindings and lacks scalability. This work proposes Fast-MPS, a multi-level parallel framework for scalable MPS sampling. Our design combines data parallelism across samples with tensor parallelism along bond dimensions. We eliminate memory and I/O pressure through and ping, and revive data parallel in large-scale MPS sampling. We evaluate our approach on Gaussian Boson Sampling, a representative and demanding application. Fast-MPS achieves over 10x speedup compared to existing simulators, scales to thousands of processes, and enables simulations with 8,176 sites and bond dimension chi = 10^4, significantly outperforming the state of the art. Fast-MPS has demonstrated great potential in high-performance tensor network applications.

Scaling Reinforcement Learning for Content Moderation with Large Language Models

Authors: Hamed Firooz, Rui Liu, Yuchen Lu, Zhenyu Hou, Fangzhou Xiong, Xiaoyang Zhang, Changshu Jian, Zhicheng Zhu, Jiayuan Ma, Jacob Tao, Chaitali Gupta, Xiaochang Peng, Shike Mei, Hang Cui, Yang Qin, Shuo Tang, Jason Gaedtke, Arpit Mittal

2025-12-23

http://arxiv.org/abs/2512.20061v1

Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for policy violations. Although recent advances in large language models (s) have demonstrated strong potential for policy-grounded moderation, the practical challenges of training these systems to achieve expert-level accuracy in real-world settings remain largely unexplored, particularly in regimes characterized by label , evolving policy definitions, and the need for nuanced reasoning beyond shallow pattern matching. In this work, we present a comprehensive empirical investigation of scaling reinforcement learning (RL) for content classification, systematically evaluating multiple RL training recipes and reward-shaping strategies-including verifiable rewards and -as-judge frameworks-to transform general-purpose language models into specialized, policy-aligned classifiers across three real-world content moderation tasks. Our findings provide actionable insights for industrial-scale moderation systems, demonstrating that RL exhibits sigmoid-like scaling behavior in which performance improves smoothly with increased training data, rollouts, and optimization steps before gradually saturating. Moreover, we show that RL substantially improves performance on tasks requiring complex policy-grounded reasoning while achieving up to 100x higher data efficiency than supervised fine-tuning, making it particularly effective in domains where expert annotations are scarce or costly.

Authors: Chang Sun, Dongliang Xie, Bo Qin, Hong Yang

2025-12-23

http://arxiv.org/abs/2512.20032v1

Visual Speech Recognition aims to transcribe spoken words from silent lip-motion videos. This task is particularly challenging for Mandarin, as visemes are highly ambiguous and homophones are prevalent. We propose VALLR-Pin, a novel two-stage framework that extends the recent VALLR architecture from English to Mandarin. First, a shared video encoder feeds into dual rs, which jointly predict both Chinese character sequences and their standard Pinyin romanization. The multi-task learning of character and phonetic outputs fosters robust visual-semantic representations. During inference, the text r generates multiple candidate transcripts. We construct a prompt by concatenating the Pinyin output with these candidate Chinese sequences and feed it to a large language model to resolve ambiguities and refine the transcription. This provides the with explicit phonetic context to correct homophone-induced errors. Finally, we fine-tune the on synthetic noisy examples: we generate imperfect Pinyin-text pairs from intermediate VALLR-Pin checkpoints using the training data, creating instruction-response pairs for error correction. This endows the with awareness of our model's specific error patterns. In summary, VALLR-Pin synergizes visual features with phonetic and linguistic context to improve Mandarin lip-reading performance.

CoLaS Copula-Seeded Sparse Local Graphs with Tunable Assortativity, Persistent Clustering, and a Degree-Tail Dichotomy

Authors: Marios Papamichalis, Regina Ruane

2025-12-23

http://arxiv.org/abs/2512.20019v1

Empirical networks are typically yet display pronounced degree variation, persistent transitivity, and systematic degree mixing. Most generators control at most two of these features, and assortativity is often achieved by degree-pre rewiring, which obscures the mechanism-parameter link. We introduce CoLaS (copula-seeded local latent-space graphs), a modular latent-variable model that separates marginal specifications from dependence. Each node has a popularity variable governing degree heterogeneity and a latent geometric location governing locality. A low-dimensional copula couples popularity and location, providing an interpretable dependence parameter that tunes degree mixing while leaving the chosen marginals unchanged. Under shrinking-range locality, edges are conditionally independent, the graph remains , and clustering does not vanish. We develop -limit theory for degrees, transitivity, and assortativity. Degrees converge to mixed-Poisson limits and we establish a degree-tail dichotomy: with fixed-range local kernels, degree tails are necessarily light, even under heavy-ailed popularity. To recover power-law degrees without sacrificing or locality, we propose CoLaS-HT, a minimal tail-inheriting extension in which effective connection ranges grow with popularity. Finally, under an identifiability condition, we provide a consistent one-graph calibration method based on jointly matching transitivity and assortativity.

Spatio-Temporal Graph Neural Networks for Dairy Farm Sustainability Forecasting and Counterfactual Policy Analysis

Authors: Surya Jayakumar, Kieran Sullivan, John McLaughlin, Christine O'Meara, Indrakshi Dey

2025-12-23

http://arxiv.org/abs/2512.19970v1

This study introduces a novel data-driven framework and the first-ever county-scale application of Spatio-Temporal Graph Neural Networks (STGNN) to forecast composite sustainability indices from herd-level operational records. The methodology employs a novel, end-to-end pipeline utilizing a Variational Autoencoder (VAE) to augment Irish Cattle Breeding Federation (ICBF) datasets, pre joint distributions while mitigating . A first-ever pillar-based scoring formulation is derived via Principal Component Analysis, identifying Reproductive Efficiency, Genetic Management, Herd Health, and Herd Management, to construct weighted composite indices. These indices are modelled using a novel STGNN architecture that explicitly encodes geographic dependencies and non-linear temporal dynamics to generate multi-year forecasts for 2026-2030.

Interpolative Decoding Exploring the Spectrum of Personality Traits in LLMs

Authors: Eric Yeh, John Cadigan, Ran Chen, Dick Crouch, Melinda Gervasio, Dayne Freitag

2025-12-23

http://arxiv.org/abs/2512.19937v1

Recent research has explored using very large language models (s) as proxies for humans in tasks such as simulation, surveys, and studies. While s do not possess a human psychology, they often can emulate human behaviors with sufficiently high fidelity to drive simulations to test human behavioral hypotheses, exhibiting more nuance and range than the rule-based agents often employed in behavioral economics. One key area of interest is the effect of personality on decision making, but the requirement that a prompt must be created for every tested personality profile introduces experimental overhead and degrades replicability. To address this issue, we leverage interpolative , representing each dimension of personality as a pair of opposed prompts and employing an interpolation parameter to simulate behavior along the dimension. We show that interpolative reliably modulates scores along each of the Big Five dimensions. We then show how interpolative causes s to mimic human decision-making behavior in economic games, replicating results from human psychological research. Finally, we present preliminary results of our efforts to ``twin'' individual human players in a collaborative game through systematic search for points in interpolation space that cause the system to replicate actions taken by the human subject.

Gate-Based Microwave Quantum Repeater Via Grid-State Encoding

Authors: Hany Khalifa, Matti Silveri

2025-12-22

http://arxiv.org/abs/2512.19896v1

In autonomous quantum error correction the lifetime of a logical bosonic qubit can be extended beyond its physical constituents without feedback measurements. Leveraging autonomous error correction, we propose a second-generation gate-based microwave quantum repeater (GBMQR) with encoded bosonic grid states. Each repeater station comprises a transmon and two bosonic resonators: one resonator as a stationary quantum memory utilizing autonomous error correction, and the other as an information bus for entanglement generation. Entanglement is generated sequentially through the successful absorption of a microwave photon wavepacket. This method enables deterministic entanglement generation, in contrast to a probabilistic mixing of two heralding signals on a balanced beamsplitter. Furthermore, our GBMQR employs an all-bosonic entanglement swapping Bell-state measurement. This is implemented via a bosonic controlled-Z gate and two separate X-basis projective homodyne measurements on the stationary stored codewords. Our approach circumvents mode-mismatch losses associated with routing and interfering of heralding modes on a beamsplitter, and confines losses to those arising from stationary storage. We evaluate the performance of the proposed quantum repeater by calculating its secret key rate under realistic lab environments. Moreover, we explicitly demonstrate that at stationary damping rate of $κ^{-1}_{\text{damp}}=$ ~\SI{40}{\milli\second}, GBMQR can achieve entanglement generation and swapping success probabilities approx.~ $0.75$ , and $0.58$ respectively, surpassing the hallmark success probability of $1/2$ set by ideal linear beamsplitter-based Bell-state measurements. The proposed device can be implemented using currently available superconducting microwave technology and is suited for secure chip-to-chip and distributed quantum computing.

Gaussian Variational Inference with Non-Gaussian Factors for State Estimation A UWB Localization Case Study

Authors: Andrew Stirling, Mykola Lukashchuk, Dmitry Bagaev, Wouter Kouw, James R. Forbes

2025-12-22

http://arxiv.org/abs/2512.19855v1

This letter extends the exactly Gaussian variational inference (ESGVI) algorithm for state estimation in two complementary directions. First, ESGVI is generalized to operate on matrix Lie groups, enabling the estimation of states with orientation components while respecting the underlying group structure. Second, factors are introduced to accommodate heavy-tailed and skewed noise distributions, as commonly encountered in ultra-wideband (UWB) localization due to non-line-of-sight (NLOS) and multipath effects. Both extensions are shown to integrate naturally within the ESGVI framework while pre its and derivative-free structure. The proposed approach is validated in a UWB localization experiment with NLOS-rich measurements, demonstrating improved accuracy and comparable consistency. Finally, a Python implementation within a factor-graph-based estimation framework is made open-source (https://github.com/decargroup/gvi_ws) to support broader research use.

Power-Scalable Generation of High-Order Optical Vortices Via Coherent Beam Combining

Authors: Hossein Fathi, Rafael F. Barros, Regina Gumenyuk

2025-12-22

http://arxiv.org/abs/2512.19815v1

Structured light beams, such as optical vortices carrying orbital angular momentum, are essential for applications ranging from low-power optical s to high-intensity laser-matter interactions. However, scaling their power and energy while pre complex phase and spatial structures remains a fundamental challenge. In this work, we demonstrate coherent beam combining as a versatile and scalable method for generating high-power structured beams without limitations on topological charge or spatial structure, while maintaining exceptionally high modal purity. We experimentally implement coherent beam combining for optical vortex beams with topological charges l = 1, 5, and 8, achieving a combined average power of 100 W and a peak power of 100 kW, with combining efficiencies of 95.0%, 93.9%, and 91.2%, respectively. Off-axis digital holography confirms that the phase and intensity profiles of the combined beams retain high modal purity, even at high topological charges. These results establish coherent beam combining as an effective route to high modal purity structured light at high power levels, unlocking new opportunities for advanced photonics and high-intensity light-matter interaction studies.

PhysMaster Building an Autonomous AI Physicist for Theoretical and Computational Physics Research

Authors: Tingjia Miao, Jiawen Dai, Jingkun Liu, Jinxin Tan, Muhua Zhang, Wenkai Jin, Yuwen Du, Tian Jin, Xianghe Pang, Zexi Liu, Tu Guo, Zhengliang Zhang, Yunjie Huang, Shuo Chen, Rui Ye, Yuzhi Zhang, Linfeng Zhang, Kun Chen, Wei Wang, Weinan E, Siheng Chen

2025-12-22

http://arxiv.org/abs/2512.19799v1

Advances in s have produced agents with knowledge and operational capabilities comparable to human scientists, suggesting potential to assist, accelerate, and automate research. However, existing studies mainly evaluate such systems on well-defined benchmarks or general tasks like literature retrieval, limiting their end-to-end problem-solving ability in open scientific scenarios. This is particularly true in physics, which is abstract, mathematically intensive, and requires integrating analytical reasoning with code-based computation. To address this, we propose PhysMaster, an -based agent functioning as an autonomous theoretical and computational physicist. PhysMaster couples absract reasoning with numerical computation and leverages LANDAU, the Layered Academic Data Universe, which preserves retrieved literature, curated prior knowledge, and validated methodological traces, enhancing decision reliability and stability. It also employs an adaptive exploration strategy balancing efficiency and open-ended exploration, enabling robust performance in ultra-long-horizon tasks. We evaluate PhysMaster on problems from high-energy theory, condensed matter theory to astrophysics, including: (i) , compressing labor-intensive research from months to hours; (ii) automation, autonomously executing hypothesis-driven loops ; and (iii) autonomous discovery, independently exploring open problems.

RAPID-LLM Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference

Authors: George Karfakis, Faraz Tahmasebi, Binglu Chen, Lime Yao, Saptarshi Mitra, Tianyue Pan, Hyoukjun Kwon, Puneet Gupta

2025-12-22

http://arxiv.org/abs/2512.19606v1

RAPID- is a unified performance modeling framework for large language model () training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operator-level Chakra execution traces from an abstract specification (model shape, batch/sequence settings, training vs. inference, and hybrid parallelism choices) with an extended Astra-Sim backend that executes those traces on explicit multi-dimensional network topologies with congestion-aware routing and support for degraded and faulty links. The frontend assigns per-operator latency using a tile-based model that accounts for SM under-utilization and multi-level memory traffic (SRAM/ L2/ HBM), and prunes memory-infeasible configurations using an activation-liveness traversal under recomputation, parallelism and ZeRO/FDSP sharding policies. Across A100-based validation cases, RAPID- predicts Llama inference step latency and GPT-scale training time per batch within 10.4\% relative to published measurements, and matches ns-3 packet-level results within 8\% on representative workloads. Case studies demonstrate how RAPID- enables fast, exhaustive sweeps over hybrid-parallel configurations, quantifies sensitivity to soft link faults under realistic routing and congestion, and evaluates hypothetical GPU design variants including HBM bandwidth throttling effects.

Possibilistic Inferential Models for Post-Selection Inference in High-Dimensional Linear Regression

Authors: Yaohui Lin

2025-12-22

http://arxiv.org/abs/2512.19588v1

Valid uncertainty quantification after model selection remains challenging in high-dimensional linear regression, especially within the possibilistic inferential model (PIM) framework. We develop possibilistic inferential models for post-selection inference based on a regularized split possibilistic construction (RSPIM) that combines generic high-dimensional selectors with PIM validification through sample splitting. A first subsample is used to select a model; ordinary least-squares refits on an independent inference subsample yield classical t/F pivots, which are then turned into consonant plausibility contours. In Gaussian linear models this leads to coor-dinatewise intervals with exact finite-sample strong validity conditional on the split and selected model, uniformly over all selectors that use only the selection data. We further analyze RSPIM in a p >> n regime under high-level screening conditions, develop orthogonalized and bootstrap-based extensions for low-dimensional targets with high-dimensional nuisance, and study a maxitive multi-split aggregation that stabilizes inference across random splits while pre strong validity. Simulations and a riboflavin gene-expression example show that calibrated RSPIM intervals are well behaved under both Gaussian and heteroskedastic errors and are competitive with state-of-the-art post-selection methods, while plausibility contours provide transparent diagnostics of post-selection uncertainty.

Event Extraction in Large Language Model

Authors: Bobo Li, Xudong Han, Jiang Liu, Yuzhe Ding, Liqiang Jing, Zhaoqi Zhang, Jinheng Li, Xinya Du, Fei Li, Meishan Zhang, Min Zhang, Aixin Sun, Philip S. Yu, Hao Fei

2025-12-22

http://arxiv.org/abs/2512.19537v1

Large language models (s) and multimodal s are changing event extraction (EE): prompting and generation can often produce structured outputs in zero shot or few shot settings. Yet based pipelines face deployment gaps, including hallucinations under weak constraints, fragile temporal and causal linking over long contexts and across documents, and limited long horizon knowledge management within a bounded context window. We argue that EE should be viewed as a system component that provides a cognitive scaffold for centered solutions. Event schemas and slot constraints create interfaces for grounding and verification; event centric structures act as controlled intermediate representations for stepwise reasoning; event links support relation aware retrieval with graph based RAG; and event stores offer updatable episodic and agent memory beyond the context window. This survey covers EE in text and multimodal settings, organizing tasks and taxonomy, tracing method evolution from rule based and neural models to instruction driven and generative frameworks, and summarizing formulations, strategies, architectures, representations, datasets, and evaluation. We also review cross lingual, low resource, and domain specific settings, and highlight open challenges and future directions for reliable event centric systems. Finally, we outline open challenges and future directions that are central to the era, aiming to evolve EE from static extraction into a structurally reliable, agent ready perception and memory layer for open world systems.

Lightweight Intrusion Detection in IoT via SHAP-Guided Feature Pruning and Knowledge-Distilled Kronecker Networks

Authors: Hafsa Benaddi, Mohammed Jouhari, Nouha Laamech, Anas Motii, Khalil Ibrahimi

2025-12-22

http://arxiv.org/abs/2512.19488v1

The widespread deployment of Internet of Things (IoT) devices requires intrusion detection systems (IDS) with high accuracy while operating under strict resource constraints. Conventional deep learning IDS are often too large and computationally intensive for edge deployment. We propose a lightweight IDS that combines SHAP-guided feature with knowledge-distilled Kronecker networks. A high-capacity teacher model identifies the most relevant features through SHAP explanations, and a compressed student leverages Kronecker-structured layers to minimize parameters while pre discriminative inputs. Knowledge distillation transfers softened decision boundaries from teacher to student, improving generalization under . Experiments on the TON_IoT dataset show that the student is nearly three orders of magnitude smaller than the teacher yet sustains macro-F1 above 0.986 with millisecond-level inference latency. The results demonstrate that explainability-driven and structured can jointly enable scalable, low-latency, and energy-efficient IDS for heterogeneous IoT environments.

A Mathematical Framework for Misinformation Propagation in Complex Networks Topology-Dependent Distortion and Control

Authors: Saikat Sur, Rohitashwa Chattopadhyay, Jens Christian Claussen, Archan Mukhopadhyay

2025-12-22

http://arxiv.org/abs/2512.19465v2

Misinformation is pervasive in natural, biological, social, and engineered systems, yet its quantitative characterization remains challenging. We develop a general mathematical framework for quantifying information distortion in distributed systems by modeling how local transmission errors accumulate along network geodesics and reshape each agent's perceived global state. Through a drift-fluctuation decomposition of pathwise binomial noise, we derive closed-form expressions for node-level perception distributions and show that directional bias induces only a uniform shift in the mean, pre the fluctuation structure. Applying the framework to canonical graph ensembles, we uncover strong topological signatures of misinformation: Erdős-Rényi random graphs exhibit a double-peaked distortion profile driven by connectivity transitions and geodesic-length fluctuations, scale-free networks suppress misinformation through hub-mediated integration, and optimally rewired small-world networks achieve comparable suppression by balancing clustering with short paths. A direct comparison across regular lattices, Erdős-Rényi random graphs, Watts-Strogatz small-world networks, and Barabási-Albert scale-free networks reveals a connectivity-dependent crossover. In the extremely regime, scale-free and Erdős-Rényi networks behave similarly. At intermediate , Watts-Strogatz small-world networks exhibit the lowest misinformation. In contrast, Barabás-Albert scale-free networks maintain low misinformation in and dense regimes, while regular lattices produce the highest distortion across connectivities. We additionally show how constraints, structural organization, and connection costs delineate regimes of minimal misinformation.

Sensitivity-Aware Mixed-Precision Quantization for ReRAM-based Computing-in-Memory

Authors: Guan-Cheng Chen, Chieh-Lin Tsai, Pei-Hsuan Tsai, Yuan-Hao Chang

2025-12-22

http://arxiv.org/abs/2512.19445v1

Compute-In-Memory (CIM) systems, particularly those utilizing ReRAM and memristive technologies, offer a promising path toward energy-efficient neural network computation. However, conventional and techniques often fail to fully optimize performance and efficiency in these architectures. In this work, we present a structured method that combines sensitivity analysis with mixed-precision strategies to enhance weight storage and computational performance on ReRAM-based CIM systems. Our approach improves ReRAM Crossbar utilization, significantly reducing power consumption, latency, and computational load, while maintaining high accuracy. Experimental results show 86.33% accuracy at 70% , alongside a 40% reduction in power consumption, demonstrating the method's effectiveness for power-constrained applications.

D2Pruner Debiased Importance and Structural Diversity for MLLM Token Pruning

Authors: Evelyn Zhang, Fufu Yu, Aoqi Wu, Zichen Wen, Ke Yan, Shouhong Ding, Biqing Qi, Linfeng Zhang

2025-12-22

http://arxiv.org/abs/2512.19443v1

Processing long visual token sequences poses a significant computational burden on Multimodal Large Language Models (Ms). While token offers a path to , we find that current methods, while adequate for general understanding, catastrophically fail on fine-grained localization tasks. We attribute this failure to the inherent flaws of the two prevailing strategies: importance-based methods suffer from a strong positional bias, an inherent model artifact that distracts from semantic content, while diversity-based methods exhibit structural blindness, disregarding the user's prompt and spatial redundancy. To address this, we introduce D2Pruner, a framework that rectifies these issues by uniquely combining debiased importance with a structural mechanism. Our method first secures a core set of the most critical tokens as pivots based on a debiased attention score. It then performs a Maximal Independent Set (MIS) selection on the remaining tokens, which are modeled on a hybrid graph where edges signify spatial proximity and semantic similarity. This process iteratively preserves the most important and available token while removing its neighbors, ensuring that the supplementary tokens are chosen to maximize importance and diversity. Extensive experiments demonstrate that D2Pruner has exceptional efficiency and fidelity. Applied to LLaVA-1.5-7B for general understanding tasks, it reduces FLOPs by 74.2\% while retaining 99.2\% of its original performance. Furthermore, in challenging localization benchmarks with InternVL-2.5-8B, it maintains 85.7\% performance at a 90\% token reduction rate, marking a significant advancement with up to 63. 53\% improvement over existing methods.

Real-Time Streamable Generative Speech Restoration with Flow Matching

Authors: Simon Welker, Bunlong Lay, Maris Hillemann, Tal Peer, Timo Gerkmann

2025-12-22

http://arxiv.org/abs/2512.19442v1

Diffusion-based generative models have greatly impacted the speech processing field in recent years, exhibiting high speech naturalness and spawning a new research direction. Their application in real-time is, however, still lagging behind due to their computation-heavy nature involving multiple calls of large DNNs. Here, we present Stream.FM, a frame-causal flow-based generative model with an algorithmic latency of 32 milliseconds (ms) and a total latency of 48 ms, paving the way for generative speech processing in real-time . We propose a buffered streaming inference scheme and an optimized DNN architecture, show how learned few-step numerical solvers can boost output quality at a fixed compute budget, explore model weight to find favorable points along a compute/quality tradeoff, and contribute a model variant with 24 ms total latency for the speech enhancement task. Our work looks beyond theoretical latencies, showing that high-quality streaming generative speech processing can be realized on consumer GPUs available today. Stream.FM can solve a variety of speech processing tasks in a streaming fashion: speech enhancement, dereverberation, codec post-filtering, bandwidth extension, STFT phase retrieval, and Mel vocoding. As we verify through comprehensive evaluations and a MUSHRA listening test, Stream.FM establishes a state-of-the-art for generative streaming speech restoration, exhibits only a reasonable reduction in quality compared to a non-streaming variant, and outperforms our recent work (Diffusion Buffer) on generative streaming speech enhancement while operating at a lower latency.

From Retrieval to Reasoning A Framework for Cyber Threat Intelligence NER with Explicit and Adaptive Instructions

Authors: Jiaren Peng, Hongda Sun, Xuan Tian, Cheng Huang, Zeqing Li, Rui Yan

2025-12-22

http://arxiv.org/abs/2512.19414v1

The automation of Cyber Threat Intelligence (CTI) relies heavily on Named Entity Recognition (NER) to extract critical entities from unstructured text. Currently, Large Language Models (s) primarily address this task through retrieval-based In-Context Learning (ICL). This paper analyzes this mainstream paradigm, revealing a fundamental flaw: its success stems not from global semantic similarity but largely from the incidental of entity types within retrieved examples. This exposes the limitations of relying on unreliable implicit induction. To address this, we propose TTPrompt, a framework shifting from implicit induction to explicit instruction. TTPrompt maps the core concepts of CTI's Tactics, Techniques, and Procedures (TTPs) into an instruction hierarchy: formulating task definitions as Tactics, guiding strategies as Techniques, and annotation guidelines as Procedures. Furthermore, to handle the adaptability challenge of static guidelines, we introduce Feedback-driven Instruction Refinement (FIR). FIR enables s to self-refine guidelines by learning from errors on minimal labeled data, adapting to distinct annotation dialects. Experiments on five CTI NER benchmarks demonstrate that TTPrompt consistently surpasses retrieval-based baselines. Notably, with refinement on just 1% of training data, it rivals models fine-tuned on the full dataset. For instance, on LADDER, its Micro F1 of 71.96% approaches the fine-tuned baseline, and on the complex CTINexus, its Macro F1 exceeds the fine-tuned ACLM model by 10.91%.