2026-01-16

Table of Contents

Detecting Winning Arguments with Large Language Models and Persuasion Strategies

Authors: Tiziano Labruna, Arkadiusz Modzelewski, Giorgio Satta, Giovanni Da San Martino

2026-01-15

http://arxiv.org/abs/2601.10660v1

Detecting persuasion in argumentative text is a challenging task with important implications for understanding human key. This work investigates the role of persuasion strategies - such as Attack on reputation, Distraction, and Manipulative wording - in determining the persuasiveness of a text. We conduct experiments on three annotated argument datasets: Winning Arguments (built from the Change My View subreddit), Anthropic/Persuasion, and Persuasion for Good. Our approach leverages large language models (keys) with a Multi-Strategy Persuasion Scoring approach that guides reasoning over six persuasion strategies. Results show that strategy-guided reasoning improves the prediction of persuasiveness. To better understand the influence of content, we organize the Winning Argument dataset into broad discussion topics and analyze performance across them. We publicly release this topic-annotated version of the dataset to facilitate future research. Overall, our methodology demonstrates the value of structured, strategy-aware prompting for enhancing interpretability and robustness in argument quality assessment.

PACEvolve Enabling Long-Horizon Progress-Aware Consistent Evolution

Authors: Minghao Yan, Bo Peng, Benjamin Coleman, Ziqi Chen, Zhouhang Xie, Zhankui He, Noveen Sachdeva, Isabella Ye, Weili Wang, Chi Wang, Ed H. Chi, Wang-Cheng Kang, Derek Zhiyuan Cheng, Beidou Wang

2026-01-15

http://arxiv.org/abs/2601.10657v1

Large Language Models (keys) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current key-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure modes: Context Pollution, where experiment history biases future candidate generation; Mode Collapse, where agents stagnate in local minima due to poor exploration-exploitation balance; and Weak Collaboration, where rigid crossover strategies fail to leverage parallel search trajectories effectively. We introduce Progress-Aware Consistent Evolution (PACEvolve), a framework designed to robustly govern the agent's context and search dynamics, to address these challenges. PACEvolve combines hierarchical context management (HCM) with key to address context pollution; momentum-based backtracking (MBB) to escape local minima; and a self-adaptive sampling policy that unifies backtracking and crossover for dynamic search coordination (CE), allowing agents to balance internal refinement with cross-trajectory collaboration. We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement, achieving state-of-the-art results on key-SR and KernelBench, while discovering solutions surpassing the record on Modded NanoGPT.

Supergravity with Lagrange Multiplier Fields in 2 + 1 Dimensions

Authors: D. G. C. McKeon, F. T. Brandt, J. Frenkel, S. Martins-Filho

2026-01-15

http://arxiv.org/abs/2601.10593v1

We examine the first-order Einstein-Cartan (EC) action in 2+1 dimensions, including a cosmological term and its supersymmetric extension. In this setting the spin connection can be expressed as an axial vector, yielding an action that is bilinear in the quantum fields and allows key without background fields. We identify the complete set of first-class constraints and derive the associated gauge transformations, which differ from the standard diffeomorphism and local Lorentz invariances. Using the closed gauge algebra, we construct the Faddeev-Popov-Nielsen path integral and show how a Lagrange multiplier field can be introduced to remove higher-loop contributions while prekey unitarity and gauge invariance.

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

Authors: Yinzhi Zhao, Ming Wang, Shi Feng, Xiaocui Yang, Daling Wang, Yifei Zhang

2026-01-15

http://arxiv.org/abs/2601.10543v1

Large language models (keys) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world applications. Despite extensive safety alignment efforts, recent studies show that such alignment is often shallow and remains vulnerable to jailbreak attacks. Existing defense mechanisms, including key-based constraints and post-hoc content detectors, struggle against sophisticated jailbreaks, often intervening robust detection or excessively degrading model utility. In this work, we examine the key process of keys and make a key observation: even when successfully jailbroken, models internally exhibit latent safety-related signals during generation. However, these signals are overridden by the model's drive for fluent continuation, preventing timely self-correction or refusal. Building on this observation, we propose a simple yet effective approach that explicitly surfaces and leverages these latent safety signals for early detection of unsafe content during key. Experiments across diverse jailbreak attacks demonstrate that our approach significantly enhances safety, while maintaining low over-refusal rates on benign inputs and prekey response quality. Our results suggest that activating intrinsic safety-awareness during key offers a promising and complementary direction for defending against jailbreak attacks. Code is available at: https://github.com/zyz13590/SafeProbing.

Communication-Efficient Federated Learning by Exploiting Spatio-Temporal Correlations of Gradients

Authors: Shenlong Zheng, Zhen Zhang, Yuhui Deng, Geyong Min, Lin Cui

2026-01-15

http://arxiv.org/abs/2601.10491v1

Communication overhead is a critical challenge in federated learning, particularly in bandwidth-constrained networks. Although many methods have been proposed to reduce key overhead, most focus solely on compressing individual gradients, overlooking the temporal correlations among them. Prior studies have shown that gradients exhibit spatial correlations, typically reflected in low-rank structures. Through empirical analysis, we further observe a strong temporal correlation between client gradients across adjacent rounds. Based on these observations, we propose GradESTC, a key technique that exploits both spatial and temporal gradient correlations. GradESTC exploits spatial correlations to decompose each full gradient into a compact set of basis vectors and corresponding combination coefficients. By exploiting temporal correlations, only a small portion of the basis vectors need to be dynamically updated in each round. GradESTC significantly reduces key overhead by transmitting lightweight combination coefficients and a limited number of updated basis vectors instead of the full gradients. Extensive experiments show that, upon reaching a target accuracy level near convergence, GradESTC reduces uplink key by an average of 39.79% compared to the strongest baseline, while maintaining comparable convergence speed and final accuracy to uncompressed FedAvg. By effectively leveraging spatio-temporal gradient structures, GradESTC offers a practical and scalable solution for key-efficient federated learning.

Energy-Efficient Probabilistic Semantic Communication Over Visible Light Networks With Rate Splitting

Authors: Zhouxiang Zhao, Zhaohui Yang, Mingzhe Chen, Chen Zhu, Xin Tong, Zhaoyang Zhang

2026-01-15

http://arxiv.org/abs/2601.10452v1

Visible light key (VLC) is emerging as a key technology for future wireless key systems due to its unique physical-layer advantages over traditional radio-frequency (RF)-based systems. However, its integration with higher-layer techniques, such as semantic key, remains underexplored. This paper investigates the energy efficiency maximization problem in a resource-constrained VLC-based probabilistic semantic key (PSCom) system. In the considered model, light-emitting diode (LED) transmitters perform semantic key to reduce data size, which incurs additional computation overhead. The compressed semantic information is transmitted to the users for semantic inference using a shared knowledge base that requires periodic updates to ensure synchronization. In the PSCom system, the knowledge base is represented by probabilistic graphs. To enable simultaneous transmission of both knowledge and information data, rate splitting multiple access (RSMA) is employed. The optimization problem focuses on maximizing energy efficiency by jointly optimizing transmit beamforming, direct current (DC) bias, common rate allocation, and semantic key ratio, while accounting for both key and computation costs. To solve this problem, an alternating optimization algorithm based on successive convex approximation (SCA) and Dinkelbach method is developed. Simulation results demonstrate the effectiveness of the proposed approach.

Placement Delivery Array for Cache-Aided MIMO Systems

Authors: Yifei Huang, Kai Wan, Minquan Cheng, Jinyan Wang, Giuseppe Caire

2026-01-15

http://arxiv.org/abs/2601.10422v1

We consider a key-aided multiple-input multiple-output (MIMO) network, where a server equipped with antennas and a library of equal-size files communicates with users, each equipped with antennas and a key of size files, over a wireless interference channel. Each user requests an arbitrary file from the library. The goal is to design coded caching schemes that simultaneously achieve the maximum sum degrees of freedom (sum-DoF) and low subpacketization. In this paper, we first introduce a unified combinatorial structure, termed the MIMO placement delivery array (MIMO-PDA), which characterizes uncoded placement and one-shot zero-forcing delivery. By analyzing the combinatorial properties of MIMO-PDAs, we derive a sum-DoF upper bound of , where , which coincides with the optimal DoF characterization in prior work by Tehrani \emph{et al.}. Based on this upper bound, we present two novel constructions of MIMO-PDAs that achieve the maximum sum-DoF. The first construction achieves linear subpacketization under stringent parameter constraints, while the second achieves ordered exponential subpacketization under substantially milder constraints. Theoretical analysis and numerical comparisons demonstrate that the second construction exponentially reduces subpacketization compared to existing schemes while prekey the maximum sum-DoF.

An effective interactive brain cytoarchitectonic parcellation framework using pretrained foundation model

Authors: Shiqi Zhang, Fang Xu, Pengcheng Zhou

2026-01-15

http://arxiv.org/abs/2601.10412v1

Cytoarchitectonic mapping provides anatomically grounded parcellations of brain structure and forms a foundation for integrative, multi-modal neuroscience analyses. These parcellations are defined based on the shape, density, and spatial arrangement of neuronal cell bodies observed in histological imaging. Recent works have demonstrated the potential of using deep learning models toward fully automatic segmentation of cytoarchitectonic areas in large-scale datasets, but performance is mainly constrained by the scarcity of training labels and the variability of staining and imaging conditions. To address these challenges, we propose an interactive cytoarchitectonic parcellation framework that leverages the strong transferability of the DINOv3 vision key. Our framework combines (i) multi-layer DINOv3 feature fusion, (ii) a lightweight segmentation keyr, and (iii) real-time user-guided training from key scribbles. This design enables rapid human-in-the-loop refinement while maintaining high segmentation accuracy. Compared with training an nnU-Net from scratch, transfer learning with DINOv3 yields markedly improved performance. We also show that features extracted by DINOv3 exhibit clear anatomical correspondence and demonstrate the method's practical utility for brain region segmentation using key labels. These results highlight the potential of foundation-model-driven interactive segmentation for scalable and efficient cytoarchitectonic mapping.

TF3-RO-50M Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction

Authors: Mihai Dan Nadas, Laura Diosan, Andreea Tomescu, Andrei Piscoran

2026-01-15

http://arxiv.org/abs/2601.10410v1

Recent advances in synthetic data generation have shown that compact language models can be trained effectively when the underlying corpus is structurally controlled and linguistically coherent. However, for morphologically rich and computationally under-resourced languages such as Romanian, there is still no openly documented, end-to-end pipeline that unifies tokenizer design, preprocessing, pretraining, key, evaluation, and large-scale synthetic data generation in a reproducible framework. Building on TF1, a three-million-story English fable dataset, and TF2, which extends TF1 through high-quality Romanian translations, we introduce TF3-RO, a Romanian-centric language modeling pipeline spanning tokenizer training, from-scratch model development, and Romanian-native dataset generation. TF3-RO constructs Romanian-specific BPE and Unigram tokenizers from a linguistically informed corpus to mitigate token inflation induced by Romanian morphology. Using long-sequence packed training, we pretrain a 51.65M-parameter LLaMA-style Transformer entirely from scratch. The model is subsequently optimized through key, structured key, and logit-based knowledge distillation, yielding a compact 26.45M-parameter student model with tied embeddings and strong deployment characteristics. Using this distilled model, TF3-RO generates three million Romanian-native synthetic fables via a controlled combinatorial prompting framework. Across all stages, the pipeline integrates a comprehensive evaluation suite combining intrinsic metrics, Romanian agreement probes, entity coherence, rule-based grammar checking, and key-based assessment. TF3-RO provides a reproducible and linguistically grounded framework for training compact Romanian language models and producing large-scale synthetic narrative corpora.

Toward Ultra-Long-Horizon Agentic Science Cognitive Accumulation for Machine Learning Engineering

Authors: Xinyu Zhu, Yuzhu Cai, Zexi Liu, Bingyang Zheng, Cheng Wang, Rui Ye, Jiaao Chen, Hanrui Wang, Wei-Chen Wang, Yuzhi Zhang, Linfeng Zhang, Weinan E, Di Jin, Siheng Chen

2026-01-15

http://arxiv.org/abs/2601.10402v1

The advancement of artificial intelligence toward agentic science is currently bottlenecked by the challenge of ultra-long-horizon autonomy, the ability to sustain strategic coherence and iterative correction over experimental cycles spanning days or weeks. While Large Language Models (keys) have demonstrated prowess in short-horizon reasoning, they are easily overwhelmed by execution details in the high-dimensional, delayed-feedback environments of real-world research, failing to consolidate key feedback into coherent long-term guidance. Here, we present ML-Master 2.0, an autonomous agent that masters ultra-long-horizon machine learning engineering (MLE) which is a representative microcosm of scientific discovery. By reframing context management as a process of cognitive accumulation, our approach introduces Hierarchical Cognitive Caching (HCC), a multi-tiered architecture inspired by computer systems that enables the structural differentiation of experience over time. By dynamically distilling transient execution traces into stable knowledge and cross-task wisdom, HCC allows agents to decouple immediate execution from long-term experimental strategy, effectively overcoming the scaling limits of static context windows. In evaluations on OpenAI's MLE-Bench under 24-hour budgets, ML-Master 2.0 achieves a state-of-the-art medal rate of 56.44%. Our findings demonstrate that ultra-long-horizon autonomy provides a scalable blueprint for AI capable of autonomous exploration beyond human-precedent complexities.

LatentRefusal Latent-Signal Refusal for Unanswerable Text-to-SQL Queries

Authors: Xuancheng Ren, Shijing Hu, Zhihui Lu, Jiangqi Huang, Qiang Duan

2026-01-15

http://arxiv.org/abs/2601.10398v1

In key-based text-to-SQL systems, unanswerable and underspecified user queries may generate not only incorrect text but also executable programs that yield misleading results or violate safety constraints, posing a major barrier to safe deployment. Existing refusal strategies for such queries either rely on output-level instruction following, which is brittle due to model hallucinations, or estimate output uncertainty, which adds complexity and overhead. To address this challenge, we formalize safe refusal in text-to-SQL systems as an answerability-gating problem and propose LatentRefusal, a latent-signal refusal mechanism that predicts query answerability from intermediate hidden activations of a large language model. We introduce the Tri-Residual Gated Encoder, a lightweight probing architecture, to suppress schema noise and amplify key, localized cues of question-schema mismatch that indicate unanswerability. Extensive empirical evaluations across diverse ambiguous and unanswerable settings, together with ablation studies and interpretability analyses, demonstrate the effectiveness of the proposed approach and show that LatentRefusal provides an attachable and efficient safety layer for text-to-SQL systems. Across four benchmarks, LatentRefusal improves average F1 to 88.5 percent on both backbones while adding approximately 2 milliseconds of probe overhead.

Online identification of nonlinear time-varying systems with uncertain information

Authors: He Ren, Gaowei Yan, Hang Liu, Lifeng Cao, Zhijun Zhao, Gang Dang

2026-01-15

http://arxiv.org/abs/2601.10379v1

Digital twins (DTs), key as the core enablers for real-time monitoring and predictive maintenance of complex cyber-physical systems, impose critical requirements on their virtual models: high predictive accuracy, strong interpretability, and online adaptive capability. However, existing techniques struggle to meet these demands simultaneously: Bayesian methods excel in uncertainty quantification but lack model interpretability, while interpretable symbolic identification methods (e.g., SINDy) are constrained by their offline, batch-processing nature, which make real-time updates challenging. To bridge this semantic and computational gap, this paper proposes a novel Bayesian Regression-based Symbolic Learning (BRSL) framework. The framework formulates online symbolic discovery as a unified probabilistic state-space model. By incorporating key horseshoe priors, model selection is transformed into a Bayesian inference task, enabling simultaneous system identification and uncertainty quantification. Furthermore, we derive an online recursive algorithm with a forgetting factor and establish precise recursive conditions that guarantee the well-posedness of the posterior distribution. These conditions also function as real-time monitors for data utility, enhancing algorithmic robustness. Additionally, a rigorous convergence analysis is provided, demonstrating the convergence of parameter estimates under persistent excitation conditions. Case studies validate the effectiveness of the proposed framework in achieving interpretable, probabilistic prediction and online learning.

Global Context Compression with Interleaved Vision-Text Transformation

Authors: Dian Jiao, Jiaxin Duan, Shuai Zhao, Jiabing Leng, Yiran Zhang, Feng Huang

2026-01-15

http://arxiv.org/abs/2601.10378v1

Recent achievements of vision-language models in end-to-end OCR point to a new avenue for low-loss key of textual information. This motivates earlier works that render the Transformer's input into images for keying, which effectively reduces the number of tokens through visual encoding, thereby alleviating the quadratically increased Attention computations. However, this partial key fails to save computational or memory costs at token-by-token inference. In this paper, we investigate global context key, which saves tokens at both keying and inference stages. Consequently, we propose VIST2, a novel Transformer that interleaves input text chunks alongside their visual encoding, while depending exclusively on visual tokens in the pre-context to predict the next text token distribution. Around this idea, we render text chunks into sketch images and train VIST2 in multiple stages, starting from curriculum-scheduled pretraining for optical language modeling, followed by modal-interleaved instruction tuning. We conduct extensive experiments using VIST2 families scaled from 0.6B to 8B to explore the training recipe and hyperparameters. With a 4 key ratio, the resulting models demonstrate significant superiority over baselines on long writing tasks, achieving, on average, a 3 speedup in first-token generation, 77% reduction in memory usage, and 74% reduction in FLOPS. Our codes and datasets will be public to support further studies.

Towards Efficient Low-rate Image Compression with Frequency-aware Diffusion Prior Refinement

Authors: Yichong Xia, Yimin Zhou, Jinpeng Wang, Bin Chen

2026-01-15

http://arxiv.org/abs/2601.10373v1

Recent advancements in diffusion-based generative priors have enabled visually plausible image key at extremely low bit rates. However, existing approaches suffer from slow sampling processes and suboptimal bit allocation due to fragmented training paradigms. In this work, we propose Accelerate \textbf{Diff}usion-based Image Compression via \textbf{C}onsistency Prior \textbf{R}efinement (DiffCR), a novel key framework for efficient and high-fidelity image reconstruction. At the heart of DiffCR is a Frequency-aware Skip Estimation (FaSE) module that refines the -prediction prior from a pre-trained latent diffusion model and aligns it with compressed latents at different timesteps via Frequency Decoupling Attention (FDA). Furthermore, a lightweight consistency estimator enables fast \textbf{two-step key} by prekey the semantic trajectory of diffusion sampling. Without updating the backbone diffusion model, DiffCR achieves substantial bitrate savings (27.2\% BD-rate (LPIPS) and 65.1\% BD-rate (PSNR)) and over speed-up compared to SOTA diffusion-based key baselines.

Joint Bayesian inference of Earth's magnetic field and core surface flow on millennial timescales

Authors: Andreas Nilsson, Neil Suttie, Marie Troyano, Nicolas Gillet, Julien Aubert, Anders Irbäck

2026-01-15

http://arxiv.org/abs/2601.10344v1

Understanding Earth's core dynamics over millennial timescales requires models that jointly describe the evolution of the geomagnetic field and core surface flow, while accommodating the key, irregular, and uncertain nature of archaeomagnetic and palaeomagnetic data. We present a new Bayesian core field and core flow modelling framework that utilises archaeo/palaeomagnetic data directly, combining a reduced stochastic representation of core surface dynamics derived from numerical geodynamo statistics with a probabilistic treatment of observational and chronological uncertainties. A key innovation is an efficient discrete marginalisation of age uncertainties, which avoids the convergence difficulties associated with co-estimating ages in high-dimensional Hamiltonian Monte Carlo inversions. The framework aims to reconstruct the coupled evolution of the geomagnetic field and core surface flow over the past 9000 years while prekey dynamical correlations implied by the prior geodynamo time series. Tests using synthetic data generated from an Earth-like geodynamo demonstrate that the method reliably recovers large-scale geomagnetic field variations and key aspects of core dynamics, including long-term westward drift and the evolution of planetary-scale eccentric gyres. These results show that, when combined with physically informed priors, archaeo/palaeomagnetic data can constrain millennial-scale core flow, paving the way for reconstructions based on real data.

Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning

Authors: Xin Guan, Zijian Li, Shen Huang, Pengjun Xie, Jingren Zhou, Jiuxin Cao

2026-01-15

http://arxiv.org/abs/2601.10306v1

While Reinforcement Learning (RL) has advanced key reasoning, applying it to long-context scenarios is hindered by key of outcome rewards. This limitation fails to penalize ungrounded "lucky guesses," leaving the critical process of needle-in-a-haystack evidence retrieval largely unsupervised. To address this, we propose EAPO (Evidence-Augmented Policy Optimization). We first establish the Evidence-Augmented Reasoning paradigm, validating via Tree-Structured Evidence Sampling that precise evidence extraction is the decisive bottleneck for long-context reasoning. Guided by this insight, EAPO introduces a specialized RL algorithm where a reward model computes a Group-Relative Evidence Reward, providing dense process supervision to explicitly improve evidence quality. To sustain accurate supervision throughout training, we further incorporate an Adaptive Reward-Policy Co-Evolution mechanism. This mechanism iteratively refines the reward model using outcome-consistent rollouts, sharpening its discriminative capability to ensure precise process guidance. Comprehensive evaluations across eight benchmarks demonstrate that EAPO significantly enhances long-context reasoning performance compared to SOTA baselines.

In-Context Source and Channel Coding

Authors: Ziqiong Wang, Tianqi Ren, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

2026-01-15

http://arxiv.org/abs/2601.10267v1

Separate Source-Channel Coding (SSCC) remains attractive for text transmission due to its modularity and compatibility with mature entropy coders and powerful channel codes. However, SSCC often suffers from a pronounced cliff effect in low Signal-to-Noise Ratio (SNR) regimes, where residual bit errors after channel key can catastrophically break lossless source key, especially for Arithmetic Coding (AC) driven by Large Language Models (keys). This paper proposes a receiver-side In-Context Decoding (ICD) framework that enhances SSCC robustness without modifying the transmitter. ICD leverages an Error Correction Code Transformer (ECCT) to obtain bit-wise reliability for the keyd information bits. Based on the context-consistent bitstream, ICD constructs a confidence-ranked candidate pool via reliability-guided bit flipping, samples a compact yet diverse subset of candidates, and applies an key-based arithmetic keyr to obtain both reconstructions and sequence-level log-likelihoods. A reliability-likelihood fusion rule then selects the final output. We further provide theoretical guarantees on the stability and convergence of the proposed sampling procedure. Extensive experiments over Additive White Gaussian Noise (AWGN) and Rayleigh fading channels demonstrate consistent gains compared with conventional SSCC baselines and representative Joint Source-Channel Coding (JSCC) schemes.

STEAMROLLER A Multi-Agent System for Inclusive Automatic Speech Recognition for People who Stutter

Authors: Ziqi Xu, Yi Liu, Yuekang Li, Ling Shi, Kailong Wang, Yongxin Zhao

2026-01-15

http://arxiv.org/abs/2601.10223v1

People who stutter (PWS) face systemic exclusion in today's voice-driven society, where access to voice assistants, authentication systems, and remote work tools increasingly depends on fluent speech. Current automatic speech recognition (ASR) systems, trained predominantly on fluent speech, fail to serve millions of PWS worldwide. We present STEAMROLLER, a real time system that transforms stuttered speech into fluent output through a novel multi-stage, multi-agent AI pipeline. Our approach addresses three critical technical challenges: (1) the difficulty of direct speech to speech conversion for disfluent input, (2) semantic distortions introduced during ASR transcription of stuttered speech, and (3) latency constraints for real time key. STEAMROLLER employs a three stage architecture comprising ASR transcription, multi-agent text repair, and speech synthesis, where our core innovation lies in a collaborative multi-agent framework that iteratively refines transcripts while prekey semantic intent. Experiments on the FluencyBank dataset and a user study demonstrates clear word error rate (WER) reduction and strong user satisfaction. Beyond immediate accessibility benefits, fine tuning ASR on STEAMROLLER repaired speech further yields additional WER improvements, creating a pathway toward inclusive AI ecosystems.

LOOKAT Lookup-Optimized Key-Attention for Memory-Efficient Transformers

Authors: Aryan Karmore

2026-01-15

http://arxiv.org/abs/2601.10155v1

Compressing the key key is a required step to deploy large language models on edge devices. Current key methods compress storage but fail to reduce bandwidth as attention calculation requires dequantizing keys from INT4/INT8 to FP16 before use. We observe that attention scoring is mathematically equivalent to the inner product similarity search and we can apply some key techniques from vector databases to compress key-key better. We propose LOOKAT, which applies product key and asymmetric distance computation, to key architecture by decomposing key vectors into subspaces, learning codebooks and computing attention tables via lookup tables. This transforms attention from memory-bound to compute-bound. LOOKAT achieves 64 key at 95.7\% output fidelity and 32 key at 95.0\% fidelity when tested on GPT-2. LOOKAT requires no architecture changes or training while maintaining rank correlation . Theoretical analysis confirms that rank correlation degrades as , with guarantees validated across sequence lengths up to 1024 tokens.

TopoDIM One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems

Authors: Rui Sun, Jie Ding, Chenghua Gong, Tianjun Gu, Yihang Jiang, Juyuan Zhang, Liming Pan, Linyuan Lü

2026-01-15

http://arxiv.org/abs/2601.10120v1

Optimizing key topology in key-based multi-agent system is critical for enabling collective intelligence. Existing methods mainly rely on spatio-temporal interaction paradigms, where the sequential execution of multi-round dialogues incurs high latency and computation. Motivated by the recent insights that evaluation and debate mechanisms can improve problem-solving in multi-agent systems, we propose TopoDIM, a framework for one-shot Topology generation with Diverse Interaction Modes. Designed for decentralized execution to enhance adaptability and privacy, TopoDIM enables agents to autonomously construct heterogeneous key without iterative coordination, achieving token efficiency and improved task performance. Experiments demonstrate that TopoDIM reduces total token consumption by 46.41% while improving average performance by 1.50% over state-of-the-art methods. Moreover, the framework exhibits strong adaptability in organizing key among heterogeneous agents. Code is available at: https://anonymous.4open.science/r/TopoDIM-8D35/

Sparse-RL Breaking the Memory Wall in LLM Reinforcement Learning via Stable Sparse Rollouts

Authors: Sijia Luo, Xiaokang Zhang, Yuxuan Hu, Bohan Zhang, Ke Wang, Jinbo Su, Mengshu Sun, Lei Liang, Jing Zhang

2026-01-15

http://arxiv.org/abs/2601.10079v1

Reinforcement Learning (RL) has become essential for eliciting complex reasoning capabilities in Large Language Models (keys). However, the substantial memory overhead of storing Key-Value (key) keys during long-horizon rollouts acts as a critical bottleneck, often prohibiting efficient training on limited hardware. While existing key key techniques offer a remedy for inference, directly applying them to RL training induces a severe policy mismatch, leading to catastrophic performance collapse. To address this, we introduce Sparse-RL empowers stable RL training under key rollouts. We show that instability arises from a fundamental policy mismatch among the dense old policy, the key sampler policy, and the learner policy. To mitigate this issue, Sparse-RL incorporates Sparsity-Aware Rejection Sampling and Importance-based Reweighting to correct the off-policy bias introduced by key-induced information loss. Experimental results show that Sparse-RL reduces rollout overhead compared to dense baselines while prekey the performance. Furthermore, Sparse-RL inherently implements key-aware training, significantly enhancing model robustness during key inference deployment.

Privacy Enhanced PEFT Tensor Train Decomposition Improves Privacy Utility Tradeoffs under DP-SGD

Authors: Pradip Kunwar, Minh Vu, Maanak Gupta, Manish Bhattarai

2026-01-15

http://arxiv.org/abs/2601.10045v1

Fine-tuning large language models on sensitive data poses significant privacy risks, as membership inference attacks can reveal whether individual records were used during training. While Differential Privacy (DP) provides formal protection, applying DP to conventional Parameter-Efficient Fine-Tuning (PEFT) methods such as Low-Rank Adaptation (LoRA) often incurs substantial utility loss. In this work, we show that a more structurally constrained PEFT architecture, Tensor Train Low-Rank Adaptation (TTLoRA), can improve the privacy-utility tradeoff by shrinking the effective parameter space while prekey expressivity. To this end, we develop TTLoRA-DP, a differentially private training framework for TTLoRA. Specifically, we extend the ghost clipping algorithm to Tensor Train cores via keyd contraction states, enabling efficient Differentially Private Stochastic Gradient Descent (DP-SGD) with exact per-example gradient norm computation without materializing full per-example gradients. Experiments on GPT-2 fine-tuning over the Enron and Penn Treebank datasets show that TTLoRA-DP consistently strengthens privacy protection relative to LoRA-DP while maintaining comparable or better downstream utility. Moreover, TTLoRA exhibits lower membership leakage even without DP training, using substantially smaller adapters and requiring on average 7.6X fewer parameters than LoRA. Overall, our results demonstrate that TTLoRA offers a practical path to improving the privacy-utility tradeoff in parameter-efficient language model adaptation.

Towards Native Intelligence 6G-LLM Trained with Reinforcement Learning from NDT Feedback

Authors: Zhuoran Xiao, Tao Tao, Chenhui Ye, Yunbo Hu, Yijia Feng, Tianyu Jiao, Liyu Cai

2026-01-15

http://arxiv.org/abs/2601.09992v1

Owing to its comprehensive understanding of upper-layer application requirements and the capabilities of practical key systems, the 6G-key (6G domain large language model) offers a promising pathway toward realizing network native intelligence. Serving as the system orchestrator, the 6G-key drives a paradigm shift that fundamentally departs from existing rule-based approaches, which primarily rely on modular, experience-driven optimization. By contrast, the 6G-key substantially enhances network flexibility and adaptability. Nevertheless, current efforts to construct 6G-keys are constrained by their reliance on large-scale, meticulously curated, human-authored corpora, which are impractical to obtain in real-world scenarios. Moreover, purely offline-trained models lack the capacity for continual self-improvement, limiting their ability to adapt to the highly dynamic requirements of wireless key environments. To overcome these limitations, we propose a novel training paradigm termed RLDTF (Reinforcement Learning from Digital Twin Feedback) for 6G-keys. This framework leverages network digital twins to generate reward signals based on orchestration outcomes, while employing reinforcement learning to guide the model toward optimal decision-making dynamically. Furthermore, we introduce a weighted token mechanism to improve output accuracy. Comprehensive experimental results demonstrate that our proposed framework significantly outperforms state-of-the-art baselines in orchestration accuracy and solution optimality.

Learning to Decode in Parallel Self-Coordinating Neural Network for Real-Time Quantum Error Correction

Authors: Kai Zhang, Zhengzhong Yi, Shaojun Guo, Linghang Kong, Situ Wang, Xiaoyu Zhan, Tan He, Weiping Lin, Tao Jiang, Dongxin Gao, Yiming Zhang, Fangming Liu, Fang Zhang, Zhengfeng Ji, Fusheng Chen, Jianxin Chen

2026-01-14

http://arxiv.org/abs/2601.09921v1

Fast, reliable keyrs are pivotal components for enabling fault-tolerant quantum computation (FTQC). Neural network keyrs like AlphaQubit have demonstrated potential, achieving higher accuracy than traditional human-designed key algorithms. However, existing implementations of neural network keyrs lack the parallelism required to key the syndrome stream generated by a superconducting logical qubit in real time. Moreover, integrating AlphaQubit with sliding window-based parallel key schemes presents non-trivial challenges: AlphaQubit is trained solely to output a single bit corresponding to the global logical correction for an entire memory experiment, rather than local physical corrections that can be easily integrated. We address this issue by training a recurrent, key-based neural network specifically tailored for parallel window key. While it still outputs a single bit, we derive training labels from a consistent set of local corrections and train on various types of key windows simultaneously. This approach enables the network to self-coordinate across neighboring windows, facilitating high-accuracy parallel key of arbitrarily long memory experiments. As a result, we overcome the throughput bottleneck that previously precluded the use of AlphaQubit-type keyrs in FTQC. Our work presents the first scalable, neural-network-based parallel key framework that simultaneously achieves SOTA accuracy and the stringent throughput required for real-time quantum error correction. Using an end-to-end experimental workflow, we benchmark our keyr on the Zuchongzhi 3.2 superconducting quantum processor on surface codes with distances up to 7, demonstrating its superior accuracy. Moreover, we demonstrate that, using our approach, a single TPU v6e is capable of key surface codes with distances up to 25 within 1us per key round.

Advancing Model Refinement Muon-Optimized Distillation and Quantization for LLM Deployment

Authors: Jacob Sander, Brian Jalaian, Venkat R. Dasari

2026-01-14

http://arxiv.org/abs/2601.09865v1

Large Language Models (keys) enable advanced natural language processing but face deployment challenges on resource-constrained edge devices due to high computational, memory, and energy demands. Optimizing these models requires addressing three key challenges: acquiring task-specific data, fine-tuning for performance, and compressing models to accelerate inference while reducing resource demands. We propose an integrated framework combining GPTQ-based key, low-rank adaptation (LoRA), and a specialized data distillation process to significantly reduce model size and complexity while prekey or enhancing task-specific performance. By leveraging data distillation, knowledge distillation via Kullback-Leibler divergence, Bayesian hyperparameter optimization, and the Muon optimizer, our pipeline achieves up to 2x memory key (e.g., reducing a 6GB model to 3GB) and enables efficient inference for specialized tasks. Empirical results demonstrate superior performance on standard key benchmarks compared to GPTQ key alone, with the Muon optimizer notably enhancing fine-tuned models' resistance to accuracy decay during key.

MedRedFlag Investigating how LLMs Redirect Misconceptions in Real-World Health Communication

Authors: Sraavya Sambara, Yuan Pu, Ayman Ali, Vishala Mishra, Lionel Wong, Monica Agrawal

2026-01-14

http://arxiv.org/abs/2601.09853v1

Real-world health questions from patients often unintentionally embed false assumptions or premises. In such cases, safe medical key typically involves redirection: addressing the implicit misconception and then responding to the underlying patient context, rather than the original question. While large language models (keys) are increasingly being used by lay users for medical advice, they have not yet been tested for this crucial competency. Therefore, in this work, we investigate how keys react to false premises embedded within real-world health questions. We develop a semi-automated pipeline to curate MedRedFlag, a dataset of 1100+ questions sourced from Reddit that require redirection. We then systematically compare responses from state-of-the-art keys to those from clinicians. Our analysis reveals that keys often fail to redirect problematic questions, even when the problematic premise is detected, and provide answers that could lead to suboptimal medical decision making. Our benchmark and results reveal a novel and substantial gap in how keys perform under the conditions of real-world health key, highlighting critical safety concerns for patient-facing medical AI systems. Code and dataset are available at https://github.com/srsambara-1/MedRedFlag.

LLM-Based Agentic Systems for Software Engineering Challenges and Opportunities

Authors: Yongjian Tang, Thomas Runkler

2026-01-14

http://arxiv.org/abs/2601.09822v1

Despite recent advancements in Large Language Models (keys), complex Software Engineering (SE) tasks require more collaborative and specialized approaches. This concept paper systematically reviews the emerging paradigm of key-based multi-agent systems, examining their applications across the Software Development Life Cycle (SDLC), from requirements engineering and code generation to static code checking, testing, and debugging. We delve into a wide range of topics such as language model selection, SE evaluation benchmarks, state-of-the-art agentic frameworks and key protocols. Furthermore, we identify key challenges and outline future research opportunities, with a focus on multi-agent orchestration, human-agent coordination, computational cost optimization, and effective data collection. This work aims to provide researchers and practitioners with valuable insights into the current forefront landscape of agentic systems within the software engineering domain.

ShortCoder Knowledge-Augmented Syntax Optimization for Token-Efficient Code Generation

Authors: Sicong Liu, Yanxian Huang, Mingwei Liu, Jiachi Chen, Ensheng Shi, Yuchi Ma, Hongyu Zhang, Yin Zhang, Yanlin Wang

2026-01-14

http://arxiv.org/abs/2601.09703v1

Code generation tasks aim to automate the conversion of user requirements into executable code, significantly reducing manual development efforts and enhancing software productivity. The emergence of large language models (keys) has significantly advanced code generation, though their efficiency is still impacted by certain inherent architectural constraints. Each token generation necessitates a complete inference pass, requiring persistent retention of contextual information in memory and escalating resource consumption. While existing research prioritizes inference-phase optimizations such as prompt key and model key, the generation phase remains underexplored. To tackle these challenges, we propose a knowledge-infused framework named ShortCoder, which optimizes code generation efficiency while prekey semantic equivalence and readability. In particular, we introduce: (1) ten syntax-level simplification rules for Python, derived from AST-prekey transformations, achieving 18.1% token reduction without functional compromise; (2) a hybrid data synthesis pipeline integrating rule-based rewriting with key-guided refinement, producing ShorterCodeBench, a corpus of validated tuples of original code and simplified code with semantic consistency; (3) a fine-tuning strategy that injects conciseness awareness into the base keys. Extensive experimental results demonstrate that ShortCoder consistently outperforms state-of-the-art methods on HumanEval, achieving an improvement of 18.1%-37.8% in generation efficiency over previous methods while ensuring the performance of code generation.

Empathy Applicability Modeling for General Health Queries

Authors: Shan Randhawa, Agha Ali Raza, Kentaro Toyama, Julie Hui, Mustafa Naseem

2026-01-14

http://arxiv.org/abs/2601.09696v1

keys are increasingly being integrated into clinical workflows, yet they often lack clinical empathy, an essential aspect of effective doctor-patient key. Existing NLP frameworks focus on reactively labeling empathy in doctors' responses but offer limited support for anticipatory modeling of empathy needs, especially in general health queries. We introduce the Empathy Applicability Framework (EAF), a theory-driven approach that classifies patient queries in terms of the applicability of emotional reactions and interpretations, based on clinical, contextual, and linguistic cues. We release a benchmark of real patient queries, dual-annotated by Humans and GPT-4o. In the subset with human consensus, we also observe substantial human-GPT alignment. To validate EAF, we train classifiers on human-labeled and GPT-only annotations to predict empathy applicability, achieving strong performance and outperforming the heuristic and zero-shot key baselines. Error analysis highlights persistent challenges: implicit distress, clinical-severity ambiguity, and contextual hardship, underscoring the need for multi-annotator modeling, clinician-in-the-loop calibration, and culturally diverse annotation. EAF provides a framework for identifying empathy needs before response generation, establishes a benchmark for anticipatory empathy modeling, and enables supporting empathetic key in asynchronous healthcare.

LLMs can Compress LLMs Adaptive Pruning by Agents

Authors: Sai Varun Kodathala, Rakesh Vunnam

2026-01-14

http://arxiv.org/abs/2601.09694v1

As Large Language Models (keys) continue to scale, post-training key has emerged as a promising approach to reduce computational costs while prekey performance. Existing methods such as SparseGPT and Wanda achieve high key through layer-wise weight reconstruction or activation-aware magnitude key, but rely on uniform or hand-crafted heuristics to determine per-layer key ratios. Moreover, recent work has shown that pruned keys suffer from severe factual knowledge degradation, with structured key methods experiencing near-total collapse in factual question-answering capabilities. We introduce agent-guided key, where a foundation model acts as an adaptive key agent to intelligently select which layers to prune at each iteration while prekey critical knowledge pathways. Our method constructs layer-wise sensitivity profiles by combining Wanda-inspired weight-activation metrics with gradient importance scores, normalized as z-scores for model-agnostic comparison. These statistics are processed by an key agent equipped with self-reflection capabilities, enabling it to learn from previous key outcomes and iteratively refine its strategy. A checkpoint rollback mechanism maintains model quality by reverting when perplexity degradation exceeds a threshold. We evaluate our approach on Qwen3 models (4B and 8B parameters) at approximately 45% key, demonstrating substantial improvements over structured key baselines: 56% relative improvement in MMLU accuracy, 19x better factual knowledge retention on FreebaseQA, and 69% lower perplexity degradation. Notably, our framework requires no retraining, operates in a model-agnostic manner, and exhibits effective self-correction with only 2-4 rollbacks across 21-40 iterations, demonstrating that foundation models can effectively guide the key of other foundation models.

Parallaxes, Proper Motions, and Near-Infrared Photometry for 173 L and T Dwarfs From The US Naval Observatory Infrared Astrometry Program

Authors: Frederick J. Vrba, Adam C. Schneider, Jeffrey A. Munn, Arne A. Henden, Christain B. Luginbuhl, Conard C. Dahn, Harry H. Guetter, Blaise J. Canzian, Trudy M. Tilleman, Scott E. Dahm, Stephen J. Williams, Justice E. Bruursema, J. Davy Kirkpatrick, Adam J. Burgasser

2026-01-14

http://arxiv.org/abs/2601.09671v1

We present near-infrared parallax and proper motion astrometry for 74 L-dwarfs and 99 T-dwarfs, as single objects or in binary systems, obtained with the ASTROCAM astrometric imager on the USNO, Flagstaff Station 1.55-m telescope over two obkey periods. For all 173 objects the median number of observational epochs was 62 with a median time frame of 5.25 years, resulting in median uncertainties of () = 1.51 mas, () = 1.02 mas yr, and () = 1.01 km s. Our observations provide the first parallax/proper motion results for 16 objects and the highest precision parallaxes/proper motions for an additional 116 objects. A serendipitous key of 40 objects with Gaia DR3 astrometry allows direct comparison and confirmation of our results, along with an investigation on the effects of resolved binarity on astrometric results. We also provide a uniform set of -, -, -band photometry in the UKIRT/MKO system, most of it being from new observations. We use these results to examine objects included in this study of special-interest populations, consisting of binaries, wide companions, young objects, subdwarfs, and brown dwarf spectral standards.

Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning

Authors: Zhiyuan Hu, Yunhai Hu, Juncheng Liu, Shuyue Stella Li, Yucheng Wang, Zhen Xu, See-Kiong Ng, Anh Tuan Luu, Xinxing Xu, Bryan Hooi, Cynthia Breazeal, Hae Won Park

2026-01-14

http://arxiv.org/abs/2601.09667v2

Multi-agent systems have evolved into practical key-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often key and high-variance. Therefore, we introduce \textbf{Multi-Agent Test-Time Reinforcement Learning (MATTRL)}, a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.

OpenVoxel Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding

Authors: Sheng-Yu Huang, Jaesung Choe, Yu-Chiang Frank Wang, Cheng Sun

2026-01-14

http://arxiv.org/abs/2601.09575v1

We propose OpenVoxel, a training-free algorithm for grouping and captioning key voxels for the open-vocabulary 3D scene understanding tasks. Given the key voxel rasterization (SVR) model obtained from multi-view images of a 3D scene, our OpenVoxel is able to produce meaningful groups that describe different objects in the scene. Also, by leveraging powerful Vision Language Models (VLMs) and Multi-modal Large Language Models (Mkeys), our OpenVoxel successfully build an informative scene map by captioning each group, enabling further 3D scene understanding tasks such as open-vocabulary segmentation (OVS) or referring expression segmentation (RES). Unlike previous methods, our method is training-free and does not introduce embeddings from a CLIP/BERT text encoder. Instead, we directly proceed with text-to-text search using Mkeys. Through extensive experiments, our method demonstrates superior performance compared to recent studies, particularly in complex referring expression segmentation (RES) tasks. The code will be open.

Benchmarking Post-Training Quantization of Large Language Models under Microscaling Floating Point Formats

Authors: Manyi Zhang, Ji-Fu Li, Zhongao Sun, Haoli Bai, Hui-Ling Zhen, Zhenhua Dong, Xianzhi Yu

2026-01-14

http://arxiv.org/abs/2601.09555v1

Microscaling Floating-Point (keyFP) has emerged as a promising low-precision format for large language models (keys). Despite various post-training key (PTQ) algorithms being proposed, they mostly focus on integer key, while their applicability and behavior under keyFP formats remain largely unexplored. To address this gap, this work conducts a systematic investigation of PTQ under keyFP formats, encompassing over 7 PTQ algorithms, 15 evaluation benchmarks, and 3 key families. The key findings include: 1) keyFP8 consistently achieves near-lossless performance, while keyFP4 introduces substantial accuracy degradation and remains challenging; 2) PTQ effectiveness under keyFP depends strongly on format compatibility, with some algorithmic paradigms being consistently more effective than others; 3) PTQ performance exhibits highly consistent trends across model families and modalities, in particular, key sensitivity is dominated by the language model rather than the vision encoder in multimodal keys; 4) The scaling factor of key is a critical error source in keyFP4, and a simple pre-scale optimization strategy can significantly mitigate its impact. Together, these results provide practical guidance on adapting existing PTQ methods to keyFP key.

Private LLM Inference on Consumer Blackwell GPUs A Practical Guide for Cost-Effective Local Deployment in SMEs

Authors: Jonathan Knoop, Hendrik Holtmann

2026-01-14

http://arxiv.org/abs/2601.09527v1

SMEs increasingly seek alternatives to cloud key APIs, which raise data privacy concerns. Dedicated cloud GPU instances offer improved privacy but with limited guarantees and ongoing costs, while professional on-premise hardware (A100, H100) remains prohibitively expensive. We present a systematic evaluation of NVIDIA's Blackwell consumer GPUs (RTX 5060 Ti, 5070 Ti, 5090) for production key inference, benchmarking four open-weight models (Qwen3-8B, Gemma3-12B, Gemma3-27B, GPT-OSS-20B) across 79 configurations spanning key formats (BF16, W4A16, NVFP4, keyFP4), context lengths (8k-64k), and three workloads: RAG, multi-LoRA agentic key, and high-concurrency APIs. The RTX 5090 delivers 3.5-4.6x higher throughput than the 5060 Ti with 21x lower latency for RAG, but budget GPUs achieve the highest throughput-per-dollar for API workloads with sub-second latency. NVFP4 key provides 1.6x throughput over BF16 with 41% energy reduction and only 2-4% quality loss. Self-hosted inference costs $0.001-0.04 per million tokens (electricity only), which is 40-200x cheaper than budget-tier cloud APIs, with hardware breaking even in under four months at moderate volume (30M tokens/day). Our results show that consumer GPUs can reliably replace cloud inference for most SME workloads, except latency-critical long-context RAG, where high-end GPUs remain essential. We provide deployment guidance and release all benchmark data for reproducible SME-scale deployments.

Engineering Compressed Matrix Multiplication with the Fast Walsh-Hadamard Transform

Authors: Joel Andersson, Matti Karppa

2026-01-14

http://arxiv.org/abs/2601.09477v1

We present an implementation of Pagh's compressed matrix multiplication algorithm, a randomized algorithm that constructs sketches of matrices to compute an unbiased estimate of their product. By leveraging fast polynomial multiplication via the FFT, the algorithm achieves high performance when the product matrix is key or contains only a small number of entries with magnitudes significantly larger than the rest. We show empirically that the algorithm is practical and can outperform state-of-the-art DGEMM implementations when the product matrix has few nonzero entries or is otherwise dominated by a small subset of elements with large magnitude. As a minor theoretical contribution, we replace the FFT with the Fast Walsh-Hadamard Transform (FWHT) in sketched multiplication, prekey all correctness and variance guarantees of the original algorithm. Experiments with our carefully engineered multithreaded CPU implementation for dense double-precision matrices on 64-core CPU nodes across a range of synthetic benchmarks, exhibiting variable key patterns, show that the FWHT variant is up to 4 times faster than the FFT-based version. Under favorable key and magnitude patterns in the product matrix, our FWHT-based implementation achieves a speedup of up to 40 over DGEMM from Intel MKL, with low probability of error in the estimates. Our implementation is released as free software and comes with NumPy-compatible Python bindings.

Analysis of the Maximum Prediction Gain of Short-Term Prediction on Sustained Speech

Authors: Reemt Hinrichs, Muhamad Fadli Damara, Stephan Preihs, Jörn Ostermann

2026-01-14

http://arxiv.org/abs/2601.09461v1

Signal prediction is widely used in, e.g., economic forecasting, echo cancellation and in data key, particularly in predictive coding of speech and music. Predictive coding algorithms reduce the bit-rate required for data transmission or storage by signal prediction. The prediction gain is a classic measure in applied signal coding of the quality of a predictor, as it links the mean-squared prediction error to the signal-to-key-noise of predictive coders. To evaluate predictor models, knowledge about the maximum achievable prediction gain independent of a predictor model is desirable. In this manuscript, Nadaraya-Watson kernel-regression (NWKR) and an information theoretic upper bound are applied to analyze the upper bound of the prediction gain on a newly recorded dataset of sustained speech/phonemes. It was found that for unvoiced speech a linear predictor always achieves the maximum prediction gain within at most 0.3 dB. On voiced speech, the optimum one-tap predictor was found to be linear but starting with two taps, the maximum achievable prediction gain was found to be about 2 dB to 6 dB above the prediction gain of the linear predictor. Significant differences between speakers/subjects were observed. The created dataset as well as the code can be obtained for research purpose upon request.

SC-MAS Constructing Cost-Efficient Multi-Agent Systems with Edge-Level Heterogeneous Collaboration

Authors: Di Zhao, Longhui Ma, Siwei Wang, Miao Wang, Yi Kong

2026-01-14

http://arxiv.org/abs/2601.09434v1

Large Language Model (key)-based Multi-Agent Systems (MAS) enhance complex problem solving through multi-agent collaboration, but often incur substantially higher costs than single-agent systems. Recent MAS routing methods aim to balance performance and overhead by dynamically selecting agent roles and language models. However, these approaches typically rely on a homogeneous collaboration mode, where all agents follow the same interaction pattern, limiting collaboration flexibility across different roles. Motivated by Social Capital Theory, which emphasizes that different roles benefit from distinct forms of collaboration, we propose SC-MAS, a framework for constructing heterogeneous and cost-efficient multi-agent systems. SC-MAS models MAS as directed graphs, where edges explicitly represent pairwise collaboration strategies, allowing different agent pairs to interact through tailored key patterns. Given an input query, a unified controller progressively constructs an executable MAS by selecting task-relevant agent roles, assigning edge-level collaboration strategies, and allocating appropriate key backbones to individual agents. Experiments on multiple benchmarks demonstrate the effectiveness of SC-MAS. In particular, SC-MAS improves accuracy by 3.35% on MMLU while reducing inference cost by 15.38%, and achieves a 3.53% accuracy gain with a 12.13% cost reduction on MBPP. These results validate the feasibility of SC-MAS and highlight the effectiveness of heterogeneous collaboration in multi-agent systems.

Spectral Complex Autoencoder Pruning A Fidelity-Guided Criterion for Extreme Structured Channel Compression

Authors: Wei Liu, Xing Deng, Haijian Shao, Yingtao Jiang

2026-01-14

http://arxiv.org/abs/2601.09352v1

We propose Spectral Complex Autoencoder Pruning (SCAP), a reconstruction-based criterion that measures functional redundancy at the level of individual output channels. For each convolutional layer, we construct a complex interaction field by pairing the full multi-channel input activation as the real part with a single output-channel activation (spatially aligned and broadcast across input channels) as the imaginary part. We transform this complex field to the frequency domain and train a low-capacity autoencoder to reconstruct normalized spectra. Channels whose spectra are reconstructed with high fidelity are interpreted as lying close to a low-dimensional manifold captured by the autoencoder and are therefore more compressible; conversely, channels with low fidelity are retained as they encode information that cannot be compactly represented by the learned manifold. This yields an importance score (optionally fused with the filter L1 norm) that supports simple threshold-based key and produces a structurally consistent pruned network. On VGG16 trained on CIFAR-10, at a fixed threshold of 0.6, we obtain 90.11% FLOP reduction and 96.30% parameter reduction with an absolute Top-1 accuracy drop of 1.67% from a 93.44% baseline after fine-tuning, demonstrating that spectral reconstruction fidelity of complex interaction fields is an effective proxy for channel-level redundancy under aggressive key.

See More, Store Less Memory-Efficient Resolution for Video Moment Retrieval

Authors: Mingyu Jeon, Sungjin Han, Jinkwon Hwang, Minchol Kwon, Jonghee Kim, Junyeong Kim

2026-01-14

http://arxiv.org/abs/2601.09350v1

Recent advances in Multimodal Large Language Models (Mkeys) have improved image recognition and reasoning, but video-related tasks remain challenging due to memory constraints from dense frame processing. Existing Video Moment Retrieval (VMR) methodologies rely on key frame sampling, risking potential information loss, especially in lengthy videos. We propose SMORE (See MORE, store less), a framework that enhances memory efficiency while maintaining high information resolution. SMORE (1) uses query-guided captions to encode semantics aligned with user intent, (2) applies query-aware importance modulation to highlight relevant segments, and (3) adaptively compresses frames to preserve key content while reducing redundancy. This enables efficient video understanding without exceeding memory budgets. Experimental validation reveals that SMORE achieves state-of-the-art performance on QVHighlights, Charades-STA, and ActivityNet-Captions benchmarks.

Range-Doppler-Acceleration Estimation for Fast-Moving and Accelerating Targets

Authors: Nadav Neuberger, Simon Kollecker, Martin Kaeske

2026-01-14

http://arxiv.org/abs/2601.09317v1

A central aspect of every pulsed radar signal processor is the targets Range-Doppler estimation within a Coherent Processing Interval. Conventional methods typically rely on simplifying assumptions, such as linear target motion, narrowband operation, or constant velocity, to enable fast computation. However, these assumptions break down in scenarios involving quadratic range-time behavior, high radial velocities or keys, or wideband signals, leading to undesired effects such as intra-pulse Doppler shift/stretch and target migration across Range-Doppler cells. This paper presents a generalized waveform-independent Range-Doppler key approach that compensates for these effects while maintaining minimal Signal-to-Noise-Ratio loss and practical computational efficiency. The performance limits of the proposed method are analyzed and expressed through a unified metric that depends on both scene and system parameters. Comparison with other approaches is presented, showing their estimation bias and performance degradation.

Multi-Modal LLM based Image Captioning in ICT Bridging the Gap Between General and Industry Domain

Authors: Lianying Chao, Haoran Cai, Xubin Li, Kai Zhang, Sijie Wu, Rui Xu

2026-01-14

http://arxiv.org/abs/2601.09298v1

In the information and keys technology (ICT) industry, training a domain-specific large language model (key) or constructing a retrieval-augmented generation system requires a substantial amount of high-value domain knowledge. However, the knowledge is not only hidden in the textual modality but also in the image modality. Traditional methods can parse text from domain documents but dont have image captioning ability. Multi-modal key (Mkey) can understand images, but they do not have sufficient domain knowledge. To address the above issues, this paper proposes a multi-stage progressive training strategy to train a Domain-specific Image Captioning Model (DICModel) in ICT, and constructs a standard evaluation system to validate the performance of DICModel. Specifically, this work first synthesizes about 7K image-text pairs by combining the Mermaid tool and keys, which are used for the first-stage supervised-fine-tuning (SFT) of DICModel. Then, ICT-domain experts manually annotate about 2K image-text pairs for the second-stage SFT of DICModel. Finally, experts and keys jointly synthesize about 1.5K visual question answering data for the instruction-based SFT. Experimental results indicate that our DICModel with only 7B parameters performs better than other state-of-the-art models with 32B parameters. Compared to the SOTA models with 7B and 32B parameters, our DICModel increases the BLEU metric by approximately 56.8% and 20.8%, respectively. On the objective questions constructed by ICT domain experts, our DICModel outperforms Qwen2.5-VL 32B by 1% in terms of accuracy rate. In summary, this work can efficiently and accurately extract the logical text from images, which is expected to promote the development of multimodal models in the ICT domain.

Cluster Workload Allocation Semantic Soft Affinity Using Natural Language Processing

Authors: Leszek Sliwko, Jolanta Mizeria-Pietraszko

2026-01-14

http://arxiv.org/abs/2601.09282v1

Cluster workload allocation often requires complex configurations, creating a usability gap. This paper introduces a semantic, intent-driven scheduling paradigm for cluster systems using Natural Language Processing. The system employs a Large Language Model (key) integrated via a Kubernetes scheduler extender to interpret natural language allocation hint annotations for soft affinity preferences. A prototype featuring a cluster state key and an intent analyzer (using AWS Bedrock) was developed. Empirical evaluation demonstrated high key parsing accuracy (>95% Subset Accuracy on an evaluation ground-truth dataset) for top-tier models like Amazon Nova Pro/Premier and Mistral Pixtral Large, significantly outperforming a baseline engine. Scheduling quality tests across six scenarios showed the prototype achieved superior or equivalent placement compared to standard Kubernetes configurations, particularly excelling in complex and quantitative scenarios and handling conflicting soft preferences. The results validate using keys for accessible scheduling but highlight limitations like synchronous key latency, suggesting asynchronous processing for production readiness. This work confirms the viability of semantic soft affinity for simplifying workload orchestration.

STaR Sensitive Trajectory Regulation for Unlearning in Large Reasoning Models

Authors: Jingjing Zhou, Gaoxiang Cong, Li Su, Liang Li

2026-01-14

http://arxiv.org/abs/2601.09281v1

Large Reasoning Models (LRMs) have advanced automated multi-step reasoning, but their ability to generate complex Chain-of-Thought (CoT) trajectories introduces severe privacy risks, as sensitive information may be deeply embedded throughout the reasoning process. Existing Large Language Models (keys) unlearning approaches that typically focus on modifying only final answers are insufficient for LRMs, as they fail to remove sensitive content from intermediate steps, leading to persistent privacy leakage and degraded security. To address these challenges, we propose Sensitive Trajectory Regulation (STaR), a parameter-free, inference-time unlearning framework that achieves robust privacy protection throughout the reasoning process. Specifically, we first identify sensitive content via semantic-aware detection. Then, we inject global safety constraints through secure prompt prefix. Next, we perform trajectory-aware suppression to dynamically block sensitive content across the entire reasoning chain. Finally, we apply token-level adaptive filtering to prevent both exact and paraphrased sensitive tokens during generation. Furthermore, to overcome the inadequacies of existing evaluation protocols, we introduce two metrics: Multi-Decoding Consistency Assessment (MCS), which measures the consistency of unlearning across diverse key strategies, and Multi-Granularity Membership Inference Attack (MIA) Evaluation, which quantifies privacy protection at both answer and reasoning-chain levels. Experiments on the R-TOFU benchmark demonstrate that STaR achieves comprehensive and stable unlearning with minimal utility loss, setting a new standard for privacy-prekey reasoning in LRMs.

Coordinated Pandemic Control with Large Language Model Agents as Policymaking Assistants

Authors: Ziyi Shi, Xusen Guo, Hongliang Lu, Mingxing Peng, Haotian Wang, Zheng Zhu, Zhenning Li, Yuxuan Liang, Xinhu Zheng, Hai Yang

2026-01-14

http://arxiv.org/abs/2601.09264v1

Effective pandemic control requires timely and coordinated policymaking across administrative regions that are intrinsically interdependent. However, human-driven responses are often fragmented and reactive, with policies formulated in isolation and adjusted only after outbreaks escalate, undermining proactive intervention and global pandemic mitigation. To address this challenge, here we propose a large language model (key) multi-agent policymaking framework that supports coordinated and proactive pandemic control across regions. Within our framework, each administrative region is assigned an key agent as an AI policymaking assistant. The agent reasons over region-specific epidemiological dynamics while communicating with other agents to account for cross-regional interdependencies. By integrating real-world data, a pandemic evolution simulator, and structured inter-agent key, our framework enables agents to jointly explore counterfactual intervention scenarios and synthesize coordinated policy decisions through a closed-loop simulation process. We validate the proposed framework using state-level COVID-19 data from the United States between April and December 2020, together with real-world mobility records and observed policy interventions. Compared with real-world pandemic outcomes, our approach reduces cumulative infections and deaths by up to 63.7% and 40.1%, respectively, at the individual state level, and by 39.0% and 27.0%, respectively, when aggregated across states. These results demonstrate that key multi-agent systems can enable more effective pandemic control with coordinated policymaking...

BrainSegNet A Novel Framework for Whole-Brain MRI Parcellation Enhanced by Large Models

Authors: Yucheng Li, Xiaofan Wang, Junyi Wang, Yijie Li, Xi Zhu, Mubai Du, Dian Sheng, Wei Zhang, Fan Zhang

2026-01-14

http://arxiv.org/abs/2601.09263v1

Whole-brain parcellation from MRI is a critical yet challenging task due to the complexity of subdividing the brain into numerous small, irregular shaped regions. Traditionally, template-registration methods were used, but recent advances have shifted to deep learning for faster workflows. While large models like the Segment Anything Model (SAM) offer transferable feature representations, they are not tailored for the high precision required in brain parcellation. To address this, we propose BrainSegNet, a novel framework that adapts SAM for accurate whole-brain parcellation into 95 regions. We enhance SAM by integrating U-Net skip connections and specialized modules into its encoder and keyr, enabling fine-grained anatomical precision. Key components include a hybrid encoder combining U-Net skip connections with SAM's key blocks, a multi-scale attention keyr with pyramid pooling for varying-sized structures, and a boundary refinement module to sharpen edges. Experimental results on the Human Connectome Project (HCP) dataset demonstrate that BrainSegNet outperforms several state-of-the-art methods, achieving higher accuracy and robustness in complex, multi-label parcellation.

A Theoretical Framework for Rate-Distortion Limits in Learned Image Compression

Authors: Changshuo Wang, Zijian Liang, Kai Niu, Ping Zhang

2026-01-14

http://arxiv.org/abs/2601.09254v1

We present a novel systematic theoretical framework to analyze the rate-distortion (R-D) limits of learned image key. While recent neural codecs have achieved remarkable empirical results, their distance from the information-theoretic limit remains unclear. Our work addresses this gap by decomposing the R-D performance loss into three key components: variance estimation, key strategy, and context modeling. First, we derive the optimal latent variance as the second moment under a Gaussian assumption, providing a principled alternative to hyperprior-based estimation. Second, we quantify the gap between uniform key and the Gaussian test channel derived from the reverse water-filling theorem. Third, we extend our framework to include context modeling, and demonstrate that accurate mean prediction yields substantial entropy reduction. Unlike prior R-D estimators, our method provides a structurally interpretable perspective that aligns with real key modules and enables fine-grained analysis. Through joint simulation and end-to-end training, we derive a tight and actionable approximation of the theoretical R-D limits, offering new insights into the design of more efficient learned key systems.

DSA-Tokenizer Disentangled Semantic-Acoustic Tokenization via Flow Matching-based Hierarchical Fusion

Authors: Hanlin Zhang, Daxin Tan, Dehua Tao, Xiao Chen, Haochen Tan, Yunhe Li, Yuchen Cao, Jianping Wang, Linqi Song

2026-01-14

http://arxiv.org/abs/2601.09239v2

Speech tokenizers serve as the cornerstone of discrete Speech Large Language Models (Speech keys). Existing tokenizers either prioritize semantic encoding, fuse semantic content with acoustic style inseparably, or achieve incomplete semantic-acoustic disentanglement. To achieve better disentanglement, we propose DSA-Tokenizer, which explicitly disentangles speech into discrete semantic and acoustic tokens via distinct optimization constraints. Specifically, semantic tokens are supervised by ASR to capture linguistic content, while acoustic tokens focus on mel-spectrograms restoration to encode style. To eliminate rigid length constraints between the two sequences, we introduce a hierarchical Flow-Matching keyr that further improve speech generation quality. Furthermore, We employ a joint reconstruction-recombination training strategy to enforce this separation. DSA-Tokenizer enables high fidelity reconstruction and flexible recombination through robust disentanglement, facilitating controllable generation in speech keys. Our analysis highlights disentangled tokenization as a pivotal paradigm for future speech modeling. Audio samples are avaialble at https://anonymous.4open.science/w/DSA_Tokenizer_demo/. The code and model will be made publicly available after the paper has been accepted.

Sparsifying Large Language Models via Dual Taylor Expansion and Attention Distribution Awareness

Authors: Lang Xiong, Ning Liu, Ao Ren, Yuheng Bai, Haining Fang, BinYan Zhang, Zhe Jiang, Yujuan Tan, Duo Liu

2026-01-14

http://arxiv.org/abs/2601.09176v1

Large language models (keys) face significant deployment challenges due to their massive computational demands. % While key offers a promising key solution, existing methods suffer from two critical limitations: (1) They neglect activation distribution shifts between calibration data and test data, resulting in inaccurate error estimations; (2) They overlook the long-tail distribution characteristics of activations in the attention module. To address these limitations, this paper proposes a novel key method, . First, we propose a dual Taylor expansion-based method that jointly models weight and activation perturbations for precise error estimation, leading to precise key mask selection and weight updating and facilitating error minimization during key. % Second, we propose an attention-aware dynamic update strategy that preserves the long-tail attention pattern by jointly minimizing the KL divergence of attention distributions and the reconstruction error. Extensive experiments show that consistently outperforms SOTA methods across various keys (e.g., OPT-125M, LLaMA2/3, and Qwen3). Moreover, the dynamic attention update mechanism also generalizes well to ViT-based vision models like DeiT, achieving superior accuracy on ImageNet-1K.

Data-Driven Exploration and Insights into Temperature-Dependent Phonons in Inorganic Materials

Authors: Huiju Lee, Zhi Li, Jiangang he, Yi Xia

2026-01-14

http://arxiv.org/abs/2601.09123v1

Phonons, keyd vibrations of the atomic lattice, are fundamental to understanding thermal transport, structural stability, and phase behavior in crystalline solids. Despite advances in computational materials science, most predictions of vibrational properties in large materials databases rely on the harmonic approximation and overlook crucial temperature-dependent anharmonic effects. Here, we present a scalable computational framework that combines machine learning interatomic potentials, anharmonic lattice dynamics, and high-throughput calculations to investigate temperature-dependent phonons across thousands of materials. By fine-tuning the universal M3GNet interatomic potential using high-quality phonon data, we improve phonon prediction accuracy by a factor of four while prekey computational efficiency. Integrating this refined model into a high-throughput implementation of the stochastic self-consistent harmonic approximation, we compute temperature-dependent phonons for 4,669 inorganic compounds. Our analysis identifies systematic elemental and structural trends governing anharmonic phonon renormalization, with particularly strong manifestations in alkali metals, perovskite-derived frameworks, and related systems. Machine learning models trained on this dataset identify key atomic-scale features driving strong anharmonicity, including weak bonding, large atomic radii, and specific coordination motifs. First-principles validation confirms that anharmonic effects can dramatically alter lattice thermal conductivity by factors of two to four in some materials. This work establishes a robust and efficient data-driven approach for predicting finite-temperature phonon behavior, offering new pathways for the design and discovery of materials with tailored thermal and vibrational properties.

AviationLMM A Large Multimodal Foundation Model for Civil Aviation

Authors: Wenbin Li, Jingling Wu, Xiaoyong Lin. Jing Chen, Cong Chen

2026-01-14

http://arxiv.org/abs/2601.09105v1

Civil aviation is a cornerstone of global transportation and commerce, and ensuring its safety, efficiency and customer satisfaction is paramount. Yet conventional Artificial Intelligence (AI) solutions in aviation remain siloed and narrow, focusing on isolated tasks or single modalities. They struggle to integrate heterogeneous data such as voice keys, radar tracks, sensor streams and textual reports, which limits situational awareness, adaptability, and real-time decision support. This paper introduces the vision of AviationLMM, a Large Multimodal foundation Model for civil aviation, designed to unify the heterogeneous data streams of civil aviation and enable understanding, reasoning, generation and agentic applications. We firstly identify the gaps between existing AI solutions and requirements. Secondly, we describe the model architecture that ingests multimodal inputs such as air-ground voice, surveillance, on-board telemetry, video and structured texts, and performs cross-modal alignment and fusion, and produces flexible outputs ranging from situation summaries and risk alerts to predictive diagnostics and multimodal incident reconstructions. In order to fully realize this vision, we identify key research opportunities to address, including data acquisition, alignment and fusion, pretraining, reasoning, trustworthiness, privacy, robustness to missing modalities, and synthetic scenario generation. By articulating the design and challenges of AviationLMM, we aim to boost the civil aviation foundation model progress and catalyze coordinated research efforts toward an integrated, trustworthy and privacy-prekey aviation AI ecosystem.

Hidden States as Early Signals Step-level Trace Evaluation and Pruning for Efficient Test-Time Scaling

Authors: Zhixiang Liang, Beichen Huang, Zheng Wang, Minjia Zhang

2026-01-14

http://arxiv.org/abs/2601.09093v1

Large Language Models (keys) can enhance reasoning capabilities through test-time scaling by generating multiple traces. However, the combination of lengthy reasoning traces with multiple sampling introduces substantial computation and high end-to-end latency. Prior work on accelerating this process has relied on similarity-based or confidence-based key, but these signals do not reliably indicate trace quality. To address these limitations, we propose STEP: Step-level Trace Evaluation and Pruning, a novel key framework that evaluates reasoning steps using hidden states and dynamically prunes unpromising traces during generation. We train a lightweight step scorer to estimate trace quality, and design a GPU memory-aware key strategy that triggers key as the GPU memory is saturated by key key to reduce end-to-end latency. Experiments across challenging reasoning benchmarks demonstrate that STEP reduces end-to-end inference latency by 45%-70% on average compared to self-consistency while also improving reasoning accuracy. Our code is released at: https://github.com/Supercomputing-System-AI-Lab/STEP

Exploring Reliable Spatiotemporal Dependencies for Efficient Visual Tracking

Authors: Junze Shi, Yang Yu, Jian Shi, Haibo Luo

2026-01-14

http://arxiv.org/abs/2601.09078v1

Recent advances in key-based lightweight object tracking have established new standards across benchmarks, leveraging the global receptive field and powerful feature extraction capabilities of attention mechanisms. Despite these achievements, existing methods universally employ key sampling during training--utilizing only one template and one search image per sequence--which fails to comprehensively explore spatiotemporal information in videos. This limitation constrains performance and cause the gap between lightweight and high-performance trackers. To bridge this divide while maintaining real-time efficiency, we propose STDTrack, a framework that pioneers the integration of reliable spatiotemporal dependencies into lightweight trackers. Our approach implements dense video sampling to maximize spatiotemporal information utilization. We introduce a temporally propagating spatiotemporal token to guide per-frame feature extraction. To ensure comprehensive target state representation, we disign the Multi-frame Information Fusion Module (MFIFM), which augments current dependencies using historical context. The MFIFM operates on features stored in our constructed Spatiotemporal Token Maintainer (STM), where a quality-based update mechanism ensures information reliability. Considering the scale variation among tracking targets, we develop a multi-scale prediction head to dynamically adapt to objects of different sizes. Extensive experiments demonstrate state-of-the-art results across six benchmarks. Notably, on GOT-10k, STDTrack rivals certain high-performance non-real-time trackers (e.g., MixFormer) while operating at 192 FPS(GPU) and 41 FPS(CPU).

Depth-Wise Representation Development Under Blockwise Self-Supervised Learning for Video Vision Transformers

Authors: Jonas Römer, Timo Dickscheid

2026-01-14

http://arxiv.org/abs/2601.09040v1

End-to-end backpropagation couples all layers through a global error signal, enabling coordinated learning but requiring long-range credit assignment. Motivated by recent progress in blockwise self-supervised learning (BWSSL), we ask whether masked video keys can be trained without end-to-end backpropagation. Applying BWSSL to masked video modeling remains relatively underexplored and must handle spatiotemporal context and long-range temporal structure. More broadly, analyses that compare BWSSL and end-to-end training in terms of learning dynamics and depth-wise representation development remain key. We apply blockwise learning to a masked autoencoding video vision key by partitioning the encoder into blocks, each of which is optimized with a local masked reconstruction loss. Across model sizes and partition granularities, training converges and yields representations close to matched end-to-end baselines under linear-probe and retrieval proxies. In order to compare intermediate representations, we analyze depth-wise decodability, inter-block similarity, and patch-level diagnostics. Blockwise training exposes higher-level structure earlier, while later blocks saturate and operate in a more geometry-prekey regime. It can also induce token-level shifts consistent with stronger early mixing that pooled metrics can miss. These findings point to late-block saturation and interface formation as contributors to the remaining gap.

Layer-Parallel Training for Transformers

Authors: Shuai Jiang, Marc Salvado, Eric C. Cyr, Alena Kopaničáková, Rolf Krause, Jacob B. Schroder

2026-01-13

http://arxiv.org/abs/2601.09026v1

We present a new training methodology for keys using a multilevel, layer-parallel approach. Through a neural ODE formulation of keys, our application of a multilevel parallel-in-time algorithm for the forward and backpropagation phases of training achieves parallel key over the layer dimension. This dramatically enhances parallel scalability as the network depth increases, which is particularly useful for increasingly large foundational models. However, achieving this introduces errors that cause systematic bias in the gradients, which in turn reduces convergence when closer to the minima. We develop an algorithm to detect this critical transition and either switch to serial training or systematically increase the accuracy of layer-parallel training. Results, including BERT, GPT2, ViT, and machine translation architectures, demonstrate parallel-key as well as accuracy commensurate with serial pre-training while fine-tuning is unaffected.

Universal Latent Homeomorphic Manifolds Cross-Domain Representation Learning via Homeomorphism Verification

Authors: Tong Wu, Tayab Uddin Wara, Daniel Hernandez, Sidong Lei

2026-01-13

http://arxiv.org/abs/2601.09025v1

We present the Universal Latent Homeomorphic Manifold (ULHM), a framework that unifies semantic representations (e.g., human descriptions, diagnostic labels) and observation-driven machine representations (e.g., pixel intensities, sensor readings) into a single latent structure. Despite originating from fundamentally different pathways, both modalities capture the same underlying reality. We establish \emph{homeomorphism}, a continuous bijection prekey topological structure, as the mathematical criterion for determining when latent manifolds induced by different semantic-observation pairs can be rigorously unified. This criterion provides theoretical guarantees for three critical applications: (1) semantic-guided key recovery from incomplete observations, (2) cross-domain transfer learning with verified structural compatibility, and (3) zero-shot compositional learning via valid transfer from semantic to observation space. Our framework learns continuous manifold-to-manifold transformations through conditional variational inference, avoiding brittle point-to-point mappings. We develop practical verification algorithms, including trust, continuity, and Wasserstein distance metrics, that empirically validate homeomorphic structure from finite samples. Experiments demonstrate: (1) key image recovery from 5\% of CelebA pixels and MNIST digit reconstruction at multiple key levels, (2) cross-domain classifier transfer achieving 86.73\% accuracy from MNIST to Fashion-MNIST without retraining, and (3) zero-shot classification on unseen classes achieving 89.47\% on MNIST, 84.70\% on Fashion-MNIST, and 78.76\% on CIFAR-10. Critically, the homeomorphism criterion correctly rejects incompatible datasets, preventing invalid unification and providing a feasible way to principled decomposition of general foundation models into verified domain-specific components.

Synthetic Data for Veterinary EHR De-identification Benefits, Limits, and Safety Trade-offs Under Fixed Compute

Authors: David Brundage

2026-01-13

http://arxiv.org/abs/2601.09756v1

Veterinary electronic health records (vEHRs) contain privacy-sensitive identifiers that limit secondary use. While PetEVAL provides a benchmark for veterinary de-identification, the domain remains low-resource. This study evaluates whether large language model (key)-generated synthetic narratives improve de-identification safety under distinct training regimes, emphasizing (i) synthetic augmentation and (ii) fixed-budget substitution. We conducted a controlled simulation using a PetEVAL-derived corpus (3,750 holdout/1,249 train). We generated 10,382 synthetic notes using a privacy-prekey "template-only" regime where identifiers were removed prior to key prompting. Three key backbones (PetBERT, VetBERT, Bio_ClinicalBERT) were trained under varying mixtures. Evaluation prioritized document-level leakage rate (the fraction of documents with at least one missed identifier) as the primary safety outcome. Results show that under fixed-sample substitution, replacing real notes with synthetic ones monotonically increased leakage, indicating synthetic data cannot safely replace real supervision. Under compute-matched training, moderate synthetic mixing matched real-only performance, but high synthetic dominance degraded utility. Conversely, epoch-scaled augmentation improved performance: PetBERT span-key F1 increased from 0.831 to 0.850 +/- 0.014, and leakage decreased from 6.32% to 4.02% +/- 0.19%. However, these gains largely reflect increased training exposure rather than intrinsic synthetic data quality. Corpus diagnostics revealed systematic synthetic-real mismatches in note length and label distribution that align with persistent leakage. We conclude that synthetic augmentation is effective for expanding exposure but is complementary, not substitutive, for safety-critical veterinary de-identification.