Cosmic Feed

Frontier Research Intelligence

37 papers found
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

Sergio Hernández-Gutiérrez, Matteo Merler, Ilze Amanda Auzina et al.

LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervision methods aim to solve this problem by scoring intermediate steps, from intrinsic confidence to self-distillation and embedding similarities. However, it is common practice to evaluate them by measuring the downstream performance of a training pipeline that integrates them. This is expensive, conflates supervision quality with training engineering confounders, and renders different methodological families requiring distinct training setups incomparable. As a result, dense supervision methods are rarely benchmarked on common ground. We introduce QVal, a training-free testbed for directly evaluating dense supervision signals. Given a state-action pair, QVal measures how well a method's score is Q-aligned: whether it orders actions according to the Q-values of a strong reference-policy. This lets us compare signals before any training run and separate signal quality from other engineering choices. We instantiate QVal as QVal-v1.0, benchmarking 21 dense supervision methods across four diverse environments and seven methodological families, with over 1.2K evaluation experiments across six open-weight model backbones. We find that simple prompting baselines consistently outperform recent dense supervision methods from the literature, and that performance clusters strongly by family. These findings hold across model sizes, environments, and observation modalities. QVal is designed to be easily extensible to new environments and methods, enabling researchers to iterate on dense supervision methods before any training run.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Gabrielle Kaili-May Liu, Avi Caciularu, Gal Yona et al.

Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with high confidence, fail to recognize knowledge boundaries, and misrepresent their internal uncertainty--undermining trustworthiness and reliability. Since monitoring task performance and adapting behavior accordingly are central to metacognition, we posit that models capable of accurately judging their own performance are better positioned to improve it. We operationalize this idea via two novel mechanisms: reinforcement learning with metacognitive feedback (RLMF), a paradigm to refine completion rankings during preference optimization based on the quality of a model's self-judgments of performance, and metacognitive data selection, which uses similar self-judgments to identify high-value training examples, outperforming naive active learning. We apply these innovations to the problem of faithful calibration (FC), a task that is itself fundamentally metacognitive: the goal is to align expressed with intrinsic uncertainty, difficult even for frontier LLMs. We adopt a two-stage, decoupled approach, first using these methods to calibrate the faithfulness of models' self-reported confidence scores, then mapping to natural, context-adaptable linguistic uncertainty via targeted output editing. Extensive experiments show RLMF achieves generalizable, state-of-the-art FC on diverse tasks while preserving accuracy. Further, RLMF surpasses standard RL by up to 63% while enhancing models' ability to assess and express their own capability limits. This positions RLMF as a promising paradigm to enhance LLM metacognition toward improved abilities and alignment, and suggests metacognitive performance as an effective RL signal to overcome limits of prior intrinsic feedback methods.

View PaperSource Link
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Simulation of Two-qubit Gate Variability and Fidelity of Spin Qubits Built on Nanosheet Technology

Trung Nguyen, Sarah Dweik, Hiu Yung Wong

Silicon spin qubits are promising for large-scale quantum-computer integration because they can fully leverage the well-developed semiconductor infrastructure. However, the low fidelity of two-qubit entanglement gates remains a key barrier to large-scale integrations. Recent simulations of silicon spin-qubit two-qubit gates have been performed on silicon-on-insulator (SOI) platforms, while nanosheet-based charge-qubit work has been limited to single-qubit operation using a two-dimensional Schrödinger approximation. In this work, we study silicon spin-qubit double quantum dots built on nanosheet technology using the Quantum Technology Computer-Aided Design (QTCAD) simulation suite to run three-dimensional Poisson and Schroedinger solvers, followed by a many-body solver to extract exchange interactions. We evaluate the exchange energy sensitivity to process and bias variations and then use QuTiP to solve the master equation for a two-qubit gate. The results show that millivolt-level bias variations at the plunger and middle barrier gates can reduce the gate fidelity below 99%, a common threshold target for many fault-tolerant quantum-computing algorithms. Gate-referred 1/f charge-noise effects are also analyzed through the resulting coherence time.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

DVG-WM: Disentangled Video Generation Enables Efficient Embodied World Model for Robotic Manipulation

Ziyu Shan, Zhenyu Wu, Xiaofeng Wang et al.

Video-based embodied world models provide an appealing substrate for robotic manipulation by predicting future states, yet current approaches remain limited by a fundamental entanglement: accurately modeling dynamics typically requires low-level temporal reasoning, while producing high-resolution frames demands expansive visual synthesis according to high-level semantics. This entanglement results in slow inference speed for iterative planning or too coarse predictions to retain contact-rich details. To solve this dilemma, we present Disentangled Video Generation World Model (DVG-WM), an efficient framework that explicitly decomposes world modeling into dynamics learning and visual synthesis. Conditioned on an initial observation and a language instruction, our model first generates a plausible sequence of intermediate visual states to preview the physical interaction and refines them to obtain high-fidelity videos. Furthermore, an efficient cascading mechanism is proposed, where DVG-WM uses flow matching to directly map the dynamics to video latents, and introduces a latent degradation mechanism to regenerate contact-rich details. Experiments on LIBERO and real-world platforms demonstrate improved video quality with up to 3.97 times acceleration, validating that disentangled video generation can be an efficient embodied world model for robotic manipulation.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Freeform Preference Learning for Robotic Manipulation

Marcel Torne, Anubha Mahajan, Abhijnya Bhat et al.

Reward design remains a central bottleneck for autonomous robot policy improvement, especially in long-horizon manipulation tasks where sparse success labels provide too little signal and binary preferences collapse many competing notions of quality into one ambiguous signal. We introduce Freeform Preference Learning (FPL), a method for learning robot policies from freeform human preferences. Rather than asking annotators which of two trajectories is better overall, FPL lets them define natural-language preference axes, such as speed, safety, quality of placement, or carefulness, and provide pairwise preferences along each axis. These annotations are used to learn a language-conditioned reward model that maps a trajectory and preference label to an axis-specific reward. We use this model to train a reward-conditioned policy that optimizes across the multiple human-specified dimensions. Across four real-world and two simulated long-horizon manipulation tasks, FPL improves over sparse-reward and binary-preference methods by 38 percentage points. Beyond improved performance, FPL learns dense progress signals without explicit subtask segmentation, shows compositionality of behavior not present in the data, and allows users to steer the policy towards different behaviors at test time without retraining. Blog post with videos available at https://freeform-pl.github.io/fpl.website/

View PaperSource Link
Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data

Emilie Vautier, Clément Mallet, Cédric Vega

Forest attributes are essential for national-scale resource monitoring. Airborne LiDAR metrics are among the auxiliary variables most strongly correlated with forest attributes used in National Forest Inventory (NFI) estimates. However, producing wall-to-wall predictions remains challenging when LiDAR data are acquired under heterogeneous conditions. As national LiDAR programs expand across Europe, variability in sensors, flight parameters, seasons, and scan angles limits the robustness of existing models, which are often calibrated for local conditions. We present FLORA (Forest LiDAR Octree Regression with Auxiliary Data), a deep learning framework that predicts six forest attributes: dominant height, total volume, deciduous volume, coniferous volume, basal area, and stem density from heterogeneous LiDAR point clouds. FLORA combines an octree-based backbone with ecological and spatiotemporal auxiliary variables through a late-fusion gating mechanism. Models are trained and evaluated on 32,052 National Forest Inventory plots across mainland France using data from the French LiDAR HD program. A single model trained on both leaf-on and leaf-off acquisitions outperforms season-specific models and improves cross-season robustness. Auxiliary variables provide modest overall gains but contribute more strongly to species-specific volume prediction. FLORA achieves an rRMSE of about 12.3% (R2 = 0.88) for dominant height and 39% (R2 = 0.74) for total volume, providing a robust baseline for large-scale forest attribute estimation from heterogeneous national LiDAR programs.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

SemRF: A Semantic Reference Frame for Residual-Stream Dynamics in Language Models

Jian Gu, Aldeida Aleti, Chunyang Chen et al.

Residual-stream analysis asks how language-model computation evolves across depth, but intermediate decoding requires comparable readout coordinates across layers. If embedding anchors and unembedding readout disagree on the chosen span, apparent motion may reflect measurement drift rather than computation. We introduce \emph{Semantic Reference Frames} (SemRF), an anchor-based formalism separating semantic measurement from residual dynamics. A SemRF fixes anchors and measures states against them. Pseudo-inverse tying gives exact synchronization; under restricted bi-invertibility, SemRF yields stable semantic-basis coordinates, distortion bounds, and near-identity changes. With the frame fixed, residual computation becomes a depthwise semantic trajectory. The anchors induce a semantic Voronoi diagram: distance, or evidence such as logits, assigns each layer to a coarse cell, while coordinates retain within-cell motion and margins. We define layerwise steps, contribution profiles, and imbalance diagnostics, then use the Voronoi trace to define a margin-relaxed tube. The canonical trace is the minimum-action path inside this tube; when nonempty with positive quadratic weight, it is unique and obeys a discrete spline equation away from active constraints. Excess action controls step, curvature, and profile mismatch. Low curvature implies piecewise-linear compressibility and local knowledge density: lower trace complexity means fewer semantic knots. Through the parameter-to-trajectory map, this gives a conditional link to parameter efficiency: among admissible settings fitting data, lower-action and lower-complexity traces use fewer semantic degrees of freedom. The guarantees require controlled interface error and small projection residual under explicit tube constraints.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Automated Background Swapping for Robustness against Spurious Backgrounds

Cesar Roder, Kajetan Schweighofer

Classifiers based on Deep Neural Networks exhibit strong performance across domains, yet can fail catastrophically if they rely on spurious correlations, i.e., features that are predictive of the target label in the training data but are not causally linked and thus fail to generalize. For the vision domain, many such spurious correlations manifest themselves within the background of the image, where only the foreground is predictive of the class label. In this paper, we introduce Automated Background Swapping (AutoBackSwap) to reduce the reliance of classifiers on such spurious backgrounds. AutoBackSwap uses a secondary network to disentangle the foreground and background, followed by infilling to synthesize complete backgrounds, and finally combines different foregrounds and inpainted backgrounds to augment the training data. We find that patch-wise labeling of just a few hundred samples suffices to train the secondary network and automatically augment the full training dataset on challenging image classification tasks. In contrast to many previous methods, AutoBackSwap proves very effective even if there is not a single sample in the training data breaking the spurious correlation. Across a range of image classification tasks with spurious backgrounds, AutoBackSwap consistently outperforms prior methods.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

TRIAGE: Role-Typed Credit Assignment for Agentic Reinforcement Learning

Yuanda Xu, Zhengze Zhou, Hejian Sang et al.

Agentic reinforcement learning requires assigning credit to environment-facing actions such as searches, clicks, edits, navigation commands, and object interactions. Standard GRPO uses the final verifier outcome as a uniform advantage over all action tokens. This outcome signal is useful but structurally incomplete: it punishes useful exploration in failed rollouts and reinforces redundant or regressive actions in successful rollouts. We propose TRIAGE, a role-typed credit assignment framework that adds a semantic role axis to outcome credit. A structured judge classifies each segment as decisive progress, useful exploration, no-progress infrastructure, or regression, and a fixed role-conditioned rule maps these labels to bounded segment-level process rewards. This keeps verifier outcomes as the source of optimization direction while correcting the two main blind spots of outcome-only credit. We further show that role-conditioned credit is the optimal segment-level correction expressible from role labels alone -- a projection of the per-segment advantage residual onto the role variable -- so that the fixed role constants reduce advantage estimation error whenever the judge is reliable, and we connect this to lower-variance policy gradients. Across ALFWorld, Search-QA, and WebShop, TRIAGE improves success rates over GRPO for two policy models and outperforms both a scalar judge-derived process reward and an outcome-supervised shared-backbone value baseline. Ablations show that the gain comes from role typing rather than merely adding dense rewards: reliable detection of regression inside successful trajectories is the dominant contributor, while exploration credit provides a consistent secondary gain; on completed ALFWorld and WebShop rollouts, TRIAGE also reduces environment-facing turns by an additional $10.4\%$ and $14.8\%$ relative to GRPO.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

FedLAB: Traceable Semantic Codebooks for Federated Multimodal Graph Foundation Learning

Zekai Chen, Kairui Yang, Xuaner Chen et al.

Multimodal graph foundation models aim to learn reusable knowledge from graphs enriched with text, images, attributes, and relational topology, thereby supporting diverse graph-centric and modality-centric tasks. In practice, however, such multimodal graphs are often distributed across decentralized clients, where raw contents and local structures cannot be centrally shared due to privacy constraints. This motivates federated multimodal graph foundation learning, which requires not only transferable representation learning but also intrinsic semantic traceability under strict data isolation. Existing methods usually exchange or store knowledge through parameters, prototypes, embeddings, or compact codebooks, which support optimization and transfer but do not explicitly expose how modality evidence, node semantics, and topology context jointly support predictions. To bridge this gap, we propose FedLAB, a traceable semantic codebook framework that organizes multimodal graph knowledge into typed hierarchical codebooks for modality evidence, node semantics, and topology context. FedLAB further refines these trace units through federated semantic barycenter pre-training while keeping raw multimodal contents and graph structures local. Extensive experiments on 10 benchmarks and 6 downstream tasks show that FedLAB improves over state-of-the-art baselines by up to 7.53\%, while preserving a native semantic trace interface.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

CoMet: Context and Multiplicity Decomposition for Multimodal Uncertainty Estimation

Sanghyuk Chun, William Yang, Amaya Dharmasiri et al.

Uncertainty estimation has been a long-standing challenge in AI models; it amounts to "knowing what you don't know," and metacognition is notoriously difficult even for humans (cf. the Dunning-Kruger effect). Although it is still far from solved even in simpler classification systems, tackling it in multimodal large language models (MLLMs) is becoming increasingly important. Within MLLMs, uncertainty can stem from any of the diverse sources as well as from their relationships, and further can stem from the unbounded answers in the open-ended setting. To tackle the issues, we propose CoMet, an MLLM uncertainty estimation method by decomposing uncertainty into a context-specific term and a multiplicity-specific term. The former captures ambiguity induced by the given context (e.g., task or prompt), while the latter captures how many plausible answers determined by the context remain compatible with the given input. We train a lightweight post-hoc uncertainty module to estimate these quantities, which enables efficient uncertainty estimation without autoregressive answer generation or repeated sampling. Experiments on various open-ended multimodal benchmarks, hallucination detection, and multiple-choice visual question answering benchmarks show that CoMet consistently improves uncertainty estimation over existing baselines while remaining efficient in practice. Code is available at https://github.com/princetonvisualai/comet_uncertainty

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Human-as-Humanoid: Enabling Zero-Shot Humanoid Learning from Ego-Exo Human Videos with Human-Aligned Embodiments

Xiaopeng Lin, Ruoqi Yang, Shijie Lian et al.

Vision-language-action (VLA) models across robot embodiments require high-quality observation--action supervision to learn deployable action distributions, yet scaling such robot data remains difficult, especially for high-DoF humanoids. Teleoperation provides controller-aligned supervision, while human egocentric videos capture diverse bimanual manipulation but do not directly provide executable robot actions. We introduce Human-as-Humanoid, a human-to-humanoid supervision framework that enables near-real-time human-centric action generation, making human demonstrations usable for high-DoF humanoid VLA training by jointly aligning the robot embodiment, the sensing setup, and the action-label interface. Built on PrimeU, a human-aligned 60-DoF upper-body humanoid, Human-as-Humanoid uses synchronized ego-exo videos to pair deployment-aligned egocentric observations with exocentric motion recovery, retargets the recovered human motion through staged Inverse Kinematics (IK) into controller-aligned 60-DoF action chunks, and trains the VLA model with Forward Kinematics (FK)-aware supervision to preserve wrist and fingertip task-space geometry. This converts large-scale human demonstrations from visual observations into executable observation--action supervision for the target humanoid. Experiments validate the conversion chain at the motion-recovery, robot-action-space, and real-robot deployment levels. Human-as-Humanoid yields a 4.8--7.2x raw demonstration-throughput gain over humanoid teleoperation in our data-collection analysis, and on several downstream tasks, policies post-trained only with the converted human labels generalize to real-robot deployment without target-task robot demonstrations. The official project website is available at https://zgc-embodyai.github.io/Human-as-Humanoid.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Surrogate Fidelity: When Can Open LLMs Explain Closed Ones?

Philippe Chlenski, Zachariah Carmichael, Ayush Warikoo et al.

Mechanistic interpretability (MI) requires full access to model internals, yet the APIs for most widely deployed language models at best expose log-probabilities over output tokens. This creates a surrogate problem: when do measurements made on open models allow us to make claims about a closed model? We evaluate surrogate fidelity at the prediction, attribution, and representation levels. For binary classification tasks, log-odds provide an API-compatible scalar readout of the model's representation space, and leave-one-out attributions provide insight into model behavior. Across eleven models spanning four families (Llama, Qwen, GPT, and Gemini), we find that prediction fidelity substantially overstates attribution fidelity: models that agree on what the answer is often disagree on why. We document an access-validity inversion: white-box signals like attention patterns and perturbation magnitudes are highly stable across models but only weakly predictive of causal attributions, which black-box input ablations capture by design. Mechanistic insight does not automatically transfer to closed targets, and prediction-level agreement is insufficient to warrant such transfer. Code and results are available at https://github.com/facebookresearch/surrogate.

View PaperSource Link
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Efficient entanglement of three remote single-atom quantum-network nodes

Matthias Seubert, Leonardo Ruscio, Tobias Frank et al.

Entanglement distributed over a set of individually addressable qubit nodes is the enabling resource for a plethora of applications ranging from tests of quantum physics to secure and modular quantum information networks. Entanglement between two memory qubits has been realized on various platforms, but extension to more nodes remains rare and formidably challenging. The principal bottleneck is the efficiency of the light-matter interfaces connecting the qubit nodes to their communication channels. Here, we efficiently generate, distribute and store a three-qubit entangled state across three independent laboratories containing single atoms coupled to optical resonators. We sequentially entangle the atoms pairwise, two by heralded photonic entanglement swapping and two by heralded state transfer. We reach a three-qubit entanglement fidelity of 77(1)% and an entanglement lifetime above 200us. The observed qubit correlations violate Mermin's inequality while closing the detection loophole. Our three-qubit entanglement-generation efficiency is 0.16%. This unprecedented efficiency of our scheme establishes a clear route towards multi-node quantum networks.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

PolicyGuard: From Organizational Policies to Neuro-SymbolicCompliance Review Engines

Sameer Malik, Ayush Singh, Amar Prakash Azad

Policy-grounded document review requires determining whether a target document complies with organization-specific policies, guidelines, or playbooks. While large language models can assist with policy interpretation and document analysis, end-to-end prompting leaves the applied policy logic implicit, making compliance decisions difficult to inspect, update, and test. We present PolicyGuard, a neuro-symbolic framework for policy-grounded document compliance review. PolicyGuard converts organizational policy guidance into an executable review engine consisting of typed relational logic rules and atom-level extraction questions. During review, LLMs answer these local questions using retrieved document evidence, and a symbolic evaluator applies the formal rules to detect non-compliance. We instantiate and evaluate PolicyGuard on company-specific NDA compliance review, where contract clauses must be checked against organization-specific negotiation policies. By separating policy formalization, local document interpretation, and symbolic compliance evaluation, PolicyGuard makes document review more explicit, maintainable, and systematically testable.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Self-Study Reconsidered: The Hidden Fragility of Learning from Self-Generated QA

Ekaterina Alimaskina, Denis Shveykin, Gleb Molodtsov et al.

Language models are increasingly taught from synthetic question--answer (QA) supervision: a model generates questions about a document, answers them from the same text, and the resulting pairs are used to fine-tune, distill, or compress knowledge into another model. We show that this generation step is not neutral preprocessing. It is an implicit policy that both selects which evidence becomes training signal and decides how that evidence is answered, and it is fragile at both stages. When choosing what to ask, generators do not scan a document uniformly. Coverage saturates early and concentrates on salient spans, diverse prompts converge on the same regions, and what looks question-worthy is driven by local presentation. As a result, salient artifacts such as poorly cleaned markup can hijack question generation across model families and scales. When answering, the model that produces the supervision tends to obey instruction-like passages embedded in the text. This compliance depends on the intent and surface form of the passage rather than its strictness, and is worst under task conflict, where larger models comply more often. These failure modes arise from choices made during QA generation, so they can be reduced without changing the training loop. Tying each question to a fixed target reduces biased selection, and filtering instruction-like spans before answering lowers mean injection compliance from $88\%$ to $13\%$ in our evaluation while retaining nearly all clean text.

View PaperSource Link
Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

GR2 Technical Report

Yufei Li, Zaiwei Zhang, Mingfu Liang et al.

Industrial recommendation systems serve billions of users through a multi-stage funnel -- retrieval, early-stage ranking, and re-ranking -- where the final re-ranking step disproportionately shapes user engagement and downstream performance, particularly for carousel and grid display formats. Despite growing enthusiasm for Large Language Models (LLMs) in recommendation, three gaps hinder industrial adoption: (1) most efforts target retrieval and ranking, leaving re-ranking -- the stage closest to the final user experience -- largely underexplored; (2) LLMs are typically deployed zero-shot or via supervised fine-tuning, underutilizing the reasoning capabilities unlocked by reinforcement learning (RL) on verifiable rewards; (3) deployed catalogs index billions of items with non-semantic identifiers that lie outside any base-LLM vocabulary. We present GR2 (Generative Reasoning Re-Ranker), an end-to-end framework that combines (i) mid-training on semantic IDs produced by a tokenizer with >=99% uniqueness, (ii) reasoning-trace distilled from a stronger teacher via targeted prompting and rejection sampling, and (iii) RL with verifiable rewards purpose-built for re-ranking. To make GR2 resource-viable, we further (iv) introduce a context compressor that amortizes training cost, On-Policy Distillation (OPD) as a scalable alternative to SFT -- which we find collapses at industrial scale -- and reasoning distillation for low-latency serving. GR2 delivers +18.7% R@1, +7.1% R@3, and +9.6% N@3 over legacy baselines on industrial-scale traffic. We further find that reward design is critical in re-ranking: LLMs often hack rewards by preserving the incoming order or exploiting position bias, motivating conditional verifiable rewards as essential industrial components.

View PaperSource Link
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Spatially Coupled MacKay-Neal/Hsu-Anastasopoulos CSS Codes Achieve the Quantum-Erasure Hashing Bound by Seeded BP Decoding

Kenta Kasai

In classical sparse-graph coding, spatial coupling is a mechanism by which belief-propagation (BP) decoding attains the maximum-a-posteriori (MAP) or area-threshold performance of the uncoupled system. Since MacKay-Neal/Hsu-Anastasopoulos (MN/HA) punctured sparse ensembles achieve capacity under MAP decoding, it is natural to ask whether spatially coupled MN/HA-type Calderbank-Shor-Steane (CSS) codes can reach the hashing bound on the quantum erasure channel under seeded BP decoding. We answer this question at the density evolution (DE) level for hard-erasure CSS decoding. On an erased coordinate, the two binary Pauli components remain unresolved, equivalently the erased qubit is represented by the four Pauli possibilities. We first define the CSS ensemble through sparse punctured matrices and the corresponding dense parity-check matrices. For fixed finite Z-side, X-side, and check degrees, we then derive a five-message uncoupled DE recursion, decompose it into Z-side and X-side constituent systems, and define the two constituent potentials. Applying the coupled-vector potential method to the two constituents separately proves that seeded BP decoding on the resulting finite-degree factor graphs reaches the smaller of the Z-side degree ratio and the X-side complementary degree ratio. In the X/Z equal-rate specialization, where the Z-side and X-side constituent design rates are equal, this BP threshold is the hashing-bound channel parameter determined by the design rate. Thus the paper gives a DE-level proof that seeded BP decoding with finite-degree factor graphs achieves the hashing bound for the X/Z equal-rate family. Finite-length BP concentration, block-error convergence, and a finite-code realization of the ideal DE seed are separate questions.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Radial Suppression Accelerates Algorithmic Generalization: A Geometric Analysis of Delayed Generalization

Srijan Tiwari, Aditya Chauhan, Manjot Singh

Why do neural networks memorize algorithmic training data long before they generalize? We present a geometric case study demonstrating that, on tasks where generalization requires discovering structured low-dimensional circuits, the memorization-generalization delay is driven by radial inflation of hidden representations under cross-entropy optimization. We formalize a radial-angular decomposition of activation-space dynamics and derive three testable propositions: (i) that penalizing radial inflation induces anisotropic, data-dependent weight regularization; (ii) that it suppresses radial gradient energy below the isotropic random baseline, forcing predominantly angular updates; and (iii) that it biases convergence toward flatter minima. To empirically validate these propositions, we study a single-hyperparameter norm penalty that softly constrains activations to a sqrt(d)-radius hypersphere. On modular arithmetic, this penalty accelerates grokking up to 6x across MLPs and Transformers, and halves training steps for a 10M-parameter nanoGPT on 3-digit addition.

View PaperSource Link
Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Electromagnetic radiation from a point-like charge in a weak gravitational wave: a Shapiro-delay-motivated approach

Vladimir Epp, Konstantin Osetrin, Taya But

We investigate the field of a point-like electric charge freely falling in a gravitational wave. In the presence of a gravitational wave, the initially static Coulomb field of the charge becomes time-dependent and generates corresponding radiation. The gravitational wave is treated as a weak perturbation of the Minkowski metric. The electromagnetic four-potential of the charge is sought as a solution to Maxwell's equations in the gravitational wave metric, to first order in perturbation theory. The potentials of the point charge are found in quadratures throughout the space. To regularize the potentials, an approach motivated by the Shapiro effect for the time delay of radiation in a gravitational field is used. The potentials of the charge in the far zone are calculated explicitly for a monochromatic, arbitrarily polarized gravitational wave. The angular distribution of the electromagnetic radiation induced by the gravitational wave is obtained.

View PaperSource Link
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Quantum Information as a New Lens for Precision Neutrino Physics

Khushboo Dixit, Ritam Kundu, Papia Panda et al.

We present a quantum-information-theoretic study of three-flavor neutrino oscillations in long-baseline experiments by mapping flavor states to qubit-like representations and quantifying quantum correlations through total concurrence. The local minima of this entanglement measure identify energy regions where the flavor state is closest to separability, enabling cleaner extraction of oscillation parameters. We explain how these local minima offer opportunities for precision measurements and provide insight into the accurate determination of neutrino oscillation parameters. We then propose a strategy to improve parameter extraction by aligning the benchmark oscillation regions of NO$ν$A and T2K with the minimum entanglement achievable in each experiment. This shifts the concurrence minima toward higher-event-count energy regions, leading to tighter constraints and reducing the tension arising from their different energy regimes. For normal ordering, we obtain $(0.581^{+0.0136}_{-0.0150},,195^{+38}_{-32},^\circ)$ in the $(\sin^2θ_{23},δ_{\rm CP})$ plane and $(0.580^{+0.0140}_{-0.0153},,2.515^{+0.0344}_{-0.0344}\times10^{-3},\mathrm{eV}^2)$ in the $(\sin^2θ_{23},Δm^2_{31})$ plane, yielding improved joint constraints. Using GLoBES simulations together with real data, we assess how local minima of quantum correlations influence leptonic CP-violation sensitivity, $θ_{23}$ octant-degeneracy resolution, and mass-ordering determination. Our results show that minimizing entanglement can significantly affect these key sensitivities, highlighting quantum information measures as complementary probes of neutrino flavor oscillations and offering new insight into the role of quantum correlations in precision neutrino physics.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

OopsieVerse: A Safety Benchmark with Damage-Aware Simulation for Robot Manipulation

Arnav Balaji, Arpit Bahety, Sriniket Ambatipudi et al.

While robotic manipulation capabilities have advanced rapidly, physical safety remains a major barrier to deploying household robots: task success is insufficient if the robot damages itself or its surroundings. Simulation offers a harm-free alternative to costly and dangerous real-world training and evaluation, yet existing simulators lack general mechanisms to detect, quantify, and represent damage. To address this gap, we introduce OOPSIEVERSE, a unified simulation framework and benchmark for damage-aware household manipulation. OOPSIEVERSE provides damage as an explicit, physically-grounded, and taskagnostic signal by converting sources such as contact forces, temperature changes, and liquid interactions into corresponding mechanical, thermal or fluid damage. OOPSIEVERSE comprises two core elements: (1) DAMAGESIM, a simulator-agnostic framework for detecting and quantifying damage during navigation and manipulation, and (2) a suite of household tasks designed to evaluate common damage modes and distinguish between task completion and safe execution. We demonstrate the generality of our framework by instantiating DAMAGESIM in two simulators with different physics backends, OmniGibson (Nvidia Omniverse) and RoboCasa (MuJoCo). We further showcase the utility of OOPSIEVERSE across multiple use cases, including (1) guiding safer demonstration collection via real-time damage feedback, (2) learning safer manipulation policies through damage-conditioned imitation learning and reinforcement learning, (3) benchmarking the safety of state-of-the-art Vision Language Action policies, and (4) improving real-world safety of sim-to-real transferred policies. Together, our results highlight the potential of OOPSIEVERSE as an open-source foundation for systematic, scalable research on safe robot manipulation. For code and more information, please refer to https://robin-lab.cs.utexas.edu/oopsieverse/

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Amplifying Membership Signal Through Chained Regeneration

Wojciech Łapacz, Stanisław Pawlak

The tendency of large generative models to memorize training data makes sample verification critical for privacy auditing and copyright enforcement. Current membership (MIA) and dataset inference (DI) attacks often rely on one-shot generations, which yield weak signals and limited sensitivity across modalities. Inspired by Model Autophagy Disorder (MAD), we introduce MADreMIA, a model-agnostic framework that enhances white-, gray-, and black-box MIA and DI. Rather than relying on shadow model training -- often infeasible for large generative models -- our framework facilitates scalable inference by leveraging inherent signals through iterative trajectories. This process utilizes chained generations across diverse modalities, where each output serves as the subsequent input, to improve membership evidence at low FPR. We demonstrate that memorized training samples exhibit significantly higher coherence and slower degradation during iterative regeneration than non-member generations. Our results show that MADreMIA provides richer signals across diverse model families and modalities; we present comprehensive evaluations for IARs, diffusion, and language models, alongside preliminary results demonstrating its potential for audio models.

View PaperSource Link
Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Joint inference of weak lensing convergence map and cosmology with diffusion models

Benjamin Remy, Chihway Chang, Rebecca Willett

We present a method for joint inference of cosmological parameters and convergence maps from weak lensing observations, targeting the full posterior conditioned on the observed shear field. Our approach uses implicit inference with diffusion models, learning the joint distribution from simulations, without the need to have an explicit and differentiable forward model for gradient-based MCMC sampling. We introduce a transformer-based architecture that operates in pixel space and treats cosmological parameters as additional tokens in a unified sequence, enabling efficient multimodal processing within a single network. At inference time, the trained model generates posterior samples of joint convergence maps and cosmological parameters conditioned on observed noisy shear fields. We demonstrate the method on simulated weak lensing data generated from log-normal fields in a wcdm cosmology. The model accurately reconstructs convergence maps and recovers cosmological posteriors that agree with traditional MCMC, while remaining well calibrated across the prior, with a MIRA calibration score of $0.635 \pm 0.017$ on the joint posterior (where $0.667$ is optimal). The inferred fields reproduce the correct two-point statistics as well as non-Gaussian statistics such as the one-point distribution. This work establishes diffusion-based implicit inference as a viable route toward full field-level cosmological analyses, paving the way for applications to more realistic, non-differentiable simulators.

View PaperSource Link
Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

LUNA: Learning Universal 3D Human Animation Beyond Skinning

Peng Li, Rawal Khirodkar, Junxuan Li et al.

Creating photorealistic, animatable 3D human avatars from monocular images still largely depends on Linear Blend Skinning (LBS) and parametric body models, which constrain expressivity and often introduce artifacts due to imperfect fitting. We propose LUNA, an LBS-free universal neural animation model that directly maps multiple 2D controls like images, keypoints, sketches, and unseen characters into 3D Gaussian deformations, bypassing explicit body fitting. At its core, a transformer-based motion regressor disentangles global rigid motion from fine-grained local dynamics to capture both coherent movement and subtle non-rigid effects. To resolve the inherent ambiguity of 2D-to-3D lifting while scaling beyond fitted datasets, we introduce hybrid supervision that distills soft structural priors from an LBS teacher and a loss that supports training on both limited fitted data and large in-the-wild unlabeled videos. Extensive experiments show LUNA achieves competitive visual fidelity compared to LBS-based approaches, while delivering realistic human motion and zero-shot cross-identity generalization across diverse driving modalities. To the best of our knowledge, LUNA is the first end-to-end 3D animatable model that supports implicit 2D driving.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Evaluation of Population Initialization Methods for Genetic Programming-based Symbolic Regression

Lukas Kammerer, Gabriel Kronberger, Deaglan J. Bartlett et al.

We analyze the effect of optimizing the initial population of genetic programming (GP) for symbolic regression (SR) on the accuracy and complexity of solutions. We compare three well-established random initialization methods as well as initialization with small optimized solutions from exhaustive symbolic regression (ESR) using a GP/SR implementation which is based on the multi-objective evolutionary algorithm NSGA-II. We compare the final Pareto fronts found with each initialization method on twelve synthetic problems of varying complexity and one real-world dataset. We find no significant differences in accuracy or model complexity among the initialization methods. The initial advantage of initialization with ESR disappears after only a few generations. Our results show that, given similar diversity in the initial population, the effect of the initialization method in GP-based symbolic regression on the final Pareto front is negligible.

View PaperSource Link
Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Constraining dark energy with complementary probes of large-scale structure

Neel Shah, Kazuya Koyama, Johannes Noller et al.

To observationally pin down the nature of dark energy, it is essential to consistently model cosmological perturbations in the presence of dark energy alongside the background expansion and constrain this joint theory space with a large array of complementary probes. Here, we achieve this by constraining a model in the Effective Field Theory of Dark Energy (EFTofDE) framework by supplementing probes of the expansion history with several probes of large-scale structure: redshift space distortions (RSD) from DESI DR1, $3\times2$pt measurements from DES Y3, and the Integrated Sachs-Wolfe effect from cross-correlating CMB temperature anisotropies with galaxy number counts or CMB lensing. We demonstrate the complementarity of different probes which leads to strong improvements on constraints on DE perturbations. For our most constraining dataset combination that supplements CMB+BAO+SNe probes with DESI DR1 RSD, DES Y3 $3\times2$pt and ISW cross-correlations between CMB temperature and galaxy counts, we find an improvement in the Figure of Merit (FoM) for the DE perturbation parameters $\{c_B, c_M\}$ by a factor of 2.69. We show the phenomenological implications of these constraints by mapping them to the present-day values of the phenomenological functions $\{μ(z), Σ(z)\}$, where we see an FoM improvement by a factor of 3.37. We find a significant interdependence between the posteriors of $\{w_0, w_a\}$ and $\{c_B, c_M\}$, caused by the theoretical prior imposed by the gradient stability condition within the EFTofDE framework. Finally, we compute the significance of deviation from $Λ$CM for the EFTofDE model when constrained with CMB+BAO+SNe datasets, finding it to be at 2.9$σ$. This significance is nontrivially similar to the significance for the $w_0w_a$CDM model for the same dataset combination which we find to be 3.1$σ$.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

TreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language Models

Shiyi Chen, Nicholas Saban, Collin Hargreaves et al.

Human-labeled data are widely used as reference annotations in ML, despite known variability across annotators in many expert-driven domains. In addition, expert annotation is slow, inconsistent, and remains a major bottleneck for scaling tasks like tree height bias classification in forestry remote sensing. We propose a multi-agent system (MAS) that orchestrates expert decision trees with Vision-Language Models (VLMs), treating the decision tree as a structural prior while VLMs perform localized semantic perception at individual nodes, with multi-agent voting to mitigate VLM stochasticity. We formalize a Decoupled Declarative Decision (D3) Framework that enables zero-modification generalization across diverse expert-defined decision structures. On a tree bias classification testbed, our framework outperforms supervised ML baselines and reduces the amount of expert labeling effort required. These results suggest that agentic orchestration of VLMs with expert priors can reproduce expert-defined labeling procedures at substantially lower annotation cost while maintaining interpretability.

View PaperSource Link
Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Reheating in No-Scale Models of Inflation

Ignatios Antoniadis, John Ellis, Dimitri V. Nanopoulos et al.

Analogously to the suppression of inflaton decays into conformally-coupled scalar fields in the original Starobinsky $R + R^2$ model of inflation, inflaton decays to Standard Model fields are also suppressed in minimal no-scale models of inflation with field space curvature $\mathcal{R} = 2/3$. We study how this suppression can be avoided in generalized no-scale inflationary models. These include models in which the field space curvature $\mathcal{R} = 2/(3α)$ with $α\ne 1$ as exemplified by models derived from string theory, as well as models with non-minimal gauge kinetic terms and anomaly-induced couplings. We analyze direct and anomaly-induced inflaton couplings to gauge bosons and gauginos and demonstrate the Kähler-frame invariance of the physical gauge coupling. We determine the resulting reheating temperatures and the corresponding predictions in the $(n_s,r)$ plane. Finally, we consider an $R^3$ deformation of Starobinsky supergravity, which modifies the inflaton and stabilizer sectors but does not, by itself, generate new tree-level inflaton couplings to visible matter fields.

View PaperSource Link
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

The contact temperature of arbitrary quantum states

Alain Joye, Marco Merkli

An intuitive scheme to assign a temperature to an arbitrary state of a quantum system is to investigate the heat flow resulting from the coupling to a thermometer. We introduce a simple model of a universal thermometer with the following property. When it is prepared in a Gibbs equilibrium state at inverse temperature $β\in\mathbb R$ and brought into thermal contact with a system in any state, the heat flow between the system and thermometer vanishes for a unique value of $β$. We call this value the contact temperature $β_{\rm op}\in\mathbb R$ of the system state. The thermometer is universal in that it yields a unique contact temperature for arbitrary states of finite dimensional quantum systems.

View PaperSource Link
Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments

Qingyun Liu, Jiwen Zhang, Jingyi Hu et al.

Recent multimodal large language models (MLLMs) have strong potential as embodied agents, but their ability to collaborate in visually grounded environments remains underexplored. To address this gap, we introduce MECoBench, a multimodal embodied cooperation benchmark with an evaluation platform spanning diverse real-world tasks, two cooperation structures, and three collaboration modes. Through extensive experiments across various MLLMs, we summarize three key findings: (i) Collaboration generally improves embodied task completion, but its benefits depend on balancing collaborative gains against coordination complexity. (ii) Communication is essential to collaboration gains, while the best collaboration mode depends on team size and model capability. (iii) Moreover, collaboration improves robustness under noisy priors and exploration conditions. Generally, MECoBench provides a systematic testbed for understanding the mechanisms and limits of multimodal embodied collaboration. Code and dataset are available at https://github.com/q-i-n-g/MECoBench.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Signed-Permutation Coordinate Transport for RMSNorm Transformers

John Sweeney

Modern LLM workflows move coordinate-indexed objects across checkpoints: steering vectors, sparse autoencoders, top-$k$ neuron sets, attribution lists, and merge alignments. This is only well posed after fixing the model's residual-stream gauge, which we show is architecture-dependent: LayerNorm residual charts have permutation gauge $S_d$ (up to a global sign flip), while RMSNorm charts with generic per-channel gain have signed-permutation gauge $B_d = S_d \ltimes \{\pm 1\}^d$. Permutation-only alignment is therefore symmetry-incomplete for RMSNorm models. We introduce sign-marginalized Hungarian matching and prove a sharp failure mode: with decorrelated coordinates, raw signed-correlation matching has a structural permutation-accuracy ceiling at the positive-sign fraction of the true gauge, which sign-marginalization removes. We then make coordinate-preserving transport, not function-level merging, the primary object: composing saved-checkpoint local $B_d$ gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching, and the gain is not explained by merely routing through the base. The recovered gauge transfers tools that permutation-only alignment breaks: TinyLlama SAE reconstruction has NMSE 0.004 under $B_d$ versus 1.08 under $S_d$; Qwen sentiment steering preserves 95.8% of its effect versus 17.2%; refusal steering reverses sign under $S_d$; coordinate-preserving merges behave the same way. The same covariance governs stateful training: signed transport of AdamW state preserves the resumed trajectory, while permutation-only state follows a different one from a functionally identical checkpoint. Finally, gauge-sweep audits show index-level interpretability claims are reproducible only relative to an explicit gauge.

View PaperSource Link
Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision

Zifan Carl Guo, Laura Ruis, Jacob Andreas et al.

When does training language models (LMs) to generate explanations of their predictions yield faithful introspection, rather than superficial imitation? We study LMs trained to explain which features of their inputs influenced their behavior, using models' counterfactual behavior on modified inputs as supervision. Surprisingly, we find that LMs trained on fixed counterfactual explanations derived from earlier checkpoints of themselves, or even from behaviorally similar models in different families, frequently produce explanations more faithful to their own current behaviors than to those of their training targets. This "introspective" coupling between LM explanations and behaviors occurs when training explanations remain sufficiently correlated with current behaviors over the course of training, even as behaviors themselves shift. We also show that introspective coupling tracks behavior shifts: when explanation training is provided concurrently with other post-training objectives, explanations track those shifts without requiring updated supervision. This phenomenon appears in multiple tasks, including sycophancy and refusal, and is robust to label noise. Overall, our results show that even fixed datasets of counterfactual explanations can provide scalable and generalizable post-training signal for introspection.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

AdaJEPA: An Adaptive Latent World Model

Ying Wang, Oumayma Bounou, Yann LeCun et al.

Latent world models enable planning from high-dimensional observations by predicting future states in a compact latent space. However, these models are typically kept frozen at test time: when their predictions become inaccurate, planning can fail, especially under test-time distribution shift. To address this, we propose AdaJEPA, an adaptive latent world model that performs test-time adaptation within the closed loop of model predictive control (MPC). After training, AdaJEPA plans and executes the first action chunk, uses the observed next-state transition as a self-supervised adaptation signal, and replans with the updated model. This closed-loop update continuously recalibrates the world model without additional expert demonstrations. Across a range of goal-reaching tasks, AdaJEPA substantially improves planning success with as few as one gradient step per MPC replanning step.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

AxDafny: Agentic Verified Code Generation in Dafny

Benjamin Breen, Austin Letson, Borja Requena Pozo et al.

We study agentic code generation in Dafny, where a model must generate both executable code and the proof artifacts for verification. We present AxDafny, a verifier-guided repair framework that iteratively generates implementations, invariants, assertions, and termination arguments. We also introduce LiveCodeBench-Pro-Dafny (LCB-Pro-Dafny), a benchmark of 250 competition-style programming problems translated into Dafny with formal specifications and a verifier-based evaluation harness. On LCB-Pro-Dafny, AxDafny substantially improves verification success over baseline GPT-5.5 performance. On DafnyBench, AxDafny achieves 92.7\% verification success, outperforming the strongest previously reported proof-hint baseline by 6.5 percentage points. Lastly, we show that verification success and runtime test performance measure different aspects of generated code.

View PaperSource Link
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Adapting Generalist Robot Policies with Semantic Reinforcement Learning

Jagdeep Singh Bhatia, Andrew Wagenmaker, William Chen et al.

Generalist robot policies learn a diverse repertoire of behaviors from large-scale pretraining. In principle, this makes them excellent priors for downstream adaptation via reinforcement learning (RL). In practice, however, standard RL methods leveraging this prior optimize directly over robot actions, requiring the base policy's action distribution to be close to that of a performant policy from the start. This assumption breaks down for complex or long-horizon tasks that fall outside the pretraining distribution. Our key insight is that, for sufficiently expressive generalist policies, language prompts are an effective alternative space for learning to solve such tasks: modulating language inputs elicits skills already within the policy's repertoire, which can be composed to solve tasks beyond its zero-shot capabilities. We propose Semantic Action Reinforcement Learning (SARL), which learns to optimize this prompt space through online interaction, treating the generalist policy as a controllable skill prior. Importantly, leveraging pretrained skills rather than learning new ones from scratch yields structured, semantically meaningful exploration and highly efficient online improvement, and learning to modulate prompts through experience grounds them in induced real-world behaviors for robust task-solving. Across real-world settings and simulated benchmarks, we show SARL unlocks fundamentally new capabilities -- adapting VLA behavior to solve complex, long-horizon tasks -- and significantly outperforms existing approaches for improving robot behavior in deployment.

View PaperSource Link
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

An efficient Pauli decomposition algorithm for structured matrices

Daniel J. Spencer, Kishor Bharti, Alexey V. Gorshkov

Decomposing classical matrices into linear combinations of Pauli strings is a major bottleneck for end-to-end implementations of near-term quantum algorithms. In this work, we consider a promise version of this Pauli decomposition problem in which the matrix is guaranteed to have support on only $k = \mathsf{poly}(n)$ Pauli strings and is given through classical sparse query access. Existing Pauli decomposition algorithms are designed for the generic, dense problem and do not inherently take advantage of this promised sparsity, so these approaches take time that is exponential in $n$. We present a randomized classical algorithm that does take advantage of this sparsity and recovers the exact Pauli decomposition with success probability at least $1 - δ$, for any $δ$. Under the stated access model, the algorithm executes with query and runtime complexity that is polynomial in $n$, $k$, and $\log(1/δ)$. These results show that, even though finding the Pauli decomposition is exponentially hard for general matrices, it becomes efficiently solvable for matrices that are known to be sparse in the Pauli basis, a regime that is relevant to near-term quantum algorithms operating on structured classical input.

View PaperSource Link