Cosmic Feed

Frontier Research Intelligence

Research area

AI & Cognition

See all
AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

SemRF: A Semantic Reference Frame for Residual-Stream Dynamics in Language Models

Jian Gu, Aldeida Aleti, Chunyang Chen et al.

Residual-stream analysis asks how language-model computation evolves across depth, but intermediate decoding requires comparable readout coordinates across layers. If embedding anchors and unembedding readout disagree on the chosen span, apparent motion may reflect measurement drift rather than computation. We introduce \emph{Semantic Reference Frames} (SemRF), an anchor-based formalism separating semantic measurement from residual dynamics. A SemRF fixes anchors and measures states against them. Pseudo-inverse tying gives exact synchronization; under restricted bi-invertibility, SemRF yields stable semantic-basis coordinates, distortion bounds, and near-identity changes. With the frame fixed, residual computation becomes a depthwise semantic trajectory. The anchors induce a semantic Voronoi diagram: distance, or evidence such as logits, assigns each layer to a coarse cell, while coordinates retain within-cell motion and margins. We define layerwise steps, contribution profiles, and imbalance diagnostics, then use the Voronoi trace to define a margin-relaxed tube. The canonical trace is the minimum-action path inside this tube; when nonempty with positive quadratic weight, it is unique and obeys a discrete spline equation away from active constraints. Excess action controls step, curvature, and profile mismatch. Low curvature implies piecewise-linear compressibility and local knowledge density: lower trace complexity means fewer semantic knots. Through the parameter-to-trajectory map, this gives a conditional link to parameter efficiency: among admissible settings fitting data, lower-action and lower-complexity traces use fewer semantic degrees of freedom. The guarantees require controlled interface error and small projection residual under explicit tube constraints.

AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Evaluation of Population Initialization Methods for Genetic Programming-based Symbolic Regression

Lukas Kammerer, Gabriel Kronberger, Deaglan J. Bartlett et al.

We analyze the effect of optimizing the initial population of genetic programming (GP) for symbolic regression (SR) on the accuracy and complexity of solutions. We compare three well-established random initialization methods as well as initialization with small optimized solutions from exhaustive symbolic regression (ESR) using a GP/SR implementation which is based on the multi-objective evolutionary algorithm NSGA-II. We compare the final Pareto fronts found with each initialization method on twelve synthetic problems of varying complexity and one real-world dataset. We find no significant differences in accuracy or model complexity among the initialization methods. The initial advantage of initialization with ESR disappears after only a few generations. Our results show that, given similar diversity in the initial population, the effect of the initialization method in GP-based symbolic regression on the final Pareto front is negligible.

AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

Sergio Hernández-Gutiérrez, Matteo Merler, Ilze Amanda Auzina et al.

LLM agents increasingly act over long horizons, where a single trajectory can contain hundreds or thousands of actions. In these settings, outcome-only rewards provide too sparse guidance, failing to inform the model about the goodness of intermediate actions. Dense supervision methods aim to solve this problem by scoring intermediate steps, from intrinsic confidence to self-distillation and embedding similarities. However, it is common practice to evaluate them by measuring the downstream performance of a training pipeline that integrates them. This is expensive, conflates supervision quality with training engineering confounders, and renders different methodological families requiring distinct training setups incomparable. As a result, dense supervision methods are rarely benchmarked on common ground. We introduce QVal, a training-free testbed for directly evaluating dense supervision signals. Given a state-action pair, QVal measures how well a method's score is Q-aligned: whether it orders actions according to the Q-values of a strong reference-policy. This lets us compare signals before any training run and separate signal quality from other engineering choices. We instantiate QVal as QVal-v1.0, benchmarking 21 dense supervision methods across four diverse environments and seven methodological families, with over 1.2K evaluation experiments across six open-weight model backbones. We find that simple prompting baselines consistently outperform recent dense supervision methods from the literature, and that performance clusters strongly by family. These findings hold across model sizes, environments, and observation modalities. QVal is designed to be easily extensible to new environments and methods, enabling researchers to iterate on dense supervision methods before any training run.

AI & CognitionarXiv2026-06-30Skeptical (25)
Click card for metadata

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Gabrielle Kaili-May Liu, Avi Caciularu, Gal Yona et al.

Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with high confidence, fail to recognize knowledge boundaries, and misrepresent their internal uncertainty--undermining trustworthiness and reliability. Since monitoring task performance and adapting behavior accordingly are central to metacognition, we posit that models capable of accurately judging their own performance are better positioned to improve it. We operationalize this idea via two novel mechanisms: reinforcement learning with metacognitive feedback (RLMF), a paradigm to refine completion rankings during preference optimization based on the quality of a model's self-judgments of performance, and metacognitive data selection, which uses similar self-judgments to identify high-value training examples, outperforming naive active learning. We apply these innovations to the problem of faithful calibration (FC), a task that is itself fundamentally metacognitive: the goal is to align expressed with intrinsic uncertainty, difficult even for frontier LLMs. We adopt a two-stage, decoupled approach, first using these methods to calibrate the faithfulness of models' self-reported confidence scores, then mapping to natural, context-adaptable linguistic uncertainty via targeted output editing. Extensive experiments show RLMF achieves generalizable, state-of-the-art FC on diverse tasks while preserving accuracy. Further, RLMF surpasses standard RL by up to 63% while enhancing models' ability to assess and express their own capability limits. This positions RLMF as a promising paradigm to enhance LLM metacognition toward improved abilities and alignment, and suggests metacognitive performance as an effective RL signal to overcome limits of prior intrinsic feedback methods.

Research area

Quantum Technology

See all
Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Simulation of Two-qubit Gate Variability and Fidelity of Spin Qubits Built on Nanosheet Technology

Trung Nguyen, Sarah Dweik, Hiu Yung Wong

Silicon spin qubits are promising for large-scale quantum-computer integration because they can fully leverage the well-developed semiconductor infrastructure. However, the low fidelity of two-qubit entanglement gates remains a key barrier to large-scale integrations. Recent simulations of silicon spin-qubit two-qubit gates have been performed on silicon-on-insulator (SOI) platforms, while nanosheet-based charge-qubit work has been limited to single-qubit operation using a two-dimensional Schrödinger approximation. In this work, we study silicon spin-qubit double quantum dots built on nanosheet technology using the Quantum Technology Computer-Aided Design (QTCAD) simulation suite to run three-dimensional Poisson and Schroedinger solvers, followed by a many-body solver to extract exchange interactions. We evaluate the exchange energy sensitivity to process and bias variations and then use QuTiP to solve the master equation for a two-qubit gate. The results show that millivolt-level bias variations at the plunger and middle barrier gates can reduce the gate fidelity below 99%, a common threshold target for many fault-tolerant quantum-computing algorithms. Gate-referred 1/f charge-noise effects are also analyzed through the resulting coherence time.

Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Efficient entanglement of three remote single-atom quantum-network nodes

Matthias Seubert, Leonardo Ruscio, Tobias Frank et al.

Entanglement distributed over a set of individually addressable qubit nodes is the enabling resource for a plethora of applications ranging from tests of quantum physics to secure and modular quantum information networks. Entanglement between two memory qubits has been realized on various platforms, but extension to more nodes remains rare and formidably challenging. The principal bottleneck is the efficiency of the light-matter interfaces connecting the qubit nodes to their communication channels. Here, we efficiently generate, distribute and store a three-qubit entangled state across three independent laboratories containing single atoms coupled to optical resonators. We sequentially entangle the atoms pairwise, two by heralded photonic entanglement swapping and two by heralded state transfer. We reach a three-qubit entanglement fidelity of 77(1)% and an entanglement lifetime above 200us. The observed qubit correlations violate Mermin's inequality while closing the detection loophole. Our three-qubit entanglement-generation efficiency is 0.16%. This unprecedented efficiency of our scheme establishes a clear route towards multi-node quantum networks.

Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Spatially Coupled MacKay-Neal/Hsu-Anastasopoulos CSS Codes Achieve the Quantum-Erasure Hashing Bound by Seeded BP Decoding

Kenta Kasai

In classical sparse-graph coding, spatial coupling is a mechanism by which belief-propagation (BP) decoding attains the maximum-a-posteriori (MAP) or area-threshold performance of the uncoupled system. Since MacKay-Neal/Hsu-Anastasopoulos (MN/HA) punctured sparse ensembles achieve capacity under MAP decoding, it is natural to ask whether spatially coupled MN/HA-type Calderbank-Shor-Steane (CSS) codes can reach the hashing bound on the quantum erasure channel under seeded BP decoding. We answer this question at the density evolution (DE) level for hard-erasure CSS decoding. On an erased coordinate, the two binary Pauli components remain unresolved, equivalently the erased qubit is represented by the four Pauli possibilities. We first define the CSS ensemble through sparse punctured matrices and the corresponding dense parity-check matrices. For fixed finite Z-side, X-side, and check degrees, we then derive a five-message uncoupled DE recursion, decompose it into Z-side and X-side constituent systems, and define the two constituent potentials. Applying the coupled-vector potential method to the two constituents separately proves that seeded BP decoding on the resulting finite-degree factor graphs reaches the smaller of the Z-side degree ratio and the X-side complementary degree ratio. In the X/Z equal-rate specialization, where the Z-side and X-side constituent design rates are equal, this BP threshold is the hashing-bound channel parameter determined by the design rate. Thus the paper gives a DE-level proof that seeded BP decoding with finite-degree factor graphs achieves the hashing bound for the X/Z equal-rate family. Finite-length BP concentration, block-error convergence, and a finite-code realization of the ideal DE seed are separate questions.

Quantum TechnologyarXiv2026-06-30Skeptical (25)
Click card for metadata

Quantum Information as a New Lens for Precision Neutrino Physics

Khushboo Dixit, Ritam Kundu, Papia Panda et al.

We present a quantum-information-theoretic study of three-flavor neutrino oscillations in long-baseline experiments by mapping flavor states to qubit-like representations and quantifying quantum correlations through total concurrence. The local minima of this entanglement measure identify energy regions where the flavor state is closest to separability, enabling cleaner extraction of oscillation parameters. We explain how these local minima offer opportunities for precision measurements and provide insight into the accurate determination of neutrino oscillation parameters. We then propose a strategy to improve parameter extraction by aligning the benchmark oscillation regions of NO$ν$A and T2K with the minimum entanglement achievable in each experiment. This shifts the concurrence minima toward higher-event-count energy regions, leading to tighter constraints and reducing the tension arising from their different energy regimes. For normal ordering, we obtain $(0.581^{+0.0136}_{-0.0150},,195^{+38}_{-32},^\circ)$ in the $(\sin^2θ_{23},δ_{\rm CP})$ plane and $(0.580^{+0.0140}_{-0.0153},,2.515^{+0.0344}_{-0.0344}\times10^{-3},\mathrm{eV}^2)$ in the $(\sin^2θ_{23},Δm^2_{31})$ plane, yielding improved joint constraints. Using GLoBES simulations together with real data, we assess how local minima of quantum correlations influence leptonic CP-violation sensitivity, $θ_{23}$ octant-degeneracy resolution, and mass-ordering determination. Our results show that minimizing entanglement can significantly affect these key sensitivities, highlighting quantum information measures as complementary probes of neutrino flavor oscillations and offering new insight into the role of quantum correlations in precision neutrino physics.

Research area

Space & Physics

See all
Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

FLORA: A deep learning approach to predict forest attributes from heterogeneous LiDAR data

Emilie Vautier, Clément Mallet, Cédric Vega

Forest attributes are essential for national-scale resource monitoring. Airborne LiDAR metrics are among the auxiliary variables most strongly correlated with forest attributes used in National Forest Inventory (NFI) estimates. However, producing wall-to-wall predictions remains challenging when LiDAR data are acquired under heterogeneous conditions. As national LiDAR programs expand across Europe, variability in sensors, flight parameters, seasons, and scan angles limits the robustness of existing models, which are often calibrated for local conditions. We present FLORA (Forest LiDAR Octree Regression with Auxiliary Data), a deep learning framework that predicts six forest attributes: dominant height, total volume, deciduous volume, coniferous volume, basal area, and stem density from heterogeneous LiDAR point clouds. FLORA combines an octree-based backbone with ecological and spatiotemporal auxiliary variables through a late-fusion gating mechanism. Models are trained and evaluated on 32,052 National Forest Inventory plots across mainland France using data from the French LiDAR HD program. A single model trained on both leaf-on and leaf-off acquisitions outperforms season-specific models and improves cross-season robustness. Auxiliary variables provide modest overall gains but contribute more strongly to species-specific volume prediction. FLORA achieves an rRMSE of about 12.3% (R2 = 0.88) for dominant height and 39% (R2 = 0.74) for total volume, providing a robust baseline for large-scale forest attribute estimation from heterogeneous national LiDAR programs.

Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Electromagnetic radiation from a point-like charge in a weak gravitational wave: a Shapiro-delay-motivated approach

Vladimir Epp, Konstantin Osetrin, Taya But

We investigate the field of a point-like electric charge freely falling in a gravitational wave. In the presence of a gravitational wave, the initially static Coulomb field of the charge becomes time-dependent and generates corresponding radiation. The gravitational wave is treated as a weak perturbation of the Minkowski metric. The electromagnetic four-potential of the charge is sought as a solution to Maxwell's equations in the gravitational wave metric, to first order in perturbation theory. The potentials of the point charge are found in quadratures throughout the space. To regularize the potentials, an approach motivated by the Shapiro effect for the time delay of radiation in a gravitational field is used. The potentials of the charge in the far zone are calculated explicitly for a monochromatic, arbitrarily polarized gravitational wave. The angular distribution of the electromagnetic radiation induced by the gravitational wave is obtained.

Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Joint inference of weak lensing convergence map and cosmology with diffusion models

Benjamin Remy, Chihway Chang, Rebecca Willett

We present a method for joint inference of cosmological parameters and convergence maps from weak lensing observations, targeting the full posterior conditioned on the observed shear field. Our approach uses implicit inference with diffusion models, learning the joint distribution from simulations, without the need to have an explicit and differentiable forward model for gradient-based MCMC sampling. We introduce a transformer-based architecture that operates in pixel space and treats cosmological parameters as additional tokens in a unified sequence, enabling efficient multimodal processing within a single network. At inference time, the trained model generates posterior samples of joint convergence maps and cosmological parameters conditioned on observed noisy shear fields. We demonstrate the method on simulated weak lensing data generated from log-normal fields in a wcdm cosmology. The model accurately reconstructs convergence maps and recovers cosmological posteriors that agree with traditional MCMC, while remaining well calibrated across the prior, with a MIRA calibration score of $0.635 \pm 0.017$ on the joint posterior (where $0.667$ is optimal). The inferred fields reproduce the correct two-point statistics as well as non-Gaussian statistics such as the one-point distribution. This work establishes diffusion-based implicit inference as a viable route toward full field-level cosmological analyses, paving the way for applications to more realistic, non-differentiable simulators.

Space & PhysicsarXiv2026-06-30Skeptical (25)
Click card for metadata

Constraining dark energy with complementary probes of large-scale structure

Neel Shah, Kazuya Koyama, Johannes Noller et al.

To observationally pin down the nature of dark energy, it is essential to consistently model cosmological perturbations in the presence of dark energy alongside the background expansion and constrain this joint theory space with a large array of complementary probes. Here, we achieve this by constraining a model in the Effective Field Theory of Dark Energy (EFTofDE) framework by supplementing probes of the expansion history with several probes of large-scale structure: redshift space distortions (RSD) from DESI DR1, $3\times2$pt measurements from DES Y3, and the Integrated Sachs-Wolfe effect from cross-correlating CMB temperature anisotropies with galaxy number counts or CMB lensing. We demonstrate the complementarity of different probes which leads to strong improvements on constraints on DE perturbations. For our most constraining dataset combination that supplements CMB+BAO+SNe probes with DESI DR1 RSD, DES Y3 $3\times2$pt and ISW cross-correlations between CMB temperature and galaxy counts, we find an improvement in the Figure of Merit (FoM) for the DE perturbation parameters $\{c_B, c_M\}$ by a factor of 2.69. We show the phenomenological implications of these constraints by mapping them to the present-day values of the phenomenological functions $\{μ(z), Σ(z)\}$, where we see an FoM improvement by a factor of 3.37. We find a significant interdependence between the posteriors of $\{w_0, w_a\}$ and $\{c_B, c_M\}$, caused by the theoretical prior imposed by the gradient stability condition within the EFTofDE framework. Finally, we compute the significance of deviation from $Λ$CM for the EFTofDE model when constrained with CMB+BAO+SNe datasets, finding it to be at 2.9$σ$. This significance is nontrivially similar to the significance for the $w_0w_a$CDM model for the same dataset combination which we find to be 3.1$σ$.

Research area

Bio-Engineering

See all
Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

GR2 Technical Report

Yufei Li, Zaiwei Zhang, Mingfu Liang et al.

Industrial recommendation systems serve billions of users through a multi-stage funnel -- retrieval, early-stage ranking, and re-ranking -- where the final re-ranking step disproportionately shapes user engagement and downstream performance, particularly for carousel and grid display formats. Despite growing enthusiasm for Large Language Models (LLMs) in recommendation, three gaps hinder industrial adoption: (1) most efforts target retrieval and ranking, leaving re-ranking -- the stage closest to the final user experience -- largely underexplored; (2) LLMs are typically deployed zero-shot or via supervised fine-tuning, underutilizing the reasoning capabilities unlocked by reinforcement learning (RL) on verifiable rewards; (3) deployed catalogs index billions of items with non-semantic identifiers that lie outside any base-LLM vocabulary. We present GR2 (Generative Reasoning Re-Ranker), an end-to-end framework that combines (i) mid-training on semantic IDs produced by a tokenizer with >=99% uniqueness, (ii) reasoning-trace distilled from a stronger teacher via targeted prompting and rejection sampling, and (iii) RL with verifiable rewards purpose-built for re-ranking. To make GR2 resource-viable, we further (iv) introduce a context compressor that amortizes training cost, On-Policy Distillation (OPD) as a scalable alternative to SFT -- which we find collapses at industrial scale -- and reasoning distillation for low-latency serving. GR2 delivers +18.7% R@1, +7.1% R@3, and +9.6% N@3 over legacy baselines on industrial-scale traffic. We further find that reward design is critical in re-ranking: LLMs often hack rewards by preserving the incoming order or exploiting position bias, motivating conditional verifiable rewards as essential industrial components.

Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

Introspective Coupling: Self-Explanation Training Tracks Behavioral Change Despite Fixed Supervision

Zifan Carl Guo, Laura Ruis, Jacob Andreas et al.

When does training language models (LMs) to generate explanations of their predictions yield faithful introspection, rather than superficial imitation? We study LMs trained to explain which features of their inputs influenced their behavior, using models' counterfactual behavior on modified inputs as supervision. Surprisingly, we find that LMs trained on fixed counterfactual explanations derived from earlier checkpoints of themselves, or even from behaviorally similar models in different families, frequently produce explanations more faithful to their own current behaviors than to those of their training targets. This "introspective" coupling between LM explanations and behaviors occurs when training explanations remain sufficiently correlated with current behaviors over the course of training, even as behaviors themselves shift. We also show that introspective coupling tracks behavior shifts: when explanation training is provided concurrently with other post-training objectives, explanations track those shifts without requiring updated supervision. This phenomenon appears in multiple tasks, including sycophancy and refusal, and is robust to label noise. Overall, our results show that even fixed datasets of counterfactual explanations can provide scalable and generalizable post-training signal for introspection.

Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

LUNA: Learning Universal 3D Human Animation Beyond Skinning

Peng Li, Rawal Khirodkar, Junxuan Li et al.

Creating photorealistic, animatable 3D human avatars from monocular images still largely depends on Linear Blend Skinning (LBS) and parametric body models, which constrain expressivity and often introduce artifacts due to imperfect fitting. We propose LUNA, an LBS-free universal neural animation model that directly maps multiple 2D controls like images, keypoints, sketches, and unseen characters into 3D Gaussian deformations, bypassing explicit body fitting. At its core, a transformer-based motion regressor disentangles global rigid motion from fine-grained local dynamics to capture both coherent movement and subtle non-rigid effects. To resolve the inherent ambiguity of 2D-to-3D lifting while scaling beyond fitted datasets, we introduce hybrid supervision that distills soft structural priors from an LBS teacher and a loss that supports training on both limited fitted data and large in-the-wild unlabeled videos. Extensive experiments show LUNA achieves competitive visual fidelity compared to LBS-based approaches, while delivering realistic human motion and zero-shot cross-identity generalization across diverse driving modalities. To the best of our knowledge, LUNA is the first end-to-end 3D animatable model that supports implicit 2D driving.

Bio-EngineeringarXiv2026-06-30Skeptical (25)
Click card for metadata

MECoBench: A Systematic Study of Multimodal Agent Collaboration in Embodied Environments

Qingyun Liu, Jiwen Zhang, Jingyi Hu et al.

Recent multimodal large language models (MLLMs) have strong potential as embodied agents, but their ability to collaborate in visually grounded environments remains underexplored. To address this gap, we introduce MECoBench, a multimodal embodied cooperation benchmark with an evaluation platform spanning diverse real-world tasks, two cooperation structures, and three collaboration modes. Through extensive experiments across various MLLMs, we summarize three key findings: (i) Collaboration generally improves embodied task completion, but its benefits depend on balancing collaborative gains against coordination complexity. (ii) Communication is essential to collaboration gains, while the best collaboration mode depends on team size and model capability. (iii) Moreover, collaboration improves robustness under noisy priors and exploration conditions. Generally, MECoBench provides a systematic testbed for understanding the mechanisms and limits of multimodal embodied collaboration. Code and dataset are available at https://github.com/q-i-n-g/MECoBench.