Signed-Permutation Coordinate Transport for RMSNorm Transformers

John Sweeney

Modern LLM workflows move coordinate-indexed objects across checkpoints: steering vectors, sparse autoencoders, top-$k$ neuron sets, attribution lists, and merge alignments. This is only well posed after fixing the model's residual-stream gauge, which we show is architecture-dependent: LayerNorm residual charts have permutation gauge $S_d$ (up to a global sign flip), while RMSNorm charts with generic per-channel gain have signed-permutation gauge $B_d = S_d \ltimes \{\pm 1\}^d$. Permutation-only alignment is therefore symmetry-incomplete for RMSNorm models. We introduce sign-marginalized Hungarian matching and prove a sharp failure mode: with decorrelated coordinates, raw signed-correlation matching has a structural permutation-accuracy ceiling at the positive-sign fraction of the true gauge, which sign-marginalization removes. We then make coordinate-preserving transport, not function-level merging, the primary object: composing saved-checkpoint local $B_d$ gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching, and the gain is not explained by merely routing through the base. The recovered gauge transfers tools that permutation-only alignment breaks: TinyLlama SAE reconstruction has NMSE 0.004 under $B_d$ versus 1.08 under $S_d$; Qwen sentiment steering preserves 95.8% of its effect versus 17.2%; refusal steering reverses sign under $S_d$; coordinate-preserving merges behave the same way. The same covariance governs stateful training: signed transport of AdamW state preserves the resumed trajectory, while permutation-only state follows a different one from a functionally identical checkpoint. Finally, gauge-sweep audits show index-level interpretability claims are reproducible only relative to an explicit gauge.

Open Source

Research Brief

Correctly aligning internal coordinate representations in RMSNorm Transformers requires accounting for signed permutations (a 'signed-permutation gauge'), which is critical for robust interpretability, steerability, and stable fine-tuning.

This paper addresses a fundamental problem in understanding and manipulating large language models (LLMs): how internal components like 'steering vectors' or 'neuron sets' can be reliably tracked and transferred across different versions or checkpoints of a model. The authors identify that the method for aligning these components (the 'gauge') depends on the model's architectural normalization. Specifically, while LayerNorm models have a simpler 'permutation gauge', RMSNorm models, which are common in modern LLMs, require a more complex 'signed-permutation gauge' that accounts for both reordering and sign flips of internal coordinates. They demonstrate that ignoring these sign flips leads to significant failure in transferring tools like sparse autoencoders, sentiment steering, and even AdamW optimizer states. They introduce a method called 'sign-marginalized Hungarian matching' to correctly handle this signed gauge, dramatically improving the preservation of coordinate-indexed functionalities across model checkpoints and fine-tuning trajectories, thus enabling more reliable interpretability and controllable AI.

Potential Applications

Reliable LLM Interpretability and Steerability: Enabling consistent transfer of steering vectors, attribution lists, and neuron sets, making LLMs more controllable and their internal workings more understandable across different model versions.
Robust Model Merging and Fine-tuning: Ensuring that merging different LLMs or continuing fine-tuning from a checkpoint preserves the intended internal structure and learned functionalities, rather than breaking them due to misaligned coordinates.
Stable Optimizer State Transfer: Preserving the training trajectory when resuming training from checkpoints by correctly transporting AdamW optimizer states, crucial for efficient and consistent model development.
Reproducible AI Research: Establishing a framework for index-level interpretability claims to be reproducible, making research findings more robust and verifiable across different experimental setups.

43/100

Paper Trustworthiness Index

Medium Skepticism

Skeptical / Unreviewed

This is a preprint publication or lacks formal peer review. It is part of the research pipeline but needs caution.

Verified AI Assessment: This credibility analysis was generated by Gemini 2.5 Flash analyzing the full paper text, references, and metadata.

Core Pillars Breakdown

Author & Institutional Track Record

5 / 25

The abstract does not provide any information about the authors, their affiliations, or funding, making it impossible to evaluate their track record from the provided text.

Technical Rigor & Methodology

28 / 30

The abstract demonstrates high technical rigor by identifying an architecture-dependent gauge problem, proposing a precise mathematical solution (signed-permutation gauge B_d), proving a sharp failure mode, and presenting strong quantitative results across multiple tasks (coordinate recovery, SAE reconstruction, steering, AdamW state) to support its claims.

Reproducibility & Openness

5 / 25

The abstract does not contain any information about the availability of code, datasets, or model weights, which are crucial for independent reproducibility of the reported results.

Community Vetting & Peer Review

5 / 20

The abstract does not specify whether the paper has undergone peer review, been accepted to a major conference, or published in a journal, making its current community vetting status unclear.

Detailed Evidence Assessment

Verified Evidence & Citations

RMSNorm models require a signed-permutation gauge (B_d).

“RMSNorm charts with generic per-channel gain have signed-permutation gauge B_d = S_d \ltimes {\pm 1}^d.”

Permutation-only alignment for RMSNorm models is incomplete.

“Permutation-only alignment is therefore symmetry-incomplete for RMSNorm models.”

Raw signed-correlation matching has a structural permutation-accuracy ceiling.

“raw signed-correlation matching has a structural permutation-accuracy ceiling at the positive-sign fraction of the true gauge, which sign-marginalization removes.”

Coordinate-preserving transport recovers significantly more cross-run coordinates.

“composing saved-checkpoint local B_d gauges along same-base fine-tuning trajectories recovers 91.1% of cross-run coordinates at 1500 steps versus 60.3% for endpoint matching”

TinyLlama SAE reconstruction is significantly better under B_d than S_d.

“TinyLlama SAE reconstruction has NMSE 0.004 under B_d versus 1.08 under S_d”

Qwen sentiment steering effect is preserved much better under B_d.

“Qwen sentiment steering preserves 95.8% of its effect versus 17.2%”

Refusal steering reverses sign under S_d.

“refusal steering reverses sign under S_d”

Signed transport of AdamW state preserves the resumed trajectory.

“signed transport of AdamW state preserves the resumed trajectory, while permutation-only state follows a different one from a functionally identical checkpoint.”

Uncertainties & Omissions

• Omission:Author affiliations and institutional backing are not provided.

• Omission:No links or mentions of public code repositories, datasets, or model weights.

• Omission:Lack of information regarding peer-review status, publication venue, or conference acceptance.

• Uncertainty:The precise conditions under which 'decorrelated coordinates' lead to the sharp failure mode are not fully detailed.

• Uncertainty:The generalizability of 'same-base fine-tuning trajectories' beyond the reported 1500 steps or specific LLMs might need further exploration.

• Uncertainty:The exact methodology and scope of 'gauge-sweep audits' for interpretability claims are not elaborated.