Skip to content

Add example 06: spectral analysis with chirp signal inference#42

Open
cweniger wants to merge 29 commits intomainfrom
feat/06_example
Open

Add example 06: spectral analysis with chirp signal inference#42
cweniger wants to merge 29 commits intomainfrom
feat/06_example

Conversation

@cweniger
Copy link
Copy Markdown
Owner

@cweniger cweniger commented Feb 19, 2026

Summary

  • Adds examples/06_spectral_analysis demonstrating falcon + fuge integration for multi-harmonic chirp signal parameter inference
  • Signal model: chirping waveform parameterized by (f0, chirp_mass, harmonic_decay) with additive Gaussian noise
  • Graph: theta → y (clean signal) + n (noise)x (observed = y + n)tokens (STFT)
  • Two config variants:
    • config.yml: tokenized pipeline with TransformerEmbedding (via fuge)
    • config2.yml: scaffolded SVD embedding on raw signals
  • Both use falcon.estimators.Gaussian posterior with uniform priors set to ±10σ CRB at N=100k
  • Includes standalone Fisher/CRB estimation script, prior sample visualization, and posterior comparison plotting
  • Local chirp.py signal generator (JAX autodiff-compatible) replaces external dependency for waveform generation

Test plan

  • Run cd examples/06_spectral_analysis && python data/generate_obs.py
  • Run falcon launch --run-dir outputs/run to verify training converges
  • Run falcon sample prior --run-dir outputs/run and python make_prior_samples_plot.py
  • Run python make_fisher_estimate.py --obs data/obs.npz for both configs
  • Run python make_plots.py outputs/run to compare posterior with Fisher/CRB bounds

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 9.54%. Comparing base (a524e25) to head (73332ed).

Files with missing lines Patch % Lines
falcon/estimators/gaussian.py 0.00% 9 Missing ⚠️
falcon/core/deployed_graph.py 0.00% 4 Missing ⚠️
falcon/estimators/base.py 0.00% 3 Missing ⚠️
falcon/embeddings/norms.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##            main     #42      +/-   ##
========================================
- Coverage   9.54%   9.54%   -0.01%     
========================================
  Files         32      32              
  Lines       3854    3857       +3     
========================================
  Hits         368     368              
- Misses      3486    3489       +3     
Flag Coverage Δ
unit 9.54% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

validation_samples: 256 # Size of validation split (for early stopping)
simulate_count: 128 # Number of new samples drawn per simulation step
simulate_when_full: true # Continue simulating after reaching max samples
simulate_interval: 10 # Simulate every N training epochs
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this is about seconds, not epochs. Check this in the code, and correct the comment if necessary.

TRUE_CHIRP_MASS = 1.0
TRUE_HARMONIC_DECAY = 1.5

signal = emri_signal(
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire thing should be framed around emirs, but just general spectral signals. Don't mention EMRI here or anywhere, also not in the name of the pull request. It is confusing for non-GW experts. Also CHIRP MASS should be replaced, HARMONIC_DECAY is fine.

@cweniger cweniger changed the base branch from main to refactor/mixed-precision February 22, 2026 19:38
cweniger and others added 13 commits February 22, 2026 23:26
Demonstrates falcon + fuge library integration for EMRI gravitational
wave parameter inference using a Gaussian posterior estimator and a
nested embedding pipeline (ToneTokenizer → ToneTokenEmbedding →
TransformerEmbedding) configured declaratively via YAML.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without normalization, raw frequency bin indices (0-512) dominated
the embedding features, preventing the transformer from learning.
Now lazily calls compute_normalization() to produce zero-mean,
unit-variance features before passing to the transformer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ray sets CUDA_VISIBLE_DEVICES="" for actors without GPU allocation,
causing JAX to crash. Signal generation runs fine on CPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use GPU when available (e.g. when Ray allocates one), fall back to
CPU only when CUDA_VISIBLE_DEVICES is empty (Ray no-GPU workers).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Computes the full 3-parameter Fisher information matrix using JAX
autodiff on the EMRI signal model, then overlays the Cramér-Rao
Gaussian on a corner plot alongside falcon posterior samples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ToneTokenizer STFT requires ~16GB per 1000 signals. Without
chunking, all posterior samples go through the embedding at once.
chunk_size: 64 processes them in manageable batches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…imation script

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…okens caching config

Expand the inference graph to include deterministic ancestor nodes reachable
via evidence references (BFS expansion). This enables graph structures like
theta -> x -> tokens where tokens is a cacheable intermediate without its own
estimator. Add fallback to _simulate for nodes without estimators during
posterior/proposal sampling.

Rename internal graph attributes for symmetry: parents_dict -> forward_deps,
sorted_node_names -> forward_order, sorted_inference_node_names -> backward_order.
Add backward_deps with merged dependency dict for the inference direction.

Add config2.yml with separate tokens node for STFT caching, Tokenizer class
in model.py for float64 signal processing, and force JAX to CPU mode
unconditionally since EMRI signal generation only needs CPU.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Embedding pipeline and posterior no longer inherit float64 from numpy
theta. Neural network layers (TransformerEmbedding, MLP) run in float32
for tensor-core speed; parameter-space operations (covariance, sampling)
stay float64 for precision. TokenEmbed normalization replaced with
explicit RunningNorm pipeline layer that supports EMA across SBI rounds.

- Rename LazyOnlineNorm → RunningNorm with 3D reduce_dims and output_dtype
- GaussianPosterior: override to() to protect float64 buffers, cast MLP
  output to float64 before de-whitening
- base.py: remove dtype forcing, cast conditions to float32 for embeddings
- TokenEmbed: remove one-shot z-score, use _embed() directly
- Configs: insert RunningNorm(output_dtype=float32) between TokenEmbed
  and TransformerEmbedding

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cweniger cweniger changed the base branch from refactor/mixed-precision to main February 22, 2026 22:26
Copy link
Copy Markdown
Owner Author

@cweniger cweniger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good!

cweniger and others added 7 commits February 22, 2026 23:33
During proposal re-simulation, observation-based values for deterministic
intermediate nodes (e.g. tokens) were overwriting freshly re-simulated
values, producing inconsistent training triples. Filter both condition_refs
and the final merge step to only propagate latent node (estimator) values
from the proposal, ensuring intermediates are correctly re-derived from
the forward simulation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shows noise-free power spectra evolving as training progresses, with
true signal as reference. Bottom panels show 2D parameter scatter
accumulating over time with CRB error ellipses.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of hardcoding N, t_c, A0, n_harmonics, noise_sigma, read them
from the saved config.yml in the run directory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace CPU Python loop with vectorized JAX vmap + JIT on GPU (9x faster)
- Move noise generation to JAX (avoids slow np.random.randn at scale)
- Remove hardcoded seq_len from config (now lazy in TransformerEmbedding)
- Scale to N=1M bins with k=20240 STFT windows
- Widen priors and tune training hyperparameters

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…N=100k

Replace fuge.emri dependency with local chirp.py signal generator.
Restructure graph to Signal+Noise+Data decomposition for both configs.
Set uniform priors to ±10σ CRB at N=100k for all parameters.
Fix GPU allocation (0.2 per node) to avoid deadlock on single-GPU machines.
Add Fisher/CRB estimation script and prior sample visualization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@cweniger cweniger changed the title Add example 06: spectral analysis with fuge EMRI embeddings Add example 06: spectral analysis with chirp signal inference Mar 3, 2026
cweniger and others added 9 commits March 4, 2026 20:08
Output buffers (_output_mean, _output_std, _residual_cov, _residual_eigvals,
_residual_eigvecs) were initialized as float32 by default. For parameters
like f0 ~ 2.75e-3, the float32 ULP (3.28e-10) is comparable to the CRB
(4.89e-10), leaving only ~30 distinct representable values in the prior
range — making learning impossible.

Fix: always initialize output buffers as float64. Cast sample() output to
conditions' dtype and log_prob() output to theta's dtype. Override to()
to preserve float64 buffers when moving to device.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When training with zooming (simulate_when_full), the Gaussian estimator
trains on proposal data, biasing the residual covariance toward the
proposal distribution rather than the true posterior. Apply analytical
gamma correction: gamma_correct = (1+gamma)/gamma, which inverts the
tempering used for proposal generation.

Also:
- Fix completion message to show best val loss instead of last
- Add N=256 configs (config3: no embedding, config4: SVD embedding)
  with normal priors (5σ CRB) for A/B testing
- Add obs_256.npz and obs_1M.npz observation files
- Update SVDEmbedding for StreamingPCA register_buffer fix (numel>0)
- Add make_plots2.py (Fisher comparison) and summarize_posterior.py
- Update make_spectrum_animation.py to use local chirp module

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r design doc

- SpectralTokenEmbed: multi-scale sin/cos encoding for frequencies,
  log-scaled amplitude, sin/cos phase boundary encoding, explicit time
  coordinate. Replaces TokenEmbed for richer frequency discrimination.
- config.yml: updated to use SpectralTokenEmbed with RunningNorm,
  normal priors, larger batch/network, proposal-based training (k=10000)
- PhaseFormer.md: design document for dual-stream hierarchical transformer
  with structural coherent phase accumulation via complex multiplication
  gated by frequency-matched attention. Includes Debye-Waller damping
  concept for uncertainty-aware spectral modes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StreamingPCA.forward() now returns zeros before initialization,
removing the need for manual component checks in the caller.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Matches the double precision used by SVDEmbedding downstream.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds config_combined.yml for dual-stream embedding (SVD on raw signal
+ spectral tokens through transformer). Adds Concat helper module and
detaches batch_mean in RunningNorm to prevent gradient flow through
normalization statistics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant