Add example 06: spectral analysis with chirp signal inference by cweniger · Pull Request #42 · cweniger/falcon

cweniger · 2026-02-19T10:12:00Z

Summary

Adds examples/06_spectral_analysis demonstrating falcon + fuge integration for multi-harmonic chirp signal parameter inference
Signal model: chirping waveform parameterized by (f0, chirp_mass, harmonic_decay) with additive Gaussian noise
Graph: theta → y (clean signal) + n (noise) → x (observed = y + n) → tokens (STFT)
Two config variants:
- config.yml: tokenized pipeline with TransformerEmbedding (via fuge)
- config2.yml: scaffolded SVD embedding on raw signals
Both use falcon.estimators.Gaussian posterior with uniform priors set to ±10σ CRB at N=100k
Includes standalone Fisher/CRB estimation script, prior sample visualization, and posterior comparison plotting
Local chirp.py signal generator (JAX autodiff-compatible) replaces external dependency for waveform generation

Test plan

Run cd examples/06_spectral_analysis && python data/generate_obs.py
Run falcon launch --run-dir outputs/run to verify training converges
Run falcon sample prior --run-dir outputs/run and python make_prior_samples_plot.py
Run python make_fisher_estimate.py --obs data/obs.npz for both configs
Run python make_plots.py outputs/run to compare posterior with Fisher/CRB bounds

🤖 Generated with Claude Code

codecov · 2026-02-19T10:15:03Z

Codecov Report

❌ Patch coverage is 0% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 9.54%. Comparing base (a524e25) to head (73332ed).

Files with missing lines	Patch %	Lines
falcon/estimators/gaussian.py	0.00%	9 Missing ⚠️
falcon/core/deployed_graph.py	0.00%	4 Missing ⚠️
falcon/estimators/base.py	0.00%	3 Missing ⚠️
falcon/embeddings/norms.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##            main     #42      +/-   ##
========================================
- Coverage   9.54%   9.54%   -0.01%     
========================================
  Files         32      32              
  Lines       3854    3857       +3     
========================================
  Hits         368     368              
- Misses      3486    3489       +3

Flag	Coverage Δ
unit	`9.54% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cweniger · 2026-02-22T09:54:33Z

examples/01_minimal/config.yml

+  validation_samples: 256               # Size of validation split (for early stopping)
+  simulate_count: 128                   # Number of new samples drawn per simulation step
+  simulate_when_full: true              # Continue simulating after reaching max samples
+  simulate_interval: 10                 # Simulate every N training epochs


I thought this is about seconds, not epochs. Check this in the code, and correct the comment if necessary.

cweniger · 2026-02-22T10:48:41Z

examples/06_spectral_analysis/data/generate_obs.py

+TRUE_CHIRP_MASS = 1.0
+TRUE_HARMONIC_DECAY = 1.5
+
+signal = emri_signal(


The entire thing should be framed around emirs, but just general spectral signals. Don't mention EMRI here or anywhere, also not in the name of the pull request. It is confusing for non-GW experts. Also CHIRP MASS should be replaced, HARMONIC_DECAY is fine.

Demonstrates falcon + fuge library integration for EMRI gravitational wave parameter inference using a Gaussian posterior estimator and a nested embedding pipeline (ToneTokenizer → ToneTokenEmbedding → TransformerEmbedding) configured declaratively via YAML. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Without normalization, raw frequency bin indices (0-512) dominated the embedding features, preventing the transformer from learning. Now lazily calls compute_normalization() to produce zero-mean, unit-variance features before passing to the transformer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ray sets CUDA_VISIBLE_DEVICES="" for actors without GPU allocation, causing JAX to crash. Signal generation runs fine on CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use GPU when available (e.g. when Ray allocates one), fall back to CPU only when CUDA_VISIBLE_DEVICES is empty (Ray no-GPU workers). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Computes the full 3-parameter Fisher information matrix using JAX autodiff on the EMRI signal model, then overlays the Cramér-Rao Gaussian on a corner plot alongside falcon posterior samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The ToneTokenizer STFT requires ~16GB per 1000 signals. Without chunking, all posterior samples go through the embedding at once. chunk_size: 64 processes them in manageable batches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…imation script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…okens caching config Expand the inference graph to include deterministic ancestor nodes reachable via evidence references (BFS expansion). This enables graph structures like theta -> x -> tokens where tokens is a cacheable intermediate without its own estimator. Add fallback to _simulate for nodes without estimators during posterior/proposal sampling. Rename internal graph attributes for symmetry: parents_dict -> forward_deps, sorted_node_names -> forward_order, sorted_inference_node_names -> backward_order. Add backward_deps with merged dependency dict for the inference direction. Add config2.yml with separate tokens node for STFT caching, Tokenizer class in model.py for float64 signal processing, and force JAX to CPU mode unconditionally since EMRI signal generation only needs CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Embedding pipeline and posterior no longer inherit float64 from numpy theta. Neural network layers (TransformerEmbedding, MLP) run in float32 for tensor-core speed; parameter-space operations (covariance, sampling) stay float64 for precision. TokenEmbed normalization replaced with explicit RunningNorm pipeline layer that supports EMA across SBI rounds. - Rename LazyOnlineNorm → RunningNorm with 3D reduce_dims and output_dtype - GaussianPosterior: override to() to protect float64 buffers, cast MLP output to float64 before de-whitening - base.py: remove dtype forcing, cast conditions to float32 for embeddings - TokenEmbed: remove one-shot z-score, use _embed() directly - Configs: insert RunningNorm(output_dtype=float32) between TokenEmbed and TransformerEmbedding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cweniger

All good!

During proposal re-simulation, observation-based values for deterministic intermediate nodes (e.g. tokens) were overwriting freshly re-simulated values, producing inconsistent training triples. Filter both condition_refs and the final merge step to only propagate latent node (estimator) values from the proposal, ensuring intermediates are correctly re-derived from the forward simulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Shows noise-free power spectra evolving as training progresses, with true signal as reference. Bottom panels show 2D parameter scatter accumulating over time with CRB error ellipses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Instead of hardcoding N, t_c, A0, n_harmonics, noise_sigma, read them from the saved config.yml in the run directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Replace CPU Python loop with vectorized JAX vmap + JIT on GPU (9x faster) - Move noise generation to JAX (avoids slow np.random.randn at scale) - Remove hardcoded seq_len from config (now lazy in TransformerEmbedding) - Scale to N=1M bins with k=20240 STFT windows - Widen priors and tune training hyperparameters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…N=100k Replace fuge.emri dependency with local chirp.py signal generator. Restructure graph to Signal+Noise+Data decomposition for both configs. Set uniform priors to ±10σ CRB at N=100k for all parameters. Fix GPU allocation (0.2 per node) to avoid deadlock on single-GPU machines. Add Fisher/CRB estimation script and prior sample visualization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Output buffers (_output_mean, _output_std, _residual_cov, _residual_eigvals, _residual_eigvecs) were initialized as float32 by default. For parameters like f0 ~ 2.75e-3, the float32 ULP (3.28e-10) is comparable to the CRB (4.89e-10), leaving only ~30 distinct representable values in the prior range — making learning impossible. Fix: always initialize output buffers as float64. Cast sample() output to conditions' dtype and log_prob() output to theta's dtype. Override to() to preserve float64 buffers when moving to device. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When training with zooming (simulate_when_full), the Gaussian estimator trains on proposal data, biasing the residual covariance toward the proposal distribution rather than the true posterior. Apply analytical gamma correction: gamma_correct = (1+gamma)/gamma, which inverts the tempering used for proposal generation. Also: - Fix completion message to show best val loss instead of last - Add N=256 configs (config3: no embedding, config4: SVD embedding) with normal priors (5σ CRB) for A/B testing - Add obs_256.npz and obs_1M.npz observation files - Update SVDEmbedding for StreamingPCA register_buffer fix (numel>0) - Add make_plots2.py (Fisher comparison) and summarize_posterior.py - Update make_spectrum_animation.py to use local chirp module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…r design doc - SpectralTokenEmbed: multi-scale sin/cos encoding for frequencies, log-scaled amplitude, sin/cos phase boundary encoding, explicit time coordinate. Replaces TokenEmbed for richer frequency discrimination. - config.yml: updated to use SpectralTokenEmbed with RunningNorm, normal priors, larger batch/network, proposal-based training (k=10000) - PhaseFormer.md: design document for dual-stream hierarchical transformer with structural coherent phase accumulation via complex multiplication gated by frequency-matched attention. Includes Debye-Waller damping concept for uncertainty-aware spectral modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

StreamingPCA.forward() now returns zeros before initialization, removing the need for manual component checks in the caller. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Matches the double precision used by SVDEmbedding downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Adds config_combined.yml for dual-stream embedding (SVD on raw signal + spectral tokens through transformer). Adds Concat helper module and detaches batch_mean in RunningNorm to prevent gradient flow through normalization statistics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cweniger commented Feb 22, 2026

View reviewed changes

cweniger force-pushed the feat/06_example branch from c72362b to 4dfc737 Compare February 22, 2026 19:38

cweniger changed the base branch from main to refactor/mixed-precision February 22, 2026 19:38

cweniger force-pushed the feat/06_example branch from 4dfc737 to 7544d4e Compare February 22, 2026 22:06

cweniger and others added 13 commits February 22, 2026 23:26

Widen prior ranges and increase buffer/batch sizes for spectral example

23a8033

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Force JAX CPU backend for EMRI simulator in Ray workers

f48d1a4

Ray sets CUDA_VISIBLE_DEVICES="" for actors without GPU allocation, causing JAX to crash. Signal generation runs fine on CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Only fall back to JAX CPU when no GPU is visible

d82d39a

Use GPU when available (e.g. when Ray allocates one), fall back to CPU only when CUDA_VISIBLE_DEVICES is empty (Ray no-GPU workers). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Default make_plots.py to outputs/latest

793514f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tune spectral analysis buffer and sampling parameters

9a16bd4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Detach STFT output to avoid wasteful backward pass and add spectra an…

68f64d8

…imation script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add dim parameter to RunningNorm in example 06 configs

12fda24

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cweniger force-pushed the feat/06_example branch from 7544d4e to 12fda24 Compare February 22, 2026 22:26

cweniger changed the base branch from refactor/mixed-precision to main February 22, 2026 22:26

cweniger commented Feb 22, 2026

View reviewed changes

cweniger and others added 7 commits February 22, 2026 23:33

Trigger CI

01ade92

update config.yml

0120c1e

Add spectrum animation script for example 06

09d565c

Shows noise-free power spectra evolving as training progresses, with true signal as reference. Bottom panels show 2D parameter scatter accumulating over time with CRB error ellipses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Read signal parameters from config in plotting scripts

2e976c8

Instead of hardcoding N, t_c, A0, n_harmonics, noise_sigma, read them from the saved config.yml in the run directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cweniger changed the title ~~Add example 06: spectral analysis with fuge EMRI embeddings~~ Add example 06: spectral analysis with chirp signal inference Mar 3, 2026

cweniger and others added 9 commits March 4, 2026 20:08

Updates config2.yml

6b17c3e

Simplify SVDEmbedding forward using fuge StreamingPCA zero-fallback

9dedaa7

StreamingPCA.forward() now returns zeros before initialization, removing the need for manual component checks in the caller. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use float64 for Signal and Noise simulator outputs

679399f

Matches the double precision used by SVDEmbedding downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add FIXME for missing condition refs in _execute_graph output

b71326f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge branch 'main' into feat/06_example

73332ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example 06: spectral analysis with chirp signal inference#42

Add example 06: spectral analysis with chirp signal inference#42
cweniger wants to merge 29 commits intomainfrom
feat/06_example

cweniger commented Feb 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

cweniger Feb 22, 2026

Uh oh!

cweniger Feb 22, 2026

Uh oh!

cweniger left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cweniger commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

codecov bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cweniger Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

cweniger Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

cweniger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cweniger commented Feb 19, 2026 •

edited

Loading

codecov bot commented Feb 19, 2026 •

edited

Loading