Add example 06: spectral analysis with chirp signal inference#42
Add example 06: spectral analysis with chirp signal inference#42
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #42 +/- ##
========================================
- Coverage 9.54% 9.54% -0.01%
========================================
Files 32 32
Lines 3854 3857 +3
========================================
Hits 368 368
- Misses 3486 3489 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
examples/01_minimal/config.yml
Outdated
| validation_samples: 256 # Size of validation split (for early stopping) | ||
| simulate_count: 128 # Number of new samples drawn per simulation step | ||
| simulate_when_full: true # Continue simulating after reaching max samples | ||
| simulate_interval: 10 # Simulate every N training epochs |
There was a problem hiding this comment.
I thought this is about seconds, not epochs. Check this in the code, and correct the comment if necessary.
| TRUE_CHIRP_MASS = 1.0 | ||
| TRUE_HARMONIC_DECAY = 1.5 | ||
|
|
||
| signal = emri_signal( |
There was a problem hiding this comment.
The entire thing should be framed around emirs, but just general spectral signals. Don't mention EMRI here or anywhere, also not in the name of the pull request. It is confusing for non-GW experts. Also CHIRP MASS should be replaced, HARMONIC_DECAY is fine.
c72362b to
4dfc737
Compare
4dfc737 to
7544d4e
Compare
Demonstrates falcon + fuge library integration for EMRI gravitational wave parameter inference using a Gaussian posterior estimator and a nested embedding pipeline (ToneTokenizer → ToneTokenEmbedding → TransformerEmbedding) configured declaratively via YAML. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without normalization, raw frequency bin indices (0-512) dominated the embedding features, preventing the transformer from learning. Now lazily calls compute_normalization() to produce zero-mean, unit-variance features before passing to the transformer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ray sets CUDA_VISIBLE_DEVICES="" for actors without GPU allocation, causing JAX to crash. Signal generation runs fine on CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use GPU when available (e.g. when Ray allocates one), fall back to CPU only when CUDA_VISIBLE_DEVICES is empty (Ray no-GPU workers). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Computes the full 3-parameter Fisher information matrix using JAX autodiff on the EMRI signal model, then overlays the Cramér-Rao Gaussian on a corner plot alongside falcon posterior samples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ToneTokenizer STFT requires ~16GB per 1000 signals. Without chunking, all posterior samples go through the embedding at once. chunk_size: 64 processes them in manageable batches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…imation script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…okens caching config Expand the inference graph to include deterministic ancestor nodes reachable via evidence references (BFS expansion). This enables graph structures like theta -> x -> tokens where tokens is a cacheable intermediate without its own estimator. Add fallback to _simulate for nodes without estimators during posterior/proposal sampling. Rename internal graph attributes for symmetry: parents_dict -> forward_deps, sorted_node_names -> forward_order, sorted_inference_node_names -> backward_order. Add backward_deps with merged dependency dict for the inference direction. Add config2.yml with separate tokens node for STFT caching, Tokenizer class in model.py for float64 signal processing, and force JAX to CPU mode unconditionally since EMRI signal generation only needs CPU. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Embedding pipeline and posterior no longer inherit float64 from numpy theta. Neural network layers (TransformerEmbedding, MLP) run in float32 for tensor-core speed; parameter-space operations (covariance, sampling) stay float64 for precision. TokenEmbed normalization replaced with explicit RunningNorm pipeline layer that supports EMA across SBI rounds. - Rename LazyOnlineNorm → RunningNorm with 3D reduce_dims and output_dtype - GaussianPosterior: override to() to protect float64 buffers, cast MLP output to float64 before de-whitening - base.py: remove dtype forcing, cast conditions to float32 for embeddings - TokenEmbed: remove one-shot z-score, use _embed() directly - Configs: insert RunningNorm(output_dtype=float32) between TokenEmbed and TransformerEmbedding Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7544d4e to
12fda24
Compare
During proposal re-simulation, observation-based values for deterministic intermediate nodes (e.g. tokens) were overwriting freshly re-simulated values, producing inconsistent training triples. Filter both condition_refs and the final merge step to only propagate latent node (estimator) values from the proposal, ensuring intermediates are correctly re-derived from the forward simulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shows noise-free power spectra evolving as training progresses, with true signal as reference. Bottom panels show 2D parameter scatter accumulating over time with CRB error ellipses. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of hardcoding N, t_c, A0, n_harmonics, noise_sigma, read them from the saved config.yml in the run directory. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace CPU Python loop with vectorized JAX vmap + JIT on GPU (9x faster) - Move noise generation to JAX (avoids slow np.random.randn at scale) - Remove hardcoded seq_len from config (now lazy in TransformerEmbedding) - Scale to N=1M bins with k=20240 STFT windows - Widen priors and tune training hyperparameters Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…N=100k Replace fuge.emri dependency with local chirp.py signal generator. Restructure graph to Signal+Noise+Data decomposition for both configs. Set uniform priors to ±10σ CRB at N=100k for all parameters. Fix GPU allocation (0.2 per node) to avoid deadlock on single-GPU machines. Add Fisher/CRB estimation script and prior sample visualization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Output buffers (_output_mean, _output_std, _residual_cov, _residual_eigvals, _residual_eigvecs) were initialized as float32 by default. For parameters like f0 ~ 2.75e-3, the float32 ULP (3.28e-10) is comparable to the CRB (4.89e-10), leaving only ~30 distinct representable values in the prior range — making learning impossible. Fix: always initialize output buffers as float64. Cast sample() output to conditions' dtype and log_prob() output to theta's dtype. Override to() to preserve float64 buffers when moving to device. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When training with zooming (simulate_when_full), the Gaussian estimator trains on proposal data, biasing the residual covariance toward the proposal distribution rather than the true posterior. Apply analytical gamma correction: gamma_correct = (1+gamma)/gamma, which inverts the tempering used for proposal generation. Also: - Fix completion message to show best val loss instead of last - Add N=256 configs (config3: no embedding, config4: SVD embedding) with normal priors (5σ CRB) for A/B testing - Add obs_256.npz and obs_1M.npz observation files - Update SVDEmbedding for StreamingPCA register_buffer fix (numel>0) - Add make_plots2.py (Fisher comparison) and summarize_posterior.py - Update make_spectrum_animation.py to use local chirp module Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r design doc - SpectralTokenEmbed: multi-scale sin/cos encoding for frequencies, log-scaled amplitude, sin/cos phase boundary encoding, explicit time coordinate. Replaces TokenEmbed for richer frequency discrimination. - config.yml: updated to use SpectralTokenEmbed with RunningNorm, normal priors, larger batch/network, proposal-based training (k=10000) - PhaseFormer.md: design document for dual-stream hierarchical transformer with structural coherent phase accumulation via complex multiplication gated by frequency-matched attention. Includes Debye-Waller damping concept for uncertainty-aware spectral modes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
StreamingPCA.forward() now returns zeros before initialization, removing the need for manual component checks in the caller. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Matches the double precision used by SVDEmbedding downstream. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds config_combined.yml for dual-stream embedding (SVD on raw signal + spectral tokens through transformer). Adds Concat helper module and detaches batch_mean in RunningNorm to prevent gradient flow through normalization statistics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
examples/06_spectral_analysisdemonstrating falcon + fuge integration for multi-harmonic chirp signal parameter inferencetheta → y (clean signal)+n (noise)→x (observed = y + n)→tokens (STFT)config.yml: tokenized pipeline withTransformerEmbedding(via fuge)config2.yml: scaffolded SVD embedding on raw signalsfalcon.estimators.Gaussianposterior with uniform priors set to ±10σ CRB at N=100kchirp.pysignal generator (JAX autodiff-compatible) replaces external dependency for waveform generationTest plan
cd examples/06_spectral_analysis && python data/generate_obs.pyfalcon launch --run-dir outputs/runto verify training convergesfalcon sample prior --run-dir outputs/runandpython make_prior_samples_plot.pypython make_fisher_estimate.py --obs data/obs.npzfor both configspython make_plots.py outputs/runto compare posterior with Fisher/CRB bounds🤖 Generated with Claude Code