⚡️ Speed up function `compress_summary` by 26% by codeflash-ai[bot] · Pull Request #19 · HeshamHM28/keras

codeflash-ai · 2025-05-21T05:17:34Z

📄 26% (0.26x) speedup for `compress_summary` in `keras/src/layers/preprocessing/discretization.py`

⏱️ Runtime : 715 microseconds → 567 microseconds (best of 463 runs)

📝 Explanation and details

Here's a highly optimized version of your program. Major improvements.

Eliminated unnecessary re-assignment of variables.
Avoided repeated calculation of cumulative weights and interpolation.
Used in-place operations and preallocated arrays where possible.
Reduced overhead of creating new NumPy arrays.
Compressed logic for memory and speed without changing function semantics.
Used contiguous 2D arrays (shape (n, 2)) instead of stacks for efficiency.

Code:

Key Changes:

Reduced redundant recomputation of cum_weights and interpolation.
np.empty_like and np.empty for best-possible performance.
In-place assignment for the first weight difference element.
Returned array uses dtype=np.float32 directly.
Preserved all comments as requested unless code changed.

This version is significantly faster for large summaries, using less memory and more NumPy in-place operations.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
🌀 Generated Regression Tests	✅ 37 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Generated Regression Tests Details

import numpy as np
# imports
import pytest  # used for our unit tests
from keras.src.layers.preprocessing.discretization import compress_summary

# unit tests

# ----------------
# Basic Test Cases
# ----------------

def test_compress_summary_basic_two_bins():
    # Test with two bins, epsilon large enough to compress to 1 bin
    summary = np.array([[1.0, 3.0], [2.0, 2.0]])
    epsilon = 1.0
    # shape[1] * epsilon = 2*1 = 2 > 1, so compression occurs
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_basic_no_compression():
    # Test where epsilon is too small, so no compression occurs
    summary = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]])
    epsilon = 0.2  # 3*0.2 = 0.6 < 1, so no compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_basic_three_bins():
    # Test with three bins and moderate epsilon
    summary = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]])
    epsilon = 0.5  # 3*0.5 = 1.5 > 1, so compression occurs
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_basic_uniform_weights():
    # Test with uniform weights and bins
    summary = np.array([[0.0, 1.0, 2.0, 3.0], [1.0, 1.0, 1.0, 1.0]])
    epsilon = 0.25  # 4*0.25 = 1, so compression occurs
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

# ----------------
# Edge Test Cases
# ----------------

def test_compress_summary_empty():
    # Empty summary array
    summary = np.empty((2, 0))
    epsilon = 0.5
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_single_bin():
    # Only one bin in summary
    summary = np.array([[5.0], [10.0]])
    epsilon = 0.5
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_epsilon_zero():
    # Epsilon is zero, should not compress
    summary = np.array([[1.0, 2.0], [1.0, 1.0]])
    epsilon = 0.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_epsilon_just_under_threshold():
    # Epsilon just under the threshold for compression
    summary = np.array([[1.0, 2.0, 3.0], [1.0, 1.0, 1.0]])
    epsilon = 0.33333333  # 3*0.333... = 0.99999999 < 1, so no compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_epsilon_just_over_threshold():
    # Epsilon just over the threshold for compression
    summary = np.array([[1.0, 2.0, 3.0], [1.0, 1.0, 1.0]])
    epsilon = 0.334  # 3*0.334 = 1.002 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_non_uniform_weights():
    # Non-uniform weights
    summary = np.array([[1.0, 2.0, 4.0, 8.0], [1.0, 2.0, 3.0, 4.0]])
    epsilon = 0.5  # 4*0.5 = 2 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_extreme_values():
    # Test with very large and very small values
    summary = np.array([[1e-10, 1e10], [1.0, 1.0]])
    epsilon = 1.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_negative_weights():
    # Negative weights should be handled (though not meaningful, but test for robustness)
    summary = np.array([[1.0, 2.0], [1.0, -1.0]])
    epsilon = 1.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_non_integer_weights():
    # Non-integer weights
    summary = np.array([[0.0, 1.0, 2.0], [0.5, 1.5, 2.0]])
    epsilon = 0.5
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

# -----------------------
# Large Scale Test Cases
# -----------------------

def test_compress_summary_large_uniform():
    # Large summary with uniform bins and weights
    n_bins = 1000
    summary = np.vstack((np.linspace(0, 100, n_bins), np.ones(n_bins)))
    epsilon = 0.01  # 1000*0.01=10 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # Should compress to int(1/epsilon) = 100 bins
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_nonuniform():
    # Large summary with non-uniform weights
    n_bins = 500
    summary = np.vstack((np.linspace(0, 50, n_bins), np.linspace(1, 5, n_bins)))
    epsilon = 0.02  # 500*0.02=10 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_sparse():
    # Large summary with many zero weights
    n_bins = 1000
    bins = np.linspace(0, 100, n_bins)
    weights = np.zeros(n_bins)
    weights[::100] = 1.0  # Only every 100th bin has weight
    summary = np.vstack((bins, weights))
    epsilon = 0.05  # 1000*0.05=50 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_skewed():
    # Large summary with skewed weights (exponential)
    n_bins = 800
    bins = np.linspace(0, 10, n_bins)
    weights = np.exp(np.linspace(0, 5, n_bins))
    summary = np.vstack((bins, weights))
    epsilon = 0.01  # 800*0.01=8 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_all_same_value():
    # Large summary where all bin values are identical
    n_bins = 500
    summary = np.vstack((np.full(n_bins, 7.0), np.ones(n_bins)))
    epsilon = 0.02
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np
# imports
import pytest  # used for our unit tests
from keras.src.layers.preprocessing.discretization import compress_summary

# unit tests

# -------------------
# BASIC TEST CASES
# -------------------

def test_single_bin_no_compression():
    # Single bin: should return as is, regardless of epsilon
    summary = np.array([[5.0], [10.0]])
    codeflash_output = compress_summary(summary, 0.5); result = codeflash_output
    codeflash_output = compress_summary(summary, 0.01); result = codeflash_output

def test_two_bins_no_compression():
    # Two bins, epsilon too small to trigger compression
    summary = np.array([[1.0, 2.0], [3.0, 7.0]])
    epsilon = 0.4  # 2 * 0.4 = 0.8 < 1, so should not compress
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_two_bins_with_compression():
    # Two bins, epsilon triggers compression
    summary = np.array([[1.0, 3.0], [2.0, 8.0]])
    epsilon = 0.6  # 2 * 0.6 = 1.2 > 1, so should compress
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # With epsilon=0.6, percents = [0.6, 1.2) = [0.6]
    # cum_weights: [2, 10], cum_weight_percents: [0.2, 1.0]
    # new_bins: interpolate at 0.6 between 0.2 and 1.0
    expected_bin = 1.0 + (3.0-1.0)*(0.6-0.2)/(1.0-0.2)
    expected_cumweight = 2.0 + (10.0-2.0)*(0.6-0.2)/(1.0-0.2)
    expected_weight = expected_cumweight  # since only one bin, prev is 0
    expected = np.array([[expected_bin], [expected_weight]], dtype="float32")

def test_three_bins_with_compression():
    # Three bins, epsilon triggers compression
    summary = np.array([[1.0, 2.0, 3.0], [1.0, 1.0, 2.0]])
    epsilon = 0.5  # 3*0.5=1.5>1, so compress
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # percents: [0.5, 1.0)
    # cum_weights: [1,2,4], cum_weight_percents: [0.25,0.5,1.0]
    # new_bins: interpolate at 0.5 and 1.0 (but 1.0 is not included)
    expected_bins = np.interp([0.5], [0.25,0.5,1.0], [1.0,2.0,3.0])
    expected_cumweights = np.interp([0.5], [0.25,0.5,1.0], [1.0,2.0,4.0])
    expected_weights = expected_cumweights  # only one bin, prev is 0
    expected = np.array([expected_bins, expected_weights], dtype="float32")

def test_dtype_is_float32():
    # Output dtype should always be float32
    summary = np.array([[1.0, 2.0], [3.0, 4.0]], dtype="float64")
    codeflash_output = compress_summary(summary, 1.0); result = codeflash_output

# -------------------
# EDGE TEST CASES
# -------------------

def test_empty_summary():
    # Empty summary should be returned as is (shape [2,0])
    summary = np.empty((2,0))
    codeflash_output = compress_summary(summary, 0.5); result = codeflash_output

def test_epsilon_zero():
    # Epsilon zero: should return as is (avoid division by zero)
    summary = np.array([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = compress_summary(summary, 0.0); result = codeflash_output

def test_epsilon_very_small():
    # Epsilon very small: should return as is
    summary = np.array([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = compress_summary(summary, 1e-10); result = codeflash_output


def test_negative_weights():
    # Negative weights: cum_weights may decrease, but function should still run
    summary = np.array([[1.0, 2.0, 3.0], [2.0, -1.0, 3.0]])
    epsilon = 0.6
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_non_monotonic_bins():
    # Non-monotonic bins: interpolation should still work
    summary = np.array([[2.0, 1.0, 3.0], [1.0, 2.0, 3.0]])
    epsilon = 0.6
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_epsilon():
    # Epsilon large enough to compress to a single bin
    summary = np.array([[1.0, 2.0, 3.0, 4.0], [1.0, 1.0, 1.0, 1.0]])
    epsilon = 1.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # percents: [1.0] (but np.arange(0,1,1.0) = [0.0], so percents=[1.0])
    # But since percents starts at epsilon, if epsilon >= 1, percents=[1.0]
    # But 4*1.0 = 4 > 1, so compress, but percents = [1.0]
    # Interpolate at 1.0: should get last bin and total weight
    expected_bin = 4.0
    expected_cumweight = 4.0
    expected = np.array([[expected_bin], [expected_cumweight]], dtype="float32")

def test_summary_with_one_row():
    # Summary with only one row (invalid input)
    summary = np.array([[1.0, 2.0, 3.0]])
    with pytest.raises(IndexError):
        codeflash_output = compress_summary(summary, 0.5); _ = codeflash_output

def test_summary_with_more_than_two_rows():
    # Summary with more than two rows (invalid input)
    summary = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    # Should use only the first two rows
    codeflash_output = compress_summary(summary[:2], 0.5); result = codeflash_output

# -------------------
# LARGE SCALE TEST CASES
# -------------------

def test_large_summary_compression():
    # Large summary, test scalability and output shape
    np.random.seed(42)
    bins = np.sort(np.random.rand(2, 1000), axis=1)
    bins[1] = np.random.randint(1, 10, size=1000)  # random weights
    summary = np.vstack((np.sort(np.random.rand(1000)), np.random.randint(1,10,1000)))
    epsilon = 0.01
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_no_compression():
    # Large summary, epsilon too small to trigger compression
    summary = np.vstack((np.linspace(0,1,1000), np.ones(1000)))
    epsilon = 1e-4  # 1000*1e-4 = 0.1 < 1
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_all_same_bin():
    # Large summary, all bins have same value
    summary = np.vstack((np.ones(1000), np.ones(1000)))
    epsilon = 0.05
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_increasing_weights():
    # Large summary, weights increasing
    summary = np.vstack((np.linspace(0,10,1000), np.arange(1,1001)))
    epsilon = 0.02
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_decreasing_weights():
    # Large summary, weights decreasing
    summary = np.vstack((np.linspace(0,10,1000), np.arange(1000,0,-1)))
    epsilon = 0.02
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-compress_summary-maxhreks and push.

Here's a highly optimized version of your program. Major improvements. - Eliminated unnecessary re-assignment of variables. - Avoided repeated calculation of cumulative weights and interpolation. - Used in-place operations and preallocated arrays where possible. - Reduced overhead of creating new NumPy arrays. - Compressed logic for memory and speed without changing function semantics. - Used contiguous 2D arrays (shape `(n, 2)`) instead of stacks for efficiency. **Code:** **Key Changes:** - Reduced redundant recomputation of `cum_weights` and interpolation. - `np.empty_like` and `np.empty` for best-possible performance. - In-place assignment for the first weight difference element. - Returned array uses `dtype=np.float32` directly. - Preserved all comments as requested unless code changed. This version is significantly faster for large summaries, using less memory and more NumPy in-place operations.

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 21, 2025

codeflash-ai bot requested a review from HeshamHM28 May 21, 2025 05:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `compress_summary` by 26%#19

⚡️ Speed up function `compress_summary` by 26%#19
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-compress_summary-maxhreks

codeflash-ai bot commented May 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

codeflash-ai bot commented May 21, 2025

📄 26% (0.26x) speedup for compress_summary in keras/src/layers/preprocessing/discretization.py

📝 Explanation and details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

📄 26% (0.26x) speedup for `compress_summary` in `keras/src/layers/preprocessing/discretization.py`