Skip to content

⚡️ Speed up function compress_summary by 26%#19

Open
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-compress_summary-maxhreks
Open

⚡️ Speed up function compress_summary by 26%#19
codeflash-ai[bot] wants to merge 1 commit intomasterfrom
codeflash/optimize-compress_summary-maxhreks

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented May 21, 2025

📄 26% (0.26x) speedup for compress_summary in keras/src/layers/preprocessing/discretization.py

⏱️ Runtime : 715 microseconds 567 microseconds (best of 463 runs)

📝 Explanation and details

Here's a highly optimized version of your program. Major improvements.

  • Eliminated unnecessary re-assignment of variables.
  • Avoided repeated calculation of cumulative weights and interpolation.
  • Used in-place operations and preallocated arrays where possible.
  • Reduced overhead of creating new NumPy arrays.
  • Compressed logic for memory and speed without changing function semantics.
  • Used contiguous 2D arrays (shape (n, 2)) instead of stacks for efficiency.

Code:

Key Changes:

  • Reduced redundant recomputation of cum_weights and interpolation.
  • np.empty_like and np.empty for best-possible performance.
  • In-place assignment for the first weight difference element.
  • Returned array uses dtype=np.float32 directly.
  • Preserved all comments as requested unless code changed.

This version is significantly faster for large summaries, using less memory and more NumPy in-place operations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 37 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests Details
import numpy as np
# imports
import pytest  # used for our unit tests
from keras.src.layers.preprocessing.discretization import compress_summary

# unit tests

# ----------------
# Basic Test Cases
# ----------------

def test_compress_summary_basic_two_bins():
    # Test with two bins, epsilon large enough to compress to 1 bin
    summary = np.array([[1.0, 3.0], [2.0, 2.0]])
    epsilon = 1.0
    # shape[1] * epsilon = 2*1 = 2 > 1, so compression occurs
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_basic_no_compression():
    # Test where epsilon is too small, so no compression occurs
    summary = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]])
    epsilon = 0.2  # 3*0.2 = 0.6 < 1, so no compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_basic_three_bins():
    # Test with three bins and moderate epsilon
    summary = np.array([[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]])
    epsilon = 0.5  # 3*0.5 = 1.5 > 1, so compression occurs
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_basic_uniform_weights():
    # Test with uniform weights and bins
    summary = np.array([[0.0, 1.0, 2.0, 3.0], [1.0, 1.0, 1.0, 1.0]])
    epsilon = 0.25  # 4*0.25 = 1, so compression occurs
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

# ----------------
# Edge Test Cases
# ----------------

def test_compress_summary_empty():
    # Empty summary array
    summary = np.empty((2, 0))
    epsilon = 0.5
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_single_bin():
    # Only one bin in summary
    summary = np.array([[5.0], [10.0]])
    epsilon = 0.5
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_epsilon_zero():
    # Epsilon is zero, should not compress
    summary = np.array([[1.0, 2.0], [1.0, 1.0]])
    epsilon = 0.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_epsilon_just_under_threshold():
    # Epsilon just under the threshold for compression
    summary = np.array([[1.0, 2.0, 3.0], [1.0, 1.0, 1.0]])
    epsilon = 0.33333333  # 3*0.333... = 0.99999999 < 1, so no compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_epsilon_just_over_threshold():
    # Epsilon just over the threshold for compression
    summary = np.array([[1.0, 2.0, 3.0], [1.0, 1.0, 1.0]])
    epsilon = 0.334  # 3*0.334 = 1.002 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_non_uniform_weights():
    # Non-uniform weights
    summary = np.array([[1.0, 2.0, 4.0, 8.0], [1.0, 2.0, 3.0, 4.0]])
    epsilon = 0.5  # 4*0.5 = 2 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_extreme_values():
    # Test with very large and very small values
    summary = np.array([[1e-10, 1e10], [1.0, 1.0]])
    epsilon = 1.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_negative_weights():
    # Negative weights should be handled (though not meaningful, but test for robustness)
    summary = np.array([[1.0, 2.0], [1.0, -1.0]])
    epsilon = 1.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_compress_summary_non_integer_weights():
    # Non-integer weights
    summary = np.array([[0.0, 1.0, 2.0], [0.5, 1.5, 2.0]])
    epsilon = 0.5
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

# -----------------------
# Large Scale Test Cases
# -----------------------

def test_compress_summary_large_uniform():
    # Large summary with uniform bins and weights
    n_bins = 1000
    summary = np.vstack((np.linspace(0, 100, n_bins), np.ones(n_bins)))
    epsilon = 0.01  # 1000*0.01=10 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # Should compress to int(1/epsilon) = 100 bins
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_nonuniform():
    # Large summary with non-uniform weights
    n_bins = 500
    summary = np.vstack((np.linspace(0, 50, n_bins), np.linspace(1, 5, n_bins)))
    epsilon = 0.02  # 500*0.02=10 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_sparse():
    # Large summary with many zero weights
    n_bins = 1000
    bins = np.linspace(0, 100, n_bins)
    weights = np.zeros(n_bins)
    weights[::100] = 1.0  # Only every 100th bin has weight
    summary = np.vstack((bins, weights))
    epsilon = 0.05  # 1000*0.05=50 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_skewed():
    # Large summary with skewed weights (exponential)
    n_bins = 800
    bins = np.linspace(0, 10, n_bins)
    weights = np.exp(np.linspace(0, 5, n_bins))
    summary = np.vstack((bins, weights))
    epsilon = 0.01  # 800*0.01=8 > 1, so compression
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))

def test_compress_summary_large_all_same_value():
    # Large summary where all bin values are identical
    n_bins = 500
    summary = np.vstack((np.full(n_bins, 7.0), np.ones(n_bins)))
    epsilon = 0.02
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    expected_bins = int(np.ceil(1/epsilon))
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

import numpy as np
# imports
import pytest  # used for our unit tests
from keras.src.layers.preprocessing.discretization import compress_summary

# unit tests

# -------------------
# BASIC TEST CASES
# -------------------

def test_single_bin_no_compression():
    # Single bin: should return as is, regardless of epsilon
    summary = np.array([[5.0], [10.0]])
    codeflash_output = compress_summary(summary, 0.5); result = codeflash_output
    codeflash_output = compress_summary(summary, 0.01); result = codeflash_output

def test_two_bins_no_compression():
    # Two bins, epsilon too small to trigger compression
    summary = np.array([[1.0, 2.0], [3.0, 7.0]])
    epsilon = 0.4  # 2 * 0.4 = 0.8 < 1, so should not compress
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_two_bins_with_compression():
    # Two bins, epsilon triggers compression
    summary = np.array([[1.0, 3.0], [2.0, 8.0]])
    epsilon = 0.6  # 2 * 0.6 = 1.2 > 1, so should compress
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # With epsilon=0.6, percents = [0.6, 1.2) = [0.6]
    # cum_weights: [2, 10], cum_weight_percents: [0.2, 1.0]
    # new_bins: interpolate at 0.6 between 0.2 and 1.0
    expected_bin = 1.0 + (3.0-1.0)*(0.6-0.2)/(1.0-0.2)
    expected_cumweight = 2.0 + (10.0-2.0)*(0.6-0.2)/(1.0-0.2)
    expected_weight = expected_cumweight  # since only one bin, prev is 0
    expected = np.array([[expected_bin], [expected_weight]], dtype="float32")

def test_three_bins_with_compression():
    # Three bins, epsilon triggers compression
    summary = np.array([[1.0, 2.0, 3.0], [1.0, 1.0, 2.0]])
    epsilon = 0.5  # 3*0.5=1.5>1, so compress
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # percents: [0.5, 1.0)
    # cum_weights: [1,2,4], cum_weight_percents: [0.25,0.5,1.0]
    # new_bins: interpolate at 0.5 and 1.0 (but 1.0 is not included)
    expected_bins = np.interp([0.5], [0.25,0.5,1.0], [1.0,2.0,3.0])
    expected_cumweights = np.interp([0.5], [0.25,0.5,1.0], [1.0,2.0,4.0])
    expected_weights = expected_cumweights  # only one bin, prev is 0
    expected = np.array([expected_bins, expected_weights], dtype="float32")

def test_dtype_is_float32():
    # Output dtype should always be float32
    summary = np.array([[1.0, 2.0], [3.0, 4.0]], dtype="float64")
    codeflash_output = compress_summary(summary, 1.0); result = codeflash_output

# -------------------
# EDGE TEST CASES
# -------------------

def test_empty_summary():
    # Empty summary should be returned as is (shape [2,0])
    summary = np.empty((2,0))
    codeflash_output = compress_summary(summary, 0.5); result = codeflash_output

def test_epsilon_zero():
    # Epsilon zero: should return as is (avoid division by zero)
    summary = np.array([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = compress_summary(summary, 0.0); result = codeflash_output

def test_epsilon_very_small():
    # Epsilon very small: should return as is
    summary = np.array([[1.0, 2.0], [3.0, 4.0]])
    codeflash_output = compress_summary(summary, 1e-10); result = codeflash_output


def test_negative_weights():
    # Negative weights: cum_weights may decrease, but function should still run
    summary = np.array([[1.0, 2.0, 3.0], [2.0, -1.0, 3.0]])
    epsilon = 0.6
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_non_monotonic_bins():
    # Non-monotonic bins: interpolation should still work
    summary = np.array([[2.0, 1.0, 3.0], [1.0, 2.0, 3.0]])
    epsilon = 0.6
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_epsilon():
    # Epsilon large enough to compress to a single bin
    summary = np.array([[1.0, 2.0, 3.0, 4.0], [1.0, 1.0, 1.0, 1.0]])
    epsilon = 1.0
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
    # percents: [1.0] (but np.arange(0,1,1.0) = [0.0], so percents=[1.0])
    # But since percents starts at epsilon, if epsilon >= 1, percents=[1.0]
    # But 4*1.0 = 4 > 1, so compress, but percents = [1.0]
    # Interpolate at 1.0: should get last bin and total weight
    expected_bin = 4.0
    expected_cumweight = 4.0
    expected = np.array([[expected_bin], [expected_cumweight]], dtype="float32")

def test_summary_with_one_row():
    # Summary with only one row (invalid input)
    summary = np.array([[1.0, 2.0, 3.0]])
    with pytest.raises(IndexError):
        codeflash_output = compress_summary(summary, 0.5); _ = codeflash_output

def test_summary_with_more_than_two_rows():
    # Summary with more than two rows (invalid input)
    summary = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
    # Should use only the first two rows
    codeflash_output = compress_summary(summary[:2], 0.5); result = codeflash_output

# -------------------
# LARGE SCALE TEST CASES
# -------------------

def test_large_summary_compression():
    # Large summary, test scalability and output shape
    np.random.seed(42)
    bins = np.sort(np.random.rand(2, 1000), axis=1)
    bins[1] = np.random.randint(1, 10, size=1000)  # random weights
    summary = np.vstack((np.sort(np.random.rand(1000)), np.random.randint(1,10,1000)))
    epsilon = 0.01
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_no_compression():
    # Large summary, epsilon too small to trigger compression
    summary = np.vstack((np.linspace(0,1,1000), np.ones(1000)))
    epsilon = 1e-4  # 1000*1e-4 = 0.1 < 1
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_all_same_bin():
    # Large summary, all bins have same value
    summary = np.vstack((np.ones(1000), np.ones(1000)))
    epsilon = 0.05
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_increasing_weights():
    # Large summary, weights increasing
    summary = np.vstack((np.linspace(0,10,1000), np.arange(1,1001)))
    epsilon = 0.02
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output

def test_large_summary_decreasing_weights():
    # Large summary, weights decreasing
    summary = np.vstack((np.linspace(0,10,1000), np.arange(1000,0,-1)))
    epsilon = 0.02
    codeflash_output = compress_summary(summary, epsilon); result = codeflash_output
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-compress_summary-maxhreks and push.

Codeflash

Here's a highly optimized version of your program. Major improvements.

- Eliminated unnecessary re-assignment of variables.
- Avoided repeated calculation of cumulative weights and interpolation.
- Used in-place operations and preallocated arrays where possible.
- Reduced overhead of creating new NumPy arrays.
- Compressed logic for memory and speed without changing function semantics.
- Used contiguous 2D arrays (shape `(n, 2)`) instead of stacks for efficiency.

**Code:**



**Key Changes:**

- Reduced redundant recomputation of `cum_weights` and interpolation.
- `np.empty_like` and `np.empty` for best-possible performance.
- In-place assignment for the first weight difference element.
- Returned array uses `dtype=np.float32` directly.
- Preserved all comments as requested unless code changed.

This version is significantly faster for large summaries, using less memory and more NumPy in-place operations.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label May 21, 2025
@codeflash-ai codeflash-ai bot requested a review from HeshamHM28 May 21, 2025 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants