nimble

joined 2 months ago
[–] nimble@programming.dev 2 points 4 days ago

Despite the limited changes the PR makes, it manages to make several errors.

According to benchmarks in issue #31130:

  • With broadcast: np.column_stack → 36.47 µs, np.vstack().T → 27.67 µs (24% faster)
  • Without broadcast: np.column_stack → 20.63 µs, np.vstack().T → 13.18 µs (36% faster)

Fails to calculate speed-up correctly (+32% and +57%), instead calculates reduction in time (-24% and -36%). Also those figures are just regurgitated from the original issue.

The improvement comes from np.vstack().T doing contiguous memory copies and returning a view, whereas np.column_stack has to interleave elements in memory.

Regurgitated information from the original issue.

Changes

  • Modified 3 files
  • Replaced 3 occurrences of np.column_stack with np.vstack().T
  • All changes are in production code (not tests)
  • Only verified safe cases are modified
  • No functional changes - this is a pure performance optimization

The PR changes 4 files.