Top
Best
New

Posted by todsacerdoti 10/23/2024

A DSL for peephole transformation rules of integer operations in the PyPy JIT(pypy.org)
108 points | 11 comments
fweimer 10/23/2024|
GCC does this match.pd. Here's the add_zero example, I think:

    /* Simplifications of operations with one constant operand and
       simplifications to constants or single values.  */
    
    (for op (plus pointer_plus minus bit_ior bit_xor)
      (simplify
        (op @0 integer_zerop)
        (non_lvalue @0)))
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/match.pd

The floating point case can be more complicated because of signed zeros and signaling NaNs, see fold_real_zero_addition_p.

westurner 10/23/2024||
math.fma fused multiply add is in Python 3.13. Are there already rules to transform expressions to math.fma?

And does Z3 verification indicate differences in output due to minimizing float-rounding error?

https://docs.python.org/3/library/math.html#math.fma :

> math.fma(x, y, z)

> Fused multiply-add operation. Return (x * y) + z, computed as though with infinite precision and range followed by a single round to the float format. This operation often provides better accuracy than the direct expression (x * y) + z.

> This function follows the specification of the fusedMultiplyAdd operation described in the IEEE 754 standard.

cfbolztereick 10/23/2024|
I only support integer operations in the DSL so far.

(but yes, turning the python expression x*y+z into an fma call would not be a legal optimization for the jit anyway. And Z3 would rightfully complain. The results must be bitwise identical before and after optimization)

westurner 10/25/2024||
Does Z3 distinguish between x*y+z and z+y*x, in terms of floating point output?

math.fma() would be a different function call with the same or similar output.

(deal-solver is a tool for verifying formal implementations in Python with Z3.)

hoten 10/23/2024||
The before/after delta is very satisfying and clearly easier to work with. Very nice.

How does the DSL handle side effects? I'm assuming any term here could be a compound expression, so what part of the system prevents "0*callfn()" from reducing to "0"?

bastawhiz 10/23/2024|
I believe this operates on values in something akin to registers, not expression trees. By the time you get to a place where one of these rules is relevant, the side effect has already taken place. This just prevents the return of such a function call to be operated on, like in your example.
cfbolztereick 10/24/2024||
Yeah, that's exactly right. It's operating on an SSA-based intermediate representation. So if an operation gets removed, the arguments are still being computed. The earlier operations that produce those arguments can also be removed by dead code elimination (which runs later), but indeed only if they have no side effects.

I suppose the notation of the rule heads makes it look like there actually are expression trees, which is maybe sort of confusing.

JonChesterfield 10/24/2024||
A cost model would give a proof of termination - require the output of each transform to be cheaper than the input. Confluence is less obvious.
mattalex 10/24/2024|
Once you have strong normalization you can just check local confluence and use Newman's lemma to get strong confluence. That should be pretty easy: just build all n^2 pairs and run them to termination (which you have proven before). If those pairs are confluent, so is the full rewriting scheme.
JonChesterfield 10/25/2024||
That is a new one to me. Tracked the reference back to https://www.jstor.org/stable/1968867 which looks excellent. Thank you!
mjcohen 10/23/2024||
Interesting to see which optimizations are never used.
marticer1 10/23/2024|
[dead]