-
Notifications
You must be signed in to change notification settings - Fork 11
Feedback #3
Description
Description says you collect feedback, but not specifies how it should be provided, so I should open an issue ... I guess?
First, nice job @zdevito ! torchdim looks very promising, in particular indexing looks very friendly.
Unforeseen axes
Curious how you plan to implement operations that introduce a new axis, like boolean indexing or bincount / unique / set-like operations.
One possible way would be to return a new axis object along with result, but it has issues:
x1, axis1 = bincount(x)
x2, axis2 = bincount(x)
x1 + x2 # they have different axes or same axis?This can be solved by adding one more argument or by allowing manual 'coalescing' of axes.
Concatenation / chunking of named axis
Again, curious about your thoughts here.
Multi-axes
Cases when a single function should deal with tensors of several possible dimensionalities are frequent.
Potentially you can leave those problems to positional axes, but I'd recommend exploring the direction of multi-axes:
(Q[b, qaxes, [head, c]] * K[b, kaxes, [head, c]]).sum(c).order(b, *qaxes, *kaxes, head)
# * not allowed in indexing
(Q.index(b, *qaxes, [head, c]) * K.index(b, *kaxes, [head, c])).sum(c).order(b, *qaxes, *kaxes, head)It has very 'pythonic' look, under the hood iterating over multi-axis would yield a single helper object, which would designate the position among other axes.
Delayed computations
It is a super-clever trick to delay multiplication until possible summation follows, but making it a single operation is more predictable
x = a * b
result = (x * c).sum(i, j) # here einsum-ification probably happens
x + 1 # user actually expected that one to be materialized.Just placing that in a function does not look worse to my eye, but open to other opinions
sum_product([i, j], a, b, c)Calling functions
batch, inputs, hidden, classes = dims(4)
print(loss(w1[inputs, hidden], w2[hidden, classes], images[batch, inputs], labels[batch]))
Can you provide more complete example here? It is unclear how loss function can take a matmul of images and w1, because it needs to sum over hidden variable, but it was not passed to the function.
More broadly, there should be some contract how callee interprets its inputs (from this example seems it deals only with non-named axes, and behavior of named axes is left to the calling function, but maybe I misunderstand). More examples with would be very helpful here.
Interaction with deep learning blocks
Can you explain how DL operations (e.g. convolution) would handle named dimensions (and would they)?
Add Dims context manager
with dims(6) as (h2, w2, c, b, h, w):
<computations>
Suggestion may sound a bit strange, but here is a rationale:
if you don't have an axis object, you can't manipulate it, thus whole tensor becomes non-manipulatable.
I expect users would commonly return created objects without order-ing them first, and then deal with downstream problems (since those will be scalars for outer code, they will not error out in most operations, and then users will chase skipped order).
Exit from contextmanager should deallocate all tensors that use axis objects created with context manager => more efficient memory management almost for free + in a large number of cases you can point user to the problem immediately.
Using Better Terminology
'Flattening and Splitting Dims'. Both terms are not suitable to the context. Yes, that's torch ops, but they become inappropriate as you move from discussing old-style ops to operations that are focused on axes. For instance, phrase 'flatten the dimensions' does not make any sense as dimensions/axes are already flat.
Einops uses terminology 'composition and decomposition of axes', because 1) it is obvious when you compose you get fewer axes 2) it hints that original content is preserved 3) wording: decomposition reverses composition, even kids know that (compare that to flatten vs split dims) 4) you can refer to 'composed axis' and 'composing axes', which is helpful in discussing code. Let's use this better terminology.