This was a hot topic in 2019-2020 and I just thought I‘ve read an article like this back then from NYU and another from the National University of Singapore, because I and many others were at the time independently working on graph problems using Transformers [0,1].
Turns out this article _is_ the one from NYU, just on arxiv now: