‘Attention is All You Need’ creators look beyond Transformers for AI at Nvidia GTC: ‘The world needs something better’

Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.

Seven of the eight authors of the landmark ‘Attention is All You Need’ paper, that introduced Transformers, gathered for the first time as a group for a chat with Nvidia CEO Jensen Huang in a packed ballroom at the GTC conference today.

They included Noam Shazeer, co-founder and CEO of Character.ai; Aidan Gomez, co-founder and CEO of Cohere; Ashish Vaswani, co-founder and CEO of Essential AI; Llion Jones, co-founder and CTO of Sakana AI; Illia Polosukhin, co-founder of NEAR Protocol; Jakob Uskhoreit, co-founder and CEO of Inceptive; and Lukasz Kaiser, member of the technical staff at OpenAI. Niki Parmar, co-founder of Essential AI, was unable to attend.

In 2017, the eight-person team at Google Brain struck gold with Transformers — a neural network NLP breakthrough that captured the context and meaning of words more accurately than its predecessors: the recurrent neural network and the long short-term memory network. The Transformer architecture became the underpinnings of LLMs like GPT-4 and ChatGPT, but also non-language applications including OpenAI’s Codex and DeepMind’s AlphaFold.

‘The world needs something better than Transformers’

But now, the creators of Transformers are looking beyond what they built — to what’s next for AI models. Cohere’s Gomez said that at this point “the world needs something better than Transformers,” adding that “I think all of us here hope it gets succeeded by something that will carry us to new plateau of performance.” He went on to ask the rest of the group: “What do you see comes next? That’s the exciting step because I think [what is there now] is too similar to the thing that was there six, seven, years ago.”

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

In a discussion with VentureBeat after the panel, Gomez expanded on his panel comments, saying that “it would be really sad if [Transformers] is the best we can do,” adding that he had thought so since the day after the team submitted the “Attention is All You Need” paper. “I want to see it replaced with something else 10 times better, because that means everyone gets access to models that are 10 times better.”

He pointed out that there are many inefficiencies on the memory side of Transformers and many architectural components of the Transformer that have stayed the same since the very beginning and should be “re-explored, reconsidered.” For example, a very long context, he explained, becomes expensive and unscalable. In addition, “the parameterization is maybe unnecessarily large, we could compress it down much more, we could share weights much more often — that could bring things down by an order of magnitude.”

‘You have to be clearly, obviously better’

That said, he admitted that while the rest of the paper’s authors would likely agree, Gomez said there are “varying degrees of when that will happen. And maybe convictions vary if it will happen. But everyone wants a better — like, we’re all scientists at heart — and that just means we want to see progress.”

During the panel, however, Sakana’s Jones pointed out that in order for the AI industry to move to the next thing after Transformers — whatever that may be — “you don’t just have to be better. — you have to be clearly, obviously better…so [right now] it’s stuck on the original model, despite the fact that probably technically it’s not the most powerful thing to have right now.”

Gomez agreed, telling VentureBeat that the Transformer became so popular not just because it was a good model and architecture, but that people got excited about it — you need both, he said. “If you miss either of those two things, you can’t move the community,” he explained. “So in order to catalyze the momentum to shift from an architecture to another one, you really need to put something in front of them that excites people.”

Source link

About The Author

Scroll to Top