Grok-2 arrives with image generations — is the world ready?

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

As anticipated based on updates and new settings in the mobile app for Elon Musk’s social network X, a new large language model (LLM) called Grok-2 from Musk’s sister company xAI landed last night — and it’s a doozy.

Integrated within X itself and available through the Premium ($7 USD/month) and Premium+ ($14/month with no ads) subscription tiers, Grok-2 comes, fittingly, in two model sizes: Grok-2 and Grok-2 mini. Grok-2 offers state-of-the-art performance in a wide range of tasks including chat, coding, reasoning, and vision-based application, while Grok-2 mini is a smaller, faster version optimized for efficiency, suitable for simpler text-based prompts requiring quicker responses.

Grok-2 not only boasts image generation capabilities based on a partnership with Black Forest Labs and its new and surprisingly photorealistic open source diffusion AI model Flux.1, but it also shockingly outperforms the AI models from leading rivals including OpenAI (GPT-4o) and Anthropic (Claude 3.5 Sonnet) and even Google (Gemini Pro 1.5) on leading third-party benchmark tests.

A new, surprising leader across multiple benchmarks

Promotional screenshot of chart comparing Grok-2 mini and Grok-2 performance to other leading frontier LLMs from rival firms. Credit: xAI

Specifically, Grok-2 and Grok-2 mini outperform all other models on the GPQA, MMLU, MMLU-Pro, MATH, HumanEval, MMMU, MathVista, and DocVQA benchmarks.

Even the lmsys-chatbot arena, where many companies covertly test their AI models under alternate names in advance of release (including xAI, where Grok-2 was initially called “sus-column-r”) congratulated xAI on the milestone.

Woah, another exciting update from Chatbot Arena❤️‍?
The results for @xAI’s sus-column-r (Grok 2 early version) are now public**!
With over 12,000 community votes, sus-column-r has secured the #3 spot on the overall leaderboard, even matching GPT-4o! It excels in Coding (#2),… https://t.co/gqSWSwYN0z pic.twitter.com/j9UYDBYNt4
— lmsys.org (@lmsysorg) August 14, 2024

As AI influencer and University of Pennsylvania Wharton School of Business professor Ethan Mollick observed on X, “There are now five GPT-4 class models: GPT-4o, Claude 3.5, Gemini 1.5, Llama 3.1, and now Grok 2.”

There are now five GPT-4 class models: GPT-4o, Claude 3.5, Gemini 1.5, Llama 3.1, and now Grok 2.
All of the labs are saying there is room left for continued giant improvements, but we haven’t seen any models truly leap above GPT-4… yet. https://t.co/wA1XmmhasB
— Ethan Mollick (@emollick) August 14, 2024

Musk congratulated his “hardworking xAI team!” on the similarly named social network.

Image generations steal the show

Even though Grok-2 boasts leading performance on all these different benchmarks related to math, writing, code, and other tasks, by far, the marquee feature capturing the most attention from the jump is its integration with Black Forest Labs’ Flux.1 image generation model.

Prior to the release of Grok-2, Flux.1 had already been making waves in AI and AI art circles more specifically the last few weeks as people discovered that they could achieve incredibly photorealistic generations from the open source model, enough to resemble familiar situations like a speaker at a TED talk, as well as adapt the model using low-rank adaptation (LoRA) to generate their own likeness in different situations.

I think we’re about to see another wave of AI avatars thanks to Flux LoRA training
Huge step up in quality from the SD 1.5 + Dreambooth days
Check out the colab (and other options) below to train your own personalized models https://t.co/dLtWTm4FBj pic.twitter.com/k80YK0TR9p
— Bilawal Sidhu (@bilawalsidhu) August 13, 2024

Now that a version of Flux.1 is integrated directly into Grok-2 much in the same way OpenAI integrated its image generation model DALL-E 3 directly into ChatGPT, allowing users to simply type text prompts to the chatbot and ask it to make them images on command, users are testing this capability out in Grok-2 and finding it is notably permissive — generating controversial, compromising images even of public figures such as U.S. presidential candidates Kamala Harris and Donald Trump.

Other leading image generators including Midjourney and DALL-E 3 and Microsoft Designer have prohibitions around generating this type of content — especially in the wake of the controversy earlier this year over unauthorized explicit deepfakes of popular musician Taylor Swift (made by prompt engineering around the Designer restrictions) — so it is notable that Grok-2 is bucking that trend and allowing for more freedom, and potential risk. However, that is in keeping with Musk’s stated “free speech” ethos for X.

Yet users are raising concerns about what the capability means for the providence of deepfakes and misinformation across the web.

Grok 2 is super exciting, but I don’t think people have caught on about what the accessibility of this image generation means.
With no tech know how at all, you can use it in app for $8 and make anything with basic language.
Yes, we’ve had MJ and Flux, but this is the first to… pic.twitter.com/ZiYzMPIHoI
— Omiron — e/acc (@Omiron33) August 14, 2024

As user @Omiron33 put it well: “Yes, we’ve had MJ and Flux, but this is the first to make it usable and quick. Advertising, Propaganda and everything good or bad that comes with that just happened (IMO, the good outweighs the bad)”

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat’s Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link

Grok-2 arrives with image generations — is the world ready?

A new, surprising leader across multiple benchmarks

Image generations steal the show

About The Author

Maria Howard

A new, surprising leader across multiple benchmarks

Image generations steal the show

About The Author

Maria Howard

Start typing and press enter to search