Google Gemini: Everything you need to know about the new generative AI platform | TechCrunch

Google Gemini: Everything you need to know about the new generative AI platform | TechCrunch

Google is trying to make waves with Gemini, its flagship suite of generative AI models, apps and services.

So what is Gemini? How can you use it? And how does it? Stack up for the competition?

To make it easy to keep up with the latest Gemini developments, we've put together this handy guide, which we'll keep updating as news about new Gemini models, features and Google's Gemini plans is released.

What is Gemini?

Gemini belongs to Google. Long promised, the next-generation GenAI model family, developed by Google's AI research labs DeepMind and Google Research. It comes in three flavors:

  • Gemini Ultrathe highest performing Gemini model.
  • Gemini Proa “lite” Gemini model.
  • Gemini Nanoa smaller “Aast” model that runs on mobile devices. Pixel 8 Pro.

All of the Gemini models were trained to be “spatially multimodal”—in other words, capable of doing and using more than just words. They are pre-trained and fine-tuned on a variety of audio, images and videos, a large set of codebases and text in different languages.

This sets the Gemini apart from Google's own models. LA MDAwhich was trained exclusively on text data. LaMDA cannot understand or create anything other than text (eg, articles, email drafts), but this is not the case with Gemini models.

What is the difference between Gemini Apps and Gemini Models?

Image credit: Google

Google is proving it. one more time That it lacked branding expertise, it didn't make it clear from the start that Gemini was separate and distinct from the Gemini apps on web and mobile (formerly Bard). Gemini apps are simply an interface through which some Gemini models can be accessed — think of it as a client for Google's GenAI.

Incidentally, Gemini apps and models are also completely free. Figure 2Google's text-to-image model is available in some of the company's dev tools and environments.

What can Gemini do?

Because Gemini models are multimodal, they can in theory perform many multimodal tasks, from simulating speech to captioning photos and videos to producing artwork. Some of these capabilities have yet to reach the product stage (more on that later), and Google promises all of them — and more — sometime in the not-too-distant future.

Of course, it's a bit difficult to take a company at its word.

Google Seriously under-delivered With the original bard launch. And recently it ruffled feathers. Along with a video to showcase Gemini's capabilities Which was heavily doctored and more or less wishy-washy.

Still, assuming Google is more or less true to its claims, here's what the various levels of Gemini will be able to do once they reach their full potential:

Gemini Ultra

Google says Gemini Ultra — thanks to its versatility — can be used to help with things like physics homework, solving step-by-step problems on worksheets, and spotting potential errors in pre-filled answers.

Gemini Ultra can also be applied to tasks like identifying scientific papers relevant to a particular problem, Google says — extracting information from those papers and creating the necessary formulas to recreate the chart with the latest data. To “update” the chart by doing one. .

Gemini Ultra technically supports image generation, as mentioned earlier. But this capability has yet to make its way into a productized version of the model – perhaps because the methodology is more complex than apps like Chat GPT Make pictures. Indicates an image generator rather than a feed (eg DALL-E 3In the case of ChatGPT), Gemini creates images “natively”, without any intermediate steps.

Gemini Ultra is available as an API through Vertex AI, Google's fully managed AI developer platform, and AI Studio, Google's web-based tool for app and platform developers. It also powers Gemini apps – but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires subscribing to the Google One AI Premium plan, which costs $20 per month.

The AI ​​Premium plan also connects Gemini to your broader Google Workspace account — think emails in Gmail, documents in Docs, presentations in Sheets and recordings from Google Meet. This is useful for summarizing emails or keeping Gemini capture notes during a video call.

Gemini Pro

Google says Gemini Pro is an improvement over LaMDA in its reasoning, planning and understanding capabilities.

A free the study Researchers at Carnegie Mellon and Berry AI found that an early version of Gemini Pro was actually better than OpenAI. GPT-3.5 In dealing with longer and more complex reasoning chains. But the study also found that, like all major language models, this version of Gemini Pro particularly struggled with math problems involving many digits, and Users get examples. Of Incorrect reasoning and obvious mistakes.

Google promised a cure, though — and delivered in the first form Gemini 1.5 Pro.

Designed to be a drop-in replacement, the Gemini 1.5 Pro is improved over its predecessor in many areas, perhaps most notably in the amount of data it can process. Gemini 1.5 Pro can handle ~700,000 words, or ~30,000 lines of code — 35 times the amount that Gemini 1.0 Pro can handle. And – the model is multimodal – it's not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or an hour of video in different languages, albeit slowly (for example, it takes 30 seconds to a minute to find a scene in an hour of video). takes time).

Gemini 1.5 Pro Entered public preview on Vertex AI in April..

An additional endpoint, Gemini Pro Vision, can process text. And Imagery — including photos and video — and output text along the lines of OpenAI GPT-4 with vision Model


Using Gemini Pro in Vertex AI Image credit: Gemini

Within Vertex AI, developers can tailor Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform specific actions.

In AI Studio, there is a workflow for creating structured chat prompts using Gemini Pro. Developers have access to both Gemini Pro and Gemini Pro Vision endpoints, and can adjust model temperatures to control the creative range of output and provide examples to guide tone and style. are — and can also tune security settings.

Gemini Nano

The Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is efficient enough to run directly on (some) phones rather than sending work to a server somewhere. So far, it powers a few features on the Pixel 8 Pro, Pixel 8 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users press a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations and other snippets. Users get these summaries even if they don't have a signal or Wi-Fi connection available—and in a nod to privacy, no data leaves their phone in the process.

The Gemini Nano is also in Gboard, Google's keyboard app. There, it powers a feature called Smart Reply, which helps suggest the next thing you might want to say while chatting in the messaging app. Google says the feature initially only works with WhatsApp, but will come to more apps over time.

And in the Google Messages app on supported devices, the Nano enables Magic Compose, which can craft messages in styles like “spirited,” “formal” and “lyrical.”

Is Gemini better than OpenAI's GPT-4?

Google several times said Gemini leads on benchmarks, claiming that Gemini Ultra exceeds the current state-of-the-art results on “30 of 32 widely used educational benchmarks used in the research and development of large language models”. The company says the Gemini 1.5 Pro, meanwhile, is more capable in some scenarios than the Gemini Ultra at tasks like summarizing, brainstorming and writing content. This will likely change with the release of the next Ultra model.

But leaving aside the question of whether the benchmarks really indicate a better model, Google's points scores appear marginally better than OpenAI's corresponding models. And—as mentioned earlier—some early impressions have not been good. Consumers And Academics Stating that the older version of Gemini Pro gets the basics wrong, struggles with translation and makes poor coding suggestions.

How much does Gemini cost?

Gemini 1.5 Pro is free to use in Gemini apps and, for now, AI Studio and Vertex AI.

Once Gemini 1.5 Pro is out of preview in Vertex, however, the model will cost $0.0025 per character while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and per image ($0.0025) in the case of models like the Gemini Pro Vision.

Let's assume that a 500-word article consists of 2,000 characters. Summarizing this article with Gemini 1.5 Pro will cost $5. Meanwhile, creating an article of similar length will cost $0.1.

Ultra prices are yet to be announced.

Where can you try Gemini?

Gemini Pro

Gemini Pro is the easiest place to experience. Gemini apps. Pro and Ultra are answering questions in different languages.

There's also the Gemini Pro and Ultra. Accessible In preview in Vertex AI via an API. The API is currently free to use “in-limits” and supports certain regions, including Europe, as well as features such as chat functionality and filtering.

Elsewhere, there may be Gemini Pro and Ultra. found In AI Studio. Using the service, developers can iterate over prompts and Gemini-based chatbots and then get API keys to use them in their own apps — or export the code to a fully featured IDE.

Code Assist (formerly Duet AI for developers), Google's suite of AI-powered assistance tools for code completion and generation, is using the Gemini model. Developers can perform “bulk” changes to code bases, for example updating cross-file dependencies and reviewing large sections of code.

Google brought Gemini models for it Dev Tools For Chrome and Firebase Mobile Dev Platform, and its Database creation and management tools. And it is Launched new security products under the influence of Gemini.Like Gemini in Threat Intelligence, a component of Google's Mandiant cybersecurity platform that can analyze large chunks of potentially malicious code and lets users search in natural language for signs of ongoing threats or compromises.

About the Author

Leave a Reply