Google on Wednesday unveiled its next-generation and multimodal AI (artificial intelligence) model, Gemini to take on its rival Microsoft-backed OpenAIโs GPT-4. The company calls Gemini the โmost capable, flexible, and general AI modelโ that it has ever built.
Developed by a team of researchers from Google’s now-merged AI divisions DeepMind and Google Brain, this new large language model (LLM) can generalize and seamlessly understand, operate across, and combine different types of information, including text, code, audio, image, and video.
Gemini 1.0, the companyโs first version, comes in three different sizes: Pro, Ultra, and Nano. While Gemini Ultra is made “for highly complex tasks”, Gemini Pro offers scaling “across a wide range of tasks” and Gemini Nano is the companyโs most efficient model “for on-device tasks.”
“Weโre taking the next step on our journey (as an AI first company) with Gemini, our most capable and general model yet, with state-of-the art performance across many leading benchmarks,” Google CEO Sundar Pichai said in a foreword to the blog post about the announcement.
“Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro, and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year.”
According to Google, Gemini Ultraโs performance outperformed current โstate-of-the-artโ models, including ChatGPTโs most powerful model, GPT-4, on 30 of the 32 widely-used academic benchmarks used in LLM research and development.
With a score of 90.0% on the huge multitask language understanding (MMLU), Gemini Ultra is the first model to outperform human experts (89.8%) as well as GPT-4 (86.4%), which uses a combination of 57 subjects such as math, physics, history, law, medicine, and ethics for testing both world knowledge and problem-solving abilities, the company added.
In addition, Gemini can โunderstand, explain and generate high-quality code in the worldโs most popular programming languages, like Python, Java, C++, and Go. Its ability to work across languages and reason about complex information makes it one of the leading foundation models for coding in the world.โ
Starting Wednesday, Gemini Pro is available for free right now to anyone with a Google account through theย Google Bard service. It is available in English in more than 170 countries, including the U.S., with the services expected to expand to different modalities and support new languages and locations in the near future.
Further, Gemini Nano is now available on the Pixel 8 Pro smartphone and is likely to roll out to other Pixel models very soon. Lastly, the most powerful model, Ultra, which is being tested externally, is not expected to be released publicly before early 2024.