In theory, this should mean it understands things in a more intuitive manner. The main difference between it and a typical LLM is that it's also trained on images, audio, and videos at the same time it's being trained on text they aren't the result of a separate model bolted on at the end. Still, Google has confirmed that the Gemini models use a transformer architecture and rely on strategies like pretraining and fine-tuning, much as other LLMs like GPT-4 do. For example, you can give it a prompt like "what's going on in this picture?" and attach an image, and it will describe the image and respond to further prompts asking for more complex information.īecause we've now entered the corporate competition era of AI, most companies are keeping pretty quiet on the specifics of how their models work and differ. The major difference: while Gemini can understand and generate text like other LLMs, it can also natively understand, operate on, and combine other kinds of information like images, audio, videos, and code. Google Gemini is a family of AI models, like OpenAI's GPT. Let's dig in a little deeper and see if Google can really get back in the AI game. All three versions are multimodal, which means that in addition to text, they can understand and work with images, audio, videos, and code. Gemini Nano, Gemini Pro, and Gemini Ultra are Google's attempt to play catchup.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |