02 AI Model Landscape

 AI Model Landscape


1. Foundation Models

  • Large-scale pre-trained models that can be adapted for many tasks.
  • Can be unimodal (text, image, audio) or multimodal (text + image + audio).
  • Examples: GPT-4, DALL·E, PaLM, LLaMA, Stable Diffusion

2. Generative AI (subset of foundation models)

  • AI that creates new content rather than just analyzing it.
  • Can include text, image, video, audio, and 3D.
  • Subcategories:

Type

Description

Example Models

Text

Generates human-like language

GPT-4, Claude 3, LLaMA 2, Falcon

Image

Creates images from prompts

DALL·E, MidJourney, Stable Diffusion

Video

Generates videos or animations

Runway Gen-2, Synthesia

Audio / Music

Creates speech, sound, or music

MusicLM, AudioGen, VALL-E

3D / Simulation

Generates 3D models or environments

Point-E, Kaedim3D

All LLMs that generate text fall under Generative AI.
⚠️ Not all Generative AI are LLMs (image/video/audio generators aren’t).


3. Large Language Models (LLMs) (subset of text-based Generative AI)

  • Specifically trained on text data to understand, summarize, translate, answer questions, and generate text.
  • Can also perform code generation, reasoning, and conversation.
  • Examples:
    • Proprietary: GPT-4, GPT-4-turbo, Claude 3, Gemini 1.5
    • Open source: LLaMA 2, Falcon 7B, Mistral 7B, RedPajama

Special Notes:

  • LLMs can be instruction-tuned (follow user instructions better) or multimodal (accept text + images).
  • LLMs often serve as the core engine behind AI assistants like ChatGPT or Claude.

4. Other AI Models

  • Discriminative AI / Predictive AI: Focus on classifying or predicting rather than generating.
    • Example: BERT (for understanding text), ResNet (for images), XGBoost (structured data).
  • Reinforcement Learning Models: Learn via trial-and-error feedback.
    • Example: AlphaGo, AlphaZero

Comments

Popular posts from this blog

19 Google ADK Tutorial

15 Agent AI vs Agentic AI

16 Build a Free Chat App on Google Colab using RAG