Introduction to AI (3). The Journey of AI: From Discriminative to Generative

From Discriminative to Generative AI
Artificial Intelligence (AI) has undergone a fascinating transformation, evolving from solving specific discriminative tasks (Discriminative AI) to unleashing creativity (Generative AI).
This journey—from analyzing to creating—feels natural. Just as humans learn to discriminate or predict before they create (a child first recognizes shapes before drawing them), AI’s path mirrors this developmental arc. Let’s explore this journey trying to explain these types of AI:
- Discriminative AI, focuses on classifying and predicting outcomes. For example, a discriminative AI model might be trained to classify objects in an image, or to predict whether a customer is likely to purchase a product. This type of AI excels in tasks like fraud detection, medical diagnosis, and image recognition.
- Generative AI (also known as GenAI), focuses on creating new content, such as text, images, speech, music, and video. This type of AI has become increasingly popular in recent years with the advent of large models, and specially Large Language Models (LLMs). These latest models act as creative partners, leveraging massive datasets to generate human-like text, translate languages, write stories, and answer complex questions
Discriminative AI: Predicting & Optimizing
Discriminative AI focuses on understanding the relationships between inputs and outputs, aiming to distinguish, predict, or classify based on learned patterns. It operates within the realm of recognition, optimization, and decision-making. Within this broader category, we can further break down its capabilities into Predictive AI and Prescriptive AI, each serving distinct but complementary purposes.
Predictive AI: Forecasting the Future
This type aims to anticipate what is likely to happen, i.e. forecast future outcomes based on historical data. Examples of this type are:
- Weather forecasting: Predicting tomorrow’s temperature based on past weather patterns.
- Customer behavior analysis: Predicting which customers are likely to churn.
- Risk assessment: Forecasting the likelihood of loan defaults in financial institutions.
Prescriptive AI: Optimizing for Action
Prescriptive AI takes Predictive AI a step further by not only forecasting outcomes but also providing actionable recommendations to achieve the best possible results. It combines prediction with optimization to guide decision-making.
It tries to answer “what should be done” to achieve the desired outcome. Examples of this type are:
- Streaming Services (e.g., Netflix, Spotify): They don't just predict what you might like; it creates a personalized playlist or “watch next” queue to keep you engaged.
- Smart Navigation Apps (e.g., Google Maps, Waze): When you enter a destination, the app not only predicts how long it will take to get there (predictive AI) but also suggests the fastest or least congested route (prescriptive AI) based on real-time traffic, road closures, and user preferences (e.g., avoiding toll roads), giving you actionable advice on which way to go.
Generative AI: Expanding creativity
Generative AI has witnessed remarkable advancements from November 2022 (when ChatGPT was launched), expanding its capabilities across multiple domains. Depending on the different inputs and outputs it uses, we can establish different "types" of generative AI. A first high level classification for AI will be that based on the types of input and outputs it can generate.
Single Modality Models
Single modality models work with one type of input and produce an output within the same modality. This means they are specialized for a particular type of data, like text, images, audio or code. Examples of this type of models are:
Text-Based Models
They are designed to understand and generate human language in textual form; and excel at processing sequences of words and understanding the relationships between them. Some of these models are those that can do:
Natural Language Processing (NLP) that includes a wide range of tasks that involve understanding and manipulating human language. They are usually based on LLMs or Large Language Models.
Examples of key applications
Application | Description |
---|---|
Creative Writing | Generating stories, poetry, or scripts. |
Document Summarization | Condensing lengthy legal or technical documents into concise summaries. |
Language Translation | Providing nuanced translations between languages. |
Grammar Correction and Style Adjustment | Improving text quality by correcting grammar, simplifying language, or adjusting tone. |
Paraphrasing | Rewriting text while preserving its meaning. |
Examples of text-based models
Model Type | Examples |
---|---|
Large Language Models | GPT-4 (OpenAI), Gemini (Google), Claude (Anthropic) |
Code Generation Models | GitHub Copilot, Codex |
Text classification and information extraction, that analyze text to categorize it (topic classification), extract key information from it (like names for example, using named entity recognition or NER), or determine sentiment (sentiment analysis).
Code generation (txt2code): They are used also for code generation from natural language instructions, requiring understanding both programming concepts and languages, and human natural understanding.
Vision-Based Models
Designed to interpret and process visual information from images. They learn to identify patterns, features, and objects within images. Or create new images or modify existing ones, often based on specific criteria or styles. They are used for tasks like generating realistic images, creating artistics styles, creating 3d models, or enhance image resolution.
Key applications
Application | Description |
---|---|
Image Recognition | Object detection, classification, and scene understanding. |
Image Generation | Creating realistic or artistic images from scratch based on specific styles or prompts. |
Image Enhancement | Improving image resolution or quality. |
3D Model Creation | Generating 3D models from textual descriptions or 2D images. |
Vision Based Models
Model Type | Examples |
---|---|
Image Recognition Models | ResNet, YOLO (You Only Look Once), Vision Transformers (ViT). |
Generative Vision Models | DALL·E 3 (OpenAI), Stable Diffusion, MidJourney. |
3D Model Generators | NeRF (Neural Radiance Fields). |
Audio-Based Models
Specialized in processing sound and speech data. They analyze the characteristics of sound waves to understand and manipulate audio.
Application | Description |
---|---|
Speech Recognition (Speech-to-Text) | Converts spoken words into written text for transcription tools or voice assistants. |
Audio-to-Audio Transformation | Enhances audio by reducing noise or adding effects. |
Music Generation | Composes music based on user input (e.g., OpenAI’s MuseNet or Google’s MusicLM). |
Multimodal Models
Nowadays Generative AI models like those from OpenAI, Anthropic (Claude), Google (Gemini),.. have integrated in a sophisticated way multiple data types models in what is known as a Multimodal Model. They combine elements from multiple modalities (e.g., vision and language). For instance, a model that can analyze an image while generating descriptive text of it exemplifies this category.
The field of AI is constantly evolving, and new model types and applications are emerging regularly.
Comparison between Single Modality vs Multimodal Models
Feature | Single Modality Models | Multimodal Models |
---|---|---|
Input Types | One type of input at a time | Multiple types of inputs simultaneously |
Output Types | Same as input modality | Can span across different modalities |
Complexity | Specialized but limited to one domain | Versatile but computationally intensive |
Examples | GPT-4 for text; YOLO for vision | CLIP for vision-language; Gemini by Google |
Summary of Modalities and Transformations
Below is a quick reference table of common AI transformations across modalities:
Input → Output | Description | Examples |
---|---|---|
Text-to-Text (txt2txt) | Question/answer, translation, summarization, paraphrasing,... | GPT-4 (OpenAI) |
Text-to-Image (txt2img) | Generate images from text descriptions | DALL·E 3 (OpenAI) |
Text-to-Video (txt2vid) | Create videos from text prompts | Runway Gen-2 |
Text-to-Code (txt2code) | Generate code snippets | GitHub Copilot |
Text-to-Speech (TTS) | Convert text to spoken audio | ElevenLabs |
Speech-to-Text (STT) | Transcribe speech into text [also called Automatic Speech Recognition (ASR)] | Whisper |
Image-to-Text (img2txt) | Generate descriptions or captions for images | CLIP |
Image-to-Image (img2img) | Data augmentation such as super-resolution, style transfer, and inpainting. | Leonardo AI |
Audio-to-Audio (aud2aud) | Enhance or transform audio | Adobe Enhance |
Video-to-Audio (vid2aud) | Models that analyze video and generate matching audio | Soundify |
Text-to-Music (txt2music) | Convert text instructions and description into music | MusicFX (Google) |
Music-to-Music (music2music) | Style transfer for music | MusicFX (Google) |
Text-to-Augmented/Virtual Reality (txt2AR/VR) | Creates AR/VR environments from textual prompts. |
AI Transforming Industries
Generative and Discriminative AI are revolutionizing industries by blending creativity with analytical precision. Below is a breakdown of their applications across key sectors.
Healthcare Revolution
AI Type | Application |
---|---|
Generative AI | - Digital researcher—designing drug molecules or creating personalized treatment scenarios. |
Discriminative AI |
- Medical image analysis for disease detection (cancer, fractures, retinal conditions) - Patient risk assessment based on health records and symptoms - ECG analysis for heart rhythm abnormalities - Genetic sequence screening for mutations and disorders |
Financial & Business Intelligence
AI Type | Application |
---|---|
Generative AI | - Creates synthetic datasets for market analysis while protecting data privacy. - Crafts personalized shopping experiences through tailored product recommendations. |
Discriminative AI | - Monitors transactions for fraud detection and predicts market trends with remarkable accuracy. - Customer credit risk evaluation for lending decisions - Customer churn prediction and retention analysis - Quality control in manufacturing processes - Customer feedback and sentiment categorization - Predicts purchasing behaviors to enable hyper-targeted marketing strategies. |
Creative Industries Reimagined
AI Type | Application |
---|---|
Generative AI | - Collaborates with artists to generate unique music, art, and stories, amplifying creative processes. |
Discriminative AI | - Provides audience insights by analyzing sentiment and engagement to refine artistic expressions. |
Manufacturing Efficiency
AI Type | Application |
---|---|
Generative AI | - Designs innovative product prototypes that challenge traditional engineering constraints. |
Discriminative AI | - Ensures consistent quality by predicting equipment failures during production processes. |
Transportation & Logistics
AI Type | Application |
---|---|
Generative AI | . |
Discriminative AI | - Traffic pattern analysis and accident prediction - Predictive maintenance through equipment sensor data - Automated license plate recognition - Driver behavior monitoring and safety assessment |
Conclusion
By combining the imaginative power of Generative AI with the analytical capabilities of Discriminative AI, industries are experiencing transformative advancements. From healthcare diagnostics to creative collaborations, these technologies are reshaping how we live, work, and innovate. Now that you know all of the possibilities of Generative and Discriminative AI, what ideas do you have to create something new or apply it in your work or business?