Beginner Level

Introduction to AI (3). The Journey of AI: From Discriminative to Generative

AI Xplorers Team

27 Dec 2024 — 6 min read

From Discriminative to Generative AI

Artificial Intelligence (AI) has undergone a fascinating transformation, evolving from solving specific discriminative tasks (Discriminative AI) to unleashing creativity (Generative AI).

This journey—from analyzing to creating—feels natural. Just as humans learn to discriminate or predict before they create (a child first recognizes shapes before drawing them), AI’s path mirrors this developmental arc. Let’s explore this journey trying to explain these types of AI:

Discriminative AI, focuses on classifying and predicting outcomes. For example, a discriminative AI model might be trained to classify objects in an image, or to predict whether a customer is likely to purchase a product. This type of AI excels in tasks like fraud detection, medical diagnosis, and image recognition.
Generative AI (also known as GenAI), focuses on creating new content, such as text, images, speech, music, and video. This type of AI has become increasingly popular in recent years with the advent of large models, and specially Large Language Models (LLMs). These latest models act as creative partners, leveraging massive datasets to generate human-like text, translate languages, write stories, and answer complex questions

Generative AI creates, imagines, and simulates, while Discriminative AI analyzes, predicts, and optimizes.

Discriminative AI: Predicting & Optimizing

Discriminative AI focuses on understanding the relationships between inputs and outputs, aiming to distinguish, predict, or classify based on learned patterns. It operates within the realm of recognition, optimization, and decision-making. Within this broader category, we can further break down its capabilities into Predictive AI and Prescriptive AI, each serving distinct but complementary purposes.

Predictive AI: Forecasting the Future

This type aims to anticipate what is likely to happen, i.e. forecast future outcomes based on historical data. Examples of this type are:

Weather forecasting: Predicting tomorrow’s temperature based on past weather patterns.
Customer behavior analysis: Predicting which customers are likely to churn.
Risk assessment: Forecasting the likelihood of loan defaults in financial institutions.

Prescriptive AI: Optimizing for Action

Prescriptive AI takes Predictive AI a step further by not only forecasting outcomes but also providing actionable recommendations to achieve the best possible results. It combines prediction with optimization to guide decision-making.

It tries to answer “what should be done” to achieve the desired outcome. Examples of this type are:

Streaming Services (e.g., Netflix, Spotify): They don't just predict what you might like; it creates a personalized playlist or “watch next” queue to keep you engaged.
Smart Navigation Apps (e.g., Google Maps, Waze): When you enter a destination, the app not only predicts how long it will take to get there (predictive AI) but also suggests the fastest or least congested route (prescriptive AI) based on real-time traffic, road closures, and user preferences (e.g., avoiding toll roads), giving you actionable advice on which way to go.

Generative AI: Expanding creativity

Generative AI has witnessed remarkable advancements from November 2022 (when ChatGPT was launched), expanding its capabilities across multiple domains. Depending on the different inputs and outputs it uses, we can establish different "types" of generative AI. A first high level classification for AI will be that based on the types of input and outputs it can generate.

Single Modality Models

Single modality models work with one type of input and produce an output within the same modality. This means they are specialized for a particular type of data, like text, images, audio or code. Examples of this type of models are:

Text-Based Models

They are designed to understand and generate human language in textual form; and excel at processing sequences of words and understanding the relationships between them. Some of these models are those that can do:

Natural Language Processing (NLP) that includes a wide range of tasks that involve understanding and manipulating human language. They are usually based on LLMs or Large Language Models.

Examples of key applications

Application	Description
Creative Writing	Generating stories, poetry, or scripts.
Document Summarization	Condensing lengthy legal or technical documents into concise summaries.
Language Translation	Providing nuanced translations between languages.
Grammar Correction and Style Adjustment	Improving text quality by correcting grammar, simplifying language, or adjusting tone.
Paraphrasing	Rewriting text while preserving its meaning.

Examples of text-based models

Model Type	Examples
Large Language Models	GPT-4 (OpenAI), Gemini (Google), Claude (Anthropic)
Code Generation Models	GitHub Copilot, Codex

Text classification and information extraction, that analyze text to categorize it (topic classification), extract key information from it (like names for example, using named entity recognition or NER), or determine sentiment (sentiment analysis).

Code generation (txt2code): They are used also for code generation from natural language instructions, requiring understanding both programming concepts and languages, and human natural understanding.

Vision-Based Models

Designed to interpret and process visual information from images. They learn to identify patterns, features, and objects within images. Or create new images or modify existing ones, often based on specific criteria or styles. They are used for tasks like generating realistic images, creating artistics styles, creating 3d models, or enhance image resolution.

Key applications

Application	Description
Image Recognition	Object detection, classification, and scene understanding.
Image Generation	Creating realistic or artistic images from scratch based on specific styles or prompts.
Image Enhancement	Improving image resolution or quality.
3D Model Creation	Generating 3D models from textual descriptions or 2D images.

Vision Based Models

Model Type	Examples
Image Recognition Models	ResNet, YOLO (You Only Look Once), Vision Transformers (ViT).
Generative Vision Models	DALL·E 3 (OpenAI), Stable Diffusion, MidJourney.
3D Model Generators	NeRF (Neural Radiance Fields).

Audio-Based Models

Specialized in processing sound and speech data. They analyze the characteristics of sound waves to understand and manipulate audio.

Application	Description
Speech Recognition (Speech-to-Text)	Converts spoken words into written text for transcription tools or voice assistants.
Audio-to-Audio Transformation	Enhances audio by reducing noise or adding effects.
Music Generation	Composes music based on user input (e.g., OpenAI’s MuseNet or Google’s MusicLM).

Multimodal Models

Nowadays Generative AI models like those from OpenAI, Anthropic (Claude), Google (Gemini),.. have integrated in a sophisticated way multiple data types models in what is known as a Multimodal Model. They combine elements from multiple modalities (e.g., vision and language). For instance, a model that can analyze an image while generating descriptive text of it exemplifies this category.

The field of AI is constantly evolving, and new model types and applications are emerging regularly.

Comparison between Single Modality vs Multimodal Models

Feature	Single Modality Models	Multimodal Models
Input Types	One type of input at a time	Multiple types of inputs simultaneously
Output Types	Same as input modality	Can span across different modalities
Complexity	Specialized but limited to one domain	Versatile but computationally intensive
Examples	GPT-4 for text; YOLO for vision	CLIP for vision-language; Gemini by Google

Summary of Modalities and Transformations

Below is a quick reference table of common AI transformations across modalities:

Input → Output	Description	Examples
Text-to-Text (txt2txt)	Question/answer, translation, summarization, paraphrasing,...	GPT-4 (OpenAI)
Text-to-Image (txt2img)	Generate images from text descriptions	DALL·E 3 (OpenAI)
Text-to-Video (txt2vid)	Create videos from text prompts	Runway Gen-2
Text-to-Code (txt2code)	Generate code snippets	GitHub Copilot
Text-to-Speech (TTS)	Convert text to spoken audio	ElevenLabs
Speech-to-Text (STT)	Transcribe speech into text [also called Automatic Speech Recognition (ASR)]	Whisper
Image-to-Text (img2txt)	Generate descriptions or captions for images	CLIP
Image-to-Image (img2img)	Data augmentation such as super-resolution, style transfer, and inpainting.	Leonardo AI
Audio-to-Audio (aud2aud)	Enhance or transform audio	Adobe Enhance
Video-to-Audio (vid2aud)	Models that analyze video and generate matching audio	Soundify
Text-to-Music (txt2music)	Convert text instructions and description into music	MusicFX (Google)
Music-to-Music (music2music)	Style transfer for music	MusicFX (Google)
Text-to-Augmented/Virtual Reality (txt2AR/VR)	Creates AR/VR environments from textual prompts.

The boundaries between all the AI categories (different types of Generative and Discriminative AI, simple or multimodal AIs) continue to blur as AI technologies evolve, with emerging models demonstrating increasingly complex and integrated capabilities.

AI Transforming Industries

Generative and Discriminative AI are revolutionizing industries by blending creativity with analytical precision. Below is a breakdown of their applications across key sectors.

Healthcare Revolution

AI Type	Application
Generative AI	- Digital researcher—designing drug molecules or creating personalized treatment scenarios.
Discriminative AI	- Medical image analysis for disease detection (cancer, fractures, retinal conditions) - Patient risk assessment based on health records and symptoms - ECG analysis for heart rhythm abnormalities - Genetic sequence screening for mutations and disorders

Financial & Business Intelligence

AI Type	Application
Generative AI	- Creates synthetic datasets for market analysis while protecting data privacy. - Crafts personalized shopping experiences through tailored product recommendations.
Discriminative AI	- Monitors transactions for fraud detection and predicts market trends with remarkable accuracy. - Customer credit risk evaluation for lending decisions - Customer churn prediction and retention analysis - Quality control in manufacturing processes - Customer feedback and sentiment categorization - Predicts purchasing behaviors to enable hyper-targeted marketing strategies.

Creative Industries Reimagined

AI Type	Application
Generative AI	- Collaborates with artists to generate unique music, art, and stories, amplifying creative processes.
Discriminative AI	- Provides audience insights by analyzing sentiment and engagement to refine artistic expressions.

Manufacturing Efficiency

AI Type	Application
Generative AI	- Designs innovative product prototypes that challenge traditional engineering constraints.
Discriminative AI	- Ensures consistent quality by predicting equipment failures during production processes.

Transportation & Logistics

AI Type	Application
Generative AI	.
Discriminative AI	- Traffic pattern analysis and accident prediction - Predictive maintenance through equipment sensor data - Automated license plate recognition - Driver behavior monitoring and safety assessment

Conclusion

By combining the imaginative power of Generative AI with the analytical capabilities of Discriminative AI, industries are experiencing transformative advancements. From healthcare diagnostics to creative collaborations, these technologies are reshaping how we live, work, and innovate. Now that you know all of the possibilities of Generative and Discriminative AI, what ideas do you have to create something new or apply it in your work or business?