DP-800 Study Guide: Developing AI-Enabled Database Solutions

Overview

Exam DP-800 at a Glance

Understand what this certification is, who it's for, and how the exam is structured before diving into content.

What is DP-800?

Exam DP-800 — Developing AI-Enabled Database Solutions — is a Microsoft Associate-level certification exam that validates expertise in integrating AI directly within SQL-based data solutions across the Microsoft SQL platform family: Azure SQL Database, SQL Server 2025, and SQL in Microsoft Fabric.

The exam focuses on the new generation of SQL AI capabilities — native vector types, in-database embedding generation, semantic search, and retrieval-augmented generation (RAG) — all executed using T-SQL without migrating data to a separate vector database.

Target Audience

Database developers integrating AI into SQL-based applications
Data engineers building AI-augmented pipelines on Azure SQL or Microsoft Fabric
Solution architects designing secure, scalable AI-enabled database solutions
Professionals experienced with T-SQL, GitHub-based CI/CD, and foundational AI/ML concepts

Exam Structure

Skill Area	Topics	Approx. Weight
1. AI-Ready Database Design	Schema design, T-SQL for AI, data types	~20%
2. Vectors & Embeddings	VECTOR type, DiskANN index, VECTOR_DISTANCE	~25%
3. RAG Workflows	Retrieval-Augmented Generation, LLM grounding	~20%
4. T-SQL AI Functions	AI_GENERATE_EMBEDDINGS, external models, OpenAI	~20%
5. Security, Scaling & DevOps	Encryption, RBAC, CI/CD, Data API Builder	~15%

Recommended Prerequisites

Solid T-SQL knowledge (queries, stored procedures, indexes, performance tuning)
Familiarity with Azure SQL Database or SQL Server
Basic understanding of AI/ML concepts: embeddings, vectors, LLMs
Experience with GitHub and CI/CD pipelines
Knowledge of database security best practices

📋 Exam Tips

The exam includes scenario-based questions requiring you to choose the most appropriate T-SQL approach for AI tasks. Focus on understanding when and why to use each feature, not just syntax. Practice with the Microsoft Learn labs before sitting the exam.

Skill Area 1

Designing and Developing AI-Ready Database Solutions

Learn how to design schemas, choose appropriate data types, and write advanced T-SQL that efficiently supports AI workloads.

1.1 Schema Design for AI Workloads

AI-ready schemas differ from traditional schemas: they must efficiently store high-dimensional vector data, support partial-text search, and handle semi-structured (JSON) responses from LLMs.

Key Design Principles

Store embeddings close to source data — add a VECTOR column to the same table as the text being represented, avoiding expensive joins at query time.
Use JSON columns for LLM metadata — store prompt/response logs, token counts, and model metadata as NVARCHAR(MAX) with JSON validation.
Chunk large documents — split long text into overlapping chunks (e.g., 512 tokens) and store each chunk as a separate row with its embedding and a reference to the parent document ID.
Choose appropriate vector dimensions — match dimension to your embedding model (e.g., 1536 for OpenAI text-embedding-ada-002, 3072 for text-embedding-3-large).

Example: AI-Ready Document Table

-- Store document chunks with their embeddings
CREATE TABLE dbo.DocumentChunks (
    ChunkId       INT IDENTITY PRIMARY KEY,
    DocumentId    INT NOT NULL,
    ChunkIndex    INT NOT NULL,          -- order within the document
    ChunkText     NVARCHAR(MAX) NOT NULL,
    TokenCount    INT NOT NULL,
    Embedding     VECTOR(1536),          -- OpenAI ada-002 dimension
    CreatedAt     DATETIME2 DEFAULT SYSUTCDATETIME(),
    CONSTRAINT fk_doc FOREIGN KEY (DocumentId)
        REFERENCES dbo.Documents(DocumentId)
);

-- Index for fast retrieval by document
CREATE INDEX ix_chunk_doc ON dbo.DocumentChunks (DocumentId, ChunkIndex);

Chunking Strategy

Chunk size affects retrieval quality. Common strategies:

Fixed-size chunking

Split text into equal-length segments (e.g., 512 tokens). Simple but may split sentences.

Sentence-level chunking

Split at sentence boundaries. Better semantic coherence but variable sizes.

Overlapping chunking

Add 10-20% overlap between chunks to avoid missing context at boundaries.

Hierarchical chunking

Store both coarse (paragraph) and fine (sentence) chunks for multi-level retrieval.

1.2 Semi-Structured Data and JSON in T-SQL

AI applications frequently receive JSON from LLM APIs. SQL Server provides first-class JSON support for parsing, querying, and storing JSON data.

-- Store LLM interaction logs as JSON
CREATE TABLE dbo.LLMInteractions (
    InteractionId UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
    SessionId     NVARCHAR(128) NOT NULL,
    RequestPayload  NVARCHAR(MAX),
    ResponsePayload NVARCHAR(MAX),
    ModelName     NVARCHAR(128),
    TokensUsed    INT,
    DurationMs    INT,
    CreatedAt     DATETIME2 DEFAULT SYSUTCDATETIME(),
    CONSTRAINT chk_request_json  CHECK (ISJSON(RequestPayload) = 1),
    CONSTRAINT chk_response_json CHECK (ISJSON(ResponsePayload) = 1)
);

-- Query JSON fields with JSON_VALUE and JSON_QUERY
SELECT
    InteractionId,
    JSON_VALUE(ResponsePayload, '$.choices[0].message.content') AS Answer,
    JSON_VALUE(ResponsePayload, '$.usage.total_tokens') AS TotalTokens
FROM dbo.LLMInteractions
WHERE ModelName = 'gpt-4o';

-- Aggregate with OPENJSON for multi-value arrays
SELECT msg.role, msg.content
FROM dbo.LLMInteractions
CROSS APPLY OPENJSON(RequestPayload, '$.messages')
    WITH (role NVARCHAR(50) '$.role', content NVARCHAR(MAX) '$.content') AS msg
WHERE InteractionId = '...';

1.3 Advanced T-SQL Techniques for AI Workloads

Batch Processing with MERGE

When generating embeddings in bulk, use MERGE to insert new chunks and update changed ones atomically:

MERGE dbo.DocumentChunks AS target
USING @stagingChunks AS source
    ON target.DocumentId = source.DocumentId
    AND target.ChunkIndex = source.ChunkIndex
WHEN MATCHED AND target.ChunkText != source.ChunkText THEN
    UPDATE SET
        ChunkText = source.ChunkText,
        Embedding = NULL,          -- mark for re-embedding
        CreatedAt = SYSUTCDATETIME()
WHEN NOT MATCHED BY TARGET THEN
    INSERT (DocumentId, ChunkIndex, ChunkText, TokenCount)
    VALUES (source.DocumentId, source.ChunkIndex, source.ChunkText, source.TokenCount);

Filtered Indexes for Unembedded Rows

-- Efficiently find rows that still need embedding
CREATE INDEX ix_needs_embedding
    ON dbo.DocumentChunks (ChunkId)
    WHERE Embedding IS NULL;

💡 Performance Tip

Use filtered indexes on WHERE Embedding IS NULL to efficiently find rows that need embedding. This avoids full table scans during batch embedding pipelines.

Skill Area 2

Vector Data and Embeddings in SQL

Master the VECTOR data type, vector indexing with DiskANN, and semantic similarity search — all natively within SQL Server 2025 and Azure SQL.

2.1 The VECTOR Data Type

SQL Server 2025 and Azure SQL introduce the VECTOR native data type, enabling storage of high-dimensional floating-point arrays (embeddings) as a first-class database column. Internally stored as compact binary, it presents as a JSON array for input/output.

Syntax

-- Declare a vector column: VECTOR(dimensions [, precision])
-- Default precision is float32; float16 available in preview
embedding VECTOR(1536)           -- float32 (default)
embedding VECTOR(1536, float16)  -- half-precision (more compact)

-- Supported dimension range: 1 – 1998

Creating Tables with Vectors

CREATE TABLE dbo.Products (
    ProductId   INT IDENTITY PRIMARY KEY,
    Name        NVARCHAR(255) NOT NULL,
    Description NVARCHAR(MAX),
    Category    NVARCHAR(100),
    Embedding   VECTOR(1536)     -- text-embedding-ada-002
);

Inserting Vector Data

Vectors are inserted as JSON arrays (the database converts them internally):

-- Direct insert from JSON array
INSERT INTO dbo.Products (Name, Description, Embedding)
VALUES (
    'Surface Pro 11',
    'A lightweight 2-in-1 laptop for professionals.',
    '[0.0234, -0.1127, 0.0891, ...]'   -- 1536 float values as JSON array
);

-- From a stored procedure that calls Azure OpenAI
DECLARE @emb VECTOR(1536);
EXEC dbo.GetEmbedding @text = N'Surface Pro 11 laptop', @embedding = @emb OUTPUT;

UPDATE dbo.Products
SET Embedding = @emb
WHERE ProductId = 1;

2.2 Vector Distance Functions

SQL Server 2025 provides VECTOR_DISTANCE() to compute similarity between two vectors:

Metric	Syntax	Range	Use Case
Cosine	`'cosine'`	0 (identical) – 2 (opposite)	Best for NLP / text similarity. Ignores magnitude.
Euclidean	`'euclidean'`	0 – ∞	Geometric distance. Good when magnitude matters.
Dot Product	`'dot'`	-∞ – ∞ (higher = more similar)	Fast; useful when vectors are L2-normalized.
Negative Dot	`'neg_dot'`	-∞ – ∞	Inverted for ORDER BY (lower = more similar).

-- Semantic similarity search: top-5 most similar products
DECLARE @queryEmb VECTOR(1536);
EXEC dbo.GetEmbedding @text = N'portable ultrabook for developers', @embedding = @queryEmb OUTPUT;

SELECT TOP (5)
    ProductId,
    Name,
    VECTOR_DISTANCE('cosine', Embedding, @queryEmb) AS Distance
FROM dbo.Products
WHERE Embedding IS NOT NULL
ORDER BY Distance ASC;   -- lower cosine distance = more similar

ℹ️ Cosine vs. Dot Product

When your embeddings are L2-normalized (unit length), cosine distance and dot product similarity are mathematically equivalent. Azure OpenAI embedding models return normalized vectors by default, so dot product search is slightly faster.

2.3 Vector Indexes (DiskANN)

Without an index, similarity search requires computing distance to every row — an O(n) brute-force scan. SQL Server 2025 introduces DiskANN (Disk-based Approximate Nearest Neighbor) indexes that provide sub-linear search time at the cost of approximate (not exact) results.

Creating a Vector Index

-- Create a DiskANN vector index for cosine similarity
CREATE VECTOR INDEX vix_products_embedding
    ON dbo.Products (Embedding)
    WITH (
        METRIC = 'cosine',          -- distance metric to optimize for
        TYPE = 'DiskANN'            -- only supported type currently
    );

Using VECTOR_SEARCH for Index-Accelerated Search

-- Using the VECTOR_SEARCH function (leverages the DiskANN index)
DECLARE @queryEmb VECTOR(1536);
EXEC dbo.GetEmbedding @text = N'portable laptop for travel', @embedding = @queryEmb OUTPUT;

SELECT p.ProductId, p.Name, vs.Distance
FROM VECTOR_SEARCH(
    TABLE dbo.Products AS p,
    COLUMN Embedding,
    SIMILAR_TO @queryEmb,
    METRIC 'cosine',
    TOP 10
) AS vs
JOIN dbo.Products p ON p.ProductId = vs.ProductId
ORDER BY vs.Distance;

⚠️ ANN vs. Exact Search

DiskANN produces approximate nearest neighbors. For most RAG and recommendation use cases this is acceptable. For applications requiring exact results (e.g., duplicate detection), use brute-force VECTOR_DISTANCE() without the index.

DiskANN Index Parameters

METRIC

Distance metric: cosine, euclidean, or dot. Must match your search queries.

TYPE

Currently only DiskANN is supported. Graph-based ANN algorithm from Microsoft Research.

Build time

Index builds asynchronously. Monitor with sys.dm_db_index_physical_stats.

Recall trade-off

DiskANN balances recall (~95-99%) vs speed. No tunable recall parameter in T-SQL currently.

2.4 Vector Operations and Utility Functions

-- VECTOR_NORM: compute the L2 norm of a vector
SELECT VECTOR_NORM(Embedding, 'norm2') AS EuclideanNorm
FROM dbo.Products WHERE ProductId = 1;

-- VECTOR_NORMALIZE: return unit-length vector
SELECT VECTOR_NORMALIZE(Embedding) AS NormalizedEmb
FROM dbo.Products WHERE ProductId = 1;

-- VECTOR_DIMENSIONS: get the dimension count
SELECT VECTOR_DIMENSIONS(Embedding) AS Dims FROM dbo.Products WHERE ProductId = 1;

-- Cast between float32 and float16
SELECT CAST(Embedding AS VECTOR(1536, float16)) AS CompactEmb
FROM dbo.Products WHERE ProductId = 1;

Skill Area 3

Implementing RAG Workflows

Build Retrieval-Augmented Generation pipelines that ground LLM outputs with your SQL data, reducing hallucinations and enabling domain-specific AI applications.

3.1 What is RAG?

Retrieval-Augmented Generation (RAG) is an architectural pattern that enhances LLM responses by fetching relevant context from an external data source before generating an answer. Instead of relying solely on the model's pre-trained knowledge, RAG supplies the LLM with up-to-date, domain-specific data retrieved from your database.

Grounding

Providing factual, retrieved data as context so the LLM generates accurate, verifiable answers.

Hallucination

When an LLM generates confident but incorrect information. RAG reduces this by anchoring answers in real data.

Retriever

The component (SQL vector search) that finds relevant chunks given a user query.

Generator

The LLM (e.g., GPT-4o) that synthesizes retrieved context into a natural language answer.

3.2 RAG Architecture in Azure SQL

The typical Azure SQL RAG pipeline consists of two phases:

Phase 1: Indexing (Offline)

Load source documents (PDFs, articles, product descriptions) into dbo.Documents.
Chunk documents into manageable segments (512–1024 tokens with overlap).
Generate embeddings for each chunk via Azure OpenAI (text-embedding-3-small or text-embedding-ada-002).
Store chunks + embeddings in dbo.DocumentChunks with the VECTOR column.
Build a DiskANN index on the embedding column for fast retrieval.

Phase 2: Querying (Online)

User submits a natural language question.
Generate a query embedding for the user's question using the same embedding model.
Execute vector similarity search in Azure SQL to retrieve top-K relevant chunks.
Concatenate retrieved chunks into a context prompt.
Send user question + context to Azure OpenAI GPT model and return the answer.

3.3 Complete T-SQL RAG Implementation

-- Step 1: Create the knowledge base tables
CREATE TABLE dbo.KBDocuments (
    DocId       INT IDENTITY PRIMARY KEY,
    Title       NVARCHAR(500) NOT NULL,
    Source      NVARCHAR(500),
    CreatedAt   DATETIME2 DEFAULT SYSUTCDATETIME()
);

CREATE TABLE dbo.KBChunks (
    ChunkId     INT IDENTITY PRIMARY KEY,
    DocId       INT NOT NULL REFERENCES dbo.KBDocuments(DocId),
    ChunkText   NVARCHAR(MAX) NOT NULL,
    Embedding   VECTOR(1536),
    CONSTRAINT uq_chunk UNIQUE (DocId, ChunkId)
);

-- Vector index for fast retrieval
CREATE VECTOR INDEX vix_kb_embedding ON dbo.KBChunks (Embedding)
    WITH (METRIC = 'cosine', TYPE = 'DiskANN');

-- Step 2: Stored procedure to retrieve relevant context
CREATE OR ALTER PROCEDURE dbo.RAGRetrieve
    @Question    NVARCHAR(MAX),
    @TopK        INT = 5,
    @Context     NVARCHAR(MAX) OUTPUT
AS
BEGIN
    SET NOCOUNT ON;

    -- Generate embedding for the question (calls Azure OpenAI via EXTERNAL MODEL)
    DECLARE @queryEmb VECTOR(1536);
    SET @queryEmb = AI_GENERATE_EMBEDDINGS(@Question
        USING MODEL AzureOpenAIEmbeddings);

    -- Retrieve top-K similar chunks
    SELECT TOP (@TopK)
        c.ChunkText,
        d.Title,
        VECTOR_DISTANCE('cosine', c.Embedding, @queryEmb) AS Score
    INTO #relevant
    FROM dbo.KBChunks c
    JOIN dbo.KBDocuments d ON d.DocId = c.DocId
    WHERE c.Embedding IS NOT NULL
    ORDER BY Score ASC;

    -- Build context string
    SELECT @Context = STRING_AGG(
        N'[Source: ' + Title + N']' + CHAR(10) + ChunkText,
        CHAR(10) + CHAR(10)
    )
    FROM #relevant;
END;

-- Step 3: End-to-end RAG query stored procedure
CREATE OR ALTER PROCEDURE dbo.AskQuestion
    @Question NVARCHAR(MAX),
    @Answer   NVARCHAR(MAX) OUTPUT
AS
BEGIN
    SET NOCOUNT ON;

    -- Retrieve grounding context
    DECLARE @context NVARCHAR(MAX);
    EXEC dbo.RAGRetrieve @Question = @Question, @Context = @context OUTPUT;

    -- Build system + user prompt
    DECLARE @prompt NVARCHAR(MAX) = N'You are a helpful assistant. '
        + N'Answer the user question ONLY using the provided context. '
        + N'If the context does not contain the answer, say "I don''t know."'
        + CHAR(10) + CHAR(10)
        + N'CONTEXT:' + CHAR(10) + @context;

    -- Call Azure OpenAI GPT model
    DECLARE @request NVARCHAR(MAX) = JSON_OBJECT(
        'messages': JSON_ARRAY(
            JSON_OBJECT('role': 'system', 'content': @prompt),
            JSON_OBJECT('role': 'user',   'content': @Question)
        ),
        'max_tokens': 1000
    );

    EXEC sp_invoke_external_rest_endpoint
        @url = N'https://YOUR_OAI.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01',
        @method = 'POST',
        @payload = @request,
        @response = @Answer OUTPUT;
END;

3.4 Hybrid Search (Vector + Full-Text)

For best retrieval quality, combine vector similarity search with traditional keyword (full-text) search. This is called hybrid search:

-- Enable Full-Text Search on chunk text
CREATE FULLTEXT CATALOG ft_kb AS DEFAULT;
CREATE FULLTEXT INDEX ON dbo.KBChunks (ChunkText) KEY INDEX PK_KBChunks;

-- Hybrid search: combine BM25 keyword score with cosine vector score
DECLARE @queryEmb VECTOR(1536);
SET @queryEmb = AI_GENERATE_EMBEDDINGS(@Question USING MODEL AzureOpenAIEmbeddings);

SELECT TOP 10
    c.ChunkId,
    c.ChunkText,
    -- Reciprocal Rank Fusion of keyword and vector scores
    (1.0 / (60 + ROW_NUMBER() OVER (ORDER BY ft.RANK DESC))) +
    (1.0 / (60 + ROW_NUMBER() OVER (ORDER BY VECTOR_DISTANCE('cosine', c.Embedding, @queryEmb)))) AS RRFScore
FROM dbo.KBChunks c
INNER JOIN CONTAINSTABLE(dbo.KBChunks, ChunkText, @Question) AS ft
    ON c.ChunkId = ft.[KEY]
ORDER BY RRFScore DESC;

💡 When to Use Hybrid Search

Hybrid search outperforms pure vector search for queries that contain specific product codes, names, or acronyms — exact keywords that semantic embeddings may not capture well. Use hybrid search as your default RAG retrieval strategy.

Skill Area 4

T-SQL AI Functions and Azure OpenAI Integration

Use built-in AI functions, External Models, and REST integrations to call LLMs and embedding services directly from T-SQL.

4.1 External Models

SQL Server 2025 introduces External Models — a mechanism to register AI model endpoints (Azure OpenAI, local ONNX models, etc.) and invoke them from T-SQL using standard functions.

-- Register an Azure OpenAI embedding model
CREATE EXTERNAL MODEL AzureOpenAIEmbeddings
    WITH (
        LOCATION = 'https://YOUR_RESOURCE.openai.azure.com',
        API_TYPE = 'azure_openai',
        API_KEY_SECRET = 'AZURE_OPENAI_KEY',
        DEPLOYMENT = 'text-embedding-ada-002',
        TASK = 'text-embedding'
    );

-- Register a chat completion model
CREATE EXTERNAL MODEL GPT4oChat
    WITH (
        LOCATION = 'https://YOUR_RESOURCE.openai.azure.com',
        API_TYPE = 'azure_openai',
        API_KEY_SECRET = 'AZURE_OPENAI_KEY',
        DEPLOYMENT = 'gpt-4o',
        TASK = 'chat-completion'
    );

⚠️ Security Note

API keys are stored securely using database secrets. Never hard-code API keys in plain T-SQL. Use CREATE DATABASE SCOPED CREDENTIAL or Azure Managed Identity where possible.

4.2 AI_GENERATE_EMBEDDINGS

Generate embeddings from text using a registered external model:

-- Generate embedding for a single string
SELECT AI_GENERATE_EMBEDDINGS(
    N'machine learning and artificial intelligence'
    USING MODEL AzureOpenAIEmbeddings
) AS Embedding;

-- Generate and store embeddings in bulk
UPDATE dbo.DocumentChunks
SET Embedding = AI_GENERATE_EMBEDDINGS(ChunkText USING MODEL AzureOpenAIEmbeddings)
WHERE Embedding IS NULL;

4.3 AI_GENERATE_TEXT (Chat Completions)

-- Generate text using a chat model
SELECT AI_GENERATE_TEXT(
    'Summarize this product review in one sentence: ' + ReviewText
    USING MODEL GPT4oChat
) AS Summary
FROM dbo.ProductReviews
WHERE ReviewId = 42;

-- Batch summarization with CROSS APPLY
SELECT
    r.ReviewId,
    r.ProductId,
    g.GeneratedSummary
FROM dbo.ProductReviews r
CROSS APPLY (
    SELECT AI_GENERATE_TEXT(
        N'Classify the sentiment (Positive/Negative/Neutral) and summarize: '
        + r.ReviewText
        USING MODEL GPT4oChat
    ) AS GeneratedSummary
) g
WHERE r.CreatedAt > DATEADD(day, -7, SYSUTCDATETIME());

4.4 Calling Azure OpenAI via sp_invoke_external_rest_endpoint

For more control over the HTTP request (custom headers, retry logic), use sp_invoke_external_rest_endpoint:

⚠️ Production Security

The example below uses a placeholder API key for illustration. In production, always authenticate via Azure Managed Identity (see Section 5.1). Also verify the latest Azure OpenAI API version at aka.ms/oai/docs.

DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT(
    'model'      : 'gpt-4o',
    'messages'   : JSON_ARRAY(
        JSON_OBJECT('role': 'system',
                    'content': 'You are an expert SQL tutor.'),
        JSON_OBJECT('role': 'user',
                    'content': 'Explain what a vector index is in SQL Server 2025.')
    ),
    'max_tokens' : 500,
    'temperature': 0.2
);

DECLARE @response NVARCHAR(MAX);
EXEC sp_invoke_external_rest_endpoint
    @url     = N'https://YOUR_RESOURCE.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-01',  -- verify latest API version at aka.ms/oai/docs
    @method  = 'POST',
    @headers = '{"Content-Type":"application/json","api-key":"YOUR_API_KEY"}',  -- In production use Managed Identity; never hardcode keys
    @payload = @payload,
    @response = @response OUTPUT;

-- Parse the JSON response
SELECT JSON_VALUE(@response, '$.result.choices[0].message.content') AS Answer;

4.5 Common AI Task Patterns

Sentiment Analysis

CREATE OR ALTER FUNCTION dbo.GetSentiment(@text NVARCHAR(MAX))
RETURNS NVARCHAR(20) AS
BEGIN
    DECLARE @result NVARCHAR(MAX);
    SET @result = AI_GENERATE_TEXT(
        N'Respond with only one word: Positive, Negative, or Neutral. '
        + N'Sentiment of: ' + @text
        USING MODEL GPT4oChat
    );
    RETURN LTRIM(RTRIM(@result));
END;

Entity Extraction

-- Extract named entities as JSON from support tickets
SELECT
    TicketId,
    AI_GENERATE_TEXT(
        N'Extract a JSON object with keys "product", "issue", "priority" from: '
        + TicketText
        USING MODEL GPT4oChat
    ) AS ExtractedEntities
FROM dbo.SupportTickets
WHERE Status = 'Open';

Translation

-- Translate product descriptions to English
UPDATE dbo.Products
SET DescriptionEN = AI_GENERATE_TEXT(
    N'Translate to English: ' + Description
    USING MODEL GPT4oChat
)
WHERE Language != 'en' AND DescriptionEN IS NULL;

📋 Exam Focus

Know the difference between AI_GENERATE_EMBEDDINGS (returns a VECTOR for similarity search) and AI_GENERATE_TEXT (returns NVARCHAR from a chat/completion model). Also understand when to use sp_invoke_external_rest_endpoint vs built-in AI functions.

Skill Area 5

Security, Scaling, and DevOps

Design AI-enabled database solutions that are secure, compliant, scalable, and deployed with CI/CD best practices.

5.1 Securing AI-Enabled Solutions

Managed Identity for External Models

Prefer Azure Managed Identity over API keys for authenticating Azure SQL to Azure OpenAI. This eliminates secret rotation overhead and follows the principle of least privilege.

-- Register external model using Managed Identity
CREATE EXTERNAL MODEL AzureOpenAIEmbeddings
    WITH (
        LOCATION = 'https://YOUR_RESOURCE.openai.azure.com',
        API_TYPE = 'azure_openai',
        AUTHENTICATION = 'managed_identity',  -- No API key needed
        DEPLOYMENT = 'text-embedding-ada-002',
        TASK = 'text-embedding'
    );

Row-Level Security for Multi-Tenant AI

In multi-tenant applications, ensure users only retrieve embeddings from their own data:

-- Create security policy using RLS
CREATE SCHEMA Security;
GO

CREATE FUNCTION Security.RLS_TenantFilter(@TenantId INT)
RETURNS TABLE WITH SCHEMABINDING AS
RETURN SELECT 1 AS Result
WHERE @TenantId = CAST(SESSION_CONTEXT(N'TenantId') AS INT);

CREATE SECURITY POLICY TenantIsolation
ADD FILTER PREDICATE Security.RLS_TenantFilter(TenantId)
    ON dbo.KBChunks,
ADD BLOCK PREDICATE Security.RLS_TenantFilter(TenantId)
    ON dbo.KBChunks AFTER INSERT;

-- Set tenant context for each session
EXEC sp_set_session_context @key = N'TenantId', @value = 42;

Encrypting Sensitive AI Outputs

-- Use Always Encrypted for sensitive LLM outputs (PII in responses)
-- Define column encryption in the table DDL
CREATE TABLE dbo.SensitiveResponses (
    ResponseId    INT IDENTITY PRIMARY KEY,
    UserId        INT,
    UserQuery     NVARCHAR(MAX),
    LLMResponse   NVARCHAR(MAX)
        ENCRYPTED WITH (
            COLUMN_ENCRYPTION_KEY = CEK_AI_Outputs,
            ENCRYPTION_TYPE = Randomized,
            ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256'
        )
);

Auditing AI Interactions

-- Log all LLM calls with user context for audit trails
CREATE TABLE dbo.AIAuditLog (
    LogId       BIGINT IDENTITY PRIMARY KEY,
    UserId      NVARCHAR(128) DEFAULT SUSER_SNAME(),
    AppName     NVARCHAR(256) DEFAULT APP_NAME(),
    Operation   NVARCHAR(50),
    InputHash   VARBINARY(32),  -- SHA2 of input (avoid storing PII)
    TokensUsed  INT,
    ModelName   NVARCHAR(128),
    CreatedAt   DATETIME2 DEFAULT SYSUTCDATETIME()
);

5.2 Performance and Scaling

Batch Embedding Generation

-- Process embeddings in configurable batches to avoid timeouts
CREATE OR ALTER PROCEDURE dbo.BatchGenerateEmbeddings
    @BatchSize INT = 100
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @processed INT = 0;

    WHILE EXISTS (SELECT 1 FROM dbo.KBChunks WHERE Embedding IS NULL)
    BEGIN
        UPDATE TOP (@BatchSize) dbo.KBChunks
        SET Embedding = AI_GENERATE_EMBEDDINGS(ChunkText USING MODEL AzureOpenAIEmbeddings)
        WHERE Embedding IS NULL;

        SET @processed += @@ROWCOUNT;

        -- Yield to avoid blocking; adjust delay as needed
        WAITFOR DELAY '00:00:01';
    END;

    PRINT CAST(@processed AS VARCHAR) + ' rows embedded.';
END;

Monitoring Vector Search Performance

-- Monitor vector index health
SELECT
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    p.index_type_desc,
    p.avg_fragmentation_in_percent,
    p.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') p
JOIN sys.indexes i ON i.object_id = p.object_id AND i.index_id = p.index_id
WHERE i.name LIKE 'vix_%';

5.3 CI/CD for AI-Enabled Databases

GitHub Actions Workflow for SQL Deployment

# .github/workflows/deploy-sql.yml
name: Deploy AI Database Changes

on:
  push:
    branches: [main]
    paths: ['sql/**']

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run SQL migrations
        uses: azure/sql-action@v2
        with:
          connection-string: ${{ secrets.AZURE_SQL_CONNECTION_STRING }}
          path: './sql/migrations/*.sql'
          action: 'deploy'

      - name: Run schema validation
        uses: azure/sql-action@v2
        with:
          connection-string: ${{ secrets.AZURE_SQL_CONNECTION_STRING }}
          path: './sql/tests/validate_schema.sql'
          action: 'run'

      - name: Rebuild vector indexes if schema changed
        uses: azure/sql-action@v2
        with:
          connection-string: ${{ secrets.AZURE_SQL_CONNECTION_STRING }}
          path: './sql/maintenance/rebuild_vector_indexes.sql'
          action: 'run'

Database Project (dacpac) Best Practices

Use SQL Server Database Projects (SSDT / SQL Projects extension for VS Code) to version-control schema objects including EXTERNAL MODEL definitions.
Store the CREATE EXTERNAL MODEL DDL in source control without secrets; inject secrets via deployment parameters or Key Vault references.
Use pre-deployment and post-deployment scripts for data migrations and embedding backfills.
Test schema changes against a staging Azure SQL database before deploying to production.

💡 DevSecOps Tip

Enable Microsoft Defender for Azure SQL to detect anomalous query patterns — including potential prompt injection attacks via SQL queries that route untrusted user input to LLMs. Always sanitize and parameterize user inputs before passing them to AI functions.

Skill Area 6

Data API Builder

Expose SQL data as REST and GraphQL APIs for AI-driven applications without writing custom middleware.

6.1 What is Data API Builder?

Data API Builder (DAB) is a Microsoft open-source tool that automatically generates REST and GraphQL APIs on top of existing Azure SQL, SQL Server, MySQL, and Cosmos DB databases. It allows AI applications (LangChain agents, custom GPT actions, etc.) to query SQL data via standard HTTP APIs without requiring application-layer middleware.

REST endpoint

Automatically generated CRUD endpoints: GET /api/Products, POST /api/Products, PATCH, DELETE.

GraphQL endpoint

Single /graphql endpoint supporting flexible queries, mutations, and nested entity resolution.

Entities

Database tables, views, or stored procedures mapped to API endpoints in dab-config.json.

Custom Actions

Map stored procedures (like RAG queries) to custom REST endpoints for AI tool calls.

6.2 Configuration

// dab-config.json
{
  "$schema": "https://dataapibuilder.azureedge.net/schemas/v1.2.0/dab.draft.schema.json",
  "data-source": {
    "database-type": "mssql",
    "connection-string": "@env('AZURE_SQL_CONNECTION_STRING')"
  },
  "runtime": {
    "rest": { "enabled": true, "path": "/api" },
    "graphql": { "enabled": true, "path": "/graphql" }
  },
  "entities": {
    "Product": {
      "source": "dbo.Products",
      "permissions": [{ "role": "anonymous", "actions": ["read"] }]
    },
    "SearchProducts": {
      "source": {
        "object": "dbo.SemanticProductSearch",
        "type": "stored-procedure",
        "parameters": { "query": "" }
      },
      "rest": { "methods": ["GET"] },
      "permissions": [{ "role": "authenticated", "actions": ["execute"] }]
    }
  }
}

6.3 Exposing Vector Search via Custom Actions

-- Create a stored procedure DAB can expose as a REST endpoint
CREATE OR ALTER PROCEDURE dbo.SemanticProductSearch
    @query   NVARCHAR(MAX),
    @topK    INT = 5
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @emb VECTOR(1536);
    SET @emb = AI_GENERATE_EMBEDDINGS(@query USING MODEL AzureOpenAIEmbeddings);

    SELECT TOP (@topK)
        ProductId,
        Name,
        Description,
        VECTOR_DISTANCE('cosine', Embedding, @emb) AS Score
    FROM dbo.Products
    WHERE Embedding IS NOT NULL
    ORDER BY Score ASC;
END;

After mapping this stored procedure in dab-config.json, it becomes available as:
GET /api/SearchProducts?query=portable+laptop&topK=5

6.4 Authentication and Authorization

// Role-based access in dab-config.json
"entities": {
  "KBChunks": {
    "source": "dbo.KBChunks",
    "permissions": [
      { "role": "anonymous",    "actions": [] },
      { "role": "reader",       "actions": ["read"] },
      { "role": "contributor",  "actions": ["create", "read", "update"] },
      { "role": "admin",        "actions": ["*"] }
    ]
  }
}

ℹ️ Integration with Custom GPT / Copilot Studio

Data API Builder endpoints can be registered as OpenAPI actions in Microsoft Copilot Studio or as Custom GPT actions in OpenAI — enabling conversational AI agents to query your SQL database with natural language, backed by vector search and your real data.

Practice

Flashcard Quiz

Test your knowledge with these exam-style flashcards. Click a card to reveal the answer, then rate yourself.

Score: 0 / 0

Question

Loading…

Click to reveal answer

Answer

1 / 20