<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Ryan Giggs</title>
    <description>The latest articles on Forem by Ryan Giggs (@derrickryangiggs).</description>
    <link>https://forem.com/derrickryangiggs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2544697%2F1f321a52-47b1-4034-9cc0-fba8e0b8273c.jpeg</url>
      <title>Forem: Ryan Giggs</title>
      <link>https://forem.com/derrickryangiggs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/derrickryangiggs"/>
    <language>en</language>
    <item>
      <title>Building the Sovereign Debt Observatory: An End-to-End ELT Pipeline on World Bank Debt Data for Low and Middle-Income Countries</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Thu, 09 Apr 2026 10:02:52 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/building-the-sovereign-debt-observatory-an-end-to-end-elt-pipeline-on-world-bank-debt-data-for-low-4988</link>
      <guid>https://forem.com/derrickryangiggs/building-the-sovereign-debt-observatory-an-end-to-end-elt-pipeline-on-world-bank-debt-data-for-low-4988</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Global sovereign debt is one of the most consequential datasets in existence. It shapes foreign policy, determines credit ratings, drives IMF bailout decisions, and affects the daily lives of billions of people in developing countries. The World Bank publishes this data openly — 130+ countries, 27 years of history, updated quarterly — yet there is no ready-made analytical layer on top of it.&lt;/p&gt;

&lt;p&gt;If you want to answer a question like "which African countries have the highest ratio of private nonguaranteed debt to total external debt, and how has that changed since 2010?", you have to manually download Excel files from multiple World Bank portals, clean inconsistent column names, handle missing values, and stitch everything together in a spreadsheet. Every time the data updates, you do it again.&lt;/p&gt;

&lt;p&gt;That is the problem this project solves.&lt;/p&gt;

&lt;p&gt;The Sovereign Debt Observatory is an end-to-end ELT pipeline that ingests World Bank external debt data, lands it in a cloud data lake, transforms it in BigQuery using dbt Cloud, orchestrates everything quarterly with Apache Airflow, and surfaces the answers in a Looker Studio dashboard.&lt;/p&gt;

&lt;p&gt;This article walks through every layer of the pipeline — the architecture decisions, the technical challenges, and how I solved them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Questions This Pipeline Answers
&lt;/h2&gt;

&lt;p&gt;Before writing a single line of code, I defined the analytical questions the pipeline needed to answer. This kept every decision grounded in purpose rather than technology for its own sake.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How is gross external debt distributed across public, publicly guaranteed, private nonguaranteed, and multilateral sectors per country?&lt;/li&gt;
&lt;li&gt;Which countries carry the highest short-term external debt exposure and how has that changed since 2010?&lt;/li&gt;
&lt;li&gt;What share of external debt is foreign-currency denominated and where is that ratio worsening?&lt;/li&gt;
&lt;li&gt;How has regional external debt stock evolved from 1998 to 2025 across Africa, Latin America, East Asia, South Asia, Europe and Central Asia, and the Middle East?&lt;/li&gt;
&lt;li&gt;Which countries face the heaviest debt service pressure relative to their total debt position?&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;The pipeline follows a modern ELT pattern — extract and load first, transform inside the warehouse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;World Bank API (IDS source 2)
        |
        v
PySpark job (Docker container)
        |
        v
Google Cloud Storage — raw Parquet, partitioned by extracted_date
        |
        v
BigQuery external tables (raw dataset)
        |
        v
dbt Cloud — staging views + mart tables
        |
        v
Looker Studio dashboard
        |
Orchestrated by Apache Airflow on Docker Compose
Infrastructure provisioned by Terraform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why ELT and not ETL?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In ETL, transformation happens before loading — your Spark job does the cleaning, aggregation, and business logic before writing to the warehouse. In ELT, raw data lands untransformed and the warehouse does all the heavy lifting.&lt;/p&gt;

&lt;p&gt;For this project, ELT is the right choice for three reasons. First, raw data is preserved in GCS indefinitely — if analytical requirements change six months from now, I just write a new dbt model without re-running the ingestion layer. Second, BigQuery is optimized for analytical SQL transformations at scale — it is far better at this than PySpark running in a Docker container on a local machine. Third, dbt gives us version-controlled, tested, documented transformations that are readable by anyone with SQL knowledge.&lt;/p&gt;

&lt;p&gt;PySpark's job in this pipeline is purely mechanical: hit the API, paginate, write Parquet. No business logic. No aggregations. Pure extract and load.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Sources
&lt;/h2&gt;

&lt;h3&gt;
  
  
  International Debt Statistics (IDS) — World Bank source 2
&lt;/h3&gt;

&lt;p&gt;The IDS database is the flagship World Bank debt dataset. It covers external debt stocks and flows for low and middle income countries, with annual data going back to 1998. I access it through the &lt;code&gt;wbgapi&lt;/code&gt; Python library, which wraps the World Bank Indicators API v2.&lt;/p&gt;

&lt;p&gt;I ingest nine series:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Series Code&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.DECT.CD&lt;/td&gt;
&lt;td&gt;Total external debt stocks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.DLXF.CD&lt;/td&gt;
&lt;td&gt;Long-term external debt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.DPNG.CD&lt;/td&gt;
&lt;td&gt;Private nonguaranteed debt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.MIBR.CD&lt;/td&gt;
&lt;td&gt;PPG IBRD loans&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.DPPG.CD&lt;/td&gt;
&lt;td&gt;Public and publicly guaranteed debt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.DIMF.CD&lt;/td&gt;
&lt;td&gt;IMF credit&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.PVLX.CD&lt;/td&gt;
&lt;td&gt;Present value of external debt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.MWBG.CD&lt;/td&gt;
&lt;td&gt;IBRD loans and IDA credits&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DT.DOD.MIDA.CD&lt;/td&gt;
&lt;td&gt;PPG IDA loans&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;One important lesson from building this: the World Bank has multiple API source databases, and not all of them support the standard &lt;code&gt;wbgapi&lt;/code&gt; query format. Sources 22 (QEDS SDDS), 23 (QEDS GDDS), and 54 (JEDH) all return JSON decode errors when queried programmatically — they use a separate DataBank backend. Only source 2 (IDS) reliably supports the Indicators API. This cost me several hours of debugging.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quarterly External Debt Statistics SDDS (QEDS)
&lt;/h3&gt;

&lt;p&gt;QEDS provides quarterly debt payment schedule data broken down by sector and maturity. Unlike IDS, QEDS does not support programmatic API access in the standard format. The World Bank provides bulk Excel downloads for each supplementary table instead.&lt;/p&gt;

&lt;p&gt;I download five Excel files directly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Table 1.5 — Net external debt position by sector&lt;/li&gt;
&lt;li&gt;Table 3 — Debt service payment schedule by sector&lt;/li&gt;
&lt;li&gt;Table 3.2 — Debt service by sector and instrument&lt;/li&gt;
&lt;li&gt;Table 2.1 — Foreign currency and domestic currency debt&lt;/li&gt;
&lt;li&gt;Table 1.6 — Reconciliation of positions and flows&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Infrastructure as Code with Terraform
&lt;/h2&gt;

&lt;p&gt;Every GCP resource in this project is provisioned by Terraform. The three core resources are a GCS bucket for the data lake and three BigQuery datasets — raw, staging, and mart.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_storage_bucket"&lt;/span&gt; &lt;span class="s2"&gt;"data_lake"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;gcs_bucket_name&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;region&lt;/span&gt;
  &lt;span class="nx"&gt;force_destroy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="nx"&gt;lifecycle_rule&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;action&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;type&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Delete"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nx"&gt;condition&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;age&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;versioning&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"google_bigquery_dataset"&lt;/span&gt; &lt;span class="s2"&gt;"raw"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;dataset_id&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"raw"&lt;/span&gt;
  &lt;span class="nx"&gt;location&lt;/span&gt;                   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;
  &lt;span class="nx"&gt;delete_contents_on_destroy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 90-day lifecycle rule on GCS automatically deletes old partitions, keeping storage costs near zero. The entire GCP footprint for this project costs less than $0.05 per month — BigQuery's free tier covers 1 TB of queries and 10 GB of storage, which is far more than this dataset requires.&lt;/p&gt;

&lt;p&gt;A &lt;code&gt;setup_gcp.sh&lt;/code&gt; script handles the one-time bootstrap — creating the GCP project, enabling APIs, creating the service account, and granting IAM roles. The billing account ID is passed as an environment variable so it never appears in version-controlled files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;BILLING_ACCOUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-billing-id bash scripts/setup_gcp.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ingestion Layer — PySpark on Docker
&lt;/h2&gt;

&lt;h3&gt;
  
  
  JEDH / IDS Ingestion
&lt;/h3&gt;

&lt;p&gt;The IDS ingestion script uses &lt;code&gt;wbgapi&lt;/code&gt; to fetch all nine series across all available countries from 1998 to 2025. The API returns data in wide format — one row per country per series, with year columns as separate fields. PySpark writes this to GCS as Snappy-compressed Parquet, partitioned by extraction date.&lt;/p&gt;

&lt;p&gt;One critical issue I hit: the World Bank API returns year columns as bare integers (&lt;code&gt;1998&lt;/code&gt;, &lt;code&gt;1999&lt;/code&gt;, etc.). BigQuery rejects column names that start with numbers. The fix was to prefix all year columns with &lt;code&gt;year_&lt;/code&gt; before writing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;year_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isdigit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;combined&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces clean column names like &lt;code&gt;year_1998&lt;/code&gt;, &lt;code&gt;year_1999&lt;/code&gt; that BigQuery accepts without complaint.&lt;/p&gt;

&lt;h3&gt;
  
  
  QEDS Ingestion
&lt;/h3&gt;

&lt;p&gt;The QEDS ingestion downloads five Excel files from World Bank DataBank, reads all sheets from each file using &lt;code&gt;pandas.read_excel&lt;/code&gt;, and concatenates them into a single DataFrame.&lt;/p&gt;

&lt;p&gt;The Excel files have messy column names — spaces, brackets, special characters. A &lt;code&gt;clean_column_name&lt;/code&gt; function normalizes everything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_column_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\s*\[.*?\]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# remove [YR2021Q4] suffixes
&lt;/span&gt;    &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[^a-zA-Z0-9_]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_+&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;isdigit&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;q_&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;  &lt;span class="c1"&gt;# prefix quarter columns: q_2021q4
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also hit an OOM error trying to write the QEDS data through PySpark — 76,835 rows with 90+ columns across 214 sheets was too much for the JVM heap in the Docker container. The fix was to bypass Spark entirely for QEDS and write directly to GCS using the &lt;code&gt;google-cloud-storage&lt;/code&gt; Python client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_parquet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pyarrow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seek&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GCS_BUCKET&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/octet-stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No JVM, no Spark executor, no OOM. The lesson: use the right tool for the job. PySpark is excellent for large distributed datasets. For a 76K-row DataFrame from an Excel file, plain pandas and the GCS Python client is simpler and more reliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker Image
&lt;/h3&gt;

&lt;p&gt;The ingestion image is built on &lt;code&gt;eclipse-temurin:17-jdk-jammy&lt;/code&gt; rather than a plain Python image. This is because PySpark requires Java, and the &lt;code&gt;python:3.11-slim&lt;/code&gt; base image uses Debian Trixie which does not carry &lt;code&gt;openjdk-17-jdk&lt;/code&gt; in its default repositories. The Temurin image ships Java 17 out of the box, which is exactly what Spark 3.5.1 needs.&lt;/p&gt;

&lt;p&gt;The GCS connector JAR is downloaded at build time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;RUN &lt;/span&gt;curl &lt;span class="nt"&gt;--progress-bar&lt;/span&gt; &lt;span class="nt"&gt;-L&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-hadoop3-latest.jar &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SPARK_HOME&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/jars/gcs-connector-hadoop3-latest.jar
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Credentials are never baked into the image. They are mounted at runtime as a volume:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; /path/to/key.json:/app/credentials/key.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nv"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/app/credentials/key.json &lt;span class="se"&gt;\&lt;/span&gt;
  sovereign-debt-ingestion:v1 python3 extract_jedh.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Orchestration with Apache Airflow
&lt;/h2&gt;

&lt;p&gt;Airflow runs on Docker Compose using the official &lt;code&gt;apache/airflow:2.9.2&lt;/code&gt; image. The stack includes a Celery executor with Redis as the message broker and Postgres as the metadata database.&lt;/p&gt;

&lt;p&gt;The DAG runs quarterly — on the 1st of January, April, July, and October at 06:00 UTC:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0 6 1 1,4,7,10 *&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Task flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;extract_load_jedh&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;extract_load_qeds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both tasks use the &lt;code&gt;DockerOperator&lt;/code&gt; to spin up the ingestion container on the host machine, which means Airflow itself does not need PySpark, Java, or any data dependencies — it just tells Docker to run the job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Docker socket problem&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;DockerOperator&lt;/code&gt; communicates with the host Docker daemon through the Unix socket at &lt;code&gt;/var/run/docker.sock&lt;/code&gt;. This requires two things: the socket must be mounted into the Airflow worker container, and the worker must have permission to use it.&lt;/p&gt;

&lt;p&gt;The socket mount goes in &lt;code&gt;docker-compose.yml&lt;/code&gt; under the common volumes section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/var/run/docker.sock:/var/run/docker.sock&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The permission fix requires adding the Docker group ID to the Airflow worker. On my machine the Docker group ID is 984:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;group_add&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;984"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without the group add, the worker can see the socket but gets &lt;code&gt;PermissionError(13, 'Permission denied')&lt;/code&gt;. This is a common Airflow + Docker-in-Docker gotcha that took several debugging sessions to resolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  Transformation with dbt Cloud
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Staging Layer
&lt;/h3&gt;

&lt;p&gt;The staging models are materialized as views — no storage cost, no latency, just a SQL lens over the raw external tables.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;stg_jedh&lt;/code&gt; does the heavy lifting: it unpivots the wide year-column format into long format using BigQuery's &lt;code&gt;UNPIVOT&lt;/code&gt; operator:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;unpivot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;year_1998&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year_1999&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;year_2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;year_2025&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then extracts the year integer from the column name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'year_'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;int64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;year&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And adds a human-readable series description via a CASE statement, turning &lt;code&gt;DT.DOD.DECT.CD&lt;/code&gt; into &lt;code&gt;Total external debt stocks&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The result is a clean long-format table: one row per country per series per year.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;stg_qeds&lt;/code&gt; is simpler — it selects the clean columns, uses &lt;code&gt;SAFE_CAST&lt;/code&gt; to convert quarter values to float64, and filters out null countries and series codes. &lt;code&gt;SAFE_CAST&lt;/code&gt; is preferable to &lt;code&gt;CAST&lt;/code&gt; here because it returns NULL on failure rather than throwing an error, which is the right behaviour for messy Excel data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mart Layer
&lt;/h3&gt;

&lt;p&gt;The mart models are materialized as tables with partitioning and clustering for query efficiency.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mart_debt_stocks&lt;/code&gt; is partitioned by year and clustered by &lt;code&gt;country_code&lt;/code&gt; and &lt;code&gt;series_code&lt;/code&gt;. It enriches the staging data with YoY percentage change and debt as a percentage of total external debt per country per year — both computed using window functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;lag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;debt_value_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;series_code&lt;/span&gt;
    &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;prev_year_value&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="n"&gt;safe_divide&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;debt_value_usd&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;lag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;debt_value_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;series_code&lt;/span&gt;
        &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;lag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;debt_value_usd&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="k"&gt;partition&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;country_code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;series_code&lt;/span&gt;
        &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;yoy_change_pct&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;mart_regional_debt&lt;/code&gt; assigns countries to six World Bank regions using a CASE statement on ISO3 country codes, then aggregates total, average, max, and min debt stocks per region per series per year. This powers the regional trajectory time series on the dashboard.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;mart_debt_service&lt;/code&gt; computes total annual debt payments and average quarterly payments per country from the QEDS payment schedule data, enabling debt service pressure analysis.&lt;/p&gt;

&lt;h3&gt;
  
  
  dbt Tests
&lt;/h3&gt;

&lt;p&gt;All models have &lt;code&gt;not_null&lt;/code&gt; tests on primary dimension columns. Running &lt;code&gt;dbt test&lt;/code&gt; after every model change ensures data quality is enforced at the transformation layer rather than discovered downstream in the dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Dashboard
&lt;/h2&gt;

&lt;p&gt;The Looker Studio dashboard has two pages, both connected directly to the BigQuery mart tables.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page 1 — Global Debt Overview&lt;/strong&gt; answers Q4 at a glance. A time series chart using &lt;code&gt;mart_regional_debt&lt;/code&gt; shows six regional debt trajectories from 1998 to 2024. Africa's trajectory is notably steeper post-2010. A bar chart shows the top 20 countries by total external debt for the selected year. A scorecard shows total global external debt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Page 2 — Country Deep-Dive&lt;/strong&gt; answers Q1, Q2, and Q5. A stacked bar chart shows debt composition by sector over time for any selected country. A line chart shows short-term vulnerability trends. A table sorted by total 2021 payments shows which countries face the most acute debt service pressure.&lt;/p&gt;

&lt;p&gt;Live dashboard: &lt;a href="https://lookerstudio.google.com/reporting/7fc18e9e-a5c6-4616-b920-b5b4bddf2264" rel="noopener noreferrer"&gt;https://lookerstudio.google.com/reporting/7fc18e9e-a5c6-4616-b920-b5b4bddf2264&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcndjtztlmuh5xrijntk0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcndjtztlmuh5xrijntk0.png" alt=" " width="800" height="220"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3tp1p9cm44v81acuofi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft3tp1p9cm44v81acuofi.png" alt=" " width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e46s1e6kfyq7mzy12h5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4e46s1e6kfyq7mzy12h5.png" alt=" " width="800" height="349"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxg2xfudqw09l22wczwr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frxg2xfudqw09l22wczwr.png" alt=" " width="800" height="253"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Technical Lessons
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Know which World Bank API sources support programmatic access.&lt;/strong&gt; Only source 2 (IDS) reliably supports the Indicators API v2 via &lt;code&gt;wbgapi&lt;/code&gt;. Sources 22, 23, and 54 use a different DataBank backend and return JSON decode errors. For QEDS data, use the bulk Excel downloads instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. BigQuery rejects column names starting with numbers.&lt;/strong&gt; Prefix them before writing Parquet. A simple list comprehension handles this: &lt;code&gt;f"year_{col}" if str(col).isdigit() else col&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Use the right tool for the data size.&lt;/strong&gt; PySpark is excellent for large datasets but overkill for a 76K-row Excel file. The GCS Python client with pandas and PyArrow is simpler, faster, and doesn't OOM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Docker-in-Docker with Airflow requires explicit socket mounting and group permissions.&lt;/strong&gt; Mount &lt;code&gt;/var/run/docker.sock&lt;/code&gt; and add the Docker group ID to the worker container. Without the group add, you get a silent permission error.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. ELT separates concerns cleanly.&lt;/strong&gt; When the analytical questions evolved during development, I only needed to update dbt models — never the ingestion layer. This separation is the most valuable architectural decision in the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Do Differently
&lt;/h2&gt;

&lt;p&gt;If I were building this again, I would use the World Bank DataBank bulk download API to get historical QEDS time series data instead of point-in-time Excel files. The current QEDS data only has a handful of quarters because the Excel files are snapshots of the latest publication. A proper time series would require downloading and archiving each quarterly release.&lt;/p&gt;

&lt;p&gt;I would also add a &lt;code&gt;load_to_bigquery&lt;/code&gt; step that loads Parquet directly into native BigQuery tables rather than using external tables. External tables work well but they require manually updating the source URI list each time a new partition is added. A native table with partitioning handles this automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The Sovereign Debt Observatory took the World Bank's raw debt data from scattered Excel files and API endpoints to a fully automated, tested, and documented analytical pipeline. Every component is reproducible — Terraform provisions the infrastructure, Docker packages the ingestion environment, Airflow schedules the runs, and dbt documents and tests the transformations.&lt;/p&gt;

&lt;p&gt;The full source code is on GitHub: &lt;a href="https://github.com/Derrick-Ryan-Giggs/sovereign-debt-observatory" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/sovereign-debt-observatory&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have questions about any part of the implementation, drop them in the comments. I am happy to go deeper on any layer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Ryan Derrick Giggs is a data engineering practitioner and technical writer based in Nairobi, Kenya. He is currently completing the DataTalksClub Data Engineering Zoomcamp 2026.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;LinkedIn: &lt;a href="https://linkedin.com/in/ryan-giggs-a19330265" rel="noopener noreferrer"&gt;https://linkedin.com/in/ryan-giggs-a19330265&lt;/a&gt;&lt;/em&gt;&lt;br&gt;
&lt;em&gt;GitHub: &lt;a href="https://github.com/Derrick-Ryan-Giggs" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>gcp</category>
      <category>dbt</category>
      <category>worldbank</category>
    </item>
    <item>
      <title>Exadata AI Storage: A New Era of AI-Powered Database Infrastructure</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Mon, 06 Apr 2026 07:41:04 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/exadata-ai-storage-a-new-era-of-ai-powered-database-infrastructure-12mb</link>
      <guid>https://forem.com/derrickryangiggs/exadata-ai-storage-a-new-era-of-ai-powered-database-infrastructure-12mb</guid>
      <description>&lt;p&gt;Oracle's Exadata platform has always been synonymous with extreme database performance. But with the release of &lt;strong&gt;Exadata System Software 24ai and 25ai&lt;/strong&gt;, alongside the debut of &lt;strong&gt;Oracle Exadata X11M&lt;/strong&gt; in January 2025, Oracle has taken a decisive step into the AI era. Exadata is no longer just a high-performance OLTP and analytics machine — it is now purpose-built to accelerate AI vector search, in-database machine learning, and mixed enterprise workloads, all on a single converged platform.&lt;/p&gt;

&lt;p&gt;This post breaks down the key new features across software, infrastructure, high availability, monitoring, and security — and explains why they matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Exadata AI Storage?
&lt;/h2&gt;

&lt;p&gt;At its core, Exadata AI Storage refers to Oracle's strategy of pushing intelligence deeper into the storage layer. Rather than offloading AI computation to separate, purpose-built vector databases or GPUs, Oracle brings AI operations — particularly &lt;strong&gt;AI Vector Search&lt;/strong&gt; — directly to the storage servers themselves. This means vector index builds, similarity searches, and distance calculations happen closer to where data lives, dramatically reducing the data movement that kills performance in distributed architectures.&lt;/p&gt;

&lt;p&gt;The result? Key vector search operations running up to &lt;strong&gt;30x faster&lt;/strong&gt; with Exadata System Software 24ai, and further accelerated on X11M with in-memory vector index (HNSW) scans running &lt;strong&gt;up to 43% faster&lt;/strong&gt; on database servers and &lt;strong&gt;up to 55% faster&lt;/strong&gt; on storage servers compared to the previous generation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key New Software Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. AI Smart Scan
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AI Smart Scan&lt;/strong&gt; is an extension of Exadata's legendary Smart Scan technology, now purpose-built for AI workloads. It offloads compute-intensive &lt;strong&gt;AI Vector Search&lt;/strong&gt; operations — including vector index builds and similarity queries — directly to Exadata's intelligent storage servers. This eliminates the need to ship raw data up to the database tier for processing.&lt;/p&gt;

&lt;p&gt;Critically, AI Smart Scan enables thousands of concurrent AI vector searches in multi-user environments. This is a significant differentiator for enterprise RAG (Retrieval Augmented Generation) pipelines and AI applications that need to serve many users simultaneously, not just batch processes.&lt;/p&gt;

&lt;p&gt;With the latest release (Exadata System Software 25ai / 25.1), &lt;strong&gt;Adaptive Top-K Filtering&lt;/strong&gt; further extends this: each storage server maintains a running Top-K result set, reducing data returned to the database servers by up to &lt;strong&gt;4.7x&lt;/strong&gt;. Similarly, &lt;code&gt;VECTOR_DISTANCE()&lt;/code&gt; calculations are now projected from storage, delivering up to &lt;strong&gt;4.6x faster&lt;/strong&gt; distance-based queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Exadata RDMA Memory (XRMEM)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;XRMEM&lt;/strong&gt; replaces the persistent memory (Intel Optane PMem) that earlier generations used, adapting to changes in the memory vendor landscape. It is built on DDR5 DRAM and accessed via &lt;strong&gt;RDMA (Remote Direct Memory Access) over Converged Ethernet (RoCE)&lt;/strong&gt;, bypassing the OS, I/O, and network software stacks entirely.&lt;/p&gt;

&lt;p&gt;The practical benefit: ultra-low read latency as low as &lt;strong&gt;14 microseconds&lt;/strong&gt; — a 21% improvement over prior generations — with scan throughput of up to &lt;strong&gt;500 GB/s&lt;/strong&gt; from XRMEM alone. Each Exadata X11M Extreme Flash Storage Server contains &lt;strong&gt;1.25 TB of XRMEM&lt;/strong&gt;, which sits as an acceleration tier in front of the Smart Flash Cache.&lt;/p&gt;

&lt;p&gt;XRMEM is particularly impactful for OLTP workloads that require sub-20-microsecond response times, and it now also accelerates AI vector index reads transparently.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. On-Storage Processing
&lt;/h3&gt;

&lt;p&gt;This is the foundational principle that makes all the above possible. Exadata storage servers are not passive disk arrays — they are intelligent compute nodes in their own right. SQL filtering, column projection, decompression, encryption/decryption, bloom filters, and now AI vector operations all execute &lt;strong&gt;on the storage servers&lt;/strong&gt;, not the database servers.&lt;/p&gt;

&lt;p&gt;This dramatically reduces the volume of data sent over the internal network and processed by database-tier CPUs. For analytics workloads, this "push-down" processing model is why Exadata consistently delivers 10–100x better throughput than general-purpose storage.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. In-Memory Columnar Speeds for JSON Queries
&lt;/h3&gt;

&lt;p&gt;Exadata's &lt;strong&gt;In-Memory Columnar Cache&lt;/strong&gt; on storage servers (also called Columnar Cache) stores a columnar representation of row-oriented data directly in flash cache. When queries access this data, they get the performance of columnar analytics without requiring data to be reformatted or migrated.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;Oracle Database 23ai&lt;/strong&gt; — which is required to unlock the full Exadata Exascale feature set — JSON documents stored natively benefit from this columnar acceleration. Oracle Database 23ai's &lt;strong&gt;JSON Relational Duality&lt;/strong&gt; views, which expose the same data as both JSON and relational tables simultaneously, can be queried at columnar memory speeds on Exadata, collapsing the performance gap between document and relational workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Transparent Cross-Tier Scan
&lt;/h3&gt;

&lt;p&gt;Exadata's multi-tier storage hierarchy — XRMEM → Smart Flash Cache → disk — is managed &lt;strong&gt;automatically and transparently&lt;/strong&gt;. When a Smart Scan or AI Smart Scan runs, Exadata intelligently sources data from whichever tier contains it, combining reads from memory, flash, and disk in parallel.&lt;/p&gt;

&lt;p&gt;This means administrators and developers never need to manually partition hot vs. cold data or tune tier placement. The system continuously tracks access patterns and moves data to the appropriate tier based on usage, keeping the hottest data in XRMEM and the next hottest in flash. The database never sees these tiers explicitly — it simply issues a SQL or vector query and receives results.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Caching Enhancements
&lt;/h3&gt;

&lt;p&gt;Several caching improvements ship with the latest Exadata System Software releases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automatic KEEP Object Load into Exadata Flash Cache&lt;/strong&gt;: Objects tagged with the &lt;code&gt;KEEP&lt;/code&gt; storage clause in Oracle Database are automatically and proactively loaded into Exadata Smart Flash Cache — even before they are first accessed — ensuring zero cold-start latency for critical tables and indexes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write Back Flash Cache&lt;/strong&gt;: Database block writes are cached in flash, eliminating disk I/O bottlenecks for large OLTP and batch workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cell-to-Cell Rebalance preserving XRMEM and Flash Cache&lt;/strong&gt;: When data rebalances across storage servers (due to maintenance or failure), both the XRMEM and flash cache contents are also rebalanced, preserving performance levels rather than causing a cold-cache performance dip.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Columnar Smart Scan at Memory Speed
&lt;/h3&gt;

&lt;p&gt;When Oracle Database In-Memory is enabled, Exadata automatically stores data in &lt;strong&gt;columnar format within Flash Cache and XRMEM&lt;/strong&gt; if it will improve query performance. This brings columnar analytics performance — historically associated only with in-database memory (DRAM on database servers) — to storage-resident data, enabling analytics at memory speeds even when datasets exceed what fits in database-server DRAM.&lt;/p&gt;

&lt;p&gt;A single Exadata X11M rack can deliver up to &lt;strong&gt;100 GB/s of flash throughput&lt;/strong&gt; and &lt;strong&gt;500 GB/s from XRMEM&lt;/strong&gt; for the hottest data, far exceeding what traditional storage arrays can achieve even with flash added.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Exadata Cache Observability
&lt;/h3&gt;

&lt;p&gt;Exadata System Software &lt;strong&gt;24.1 introduced &lt;code&gt;ecstat&lt;/code&gt;&lt;/strong&gt; (Exadata Storage Cache Statistics), a real-time utility that provides per-storage-server statistics on Smart Flash Cache usage, XRMEM hits, and I/O performance. This was a long-standing gap — DBAs previously had to rely on AWR snapshots to understand cache behavior.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;Exadata System Software 25.2&lt;/strong&gt;, this was extended with &lt;strong&gt;&lt;code&gt;CellSQLStat&lt;/code&gt;&lt;/strong&gt;, which provides real-time, per-storage-server insights into active Smart Scan operations: CPU and memory usage, Storage Index and Columnar Cache I/O savings, flash and XRMEM hit rates, scan rates, and more. Both &lt;code&gt;ecstat&lt;/code&gt; and &lt;code&gt;CellSQLStat&lt;/code&gt; data are automatically included in ExaWatcher collections, making them available for historical analysis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Infrastructure Improvements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Increased Number of Virtual Machines
&lt;/h3&gt;

&lt;p&gt;With &lt;strong&gt;Exadata Exascale&lt;/strong&gt; (the new intelligent storage architecture introduced in 2024 and available on X11M), the limit on Virtual Machine clusters per database server increases dramatically. Traditional Exadata with ASM supported 4, 8, or 12 VMs per database server. Exascale raises this ceiling to &lt;strong&gt;50 VMs per database server&lt;/strong&gt;, enabling far greater consolidation of Oracle Database workloads on a single Exadata system without sacrificing isolation or performance.&lt;/p&gt;

&lt;p&gt;Exascale also centralizes VM storage in the shared Exascale storage pool rather than on individual database servers, increasing flexibility and simplifying management.&lt;/p&gt;

&lt;h3&gt;
  
  
  Secure Boot for KVM Virtual Machines
&lt;/h3&gt;

&lt;p&gt;Exadata System Software now supports &lt;strong&gt;Secure Boot for KVM guest VMs&lt;/strong&gt;, ensuring that only cryptographically signed and trusted OS images can boot on Exadata database servers. This closes a significant attack vector in virtualized deployments and aligns Exadata's on-premises security posture with cloud-native security standards. It complements existing features like Trusted Partitions for Oracle Linux Virtualization.&lt;/p&gt;

&lt;h2&gt;
  
  
  High Availability and Network Resilience
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Improved RoCE Network Resilience: ExaPortMon
&lt;/h3&gt;

&lt;p&gt;Every Exadata database and storage server connects to the internal network via &lt;strong&gt;dual 100 GbE RoCE ports&lt;/strong&gt; for an aggregate of 200 Gbps. If a RoCE leaf switch port becomes stalled — appearing online but unable to pass traffic — it can cause cluster instability without triggering a clean failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ExaPortMon&lt;/strong&gt; (introduced in Exadata System Software 24ai) solves this. It continuously monitors both RoCE ports on every server. When it detects a stalled port, it &lt;strong&gt;automatically migrates the IP address to the healthy port&lt;/strong&gt;, keeping network traffic flowing and preventing outages. When the stalled port recovers, ExaPortMon automatically returns the IP address to its original port. No manual intervention required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhanced RoCE Network Security: Exadata Secure RDMA Fabric Isolation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Exadata Secure RDMA Fabric Isolation (Secure Fabric)&lt;/strong&gt; provides strict network isolation between VM clusters sharing the same physical Exadata infrastructure. It prevents database servers in one VM cluster from communicating with those in another over the RoCE fabric, eliminating lateral movement risk in consolidated and multi-tenant deployments.&lt;/p&gt;

&lt;p&gt;Starting with Exadata System Software 25.1, &lt;strong&gt;Secure Fabric is automatically selected by default&lt;/strong&gt; for all new on-premises deployments with X8M and newer hardware — bringing on-premises deployments into alignment with cloud deployments, which have always used Secure Fabric.&lt;/p&gt;

&lt;h2&gt;
  
  
  Monitoring and Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AWR &amp;amp; SQL Monitor Enhancements
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Oracle Automatic Workload Repository (AWR)&lt;/strong&gt; on Exadata is enhanced with &lt;strong&gt;Exadata-specific storage server metrics&lt;/strong&gt; alongside standard Oracle wait event data. AWR now collects and reports on XRMEM, Flash Cache, and HDD device performance, enabling DBAs to correlate database wait events with storage-tier behavior in a single report.&lt;/p&gt;

&lt;p&gt;SQL Monitor is similarly enhanced, providing end-to-end visibility into query execution that includes storage offload statistics, Smart Scan I/O savings, and flash cache hit rates — all tied to the specific SQL statement being analyzed.&lt;/p&gt;

&lt;h3&gt;
  
  
  JSON API for Management Server
&lt;/h3&gt;

&lt;p&gt;Exadata's Management Server (MS) now exposes a &lt;strong&gt;JSON REST API&lt;/strong&gt;, enabling programmatic access to Exadata management and monitoring functions. This is a significant modernization of Exadata's management interface, making it easier to integrate Exadata health metrics, alerts, and configuration into modern observability stacks (Grafana, custom dashboards, CI/CD pipelines) without relying solely on traditional CLI tools like &lt;code&gt;cellcli&lt;/code&gt; or Enterprise Manager.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security Enhancements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SNMP v3 Security
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Exadata System Software 24.1&lt;/strong&gt; introduced mandatory SNMP security improvements across all storage servers and database servers. The key changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SNMP v3 is now the recommended and encouraged standard&lt;/strong&gt;, supporting SHA-256, SHA-384, and SHA-512 authentication protocols for strong authentication and encryption.&lt;/li&gt;
&lt;li&gt;All SNMP subscriber definitions now &lt;strong&gt;require the connection type to be explicitly specified&lt;/strong&gt; — administrators can no longer leave it ambiguous.&lt;/li&gt;
&lt;li&gt;SNMP v1 remains available but triggers an explicit warning discouraging its use.&lt;/li&gt;
&lt;li&gt;Default community strings like &lt;code&gt;public&lt;/code&gt; and &lt;code&gt;private&lt;/code&gt; are &lt;strong&gt;actively discouraged by the system&lt;/strong&gt;, prompting administrators to set strong, unique values.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This tightens Exadata's management plane security, closing a common vulnerability in enterprise infrastructure where SNMP v1 with default community strings is still widely used.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This All Matters: The Convergence Thesis
&lt;/h2&gt;

&lt;p&gt;The strategic bet Oracle is making with Exadata AI Storage is one of &lt;strong&gt;convergence over fragmentation&lt;/strong&gt;. The enterprise AI market has seen an explosion of purpose-built vector databases (Pinecone, Weaviate, Qdrant, Milvus, Chroma), and many organizations are building RAG pipelines that shuffle data between separate systems: a relational database for operational data, a vector database for embeddings, an object store for documents.&lt;/p&gt;

&lt;p&gt;Exadata offers a fundamentally different architecture: &lt;strong&gt;bring the AI to the data, not the data to the AI&lt;/strong&gt;. With Oracle AI Database 23ai (now succeeded by Oracle AI Database 26ai as the long-term support release), all of this runs in a single converged engine — relational queries, vector similarity search, JSON document queries, graph traversals, and full-text search — executed as optimized SQL on Exadata hardware. Advanced AI features including AI Vector Search are included at no additional charge.&lt;/p&gt;

&lt;p&gt;And with &lt;strong&gt;Exadata Exascale&lt;/strong&gt; reducing the entry cost for Exadata Database Service by up to 95% and enabling organizations to start with as little as 300 GB of storage, the platform is no longer exclusively for Fortune 500 database estates. It is increasingly accessible to organizations of any size that need to build AI applications on enterprise-grade, governed, transactionally consistent data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary Table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Key Feature&lt;/th&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI Workloads&lt;/td&gt;
&lt;td&gt;AI Smart Scan + Adaptive Top-K&lt;/td&gt;
&lt;td&gt;Up to 30x faster vector search; 4.7x less data to DB servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;XRMEM (DDR5 + RDMA)&lt;/td&gt;
&lt;td&gt;14µs read latency; 500 GB/s scan throughput&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Caching&lt;/td&gt;
&lt;td&gt;Auto KEEP Load, Write Back, Columnar Cache&lt;/td&gt;
&lt;td&gt;Zero cold-start for critical objects; analytics at memory speed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;ecstat + CellSQLStat&lt;/td&gt;
&lt;td&gt;Real-time per-cell Smart Scan and cache monitoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Exascale VM limit increase&lt;/td&gt;
&lt;td&gt;Up to 50 VMs per DB server (up from 12)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;Secure Fabric default on-prem&lt;/td&gt;
&lt;td&gt;Automatic lateral-movement isolation for VM clusters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Network HA&lt;/td&gt;
&lt;td&gt;ExaPortMon&lt;/td&gt;
&lt;td&gt;Auto-failover between RoCE ports; no manual intervention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security&lt;/td&gt;
&lt;td&gt;SNMP v3 enforcement&lt;/td&gt;
&lt;td&gt;SHA-512 auth; eliminates default community string risk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Management&lt;/td&gt;
&lt;td&gt;JSON API for Management Server&lt;/td&gt;
&lt;td&gt;Programmatic integration with modern observability stacks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Exadata AI Storage represents Oracle's clearest articulation yet of its "converged data" strategy: a single platform that handles OLTP, analytics, and AI workloads without requiring organizations to build and manage a fragmented ecosystem of specialized tools. With Exadata System Software 25ai, the X11M generation, and the Exascale architecture now generally available across OCI, multicloud (AWS, Azure, Google Cloud), and on-premises, there has never been a better time to evaluate what Exadata can do for your AI application stack.&lt;/p&gt;

&lt;p&gt;The numbers speak for themselves — but the architecture is the real story.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Have you worked with Exadata AI Storage or Oracle Database 23ai/26ai AI Vector Search? Share your experience in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>exadata</category>
      <category>databaseengineering</category>
      <category>aiinfrastructure</category>
    </item>
    <item>
      <title>Tech Ecosystem Observatory: How I Built a Cloud-Native Data Pipeline to Track Global Tech Layoffs vs YC Startup Activity</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Mon, 30 Mar 2026 07:44:40 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/tech-ecosystem-observatory-how-i-built-a-cloud-native-data-pipeline-to-track-global-tech-layoffs-5gbe</link>
      <guid>https://forem.com/derrickryangiggs/tech-ecosystem-observatory-how-i-built-a-cloud-native-data-pipeline-to-track-global-tech-layoffs-5gbe</guid>
      <description>&lt;p&gt;Just completed my DEZ Zoomcamp 2026 capstone project — the Tech Ecosystem Observatory&lt;/p&gt;

&lt;p&gt;Built a full cloud-native batch data pipeline from scratch that answers: which industries are shedding the most jobs, and how does that correlate with YC startup activity?&lt;/p&gt;

&lt;p&gt;Here's what the pipeline looks like end to end:&lt;/p&gt;

&lt;p&gt;✅ Terraform — provisioned GCS bucket and BigQuery datasets as infrastructure as code&lt;/p&gt;

&lt;p&gt;✅ Docker — containerized all ingestion scripts into a portable image&lt;/p&gt;

&lt;p&gt;✅ Kestra — orchestrated a 4-task batch DAG running weekly every Monday 6AM UTC&lt;/p&gt;

&lt;p&gt;✅ Google Cloud Storage — raw JSONL data lake storing layoffs and YC company data&lt;/p&gt;

&lt;p&gt;✅ BigQuery — partitioned by date (monthly) and clustered by industry/country for optimized queries&lt;/p&gt;

&lt;p&gt;✅ dbt Cloud — built staging views and mart tables (mart_monthly_layoffs + mart_tech_ecosystem)&lt;/p&gt;

&lt;p&gt;✅ Looker Studio — 2-page interactive dashboard with layoff trends, geo maps, ecosystem stress ratios&lt;/p&gt;

&lt;p&gt;📊 Data: 4,317 layoff events (2023–2024) + 5,690 YC-backed companies&lt;/p&gt;

&lt;p&gt;🔗 Live dashboard: &lt;a href="https://lookerstudio.google.com/reporting/b1620cae-97cb-4911-82b8-dd0c46ee8acb" rel="noopener noreferrer"&gt;https://lookerstudio.google.com/reporting/b1620cae-97cb-4911-82b8-dd0c46ee8acb&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💻 GitHub: &lt;a href="https://github.com/Derrick-Ryan-Giggs/tech-ecosystem-observatory" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/tech-ecosystem-observatory&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Huge thanks to @DataTalksClub and Alexey Grigorev for building and maintaining this incredible free course. If you're serious about data engineering, this is where you start 👇&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/DataTalksClub/data-engineering-zoomcamp/" rel="noopener noreferrer"&gt;https://github.com/DataTalksClub/data-engineering-zoomcamp/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Who else is building data pipelines? Drop your projects below 👇&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>gcp</category>
      <category>bigquery</category>
      <category>dbt</category>
    </item>
    <item>
      <title>Streaming with PyFlink and Redpanda</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Mon, 16 Mar 2026 13:34:19 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/streaming-with-pyflink-and-redpanda-4n4c</link>
      <guid>https://forem.com/derrickryangiggs/streaming-with-pyflink-and-redpanda-4n4c</guid>
      <description>&lt;p&gt;Week 7 of Data Engineering Zoomcamp by @DataTalksClub complete&lt;/p&gt;

&lt;p&gt;Just finished Module 7 - Streaming with PyFlink. Learned how to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up Redpanda as a Kafka replacement&lt;/li&gt;
&lt;li&gt;Build Kafka producers and consumers in Python&lt;/li&gt;
&lt;li&gt;Create tumbling and session windows in Flink&lt;/li&gt;
&lt;li&gt;Analyze real-time taxi trip data with stream processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's my homework solution: &lt;a href="https://github.com/Derrick-Ryan-Giggs/Streaming-Homework" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/Streaming-Homework&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can sign up here: &lt;a href="https://github.com/DataTalksClub/data-engineering-zoomcamp/" rel="noopener noreferrer"&gt;https://github.com/DataTalksClub/data-engineering-zoomcamp/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>apacheflink</category>
      <category>kafka</category>
      <category>python</category>
    </item>
    <item>
      <title>Oracle Database 23ai: Vector Similarity Search - Exact, Approximate, and Multi-Vector Strategies</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Mon, 16 Mar 2026 08:57:31 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/oracle-database-23ai-vector-similarity-search-exact-approximate-and-multi-vector-strategies-3fci</link>
      <guid>https://forem.com/derrickryangiggs/oracle-database-23ai-vector-similarity-search-exact-approximate-and-multi-vector-strategies-3fci</guid>
      <description>&lt;p&gt;Oracle Database 23ai's AI Vector Search provides multiple strategies for finding similar vectors, each with different trade-offs between accuracy, speed, and resource usage. Understanding when to use exact search, approximate search, or multi-vector search—and knowing the essential vector functions—is crucial for building high-performance semantic search applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Similarity Search Types
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Exact Similarity Search (Flat Search)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Exact similarity search calculates a query vector's distance to all other vectors&lt;/strong&gt;. It's also called flat search or exhaustive search because every vector in the dataset is compared.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gives the most accurate results&lt;/li&gt;
&lt;li&gt;Perfect search quality (100% recall)&lt;/li&gt;
&lt;li&gt;Involves potentially significant time as dataset grows&lt;/li&gt;
&lt;li&gt;No indexes required&lt;/li&gt;
&lt;li&gt;Suitable for small to medium datasets (thousands to hundreds of thousands of vectors)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to Use:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small datasets where performance is acceptable&lt;/li&gt;
&lt;li&gt;When perfect accuracy is required&lt;/li&gt;
&lt;li&gt;Development and testing phases&lt;/li&gt;
&lt;li&gt;Benchmarking approximate search results&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Approximate Similarity Search
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Approximate similarity search uses vector indexes&lt;/strong&gt; to dramatically speed up searches with minimal accuracy loss. Instead of checking every vector, it leverages specialized data structures to narrow the search space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You must enable vector pool in the SGA&lt;/strong&gt; for HNSW indexes (in-memory neighbor graph indexes).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;vector_memory_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;&lt;span class="n"&gt;M&lt;/span&gt; &lt;span class="k"&gt;SCOPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SPFILE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Restart required&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Can be more efficient but less accurate than exact search&lt;/li&gt;
&lt;li&gt;Uses target accuracy setting (typically 90-99%)&lt;/li&gt;
&lt;li&gt;Requires vector indexes (HNSW or IVF)&lt;/li&gt;
&lt;li&gt;Scales to millions or billions of vectors&lt;/li&gt;
&lt;li&gt;Typical accuracy: 95%+ with proper configuration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Using FETCH APPROXIMATE&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="k"&gt;equals&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance Comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;According to Oracle benchmarks, exact search on 50,000 vectors took 1.50 seconds versus 0.47 seconds with HNSW index—over 3x faster with the same top-10 results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Vector Index Types
&lt;/h3&gt;

&lt;p&gt;Oracle AI Vector Search supports two main types of vector indexes:&lt;/p&gt;

&lt;h4&gt;
  
  
  HNSW (Hierarchical Navigable Small World)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;In-Memory Neighbor Graph&lt;/strong&gt; vector index that creates a navigable graph structure for ultra-fast similarity search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating HNSW Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;products_hnsw_idx&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fully built in-memory in the vector pool&lt;/li&gt;
&lt;li&gt;Extremely fast queries (sub-100ms)&lt;/li&gt;
&lt;li&gt;Requires substantial memory&lt;/li&gt;
&lt;li&gt;No DML operations allowed after creation&lt;/li&gt;
&lt;li&gt;Not available in RAC environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Memory Calculation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Formula: &lt;code&gt;1.3 times number_of_vectors times number_of_dimensions times dimension_byte_size&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Example for 1M vectors, 768 dimensions, FLOAT32 (4 bytes):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.3 × 1,000,000 × 768 × 4 = 3.99 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  IVF (Inverted File Flat)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Neighbor Partition&lt;/strong&gt; vector index built on disk with blocks cached in the buffer cache.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating IVF Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;products_ivf_idx&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;PARTITIONS&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Disk-based, blocks cached in buffer cache&lt;/li&gt;
&lt;li&gt;More scalable for very large datasets&lt;/li&gt;
&lt;li&gt;Supports DML operations (may require periodic rebuild)&lt;/li&gt;
&lt;li&gt;Works in RAC environments&lt;/li&gt;
&lt;li&gt;Slightly slower than HNSW but still very fast&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How IVF Works:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The index partitions vectors into clusters based on similarity. By default, the number of partitions equals the square root of the dataset size. During search, only relevant clusters are examined.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Multi-Vector Similarity Search
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Multi-vector similarity search is usually used for multi-document search&lt;/strong&gt; where documents are split into chunks, and chunks are embedded individually into vectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long documents split into paragraphs or sections&lt;/li&gt;
&lt;li&gt;Product catalogs with multiple descriptions&lt;/li&gt;
&lt;li&gt;Research papers chunked for semantic search&lt;/li&gt;
&lt;li&gt;Multi-modal data (text + images)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Documents are split into chunks&lt;/li&gt;
&lt;li&gt;Each chunk is embedded individually into a separate vector&lt;/li&gt;
&lt;li&gt;Chunks are stored with partition keys linking them to parent documents&lt;/li&gt;
&lt;li&gt;Search retrieves top chunks across all documents&lt;/li&gt;
&lt;li&gt;Results can be aggregated by document&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Uses partitions&lt;/strong&gt; to organize chunks belonging to the same document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table Structure Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;doc_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_text&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p3&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;MAXVALUE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multi-Vector Search Query:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find top chunks across all documents&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Aggregating by Document:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Get top documents based on best chunk match&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;doc_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;best_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;document_chunks&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;doc_id&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;best_score&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Narrowing Search Results
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Attribute Filtering with WHERE Clause
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use the WHERE clause to filter results based on metadata or business attributes. You are not limited by the use of the ORDER BY clause.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This powerful combination enables hybrid search: semantic similarity (vector search) plus exact filters (traditional SQL).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example: Filter by Category and Date&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Electronics'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;launch_date&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="k"&gt;equals&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example: Price Range Filter&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;L2_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;in_stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Y'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="n"&gt;hyphen&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best Practices for Filtering:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply filters in WHERE clause before vector search&lt;/li&gt;
&lt;li&gt;Use indexed columns for better performance&lt;/li&gt;
&lt;li&gt;Combine multiple conditions as needed&lt;/li&gt;
&lt;li&gt;Leverage partition pruning for partitioned tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing Other Distance Functions
&lt;/h2&gt;

&lt;p&gt;Oracle provides shorthand distance functions that simplify syntax and improve code readability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distance Function Equivalents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;L1_DISTANCE(v1, v2)&lt;/strong&gt; is similar to Manhattan distance&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;L1_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2, 3]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[4, 5, 6]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;manhattan_dist&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;L2_DISTANCE(v1, v2)&lt;/strong&gt; is similar to Euclidean distance&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;L2_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[3, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 4]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;euclidean_dist&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Result: 5.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;COSINE_DISTANCE&lt;/strong&gt; is the same as cosine&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 1]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_dist&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;INNER_PRODUCT(v1, v2)&lt;/strong&gt; is the same as dot product&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;INNER_PRODUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[2, 3]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[4, 5]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dot_product&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Result: 2*4 + 3*5 = 23&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note on DOT vs INNER_PRODUCT:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;VECTOR_DISTANCE&lt;/code&gt; function with DOT metric returns the &lt;strong&gt;negated&lt;/strong&gt; inner product, while the &lt;code&gt;INNER_PRODUCT&lt;/code&gt; function returns the actual dot product.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- These are NOT equivalent:&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DOT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;     &lt;span class="c1"&gt;-- Returns -1 * dot_product&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;INNER_PRODUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;             &lt;span class="c1"&gt;-- Returns dot_product&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Other Essential Vector Functions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Vector Constructors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;TO_VECTOR()&lt;/strong&gt; converts a string or a character large object (CLOB) to a vector&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- From string&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TO_VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1.5, 2.3, 4.1]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- From CLOB&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
  &lt;span class="n"&gt;my_clob&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2, 0.3, 0.4]'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="n"&gt;my_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="n"&gt;my_vector&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;TO_VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_clob&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TO_VECTOR also takes another vector as input, adjusts its format, and returns the adjusted vector as output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VECTOR()&lt;/strong&gt; converts a string or CLOB into a vector&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2, 3]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vec&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TO_VECTOR and VECTOR are synonymous&lt;/strong&gt;—they perform the same function.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Vector Serializer
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;VECTOR_SERIALIZE()&lt;/strong&gt; converts a vector into a string or a CLOB&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_SERIALIZE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;vec_string&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Result: '[0.12, 0.45, 0.78, ...]'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exporting vectors for external processing&lt;/li&gt;
&lt;li&gt;Debugging and inspection&lt;/li&gt;
&lt;li&gt;Logging and auditing&lt;/li&gt;
&lt;li&gt;Integration with non-Oracle systems&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Vector Norm
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;VECTOR_NORM()&lt;/strong&gt; returns the Euclidean norm of a vector—the distance from the origin to the vector.&lt;/p&gt;

&lt;p&gt;Also known as magnitude or length, it's calculated as the square root of the sum of squared components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;norm = square_root(x1-squared + x2-squared + ... + xn-squared)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_NORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TO_VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[3, 4]'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;magnitude&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;DUAL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Result: 5.0 (because square_root(9+16) = 5)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Practical Use: Normalizing Vectors&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Normalize vector to unit length&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;VECTOR_NORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;normalized_vector&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Normalized vectors have a magnitude of 1.0, which is required for meaningful dot product similarity comparisons.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Vector Dimension Count
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;VECTOR_DIMENSION_COUNT()&lt;/strong&gt; returns the number of dimensions of a vector&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DIMENSION_COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dimensions&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Result: 768 (for BERT-style embeddings)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Validating embedding dimensions&lt;/li&gt;
&lt;li&gt;Debugging dimension mismatches&lt;/li&gt;
&lt;li&gt;Dynamic schema inspection&lt;/li&gt;
&lt;li&gt;Migration validation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Vector Dimension Format
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;VECTOR_DIMENSION_FORMAT()&lt;/strong&gt; returns the storage format of the vector&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DIMENSION_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Possible results: 'INT8', 'FLOAT32', 'FLOAT64', 'BINARY'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schema documentation&lt;/li&gt;
&lt;li&gt;Storage optimization analysis&lt;/li&gt;
&lt;li&gt;Migration planning&lt;/li&gt;
&lt;li&gt;Model compatibility verification&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Complete Working Example
&lt;/h2&gt;

&lt;p&gt;Here's a comprehensive example demonstrating all three search types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;research_papers&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;abstract&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;publish_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert sample data&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;research_papers&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'Advances in Vector Search'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'This paper explores efficient algorithms...'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'Computer Science'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'2024-06-15'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.1, 0.2, ...]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 1. EXACT SIMILARITY SEARCH&lt;/span&gt;
&lt;span class="c1"&gt;-- Most accurate, slower for large datasets&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;research_papers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Computer Science'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 2. CREATE VECTOR INDEX for approximate search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;papers_hnsw_idx&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;research_papers&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 3. APPROXIMATE SIMILARITY SEARCH&lt;/span&gt;
&lt;span class="c1"&gt;-- Faster, 95%+ accuracy&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;publish_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;research_papers&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Computer Science'&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;publish_date&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="k"&gt;equals&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- 4. MULTI-VECTOR SEARCH (for chunked documents)&lt;/span&gt;
&lt;span class="c1"&gt;-- Create chunked version&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;paper_chunks&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_text&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Find best matching chunks&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;chunk_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;L2_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;paper_chunks&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Aggregate to document level&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;MIN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;best_match_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;research_papers&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;paper_chunks&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;paper_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;best_match_score&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Choose the Right Search Strategy
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exact search&lt;/strong&gt;: Less than 100K vectors, need perfect accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approximate with HNSW&lt;/strong&gt;: 100K to 10M vectors, need sub-100ms latency, have memory available&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approximate with IVF&lt;/strong&gt;: 10M+ vectors, limited memory, can tolerate slightly higher latency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-vector&lt;/strong&gt;: Document chunks, multi-modal data, detailed granularity needed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Optimize Index Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- For high accuracy (slower but better results)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx1&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- For speed (faster but slightly lower accuracy)&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx2&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Use Attribute Filtering Effectively
&lt;/h3&gt;

&lt;p&gt;Combine vector similarity with traditional filters to narrow results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Efficient: Filter first, then search&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;              &lt;span class="c1"&gt;-- Traditional filter&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Electronics'&lt;/span&gt;         &lt;span class="c1"&gt;-- Traditional filter&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="k"&gt;equals&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;  &lt;span class="c1"&gt;-- Vector search&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Monitor and Tune Performance
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check if index is being used&lt;/span&gt;
&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PLAN&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="k"&gt;equals&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DBMS_XPLAN&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DISPLAY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Handle Dimension Mismatches
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Validate dimensions before comparison&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DIMENSION_COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dims&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DIMENSION_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DIMENSION_COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Pitfalls and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Forgetting to Enable Vector Pool
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; HNSW index creation fails with memory error&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Configure vector pool before creating HNSW indexes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;vector_memory_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt; &lt;span class="k"&gt;SCOPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SPFILE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Restart database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall 2: Using Wrong Index Type
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Slow performance despite having an index&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Match your workload to index type:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HNSW for read-heavy, memory-available scenarios&lt;/li&gt;
&lt;li&gt;IVF for write-heavy or memory-constrained scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Metric Mismatch
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Index not being used despite proper syntax&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Ensure query metric matches index metric&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Index uses COSINE&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Query must also use COSINE&lt;/span&gt;
&lt;span class="c1"&gt;-- Correct: Uses less-than equals greater-than (COSINE operator)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="k"&gt;equals&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;

&lt;span class="c1"&gt;-- Wrong: Uses less-than hyphen greater-than (EUCLIDEAN operator)&lt;/span&gt;
&lt;span class="c1"&gt;-- This will NOT use the index!&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;less&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="n"&gt;hyphen&lt;/span&gt; &lt;span class="n"&gt;greater&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;than&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall 4: Not Using FETCH APPROXIMATE
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Query is slow despite having vector index&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use &lt;code&gt;FETCH APPROXIMATE&lt;/code&gt; to enable index usage&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Without APPROXIMATE - performs exact search&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- With APPROXIMATE - uses index&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oracle Database 23ai provides a comprehensive vector search framework with multiple strategies to fit different use cases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Search Strategies:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exact&lt;/strong&gt;: Perfect accuracy, suitable for smaller datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Approximate&lt;/strong&gt;: 95%+ accuracy with dramatic speed improvements using HNSW or IVF indexes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-vector&lt;/strong&gt;: Chunk-level granularity for document search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable vector pool for HNSW indexes: &lt;code&gt;vector_memory_size&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Use FETCH APPROXIMATE for index-accelerated queries&lt;/li&gt;
&lt;li&gt;Match distance metrics between index and query&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Essential Functions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Constructors&lt;/strong&gt;: TO_VECTOR(), VECTOR()&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serializer&lt;/strong&gt;: VECTOR_SERIALIZE()&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Utilities&lt;/strong&gt;: VECTOR_NORM(), VECTOR_DIMENSION_COUNT(), VECTOR_DIMENSION_FORMAT()&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distance Functions&lt;/strong&gt;: L1_DISTANCE(), L2_DISTANCE(), COSINE_DISTANCE(), INNER_PRODUCT()&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best Practices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose search strategy based on dataset size and accuracy requirements&lt;/li&gt;
&lt;li&gt;Combine vector search with WHERE clause filters for powerful hybrid queries&lt;/li&gt;
&lt;li&gt;Monitor index usage with EXPLAIN PLAN&lt;/li&gt;
&lt;li&gt;Normalize vectors when using dot product similarity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By understanding these search strategies and vector functions, you can build high-performance semantic search applications that scale from thousands to billions of vectors while maintaining excellent accuracy and query speed.&lt;/p&gt;

</description>
      <category>oracleaidatabase</category>
      <category>vectorsearch</category>
      <category>approximatenearestneighbor</category>
    </item>
    <item>
      <title>Oracle Database 23ai: Creating Vectors and Understanding Distance Metrics for Similarity Search</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Mon, 09 Mar 2026 07:52:54 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/oracle-database-23ai-creating-vectors-and-understanding-distance-metrics-for-similarity-search-2mb6</link>
      <guid>https://forem.com/derrickryangiggs/oracle-database-23ai-creating-vectors-and-understanding-distance-metrics-for-similarity-search-2mb6</guid>
      <description>&lt;p&gt;Oracle Database 23ai introduces native vector capabilities that enable semantic search directly within SQL. Understanding how to create vectors, calculate distances, and choose appropriate metrics is fundamental to building effective AI-powered applications. This comprehensive guide explores vector operations in Oracle 23ai with practical examples and best practices.&lt;/p&gt;

&lt;h2&gt;
  
  
  The VECTOR Data Type
&lt;/h2&gt;

&lt;p&gt;Oracle 23ai introduces a native VECTOR data type designed specifically to store and manage vector embeddings efficiently within the database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Declaration Syntax:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Flexible: any dimensions and format&lt;/span&gt;
&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;

&lt;span class="c1"&gt;-- Specific dimensions, flexible format&lt;/span&gt;
&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;-- Fully specified: dimensions and format&lt;/span&gt;
&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Format Options:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;INT8&lt;/strong&gt;: 8-bit integers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLOAT32&lt;/strong&gt;: 32-bit floating-point (IEEE standard, most common)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLOAT64&lt;/strong&gt;: 64-bit floating-point (IEEE standard, higher precision)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BINARY&lt;/strong&gt;: Binary vectors for specialized use cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Oracle Database automatically casts values if needed between formats.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Vectors with the VECTOR Constructor
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;VECTOR constructor is a function that allows us to create vectors without storing them&lt;/strong&gt;. It's particularly useful for learning purposes, testing, and ad-hoc queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Syntax
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;array_literal&lt;/span&gt; &lt;span class="p"&gt;[,&lt;/span&gt; &lt;span class="n"&gt;number_of_dimensions&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;[,&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Parameters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;array_literal&lt;/strong&gt;: String representation of vector values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;number_of_dimensions&lt;/strong&gt;: Optional, inferred from array if not specified&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;format&lt;/strong&gt;: Optional (INT8, FLOAT32, FLOAT64, BINARY)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Examples
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Simple 2D Vector:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 0]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;code&gt;[0, 0]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector with Scientific Notation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[10, 0]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt; &lt;code&gt;[1.0E+001, 0]&lt;/code&gt; (equivalent to [10, 0])&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specifying Dimensions:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2, 3]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Creating Vectors from Variables:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;DECLARE&lt;/span&gt;
  &lt;span class="n"&gt;my_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="n"&gt;my_vector&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.5, 0.8, 0.2]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="n"&gt;DBMS_OUTPUT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PUT_LINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;my_vector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Best Practices for Learning
&lt;/h3&gt;

&lt;p&gt;Use small number of dimensions (2-4) when learning vector concepts, as they're easier to visualize and understand. For production, match the dimensions to your embedding model (typically 384-1536).&lt;/p&gt;

&lt;h2&gt;
  
  
  The VECTOR_DISTANCE Function
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;VECTOR_DISTANCE is the main function for calculating distance between two vectors&lt;/strong&gt;. It allows you to calculate distance between two parameters and therefore takes 2 vectors as params.&lt;/p&gt;

&lt;h3&gt;
  
  
  Syntax
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expr1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expr2&lt;/span&gt; &lt;span class="p"&gt;[,&lt;/span&gt; &lt;span class="n"&gt;distance_metric&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Parameters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;expr1, expr2&lt;/strong&gt;: Two vector expressions to compare&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;distance_metric&lt;/strong&gt;: Optional; defaults to COSINE (or HAMMING for BINARY vectors)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Returns:&lt;/strong&gt; BINARY_DOUBLE representing the distance&lt;/p&gt;

&lt;h3&gt;
  
  
  Default Behavior
&lt;/h3&gt;

&lt;p&gt;If you do not specify a distance metric, then the default distance metric is cosine. If the input vectors are BINARY vectors, the default metric is hamming.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 1]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;COSINE&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Vector Distance Metrics: Choosing the Right One
&lt;/h2&gt;

&lt;p&gt;Oracle AI Vector Search supports multiple distance metrics, each suited for different use cases and data characteristics.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Euclidean Distance (L2)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Euclidean distance gives us the straight-line distance between two vectors. It uses the Pythagorean theorem and is sensitive to both vector size and direction.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;d = √[(x₂-x₁)² + (y₂-y₁)²]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TO_NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[3, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 4]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;EUCLIDEAN&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; &lt;code&gt;5&lt;/code&gt; (forming a 3-4-5 right triangle)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spatial data and geographical coordinates&lt;/li&gt;
&lt;li&gt;Physical measurements&lt;/li&gt;
&lt;li&gt;Image similarity (pixel-level comparisons)&lt;/li&gt;
&lt;li&gt;When absolute distance matters&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Euclidean Squared Distance (L2_SQUARED)
&lt;/h3&gt;

&lt;p&gt;The Euclidean distance without taking the square root. When ordering is more important than the actual distance values, squared Euclidean distance is very useful as it's faster to calculate, avoiding the square root computation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Advantage:&lt;/strong&gt; Faster computation for ranking/sorting&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;TO_NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[3, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 4]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;EUCLIDEAN_SQUARED&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance_squared&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; &lt;code&gt;25&lt;/code&gt; (5²)&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cosine Similarity / Cosine Distance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cosine similarity is the most widely used similarity metric, especially in natural language processing (NLP). The smaller the angle, the more similar the vectors are.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cosine measures the angle between two vectors rather than their magnitude. It's ideal for text embeddings where direction matters more than magnitude.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cosine_similarity = (A · B) / (||A|| × ||B||)
cosine_distance = 1 - cosine_similarity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 1]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;COSINE&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Range:&lt;/strong&gt; 0 (identical) to 2 (opposite)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalized:&lt;/strong&gt; Insensitive to vector magnitude&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Text embeddings, document similarity, semantic search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why It's Popular:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cosine is one of the most useful metrics since it measures the angle between two vectors instead of the difference in size or position. This makes it perfect for comparing text embeddings where the semantic meaning is encoded in direction, not magnitude.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Dot Product Similarity
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Dot product allows us to multiply the size of each vector by the cosine of their angle. It is equivalent to the sum of the vectors' coordinates. Larger values mean more similar; smaller values mean less similar.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dot_product = Σ(aᵢ × bᵢ)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Oracle's DOT metric calculates the &lt;strong&gt;negated&lt;/strong&gt; dot product, so more negative values indicate greater similarity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[2, 3]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[4, 5]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;DOT&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dot_distance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Recommendation systems&lt;/li&gt;
&lt;li&gt;Similarity ranking with normalized vectors&lt;/li&gt;
&lt;li&gt;Fast approximate nearest neighbor search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; For meaningful results, vectors should be normalized to unit length.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Manhattan Distance (L1)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Manhattan distance is useful for describing uniform grids. It's useful for city blocks, power grids, chessboards, and is faster than Euclidean metrics.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Also known as taxicab distance or L1 distance, it calculates the sum of absolute differences between vector components.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;manhattan = Σ|aᵢ - bᵢ|
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[4, 6]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;MANHATTAN&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;manhattan_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; &lt;code&gt;|4-1| + |6-2| = 7&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grid-based problems&lt;/li&gt;
&lt;li&gt;Route planning on city streets&lt;/li&gt;
&lt;li&gt;Feature selection in machine learning&lt;/li&gt;
&lt;li&gt;When diagonal movement isn't allowed&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. Hamming Distance
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Hamming distance describes where vector dimensions differ. They are binary vectors and tell us the number of bits that require change to match.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hamming distance computes the position of each bit in the sequence and is used for network error detection and correction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 0, 1, 1, 0]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;BINARY&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 1, 1, 0, 0]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;BINARY&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;HAMMING&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;hamming_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; &lt;code&gt;2&lt;/code&gt; (positions 2 and 4 differ)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Error detection and correction&lt;/li&gt;
&lt;li&gt;Genetic sequence comparison&lt;/li&gt;
&lt;li&gt;Digital communication&lt;/li&gt;
&lt;li&gt;Binary classification problems&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. Jaccard Distance
&lt;/h3&gt;

&lt;p&gt;Jaccard distance measures dissimilarity between binary vectors based on the ratio of intersection to union.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt; Both vectors must be BINARY&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formula:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;jaccard = 1 - (|A ∩ B| / |A ∪ B|)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;SQL Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 1, 0, 1]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;BINARY&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 0, 1, 1]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;BINARY&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;JACCARD&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;jaccard_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set similarity comparison&lt;/li&gt;
&lt;li&gt;Document deduplication&lt;/li&gt;
&lt;li&gt;Recommendation systems&lt;/li&gt;
&lt;li&gt;Clustering binary data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Shorthand Distance Operators
&lt;/h2&gt;

&lt;p&gt;Oracle provides convenient shorthand operators for common distance calculations, making SQL queries more concise and readable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Available Operators
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; Euclidean Distance Operator&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'[1, 2]'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;-&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0, 1]'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;euclidean_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Equivalent to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;L2_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 1]'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;euclidean_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 1]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;EUCLIDEAN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;euclidean_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; Cosine Distance Operator&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'[1, 0]'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0, 1]'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Equivalent to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 0]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0, 1]'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine_dist&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;&amp;lt;#&amp;gt;&lt;/code&gt; Negative Dot Product Operator&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="s1"&gt;'[2, 3]'&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;#&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[4, 5]'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;neg_dot_product&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Equivalent to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;INNER_PRODUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[2, 3]'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[4, 5]'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;neg_dot_product&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Practical Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Compare products by embedding similarity&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
  &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Shorthand Distance Functions
&lt;/h2&gt;

&lt;p&gt;In addition to operators, Oracle provides shorthand functions for cleaner code:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;L1_DISTANCE&lt;/strong&gt; - Manhattan distance&lt;br&gt;
&lt;strong&gt;L2_DISTANCE&lt;/strong&gt; - Euclidean distance&lt;br&gt;
&lt;strong&gt;COSINE_DISTANCE&lt;/strong&gt; - Cosine distance&lt;br&gt;
&lt;strong&gt;INNER_PRODUCT&lt;/strong&gt; - Dot product (not negated)&lt;br&gt;
&lt;strong&gt;HAMMING_DISTANCE&lt;/strong&gt; - Hamming distance&lt;br&gt;
&lt;strong&gt;JACCARD_DISTANCE&lt;/strong&gt; - Jaccard distance&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;L2_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;euclidean&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;cosine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;INNER_PRODUCT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dot_prod&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performing Similarity Search
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The VECTOR_DISTANCE function can be used to perform similarity search&lt;/strong&gt; by ordering results based on vector proximity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Exact Similarity Search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Approximate Similarity Search (with Index)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Using FETCH APPROXIMATE with vector index&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Difference:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EXACT&lt;/strong&gt;: Compares query vector with every vector (slower, 100% accurate)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;APPROXIMATE&lt;/strong&gt;: Uses vector indexes (HNSW/IVF) for fast search with ~95% accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Choosing the Right Distance Metric
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Decision Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Recommended Metric&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Text embeddings&lt;/td&gt;
&lt;td&gt;COSINE&lt;/td&gt;
&lt;td&gt;Captures semantic similarity, magnitude-invariant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image similarity&lt;/td&gt;
&lt;td&gt;EUCLIDEAN&lt;/td&gt;
&lt;td&gt;Pixel-level comparisons benefit from absolute distance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recommendation systems&lt;/td&gt;
&lt;td&gt;DOT (normalized vectors)&lt;/td&gt;
&lt;td&gt;Fast computation, works well with normalized data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grid/route problems&lt;/td&gt;
&lt;td&gt;MANHATTAN&lt;/td&gt;
&lt;td&gt;Natural fit for grid-based navigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Binary classification&lt;/td&gt;
&lt;td&gt;HAMMING&lt;/td&gt;
&lt;td&gt;Direct bit difference counting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error detection&lt;/td&gt;
&lt;td&gt;HAMMING&lt;/td&gt;
&lt;td&gt;Counts differing positions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set similarity&lt;/td&gt;
&lt;td&gt;JACCARD&lt;/td&gt;
&lt;td&gt;Measures intersection/union ratio&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Performance Considerations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Fastest to Slowest:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DOT (simple multiplication and sum)&lt;/li&gt;
&lt;li&gt;EUCLIDEAN_SQUARED (avoids square root)&lt;/li&gt;
&lt;li&gt;MANHATTAN (absolute values and sum)&lt;/li&gt;
&lt;li&gt;EUCLIDEAN (includes square root)&lt;/li&gt;
&lt;li&gt;COSINE (normalization overhead)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Matching Index and Query Metrics
&lt;/h3&gt;

&lt;p&gt;If a similarity search query specifies a distance metric that conflicts with the metric in a vector index, the vector index is not used and an exact search is performed instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best Practice:&lt;/strong&gt; Ensure your query metric matches your index metric for optimal performance.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Index created with COSINE&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;docs_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Query should also use COSINE for index usage&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;  &lt;span class="c1"&gt;-- Uses COSINE operator&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Complete Working Example
&lt;/h2&gt;

&lt;p&gt;Here's a comprehensive example demonstrating vector creation, storage, and similarity search:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create table with vector column&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;product_embeddings&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert sample data&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;product_embeddings&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'Laptop Computer'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'High-performance laptop for developers'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.2, 0.8, 0.5, ...]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;product_embeddings&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'Wireless Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="s1"&gt;'Ergonomic wireless mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.1, 0.3, 0.7, ...]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create vector index for fast similarity search&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;product_emb_idx&lt;/span&gt; 
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;product_embeddings&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Perform similarity search&lt;/span&gt;
&lt;span class="k"&gt;DECLARE&lt;/span&gt;
  &lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
  &lt;span class="c1"&gt;-- Create query vector (in practice, from embedding model)&lt;/span&gt;
  &lt;span class="n"&gt;query_vec&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.15, 0.75, 0.6, ...]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;-- Find similar products&lt;/span&gt;
  &lt;span class="k"&gt;FOR&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt; &lt;span class="k"&gt;IN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
      &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;product_embeddings&lt;/span&gt;
    &lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;query_vec&lt;/span&gt;
    &lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;
    &lt;span class="n"&gt;DBMS_OUTPUT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PUT_LINE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="s1"&gt;'Product: '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; 
      &lt;span class="s1"&gt;' | Similarity: '&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="n"&gt;rec&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;END&lt;/span&gt; &lt;span class="n"&gt;LOOP&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Choose Appropriate Dimensions
&lt;/h3&gt;

&lt;p&gt;Match embedding dimensions to your model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;384&lt;/strong&gt;: MiniLM, lightweight models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;768&lt;/strong&gt;: BERT, sentence transformers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1024&lt;/strong&gt;: Cohere embedding models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1536&lt;/strong&gt;: OpenAI ada-002&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Normalize Vectors When Using DOT
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Normalize vector to unit length&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="n"&gt;VECTOR_NORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;original_magnitude&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;VECTOR_NORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;normalized_vector&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Use Appropriate Formats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FLOAT32&lt;/strong&gt;: Default, balances precision and performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FLOAT64&lt;/strong&gt;: When high precision is critical&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;INT8&lt;/strong&gt;: For quantized models, saves storage&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Monitor Index Accuracy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check if index is being used&lt;/span&gt;
&lt;span class="k"&gt;EXPLAIN&lt;/span&gt; &lt;span class="n"&gt;PLAN&lt;/span&gt; &lt;span class="k"&gt;FOR&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;APPROXIMATE&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DBMS_XPLAN&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DISPLAY&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Benchmark Different Metrics
&lt;/h3&gt;

&lt;p&gt;Test multiple metrics on your data to find the best performer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Compare metrics&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="s1"&gt;'COSINE'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;metric&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;COSINE_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_dist&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;
&lt;span class="k"&gt;UNION&lt;/span&gt; &lt;span class="k"&gt;ALL&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
  &lt;span class="s1"&gt;'EUCLIDEAN'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;L2_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;e1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;e2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Common Pitfalls and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Dimension Mismatch
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Comparing vectors with different dimensions&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This will error&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2]'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[1, 2, 3]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Ensure all vectors have the same dimensions&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Wrong Metric for Data Type
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Using JACCARD on non-binary vectors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use JACCARD only with BINARY vectors&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Not Using Indexes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Slow similarity searches on large datasets&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Create appropriate vector indexes (HNSW for speed, IVF for scale)&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 4: Metric Mismatch with Index
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Query metric conflicts with index metric, causing full table scan&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Match query metric to index metric&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Oracle Database 23ai's native vector capabilities provide a powerful, integrated platform for semantic search and AI-powered applications. Key takeaways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector Creation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VECTOR constructor for creating vectors directly in SQL&lt;/li&gt;
&lt;li&gt;Support for multiple formats (INT8, FLOAT32, FLOAT64, BINARY)&lt;/li&gt;
&lt;li&gt;Flexible dimension specification&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Distance Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;COSINE: Default, best for text embeddings and semantic similarity&lt;/li&gt;
&lt;li&gt;EUCLIDEAN: Straight-line distance, good for spatial data&lt;/li&gt;
&lt;li&gt;DOT: Fast for normalized vectors, recommendation systems&lt;/li&gt;
&lt;li&gt;MANHATTAN: Grid-based problems, faster than Euclidean&lt;/li&gt;
&lt;li&gt;HAMMING: Binary vectors, error detection&lt;/li&gt;
&lt;li&gt;JACCARD: Set similarity with binary vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Shorthand Operators:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;-&amp;gt;&lt;/code&gt; for Euclidean distance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;=&amp;gt;&lt;/code&gt; for Cosine distance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;#&amp;gt;&lt;/code&gt; for negative dot product&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best Practices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Match metric to your use case and embedding model&lt;/li&gt;
&lt;li&gt;Create appropriate vector indexes&lt;/li&gt;
&lt;li&gt;Ensure metric consistency between index and queries&lt;/li&gt;
&lt;li&gt;Use approximate search for large datasets&lt;/li&gt;
&lt;li&gt;Benchmark different metrics on your data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By understanding these vector operations and distance metrics, you can build efficient, accurate similarity search applications entirely within Oracle Database 23ai.&lt;/p&gt;

</description>
      <category>oracledatabase23ai</category>
      <category>vectordatabase</category>
      <category>vectorsearch</category>
    </item>
    <item>
      <title>Batch Processing with Apache Spark</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Sat, 07 Mar 2026 13:04:08 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/batch-processing-with-apache-spark-35dd</link>
      <guid>https://forem.com/derrickryangiggs/batch-processing-with-apache-spark-35dd</guid>
      <description>&lt;p&gt;Week 6 of Data Engineering Zoomcamp by @DataTalksClub complete&lt;br&gt;
Just finished Module 6 - Batch Processing with Spark. Learned how to:&lt;/p&gt;

&lt;p&gt;✅ Set up PySpark and create Spark sessions&lt;/p&gt;

&lt;p&gt;✅ Read and process Parquet files at scale&lt;/p&gt;

&lt;p&gt;✅ Repartition data for optimal performance&lt;/p&gt;

&lt;p&gt;✅ Analyze millions of taxi trips with DataFrames&lt;/p&gt;

&lt;p&gt;✅ Use Spark UI for monitoring jobs&lt;/p&gt;

&lt;p&gt;Processing 4M+ taxi trips with Spark - distributed computing is powerful&lt;/p&gt;

&lt;p&gt;Here's my homework solution: &lt;a href="https://github.com/Derrick-Ryan-Giggs/pyspark-homework" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/pyspark-homework&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following along with this amazing free course - who else is learning data engineering?&lt;/p&gt;

&lt;p&gt;You can sign up here: &lt;a href="https://github.com/DataTalksClub/data-engineering-zoomcamp/" rel="noopener noreferrer"&gt;https://github.com/DataTalksClub/data-engineering-zoomcamp/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>batchprocessing</category>
      <category>spark</category>
      <category>dataengineering</category>
      <category>datatalksclub</category>
    </item>
    <item>
      <title>From APIs to Warehouses: AI-Assisted Data Ingestion with dlt</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Sun, 01 Mar 2026 11:44:32 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/from-apis-to-warehouses-ai-assisted-data-ingestion-with-dlt-19d2</link>
      <guid>https://forem.com/derrickryangiggs/from-apis-to-warehouses-ai-assisted-data-ingestion-with-dlt-19d2</guid>
      <description>&lt;p&gt;🚀 dlt Workshop of Data Engineering Zoomcamp by @DataTalksClub complete&lt;br&gt;
Just finished the Data Ingestion workshop with @dltHub. Learned how to:&lt;br&gt;
✅ Build REST API data pipelines with dlt&lt;br&gt;
✅ Use AI-assisted development with dlt MCP Server&lt;br&gt;
✅ Load paginated API data into DuckDB&lt;br&gt;
✅ Inspect pipeline data with dlt Dashboard and marimo notebooks&lt;br&gt;
Built a full NYC taxi data pipeline from a custom API - AI-assisted data engineering is the future&lt;/p&gt;

&lt;p&gt;Here's my homework solution: &lt;a href="https://github.com/Derrick-Ryan-Giggs/-my-dlt-taxi-pipeline" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/-my-dlt-taxi-pipeline&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following along with this amazing free course - who else is learning data engineering?&lt;/p&gt;

&lt;p&gt;You can sign up here: &lt;a href="https://github.com/DataTalksClub/data-engineering-zoomcamp/" rel="noopener noreferrer"&gt;https://github.com/DataTalksClub/data-engineering-zoomcamp/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>dataengineering</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Oracle AI Vector Search: DML and DDL Operations on Vector Columns</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Thu, 26 Feb 2026 08:18:20 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/oracle-ai-vector-search-dml-and-ddl-operations-on-vector-columns-1d1a</link>
      <guid>https://forem.com/derrickryangiggs/oracle-ai-vector-search-dml-and-ddl-operations-on-vector-columns-1d1a</guid>
      <description>&lt;p&gt;Oracle Database 23ai introduces the VECTOR data type, enabling you to store AI embeddings alongside traditional business data. Understanding how to perform Data Manipulation Language (DML) and Data Definition Language (DDL) operations on vector columns is essential for building effective AI-powered applications. This guide covers everything you need to know about working with vector columns, including supported operations and important restrictions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the VECTOR Data Type
&lt;/h2&gt;

&lt;p&gt;The VECTOR data type can store vectors with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Arbitrary number of dimensions&lt;/strong&gt;: From 1 to 65,535 dimensions (65,528 for BINARY format)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple formats&lt;/strong&gt;: INT8, FLOAT32, FLOAT64, or BINARY&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible or fixed specifications&lt;/strong&gt;: Can be declared with or without constraints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Declaration Examples:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Flexible: any dimensions and format&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;flexible_vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Fixed: specific dimensions and format&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;fixed_vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  DML Operations on Vectors
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. INSERT Operations
&lt;/h3&gt;

&lt;p&gt;You can directly insert vectors into tables using several methods:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Method A: Insert Using Vector Literals&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Insert vector as array literal&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Laptop'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.8, 0.5, 0.3]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert with TO_VECTOR function&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Mouse'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TO_VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'[0.2, 0.7, 0.4, 0.6]'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Method B: Insert with Embedding Generation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Generate embeddings during insert&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Keyboard'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Mechanical gaming keyboard with RGB'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="s1"&gt;'Mechanical gaming keyboard with RGB'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Method C: Insert from Another Table&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Copy vectors from another table&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products_archive&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;ADD_MONTHS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SYSDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. UPDATE Operations
&lt;/h3&gt;

&lt;p&gt;You can directly update vector columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Update with vector literal&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'[0.15, 0.85, 0.55, 0.35]'&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Update using embedding generation&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Conditional update&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Electronics'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. DELETE Operations
&lt;/h3&gt;

&lt;p&gt;Delete operations work normally with tables containing vector columns:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Delete specific rows&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Delete based on relational criteria&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;ADD_MONTHS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SYSDATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'ARCHIVED'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Delete all rows&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- or&lt;/span&gt;
&lt;span class="k"&gt;TRUNCATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Loading Data Using SQL*Loader
&lt;/h3&gt;

&lt;p&gt;SQL*Loader can load vector data from external files:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Control File Example (products.ctl):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LOAD DATA
INFILE 'products.dat'
INTO TABLE products
FIELDS TERMINATED BY ','
(
    product_id,
    product_name,
    description,
    embedding CHAR(4000)
)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Data File Example (products.dat):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1,Laptop,High-performance laptop,"[0.1,0.8,0.5,0.3,0.6]"
2,Mouse,Wireless gaming mouse,"[0.2,0.7,0.4,0.6,0.5]"
3,Keyboard,Mechanical keyboard,"[0.15,0.85,0.55,0.35,0.65]"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Load Command:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;sqlldr &lt;span class="nv"&gt;userid&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;username/password@db &lt;span class="nv"&gt;control&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;products.ctl &lt;span class="nv"&gt;log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;products.log
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. DML on Tables with HNSW Indexes
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Important Update (23ai Release 23.6+):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Oracle Database 23ai releases 23.4 and 23.5, DML operations were not allowed on tables with HNSW indexes. Starting with Release 23.6, this restriction has been lifted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Transactional consistency&lt;/strong&gt;: DML modifications are now supported on tables with HNSW indexes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAC support&lt;/strong&gt;: Guarantees transactional consistency even in Oracle RAC environments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consistent results&lt;/strong&gt;: Vector search queries using HNSW indexes see transactionally consistent results based on their read snapshot
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- These operations now work with HNSW indexes (23.6+)&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'New Product'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2, 0.3]'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'[0.2, 0.3, 0.4]'&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;101&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Vector DDL Operations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Tables with Multiple Vector Columns
&lt;/h3&gt;

&lt;p&gt;Tables can have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More than one column&lt;/strong&gt; of VECTOR data type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Different formats and dimensions&lt;/strong&gt; in different columns
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;multimedia_content&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;content_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;-- Text embedding: 384 dimensions, FLOAT32&lt;/span&gt;
    &lt;span class="n"&gt;text_embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;-- Image embedding: 512 dimensions, FLOAT32&lt;/span&gt;
    &lt;span class="n"&gt;image_embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;-- Audio embedding: 256 dimensions, INT8&lt;/span&gt;
    &lt;span class="n"&gt;audio_embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;INT8&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;-- Flexible dimension column&lt;/span&gt;
    &lt;span class="n"&gt;metadata_embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Adding Vector Columns
&lt;/h3&gt;

&lt;p&gt;You can add vector columns to existing tables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Add vector column with specific dimensions&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Add flexible vector column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;customers&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;preference_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Add with default NULL&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;documents&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;content_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;768&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After Adding, Populate the Column:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Generate embeddings for existing data&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dropping Vector Columns
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Drop a specific vector column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Drop multiple columns&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_vector&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Dropping Tables with Vector Columns
&lt;/h3&gt;

&lt;p&gt;Tables containing vector columns can be dropped normally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Drop table&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Drop table with CASCADE CONSTRAINTS&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;CASCADE&lt;/span&gt; &lt;span class="k"&gt;CONSTRAINTS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Drop and purge from recycle bin&lt;/span&gt;
&lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="n"&gt;PURGE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Prohibited Operations on Vector Columns
&lt;/h2&gt;

&lt;p&gt;Oracle Database 23ai has specific restrictions on where and how vector columns can be used. Understanding these limitations is crucial for proper database design.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Table and Storage Restrictions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cannot Define Vector Columns In:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;External Tables:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This will fail&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;ext_vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="k"&gt;EXTERNAL&lt;/span&gt; &lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; As of Oracle Database 26ai, external tables CAN be created with VECTOR columns, allowing vector embeddings in text or binary format stored in external files to be rendered as the VECTOR data type. Check your database version for availability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Index-Organized Tables (IOTs):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Cannot use as primary key&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;iot_vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
    &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Cannot use as non-key column either&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;iot_vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
    &lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Clusters and Cluster Tables:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Vectors cannot be part of clusters&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;CLUSTER&lt;/span&gt; &lt;span class="n"&gt;vector_cluster&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Global Temporary Tables:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- This will fail&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;GLOBAL&lt;/span&gt; &lt;span class="k"&gt;TEMPORARY&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;temp_vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;COMMIT&lt;/span&gt; &lt;span class="k"&gt;DELETE&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Manual Segment Space Management (MSSM) Tablespaces:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Only the SYS user can create vectors as BasicFiles in MSSM tablespaces. Regular users should use Automatic Segment Space Management (ASSM) tablespaces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create ASSM tablespace for vectors&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;TABLESPACE&lt;/span&gt; &lt;span class="n"&gt;vector_data&lt;/span&gt;
&lt;span class="n"&gt;DATAFILE&lt;/span&gt; &lt;span class="s1"&gt;'/u01/app/oracle/oradata/vector_data01.dbf'&lt;/span&gt; &lt;span class="k"&gt;SIZE&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt;
&lt;span class="n"&gt;SEGMENT&lt;/span&gt; &lt;span class="k"&gt;SPACE&lt;/span&gt; &lt;span class="n"&gt;MANAGEMENT&lt;/span&gt; &lt;span class="n"&gt;AUTO&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- Required for non-SYS users&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Partitioning Restrictions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Sub-partitioning Keys:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Vector columns cannot be sub-partition keys&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sales_data&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;sale_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SUBPARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;HASH&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;(...);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Vectors in Partitioned Tables (Allowed):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Vectors CAN exist in partitioned tables&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;sales_data&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;sale_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="k"&gt;RANGE&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2023&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'2024-01-01'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;PARTITION&lt;/span&gt; &lt;span class="n"&gt;p2024&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="k"&gt;LESS&lt;/span&gt; &lt;span class="k"&gt;THAN&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-01'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Constraint Restrictions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Primary Keys:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Foreign Keys:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;related&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;vector_ref&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Unique Constraints:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Check Constraints:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;CHECK&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Default Values:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;vectors&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="s1"&gt;'[0,0,0,0]'&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Column Modification Restrictions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cannot Modify Vector Column Definition:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Cannot change dimensions or format&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;

&lt;span class="c1"&gt;-- Cannot change to/from VECTOR type&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;MODIFY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Workaround - Add New Column:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Instead, add a new column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ADD&lt;/span&gt; &lt;span class="n"&gt;embedding_new&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT64&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Migrate data (with conversion if needed)&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding_new&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Drop old column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;DROP&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Rename new column&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;RENAME&lt;/span&gt; &lt;span class="k"&gt;COLUMN&lt;/span&gt; &lt;span class="n"&gt;embedding_new&lt;/span&gt; &lt;span class="k"&gt;TO&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Index Restrictions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Non-Vector Indexes:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector columns cannot be part of traditional indexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- B-tree index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_embedding&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;

&lt;span class="c1"&gt;-- Bitmap index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;BITMAP&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_embedding&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;

&lt;span class="c1"&gt;-- Reverse key index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_embedding&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;REVERSE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;

&lt;span class="c1"&gt;-- Function-based index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_embedding&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;UPPER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;  &lt;span class="c1"&gt;-- Not allowed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Vector Indexes Only:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- HNSW vector index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_hnsw&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;INMEMORY&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- IVF vector index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_ivf&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;PARTITIONS&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Other Restrictions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Continuous Query Notification (CQN):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector columns are not supported in CQN queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Comparison Operators:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Standard comparison operators cannot be used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- These will fail&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2]'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2]'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2]'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use Vector Distance Functions Instead:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Correct approach&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'[0.1, 0.2, 0.3]'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices for Vector DML and DDL
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Design Considerations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Good: Specify dimensions and format upfront&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;-- Clear specification&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Less optimal: Flexible vectors harder to optimize&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;  &lt;span class="c1"&gt;-- May lead to performance issues&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Batch Operations
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Efficient: Batch update&lt;/span&gt;
&lt;span class="k"&gt;UPDATE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Less efficient: Row-by-row updates in loop&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Transaction Management
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Good practice: Commit after large operations&lt;/span&gt;
&lt;span class="k"&gt;BEGIN&lt;/span&gt;
    &lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products_archive&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;created_date&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;COMMIT&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;END&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Use ASSM Tablespaces
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create tablespace with ASSM for vector tables&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;TABLESPACE&lt;/span&gt; &lt;span class="n"&gt;vector_ts&lt;/span&gt;
&lt;span class="n"&gt;DATAFILE&lt;/span&gt; &lt;span class="s1"&gt;'/u01/oradata/vector_ts01.dbf'&lt;/span&gt; &lt;span class="k"&gt;SIZE&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt;
&lt;span class="n"&gt;AUTOEXTEND&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="k"&gt;NEXT&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt; &lt;span class="n"&gt;MAXSIZE&lt;/span&gt; &lt;span class="n"&gt;UNLIMITED&lt;/span&gt;
&lt;span class="n"&gt;SEGMENT&lt;/span&gt; &lt;span class="k"&gt;SPACE&lt;/span&gt; &lt;span class="n"&gt;MANAGEMENT&lt;/span&gt; &lt;span class="n"&gt;AUTO&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Create table in ASSM tablespace&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;TABLESPACE&lt;/span&gt; &lt;span class="n"&gt;vector_ts&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5. Monitor Vector Column Usage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check vector column statistics&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="k"&gt;table_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;data_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;nullable&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_tab_columns&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;data_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'VECTOR'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Check table size with vectors&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;segment_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;segment_type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;bytes&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;size_mb&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;user_segments&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;segment_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'PRODUCTS'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;DML Operations Supported:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;INSERT vectors directly or with embedding generation&lt;/li&gt;
&lt;li&gt;UPDATE vector columns&lt;/li&gt;
&lt;li&gt;DELETE rows from tables with vectors&lt;/li&gt;
&lt;li&gt;Load data using SQL*Loader&lt;/li&gt;
&lt;li&gt;DML on tables with HNSW indexes (23.6+)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DDL Operations Supported:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create tables with multiple vector columns&lt;/li&gt;
&lt;li&gt;Different formats and dimensions per column&lt;/li&gt;
&lt;li&gt;ADD vector columns to existing tables&lt;/li&gt;
&lt;li&gt;DROP vector columns&lt;/li&gt;
&lt;li&gt;DROP tables containing vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Restrictions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;External tables (except 26ai+)&lt;/li&gt;
&lt;li&gt;Index-Organized Tables (IOTs)&lt;/li&gt;
&lt;li&gt;Clusters and cluster tables&lt;/li&gt;
&lt;li&gt;Global temporary tables&lt;/li&gt;
&lt;li&gt;Sub-partitioning keys&lt;/li&gt;
&lt;li&gt;Primary keys, foreign keys, unique constraints&lt;/li&gt;
&lt;li&gt;Check constraints, default values&lt;/li&gt;
&lt;li&gt;Column modification (dimensions/format)&lt;/li&gt;
&lt;li&gt;MSSM tablespaces (non-SYS users)&lt;/li&gt;
&lt;li&gt;Non-vector indexes (B-tree, bitmap, etc.)&lt;/li&gt;
&lt;li&gt;Standard comparison operators (=, &amp;gt;, &amp;lt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these operations and restrictions ensures you can effectively design and manage vector-enabled applications in Oracle Database 23ai, combining the power of AI embeddings with traditional relational data operations.&lt;/p&gt;

</description>
      <category>oracle</category>
      <category>oracledatabase23ai</category>
      <category>vectordatabase</category>
      <category>vectoroperations</category>
    </item>
    <item>
      <title>Data Platforms with Bruin</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Tue, 24 Feb 2026 09:04:29 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/data-platforms-with-bruin-2k3k</link>
      <guid>https://forem.com/derrickryangiggs/data-platforms-with-bruin-2k3k</guid>
      <description>&lt;p&gt;Week 5 of Data Engineering Zoomcamp by @DataTalksClub complete&lt;/p&gt;

&lt;p&gt;Just finished Module 5 - Data Platforms with Bruin. Learned how to:&lt;/p&gt;

&lt;p&gt;✅ Build end-to-end ELT pipelines with Bruin&lt;br&gt;
✅ Configure environments and connections&lt;br&gt;
✅ Use materialization strategies for incremental processing&lt;br&gt;
✅ Add data quality checks to ensure data integrity&lt;br&gt;
✅ Deploy pipelines from local to cloud (BigQuery)&lt;/p&gt;

&lt;p&gt;Modern data platforms in a single CLI tool - no vendor lock-in&lt;/p&gt;

&lt;p&gt;Here's my homework solution: &lt;a href="https://github.com/Derrick-Ryan-Giggs/bruin-taxi-pipeline-homework" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/bruin-taxi-pipeline-homework&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following along with this amazing free course - who else is learning data engineering?&lt;/p&gt;

&lt;p&gt;You can sign up here: &lt;a href="https://github.com/DataTalksClub/data-engineering-zoomcamp/" rel="noopener noreferrer"&gt;https://github.com/DataTalksClub/data-engineering-zoomcamp/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>bigquery</category>
      <category>bruin</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Oracle AI Vector Search: Querying Vectors and Optimizing with Indexes</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Mon, 16 Feb 2026 08:23:08 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/oracle-ai-vector-search-querying-vectors-and-optimizing-with-indexes-27a5</link>
      <guid>https://forem.com/derrickryangiggs/oracle-ai-vector-search-querying-vectors-and-optimizing-with-indexes-27a5</guid>
      <description>&lt;p&gt;Oracle Database 23ai's AI Vector Search enables powerful semantic search capabilities, but to truly harness its potential, you need to understand how to query vector data efficiently and when to use vector indexes. This guide explores querying vectors and optimizing performance with specialized vector indexes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Basic Queries on Vectors
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Combining Vector and Relational Data
&lt;/h3&gt;

&lt;p&gt;One of Oracle AI Vector Search's most powerful features is the ability to combine vector data with traditional relational data in a single query. This unified approach eliminates data fragmentation and allows you to perform semantic searches alongside business logic filters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic Query Pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Create table with both relational and vector data&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product_name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;in_stock&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Insert sample data&lt;/span&gt;
&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="mi"&gt;1001&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Wireless Headphones'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Premium noise-cancelling Bluetooth headphones with 30-hour battery life'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="mi"&gt;299&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Electronics'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s1"&gt;'Y'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; 
        &lt;span class="s1"&gt;'Premium noise-cancelling Bluetooth headphones with 30-hour battery life'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Query combining vector similarity with relational filters&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;doc_model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="s1"&gt;'wireless audio device'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;COSINE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;in_stock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Y'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Electronics'&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Selecting Vector Data
&lt;/h3&gt;

&lt;p&gt;You can select vector data using the asterisk (*) keyword just like any other column:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Select all columns including vectors&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Select specific columns including the vector&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Vector dimensions can be extracted&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DIMENSION_COUNT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;dimensions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DIMENSION_FORMAT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;format&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Important Restriction: No Comparison Operators Between Vectors
&lt;/h3&gt;

&lt;p&gt;Standard comparison operators (such as =, &amp;gt;, &amp;lt;, &amp;gt;=, &amp;lt;=, !=) are not allowed between vectors. You cannot directly compare vectors using these operators.&lt;/p&gt;

&lt;p&gt;** Invalid Operations:**&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- These will cause errors&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;other_vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;other_vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;other_vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;** Valid Operations:**&lt;/p&gt;

&lt;p&gt;Instead, use specialized vector distance functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Correct: Use VECTOR_DISTANCE function&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Correct: Use shorthand operators&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;description_vector&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;  &lt;span class="c1"&gt;-- Cosine distance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Understanding Vector Indexes
&lt;/h2&gt;

&lt;p&gt;Vector indexes are specialized data structures that dramatically improve query performance on vector data. Without indexes, every similarity search performs an exhaustive comparison against all vectors in the table—a process that becomes prohibitively slow as datasets grow.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Vector Indexes Matter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exhaustive Search&lt;/strong&gt;: Compares query vector against every vector in the table&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Time Complexity&lt;/strong&gt;: O(n) - grows linearly with dataset size&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real Impact&lt;/strong&gt;: Searches that take seconds can become hours-long operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt;&lt;br&gt;
Vector indexes use sophisticated techniques to reduce the search space:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Clustering&lt;/strong&gt;: Groups similar vectors together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partitioning&lt;/strong&gt;: Divides the vector space into manageable regions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Neighbor Graphs&lt;/strong&gt;: Creates efficient navigation paths through vector space&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Result:&lt;/strong&gt;&lt;br&gt;
Vector indexes greatly reduce search space, making searches appear quick and extremely efficient, often reducing search time from hours to milliseconds.&lt;/p&gt;
&lt;h2&gt;
  
  
  Types of Vector Indexes
&lt;/h2&gt;

&lt;p&gt;Oracle AI Vector Search supports three types of vector indexes:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. In-Memory Neighbor Graph (HNSW) Index
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;HNSW (Hierarchical Navigable Small World)&lt;/strong&gt; is a graph-based index that provides the fastest query performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In-memory only index that can require a lot of memory for large datasets&lt;/li&gt;
&lt;li&gt;Extremely fast query performance (milliseconds)&lt;/li&gt;
&lt;li&gt;Uses hierarchical layers for efficient navigation&lt;/li&gt;
&lt;li&gt;Best for datasets that fit in available memory&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Creating an HNSW Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;product_hnsw_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;INMEMORY&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Memory Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To roughly determine the memory size needed to store an HNSW index, use the following formula: 1.3 * number of vectors * number of dimensions * size of your vector dimension type&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Calculation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1 million vectors × 384 dimensions × 4 bytes (FLOAT32) × 1.3 = ~2 GB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Neighbor Partition (IVF) Index
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;IVF (Inverted File Flat)&lt;/strong&gt; is a storage-based index that balances search quality with reasonable speed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Characteristics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Storage-based index not constrained by memory like HNSW&lt;/li&gt;
&lt;li&gt;Uses buffer cache and disk storage&lt;/li&gt;
&lt;li&gt;Can be used for very large datasets and still provide excellent performance compared to exhaustive similarity search&lt;/li&gt;
&lt;li&gt;Supports global and local partitioning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Creating an IVF Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;product_ivf_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;PARTITIONS&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When to Use IVF:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataset is too large to fit in memory&lt;/li&gt;
&lt;li&gt;RAM is limited or constrained&lt;/li&gt;
&lt;li&gt;Need to support very large (billion-scale) datasets&lt;/li&gt;
&lt;li&gt;Working with partitioned tables&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Hybrid Vector Index
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Hybrid indexes&lt;/strong&gt; combine full-text search with semantic vector search in a single index.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creating a Hybrid Index:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;product_hybrid_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;HYBRID&lt;/span&gt;
&lt;span class="k"&gt;PARAMETERS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'EMBEDDING MODEL doc_model'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single index for both keyword and semantic searches&lt;/li&gt;
&lt;li&gt;Run textual queries, vector similarity queries, or hybrid queries&lt;/li&gt;
&lt;li&gt;Ideal for applications needing both search approaches&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Vector Pool Memory Configuration
&lt;/h2&gt;

&lt;p&gt;The Vector Pool is memory allocated in SGA to store Hierarchical Navigable Small World (HNSW) vector indexes and all associated metadata. Proper configuration is essential for HNSW index performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring Vector Memory Size
&lt;/h3&gt;

&lt;h4&gt;
  
  
  At CDB (Container Database) Level
&lt;/h4&gt;

&lt;p&gt;At the CDB level, VECTOR_MEMORY_SIZE specifies the current size of the Vector Pool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Set vector memory size at CDB level&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;VECTOR_MEMORY_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt; &lt;span class="k"&gt;SCOPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;BOTH&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; A database restart is required to apply this parameter change.&lt;/p&gt;

&lt;h4&gt;
  
  
  At PDB (Pluggable Database) Level
&lt;/h4&gt;

&lt;p&gt;The maximum PDB VECTOR_MEMORY_SIZE value is limited to 70% of the PDB sga_target. PDBs inherit settings from the CDB but can specify their own maximum usage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check current SGA target&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="k"&gt;PARAMETER&lt;/span&gt; &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'sga_target'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- PDB inherits from CDB and can set maximum&lt;/span&gt;
&lt;span class="c1"&gt;-- Maximum = 70% of PDB's sga_target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automatic Vector Pool Growth
&lt;/h3&gt;

&lt;p&gt;Oracle 23ai supports automatic vector pool management:&lt;/p&gt;

&lt;p&gt;If VECTOR_MEMORY_SIZE is set to 1 and the sga_target is greater than 0 at CDB initialization, HNSW index creation will automatically grow the vector memory pool to satisfy the new index&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Enable automatic growth&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;VECTOR_MEMORY_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;SCOPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SPFILE&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Memory Considerations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Important Points:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector indexes are stored in this pool along with vector metadata&lt;/li&gt;
&lt;li&gt;HNSW indexes require significant RAM&lt;/li&gt;
&lt;li&gt;RAM constrains the vector index size&lt;/li&gt;
&lt;li&gt;IVF indexes use buffer cache and disk, not heavily dependent on Vector Pool&lt;/li&gt;
&lt;li&gt;IVF indexes use memory either from the vector pool if defined or the shared pool to speed up index creation and during DML operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Monitoring Vector Pool:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Check vector pool allocation and usage&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;pool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;alloc_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;used_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;populate_status&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;VECTOR_MEMORY_POOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Check individual index memory usage&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="k"&gt;owner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;index_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;index_organization&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;allocated_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;used_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;num_vectors&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="n"&gt;VECTOR_INDEX&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Choosing the Right Index Type
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Decision Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;HNSW&lt;/th&gt;
&lt;th&gt;IVF&lt;/th&gt;
&lt;th&gt;Hybrid&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fastest (milliseconds)&lt;/td&gt;
&lt;td&gt;Fast (seconds)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (in-memory)&lt;/td&gt;
&lt;td&gt;Low (disk-based)&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Highest (95-99%+)&lt;/td&gt;
&lt;td&gt;High (90-99%)&lt;/td&gt;
&lt;td&gt;Variable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dataset Size&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Small to Medium&lt;/td&gt;
&lt;td&gt;Large to Huge&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best For&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Speed-critical apps&lt;/td&gt;
&lt;td&gt;Memory-constrained environments&lt;/td&gt;
&lt;td&gt;Mixed query types&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Practical Guidelines
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use HNSW When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataset fits comfortably in available memory&lt;/li&gt;
&lt;li&gt;Query speed is critical (real-time applications)&lt;/li&gt;
&lt;li&gt;You can allocate sufficient Vector Pool memory&lt;/li&gt;
&lt;li&gt;Working with up to millions of vectors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use IVF When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dataset is too large for available memory&lt;/li&gt;
&lt;li&gt;Working with billions of vectors&lt;/li&gt;
&lt;li&gt;RAM is limited or expensive&lt;/li&gt;
&lt;li&gt;Using partitioned tables for very large datasets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use Hybrid When:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Application needs both keyword and semantic search&lt;/li&gt;
&lt;li&gt;Users perform mixed query types&lt;/li&gt;
&lt;li&gt;Want to simplify index management&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Optimization Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Index Configuration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Target Accuracy:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Higher accuracy = slower but more precise&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- Very precise&lt;/span&gt;

&lt;span class="c1"&gt;-- Lower accuracy = faster but less precise  &lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- Still good for most cases&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Balance accuracy with performance needs. TARGET ACCURACY between 90-99% provides excellent results.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Memory Allocation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Allocate adequate memory for HNSW indexes&lt;/span&gt;
&lt;span class="c1"&gt;-- Use formula: 1.3 × vectors × dimensions × format_size&lt;/span&gt;

&lt;span class="c1"&gt;-- Example for 1M vectors, 384 dimensions, FLOAT32:&lt;/span&gt;
&lt;span class="c1"&gt;-- 1.3 × 1,000,000 × 384 × 4 = ~2 GB&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;VECTOR_MEMORY_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt; &lt;span class="k"&gt;SCOPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;BOTH&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Query Optimization
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Use FETCH FIRST for better performance&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;SELECT&lt;/span&gt; 
        &lt;span class="n"&gt;product_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
    &lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Electronics'&lt;/span&gt;  &lt;span class="c1"&gt;-- Apply filters early&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;similarity_score&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;-- Limit results&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Approximate vs. Exact Search
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Approximate search (uses index, faster)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Exact search (no index, 100% accurate, slower)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;product_name&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;query_vec&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="n"&gt;EXACT&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use approximate search for most queries; reserve exact search for critical accuracy requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Example: E-commerce Product Search
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Step 1: Create table&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt; &lt;span class="k"&gt;CLOB&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="n"&gt;VARCHAR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;reviews_count&lt;/span&gt; &lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;desc_vector&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;FLOAT32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;-- Step 2: Configure memory (1M products × 384 dims × 4 bytes × 1.3 = ~2GB)&lt;/span&gt;
&lt;span class="k"&gt;ALTER&lt;/span&gt; &lt;span class="k"&gt;SYSTEM&lt;/span&gt; &lt;span class="k"&gt;SET&lt;/span&gt; &lt;span class="n"&gt;VECTOR_MEMORY_SIZE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="k"&gt;G&lt;/span&gt; &lt;span class="k"&gt;SCOPE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;BOTH&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Step 3: Create HNSW index&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;VECTOR&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;products_hnsw_idx&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc_vector&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ORGANIZATION&lt;/span&gt; &lt;span class="n"&gt;INMEMORY&lt;/span&gt; &lt;span class="n"&gt;NEIGHBOR&lt;/span&gt; &lt;span class="n"&gt;GRAPH&lt;/span&gt;
&lt;span class="n"&gt;DISTANCE&lt;/span&gt; &lt;span class="n"&gt;COSINE&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;TARGET&lt;/span&gt; &lt;span class="n"&gt;ACCURACY&lt;/span&gt; &lt;span class="mi"&gt;95&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Step 4: Query with combined filters&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;VECTOR_DISTANCE&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;desc_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;VECTOR_EMBEDDING&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="s1"&gt;'comfortable running shoes'&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;COSINE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;relevance&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'Footwear'&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;reviews_count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;relevance&lt;/span&gt;
&lt;span class="k"&gt;FETCH&lt;/span&gt; &lt;span class="k"&gt;FIRST&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt; &lt;span class="k"&gt;ROWS&lt;/span&gt; &lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Queries can combine vector and relational data&lt;/strong&gt; in a single SQL statement, eliminating data fragmentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard comparison operators (=, &amp;gt;, &amp;lt;) cannot be used between vectors&lt;/strong&gt; - use VECTOR_DISTANCE functions instead&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector indexes dramatically improve performance&lt;/strong&gt; by reducing search space through clustering, partitioning, and graphs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HNSW indexes&lt;/strong&gt; provide fastest queries but require significant memory (Formula: 1.3 × vectors × dimensions × format_size)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IVF indexes&lt;/strong&gt; work for very large datasets using disk storage, not constrained by RAM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Pool memory must be configured&lt;/strong&gt; for HNSW indexes at CDB level with VECTOR_MEMORY_SIZE parameter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database restart required&lt;/strong&gt; when changing VECTOR_MEMORY_SIZE (unless using automatic growth)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PDB maximum is 70% of sga_target&lt;/strong&gt;, inheriting from CDB configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose index type based on dataset size and memory availability&lt;/strong&gt; - HNSW for speed, IVF for scale&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By understanding these concepts and properly configuring vector indexes, you can build high-performance semantic search applications that seamlessly integrate AI capabilities with traditional relational database operations.&lt;/p&gt;

</description>
      <category>oracledatabase</category>
      <category>vectordatabase</category>
      <category>vectorindexes</category>
      <category>queryoptimisation</category>
    </item>
    <item>
      <title>Analytics Engineering</title>
      <dc:creator>Ryan Giggs</dc:creator>
      <pubDate>Sun, 15 Feb 2026 11:51:03 +0000</pubDate>
      <link>https://forem.com/derrickryangiggs/analytics-engineering-12m0</link>
      <guid>https://forem.com/derrickryangiggs/analytics-engineering-12m0</guid>
      <description>&lt;p&gt;Week 4 of Data Engineering Zoomcamp by @DataTalksClub complete&lt;/p&gt;

&lt;p&gt;Just finished Module 4 - Analytics Engineering with dbt. Learned how to:&lt;/p&gt;

&lt;p&gt;✅ Build transformation models with dbt&lt;br&gt;
✅ Create staging, intermediate, and fact tables&lt;br&gt;
✅ Write tests to ensure data quality&lt;br&gt;
✅ Understand lineage and model dependencies&lt;br&gt;
✅ Analyze revenue patterns across NYC zones&lt;/p&gt;

&lt;p&gt;Transforming raw data into analytics-ready models - the T in ELT&lt;/p&gt;

&lt;p&gt;Here's my homework solution: &lt;a href="https://github.com/Derrick-Ryan-Giggs/Analytics-Engineering" rel="noopener noreferrer"&gt;https://github.com/Derrick-Ryan-Giggs/Analytics-Engineering&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Following along with this amazing free course - who else is learning data engineering?&lt;/p&gt;

&lt;p&gt;You can sign up here: &lt;a href="https://github.com/DataTalksClub/data-engineering-zoomcamp/" rel="noopener noreferrer"&gt;https://github.com/DataTalksClub/data-engineering-zoomcamp/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>analyticsengineering</category>
      <category>dbt</category>
      <category>dataengineering</category>
    </item>
  </channel>
</rss>
