# Iraq EPC Intelligence Platform — How Live Intelligence Works

Internal technical and functional reference for **Uruk Engineering & Contracting Co. LLC**. Documents how the platform collects, processes, scores, and delivers live Iraq energy sector intelligence — and where its accuracy limits are.

**Live Platform:** https://leads.urukepc.com  
**Access:** Internal use only — Uruk Engineering & Contracting Co. LLC

---

## Table of Contents

1. [System Overview](#1-system-overview)
2. [The Intelligence Pipeline — End to End](#2-the-intelligence-pipeline--end-to-end)
3. [Data Collection — 6 Parallel Sources](#3-data-collection--6-parallel-sources)
4. [Relevance Filtering — What Gets In](#4-relevance-filtering--what-gets-in)
4a. [The BD Intelligence Layer — 4 Processing Engines](#4a-the-bd-intelligence-layer--4-processing-engines)
5. [Scoring Engine — How Results Are Ranked](#5-scoring-engine--how-results-are-ranked)
6. [Claude Haiku — The 3 AI Tasks Per Search](#6-claude-haiku--the-3-ai-tasks-per-search)
7. [Background Refresh — The Autonomous Loop](#7-background-refresh--the-autonomous-loop)
8. [Auto-Save to D1 — How Leads Appear Without User Action](#8-auto-save-to-d1--how-leads-appear-without-user-action)
9. [Trend Recording — Historical Memory](#9-trend-recording--historical-memory)
10. [Project Memory — Project Lifecycle Tracking](#10-project-memory--project-lifecycle-tracking)
10a. [Analyst Enrichment Pipeline — How the Project Pipeline Stays Current](#10a-analyst-enrichment-pipeline--how-the-project-pipeline-stays-current)
11. [Vector Search — Semantic Memory](#11-vector-search--semantic-memory)
12. [KV Cache Strategy — What Gets Cached and Why](#12-kv-cache-strategy--what-gets-cached-and-why)
13. [Source Credibility Engine](#13-source-credibility-engine)
14. [The Full Data Flow Diagram](#14-the-full-data-flow-diagram)
15. [API Endpoint Reference](#15-api-endpoint-reference)
16. [Configuration Reference](#16-configuration-reference)
17. [Known Accuracy Limitations & Operational Risks](#17-known-accuracy-limitations--operational-risks)

---

## 1. System Overview

The platform operates as an **internal real-time intelligence aggregator** for Uruk Engineering & Contracting Co. LLC that continuously monitors Iraq's energy sector for EPC (Engineering, Procurement & Construction) opportunities. It ingests raw data from 6 source categories simultaneously, filters out noise using keyword signal matching, ranks results by urgency and source quality, enriches high-value items with Claude Haiku AI analysis, and presents actionable intelligence to Uruk's BD team.

> **Accuracy posture:** The platform separates deterministic extraction (tender references, deadlines, operator names) from AI-generated analysis (briefs, confidence scores, recommendations). Extracted facts are reliable; AI-generated outputs are decision-support material and require independent verification before external use.

### Core Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                     CLOUDFLARE PAGES                            │
│                                                                 │
│  ┌──────────┐    ┌───────────────────┐    ┌─────────────────┐  │
│  │  index   │    │  functions/api/   │    │   Cloudflare    │  │
│  │  .html   │◄──►│  search.js        │◄──►│   Services      │  │
│  │  app.js  │    │  intelligence.js  │    │                 │  │
│  └──────────┘    │  headlines.js     │    │  KV Cache       │  │
│                  │  market-pulse.js  │    │  D1 (SQLite)    │  │
│                  │  leads.js         │    │  Vectorize      │  │
│                  │  watchlist.js     │    │  Workers AI     │  │
│                  │  trends.js        │    └─────────────────┘  │
│                  └───────────────────┘                         │
└─────────────────────────────────────────────────────────────────┘
         │                    │                    │
         ▼                    ▼                    ▼
  External APIs         Claude Haiku         Cloudflare Cron
  (RSS, WB, Tavily,    Anthropic API        (refresh.js every
   Serper, Jina, Exa)                        3 hours)
```

### Key Design Principles

- **No framework dependency** — Vanilla JS + HTML + CSS, single-file SPA
- **Graceful degradation** — every API key is optional; missing keys skip that source
- **Subrequest budget** — operates within Cloudflare's 50 subrequest/invocation free-tier limit
- **Speed-first caching** — aggressive KV caching prevents redundant AI calls and external API hits
- **Autonomous operation** — background cron runs every 3 hours without any user interaction

---

## 2. The Intelligence Pipeline — End to End

When a user types a query and hits Search, the following sequence executes inside a single Cloudflare Worker invocation:

```
User types: "gas processing EPC Iraq"
       │
       ▼
Step 1 │ CACHE CHECK
       │ Has this query been refreshed in the last 5 minutes?
       │ YES → return cached result immediately (< 100ms)
       │ NO  → continue to Step 2
       │
       ▼
Step 2 │ PARALLEL DATA FETCH  (all 6 sources fire simultaneously)
       │
       ├── RSS Feeds (16 Tier-1 feeds, 4 Arabic) ───────► raw XML items
       ├── World Bank Projects API ──────────────────────► project notices
       ├── World Bank Procurement API (Iraq-filtered) ──► procurement notices
       ├── Tavily AI Search ─────────────────────────────► targeted web results
       ├── Serper (Google Search API) ───────────────────► Google News results
       └── Jina AI Reader (oil.gov.iq scrape) ──────────► gov portal notices
       │
       ▼
Step 3 │ RELEVANCE FILTER
       │ Each item is tested against EPC_SIGNAL_KEYWORDS
       │ Items with zero EPC signals are discarded
       │ Items from iraqRequired sources must mention Iraq
       │
       ▼
Step 4 │ BD INTELLIGENCE LAYER (4 engines run in sequence per item)
       │
       │ Engine 1: Procurement Confidence (functions/lib/procurement.js)
       │   Scans title+description for 8 positive signals and 3 negative signals
       │   Output: procurementConfidenceScore (0-100), procurementTier, tenderRef
       │
       │ Engine 2: Entity Resolution (functions/lib/entities.js)
       │   Registry of 80+ canonical Iraq energy entities
       │   Normalises aliases: "boc" -> Basra Oil Company, "ge" -> GE Vernova
       │   Output: resolvedEntities[] with canonical name + entity type
       │
       │ Engine 3: Semantic Deduplication (functions/lib/dedup.js)
       │   3-layer dedup: URL match -> tender ref match -> Jaccard title similarity
       │   Threshold: 0.72. Best source wins (official govt > trade RSS > aggregator)
       │   Approx. 50-90 raw items -> 20-40 unique items
       │
       │ Engine 4: Classic enrichment
       │   detectSector, detectLocation, detectCompanies, detectOpportunity
       │   isSubPackage, isAward flags
       │
       │
Step 5 │ SCORING
       │ Each item receives a relevance score (0.0 – 2.0)
       │ Based on: query match, source quality, procurement keywords,
       │           energy keywords, location, sub-package flag
       │
       ▼
Step 6 │ SORT
       │ Primary: urgency rank (HIGH=3, MEDIUM=2, LOW=1, INFO=0)
       │ Secondary: relevance score (descending)
       │ Take top 30 for AI processing
       │
       ▼
Step 7 │ CLAUDE HAIKU TASK 1 — VERIFICATION
       │ Top HIGH-urgency items from low-credibility domains
       │ Claude judges: is this a real EPC opportunity or noise?
       │ Adds aiVerified:true/false + aiReason to credibility object
       │
       ▼
Step 8 │ MERGE WITH VECTOR SEARCH
       │ Semantic search in Vectorize for past matching results
       │ New unique items appended, re-sorted
       │ Final slice: top 20 items
       │
       ▼
Step 9 │ PARALLEL FINAL AI TASKS (fire simultaneously)
       │
       ├── CLAUDE HAIKU TASK 2 — OPPORTUNITY BRIEFS
       │   For each HIGH-urgency item: 2-sentence EPC analysis
       │   "What is the EPC scope? Who is the likely prime contractor?"
       │
       └── CLAUDE HAIKU TASK 3 — INTELLIGENCE REPORT
           Executive summary + 4 commercial signals + 3 BD recommendations
           Based on all 20 results combined
       │
       ▼
Step 10│ BACKGROUND WRITES (ctx.waitUntil — non-blocking)
       │
       │ A: D1 trend snapshot
       │    query, result_count, high_urgency, open_bids, awards,
       │    sectors, companies -> trend_snapshots table
       │
       │ B: Project Memory correlation (functions/lib/projectMemory.js)
       │    Top 12 items matched against 38-project seed registry
       │    Creates/advances project records in D1 projects table
       │    Records lifecycle events in D1 project_events table
       │
       │
Step 11│ RESPONSE
       │ Returns: summary, signals, recommendations, 20 enriched opportunities
       │ Each opportunity now includes:
       │   title, url, provider, sector, location, stage, urgency,
       │   companies, pubDate, brief (AI), prime (AI), credibility score,
       │   procurementConfidenceScore, procurementTier, tenderRef,
       │   resolvedEntities[], procurementExplanation[]
       │ Meta includes: projectsCorrelated count
```

Total latency: **8–25 seconds** (dominated by Claude Haiku API call ~10–15s)

---

## 3. Data Collection — 6 Parallel Sources

### Source 1: RSS Feeds (16 Tier-1 Feeds)

Fetched simultaneously with a 6-second timeout each. The top 16 Tier-1 feeds are selected; Arabic-language feeds are included in this slice.

| Feed | Language | Coverage | Weight |
|------|----------|----------|--------|
| oil.gov.iq (Ministry of Oil) | EN | Official Iraq MoO press releases | 1.0 |
| moelc.gov.iq (Ministry of Electricity) | EN | Official MoE tenders and news | 1.0 |
| Iraq Business News | EN | Iraq energy, investment, contract awards | 0.95 |
| Iraq Oil Report | EN | Oil sector deep coverage | 0.90 |
| Shafaq News (English) | EN | Kurdish region oil, gas, investment | 0.85 |
| MEED | EN | Regional EPC procurement intelligence | 0.95 |
| Zawya | EN | Gulf and Iraq project finance, tenders | 0.95 |
| IraqRFP.com | EN | Iraq-specific procurement portal | 1.0 |
| GlobalTenders Iraq | EN | Iraq tender aggregator | 0.97 |
| UNGM Iraq | EN | UN global marketplace — Iraq procurement | 0.98 |
| Reuters Energy | EN | Global energy breaking news | 0.90 |
| **INA Arabic** | **AR** | **Official Iraqi state news agency — Arabic first** | **0.92** |
| **Shafaq Arabic** | **AR** | **Kurdistan/Iraq oil & procurement in Arabic** | **0.85** |
| **Al-Sabah Iraq** | **AR** | **Iraqi government newspaper — ministry notices** | **0.82** |
| **Al-Mada** | **AR** | **Iraqi independent press — oil sector** | **0.78** |
| World Bank Iraq | EN | IDA project procurement notices | 0.90 |

Each RSS item is parsed from XML, cleaned of HTML entities and CDATA wrappers, tested for EPC signal keywords (English or Arabic), and either admitted or discarded.

### Source 2: World Bank Projects API

Endpoint: `https://search.worldbank.org/api/v2/projects?format=json&fct=countryshortname_exact:Iraq`

Returns IDA/IBRD-financed infrastructure projects in Iraq. Items are filtered for energy/EPC relevance and mapped to standard opportunity format with urgency derived from funding status and disbursement data.

### Source 3: World Bank Procurement API

Endpoint: `https://search.worldbank.org/api/v2/procnotices?format=json&fct=countryshortname_exact:Iraq`

Returns active procurement notices tied to Iraq-funded World Bank projects. Post-filtered to exclude items where `countryshortname` does not include "Iraq" (the API occasionally returns other countries). Notice type is mapped to stage: `REOI → Pre-Qualification`, `RFP → RFQ/RFP Released`, `CONTRACT → Contract Awarded`.

### Source 4: Tavily Search API

Targeted AI web search against a curated include-domains list:

```
oil.gov.iq, moelc.gov.iq, investpromo.gov.iq,
meed.com, zawya.com, iraq-businessnews.com,
iraqrfp.com, tenderspage.com, ungm.org,
ted.europa.eu, dgmarket.com, developmentaid.org,
iraqoilreport.com, rigzone.com, shafaq.com
```

Query template: `"Iraq EPC energy procurement tender 2026 {user_query}"` — this ensures Tavily targets procurement-specific content rather than general news. Tavily items receive a weight of `0.92` and a `+0.3` provider score boost.

### Source 5: Serper (Google Search API)

Two merged Google Search queries per user search:
- `"{query} Iraq EPC tender procurement 2026 site:meed.com OR site:zawya.com OR site:iraqbusinessnews.com"`
- `"{query} Iraq oil gas electricity contract award bid"`

Results include news article snippets and links. Serper items receive a `+0.3` provider score boost.

### Source 6: Jina AI Reader (Direct Gov Portal Scraping)

Jina converts `https://www.oil.gov.iq/index.php?name=News` to clean markdown, extracting tender announcements and news items directly from the Iraq Ministry of Oil procurement page. This captures data **before** it appears in press coverage. Results are KV-cached for 2 hours to avoid re-scraping on every search.

---

## 4. Relevance Filtering — What Gets In

---

## 4a. The BD Intelligence Layer — 4 Processing Engines

After relevance filtering and before scoring, every item passes through four purpose-built engines that enrich it with procurement-grade intelligence.

### Engine 1: Procurement Confidence Engine (`functions/lib/procurement.js`)

Scores each item 0–100 for the strength of evidence that it represents a real procurement event.

**Positive signals (additive):**

| Signal | Score Added | Example |
|--------|------------|--------|
| Official tender reference ID | +30 | `ITB/BOC/2025/047`, `RFQ-MoE-2026-14` |
| Submission deadline detected | +20 | "submit by 15 June", "deadline: 30/06/2026" |
| EPC contract structure named | +15 | "EPC contract", "EPCC package" |
| Package/lot naming | +10 | "Package 3", "Lot B civil works" |
| Operator identified | +10 | Named oil company or ministry as client |
| Contract value present | +10 | "$120M", "estimated USD 80 million" |
| Official document format | +10 | "Invitation to Bid", "Expression of Interest" |
| World Bank / UN source | +20 | worldbank.org, ungm.org, tendersontime.com |

**Negative signals (subtractive):**

| Signal | Score Removed | Example |
|--------|--------------|--------|
| Opinion/analysis language | −15 | "analysts say", "could be", "might" |
| Feasibility-only | −10 | "feasibility study", "pre-FEED" with no tender ref |
| General news | −5 | No procurement keywords at all |

**Tier classification:**

| Score | Tier | Badge |
|-------|------|-------|
| 65–100 | VERIFIED | Green badge |
| 40–64 | PROBABLE | Blue badge |
| 20–39 | POSSIBLE | Amber badge |
| 0–19 | INTELLIGENCE | No badge |

### Engine 2: Entity Resolution Engine (`functions/lib/entities.js`)

A hard-coded registry of 80+ Iraq energy entities. Each entity has a canonical name, entity type, and a list of aliases that resolve to it.

**Entity types and colours in the UI:**

| Type | Colour | Examples |
|------|--------|----------|
| NOC | Red | Basra Oil Company, North Oil Company, Missan Oil Company |
| Operator | Orange | BP, TotalEnergies, ExxonMobil, Shell, LUKOIL |
| EPC | Purple | Petrofac, Saipem, CPECC, Tecnicas Reunidas, Worley |
| OEM | Cyan | GE Vernova, Siemens Energy, ABB, Schneider Electric |
| Ministry | Indigo | Ministry of Oil, Ministry of Electricity, NIC |
| Development | Green | World Bank, AFESD, IsDB, EIB |
| Other | Grey | Regional governments, utilities |

**Example resolutions:**
- `"BOC"` or `"basra oil"` → `Basra Oil Company (NOC)`
- `"cnpc"` or `"china national"` → `PetroChina (Operator)`
- `"ge"` or `"ge vernova"` or `"general electric"` → `GE Vernova (OEM)`
- `"IFC"` → `World Bank Group (Development)`

### Engine 3: Semantic Deduplication Engine (`functions/lib/dedup.js`)

Three deduplication passes run in sequence. When a duplicate is found, the higher-priority source version is kept.

**Pass 1 — URL normalisation:**
URLs are stripped of query strings, tracking params, and trailing slashes, then compared. Exact URL match = duplicate.

**Pass 2 — Tender reference matching:**
Reference numbers are extracted from both items using the same regex as the Procurement Engine. If both items have the same reference number = duplicate.

**Pass 3 — Title Jaccard similarity:**
Both titles are tokenised (split, lowercased, stop words removed). Jaccard coefficient = |intersection| / |union|. If ≥ 0.72 = duplicate.

**Source priority order (highest wins):**
```
oil.gov.iq = 10   (Iraq Ministry of Oil)
moelc.gov.iq = 10  (Iraq Ministry of Electricity)
worldbank.org = 9  (World Bank official)
ungm.org = 9       (UN procurement)
meed.com = 8       (Premium trade press)
tavily = 7         (AI web search)
serper = 6         (Google Search)
RSS feeds = 5      (Default)
```

### Engine 4: Project Memory Correlation (`functions/lib/projectMemory.js`)

Runs via `ctx.waitUntil()` — non-blocking. See [Section 10](#10-project-memory--project-lifecycle-tracking) for full details.

Every item from every source must pass a relevance test before entering the scoring pipeline.

### EPC Signal Keywords

An item is considered relevant if its title + description (first 300 chars) contains at least one English or Arabic signal keyword.

**English signals:**
```
Procurement:  tender, bid, rfq, rfp, contract, award, epc, procurement, prequalif
Oil sector:   oil, gas, petroleum, pipeline, refinery, refin
Power sector: power, electricity, megawatt, mw, gw, turbine, substation, transmission
Renewables:   solar, renewable, wind farm, lng, lpg
Industrial:   petrochemical, fertilizer, ammonia
Construction: construction, engineering project, infrastructure project
Finance:      invest, financ
Institutional: ministry of oil, ministry of electricity, world bank, epc contract
```

**Arabic signals (`ARABIC_EPC_SIGNALS`):**
```
مناقصة (tender)    عطاء (bid)         طلب عروض (RFQ)     عقد (contract)
مشروع (project)   إنشاء (construction) محطة (station)   مصفاة (refinery)
نفط (oil)         غاز (gas)          كهرباء (electricity) طاقة (energy)
خط أنابيب (pipeline) طاقة شمسية (solar) تأهيل (prequalification)
وزارة النفط (Ministry of Oil)       وزارة الكهرباء (Ministry of Electricity)
شركة نفط البصرة (Basra Oil Company) شركة نفط الشمال (North Oil Company)
```

### Arabic NLP Translation Layer

After all sources are merged and before scoring, any item with Arabic Unicode characters in its title is passed to `translateArabicItems()`:

```
detectArabic(item.title)    ← checks Unicode range U+0600–U+06FF
       │
       ▼
translateArabicItems(arabicItems, env)
  │
  ├── Batches up to 8 Arabic items
  ├── Calls Claude Haiku with bilingual system prompt
  ├── Extracts per item: English title, description, sector, stage,
  │   urgency, value, location, companies[]
  └── Returns items with:
        title             ← English translation
        description       ← English summary
        sector/stage/...  ← extracted fields
        arabicOriginal    ← preserved original Arabic title
        translatedFromArabic: true
       │
       ▼
  Translated items replace Arabic originals in all[]
  → Flow continues to credibility scoring, procurement confidence,
    entity resolution, semantic dedup, and AI tasks
```

**Why this matters:** Iraqi Ministry of Oil and Ministry of Electricity publish procurement notices in Arabic **2–3 days before** English trade press coverage. Arabic-only signals from INA and Al-Sabah now reach the pipeline at source speed.

### Iraq Requirement (Global Feeds Only)

For feeds not exclusively focused on Iraq (Reuters, AGBI, OilPrice, EnergyVoice), items must also pass an Iraq geography test:

```
\biraq|basra|kirkuk|mosul|najaf|baghdad|erbil|kurdistan|iraqi\b
```

This prevents non-Iraq energy news ("Oman renewables deal", "Egyptian airport privatisation") from entering the results.

### What Gets Rejected

- General political news (sanctions, elections, protests) — no EPC signal keywords
- Sports, banking, telecom deals not related to energy infrastructure
- Duplicate URLs (first occurrence wins)
- Items with titles shorter than 10 characters
- World Bank procurement notices from countries other than Iraq

---

## 5. Scoring Engine — How Results Are Ranked

Every admitted item receives a `relevance` score from 0.0 to 2.0 using `scoreRelevance()`.

### Scoring Breakdown

```
Base score                = item.weight (0.5 – 1.0, set per source)

+0.6  Query phrase match  (title+desc contains exact query string)
+0.4  Procurement keyword (tender|rfq|rfp|bid|epc|prequalif)
+0.35 Contract award      (contract award|awarded|contract signed)
+0.2  Energy keyword      (oil|gas|power|electricity|pipeline|refinery|solar|transmission)
+0.4  World Bank source   (high-quality institutional procurement)
+0.3  Serper/Google       (active search returns, query-targeted)
+0.3  Tavily              (AI web search, targeted domains)
+0.25 MEED/Zawya/IOR      (premium trade press)
+0.05 Iraq location       (minor — present in almost all items)
+0.25 Sub-package flag    (EPC sub-packages are high-value leads)
−0.4  No EPC signal       (penalty for items that slipped past filter)

Score capped at 2.0, floored at 0.0
```

### Sort Order

Items are sorted by two criteria in sequence:

1. **Urgency rank** (primary) — HIGH (3) → MEDIUM (2) → LOW (1) → INFO (0)
2. **Relevance score** (secondary) — descending within each urgency tier

This ensures an actual Iraq Ministry of Oil tender (HIGH, score 1.8) always appears above a general news item about Iraq oil revenues (LOW, score 0.9), regardless of how long ago they were published.

### Urgency Detection

`detectOpportunity()` assigns urgency based on keywords in title + description:

| Urgency | Triggered By |
|---------|-------------|
| `HIGH` | rfq, rfp, invitation to bid, deadline, submit by, prequalification, expressions of interest, urgent, immediate, epc contract available |
| `MEDIUM` | tender, bid, procurement, contract, award, opportunity |
| `LOW` | monitoring, planning, feasibility, proposal, pipeline |
| `INFO` | awarded (past tense) — informational contract award |

---

## 6. Claude Haiku — The 3 AI Tasks Per Search

Every live search (not served from cache) triggers up to 3 Claude Haiku API calls via `callAI()`.

### The `callAI()` Helper

```
callAI(system_prompt, user_prompt, env, max_tokens)
  │
  ├── PRIMARY: Claude Haiku 4.5 (Anthropic)
  │   URL: https://api.anthropic.com/v1/messages
  │   Model: claude-haiku-4-5
  │   Auth: ANTHROPIC_KEY (BOM-stripped, whitespace-trimmed)
  │   Timeout: 25 seconds
  │   │
  │   ├── Success → return text content
  │   └── HTTP error / timeout → fall through to fallback
  │
  └── FALLBACK: Llama 3.3 70B FP8 Fast (Cloudflare Workers AI)
      Binding: env.AI (free, always available)
      Note: Used only if ANTHROPIC_KEY is absent or API call fails
```

### Task 1 — Verification (`aiVerifyTopItems`)

**When it runs:** After Step 6 (scoring), on the top HIGH-urgency items from domains with credibility score below 80.

**What it does:** Claude receives a batch of item titles + sources and judges which are genuine EPC procurement opportunities vs. clickbait or tangentially related news.

**System prompt:** *"You are an Iraq EPC procurement expert. Classify each item as VERIFIED (genuine procurement opportunity) or NOISE (general news, not actionable for EPC teams). Reply ONLY with valid JSON."*

**Output:** Each item gains `credibility.aiVerified: true/false` and `credibility.aiReason: "..."`. Verified items get a credibility boost; NOISE items are deprioritised.

**Cache:** Per search query, cached in KV for 2 hours.

### Task 2 — Opportunity Briefs (`generateOpportunityBriefs`)

**When it runs:** In parallel with Task 3, after the final 20 items are selected.

**What it does:** For each HIGH-urgency item that lacks an existing brief, Claude writes a 2-sentence analysis:
1. What is the EPC scope, value, and deadline?
2. Who is the likely prime contractor and what sub-packages are available?

**System prompt excerpt:** *"You are the Head of Business Development at a major EPC contractor. For each opportunity: write exactly 2 sentences — (1) the EPC scope, value estimate, and key deadline, (2) the likely prime contractor and sub-package opportunities available. Reply ONLY with valid JSON array."*

**Output:** Each opportunity gets `brief: "..."` and `prime: "Company Name"`. These appear on the expandable card panel in the UI when the user clicks **▼ AI Brief**.

**Cache:** Keyed by item title hash, cached in KV for 2 hours.

### Task 3 — Intelligence Report (`buildAiIntelligence`)

**When it runs:** In parallel with Task 2.

**What it does:** Claude receives the full 20-item evidence list (with urgency, stage, sector, location, companies, provider for each) and generates a structured intelligence report for EPC decision-makers.

**Output structure:**
```json
{
  "summary": "Executive summary (2-3 sentences with specific project names)",
  "signals": [
    { "type": "TENDER_ALERT|CONTRACT_AWARD|MARKET_SHIFT|RISK",
      "title": "Signal headline",
      "detail": "Specific commercial implication" }
  ],
  "recommendations": [
    { "action": "REGISTER_NOW|PREPARE_BID|MONITOR|INVESTIGATE|PARTNER",
      "text": "Specific BD action with named project, operator, deadline" }
  ],
  "marketSummary": "Iraq market overview sentence"
}
```

**Iraq Market Context injected into every prompt:**

The prompt includes hardcoded 2026 context knowledge so Claude's recommendations reference real active projects:
- BP Kirkuk JMC wellhead upgrades + surface facilities
- GE Vernova / Siemens MoE 24GW framework (ongoing substation + gas turbine packages)
- Pearl Petroleum Khor Mor Phase 2 (C3 gas processing EPC, sub-packages available)
- TotalEnergies Ratawi Solar (1GW in procurement)
- Masdar 1GW Iraq Solar (multi-site, 4 locations)
- CPECC Gharraf / Zubair Gas Gathering (active construction, sub-packages)

This grounding prevents hallucinated project names and ensures recommendations are commercially actionable.

---

## 7. Background Refresh — The Autonomous Loop

### Cron Schedule

Cloudflare triggers `functions/scheduled/refresh.js` every 3 hours via cron.

The schedule is configured in `wrangler.toml`:
```toml
[triggers]
crons = ["0 */3 * * *"]
```

### What Happens Each Refresh

```
Every 3 hours:
  For each of 14 pre-defined queries:
  ["oil", "gas", "power plant", "solar", "refinery",
   "transmission", "basra", "baghdad", "kirkuk", "pipeline",
   "petrochemical", "water treatment", "EPC", "tender"]
  
    1. Call searchModule.onRequestPost({ query })
       → This triggers the full 6-source fetch + Claude Haiku pipeline
    
    2. Cache the result in KV with key:
       search_refresh_v1_{query_lowercase_underscored}
       TTL: 3 hours (10,800 seconds)
    
    3. Filter results for urgency === "HIGH"
    
    4. For each HIGH-urgency item:
       a. Check D1: SELECT id FROM leads WHERE url = ? OR title = ?
       b. If NOT found → INSERT INTO leads (auto-saved · background refresh)
       c. If found → skip (deduplication)
    
    5. Collect all newly inserted leads across all 14 queries
    
    6. If any new leads found → fire WEBHOOK_URL (if configured)
  
  After all 14 queries:
    Store refresh summary in KV key: refresh_summary_last
    Summary includes: timestamp, duration, queries, new leads saved
```

### 2-second Delay Between Queries

A 2-second pause between each of the 14 queries prevents rate-limiting by external APIs (especially Tavily and Serper which have per-minute request limits).

### Cache Hit Benefit

When users search for "oil" or "gas" within 5 minutes of the background refresh, the result is served from KV cache instantly (< 100ms) with no AI or API calls.

### Client-Side Background Poll (30 min)

Separate from the server cron, the frontend (`app.js`) runs a silent background check every 30 minutes:

```
Every 30 minutes (client-side setInterval):
  Call GET /api/intelligence (silent mode)
  → Compare meta.analysedAt with current state.intelligence.meta.analysedAt
  → If timestamps match (still same 8h cache hit):
       Skip all re-renders. Zero UI disruption.
  → If timestamps differ (analyst has run a new cycle):
       Re-render pipeline, predictions, pipeline board, KPIs, competitors
       Show single toast: "🤖 Intelligence refreshed — pipeline updated"
       Update last-sync timestamp
```

**Why this matters:** Users who leave the platform open for extended periods always see current data without needing to manually reload. The check is silent — no loading spinner, no progress bar, no disruption to the current view. On first page load, intelligence fetches normally with full loading indicators.

**Auto-refresh scope:** The 5-minute frontend auto-refresh timer covers Opportunities, Headlines, and Market Pulse only. Intelligence is deliberately excluded — its 8-hour analyst cycle makes 5-minute re-fetches redundant. The 30-min silent poll is sufficient.

---

## 8. Auto-Save to D1 — How Leads Appear Without User Action

This is the mechanism by which the **Saved Leads tab populates automatically**.

### The Auto-Save Filter

Only `urgency === "HIGH"` items are auto-saved. This is intentional — MEDIUM and LOW items are context/monitoring data, not actionable leads.

### Deduplication Logic

Before saving, the system checks:
```sql
SELECT id FROM leads
WHERE url = ?  OR  (url IS NULL AND title = ?)
LIMIT 1
```

If a match is found → skip (already in the leads table).  
If no match → INSERT with `notes = "auto-saved · background refresh"`.

### Manual vs Auto Saves

Users can distinguish auto-saved leads from manual saves by the `notes` field:
- `"auto-saved · background refresh"` → discovered by the cron without user interaction
- `""` (empty) → saved by user clicking ★ Save on a search result card

### What Gets Stored Per Lead

```sql
-- Base fields (migration 0001)
title        TEXT    -- Full opportunity title
url          TEXT    -- Direct source URL
provider     TEXT    -- Source name (e.g. "Tavily / tendersontime.com")
sector       TEXT    -- Oil & Gas / Power Generation / Renewables / etc.
location     TEXT    -- Iraq / Basra / Baghdad / Kirkuk / etc.
stage        TEXT    -- RFQ/RFP Released / EPC Bidding / etc.
urgency      TEXT    -- HIGH / MEDIUM / LOW
companies    TEXT    -- JSON array of mentioned companies
pub_date     TEXT    -- Original publication date
brief        TEXT    -- Claude Haiku 2-sentence EPC brief
prime        TEXT    -- Likely prime contractor (from Claude)
query        TEXT    -- Search query that discovered this lead
notes        TEXT    -- User notes or "auto-saved · background refresh"
saved_at     TEXT    -- Timestamp of when saved to D1

-- BD Intelligence fields added in migration 0002
procurement_tier        TEXT    -- VERIFIED / PROBABLE / POSSIBLE / INTELLIGENCE
procurement_confidence  INTEGER -- 0–100 score at time of saving
tender_ref              TEXT    -- Extracted reference number (e.g. ITB/BOC/2025/047)
resolved_entities       TEXT    -- JSON array: [{canonical, type}, ...]
project_id              INTEGER -- FK to projects table if correlated
```

This means every saved lead is a time-stamped procurement intelligence snapshot — not just a URL bookmark.

---

## 9. Trend Recording — Historical Memory

---

## 10. Project Memory — Project Lifecycle Tracking

### What It Is

The Project Memory Engine (`functions/lib/projectMemory.js`) automatically builds and maintains a database of Iraq energy projects, correlating incoming search results to known projects and advancing their lifecycle stage over time.

### How It Works

After every live search, the top 12 results are passed to `correlateToProject()` via `ctx.waitUntil()`.

For each item, the engine runs two matching strategies:

**Strategy 1 — Name fingerprinting:**
The item title is checked against 27 seed project patterns (regex). Examples:
- `rumaila|rumayla` → Rumaila Oil Field Development (Basra Oil Company)
- `khor mor|khormor` → Khor Mor Phase 2 C3 Gas Train (Pearl Petroleum)
- `karbala refin` → Karbala Refinery (Ministry of Oil)

**Strategy 2 — Operator + sector + location matching:**
If no name match, the engine queries D1 for existing projects matching the item's operator (via LIKE), sector, and location. This catches new signals about known projects that don't mention the project name directly.

### Lifecycle Stage Progression

When a match is found, the engine detects the lifecycle event type from the item content:

| Event Detected | Keywords | D1 Stage Assigned |
|---------------|---------|-------------------|
| `award` | "contract awarded", "signed" | Award |
| `construction` | "under construction", "commencing" | Construction |
| `commissioning` | "commissioning", "start-up" | Commissioning |
| `commercial_bid` | "commercial bid", "financial" | Commercial Bid |
| `technical_bid` | "technical bid", "technical proposal" | Technical Bid |
| `epc_tender` | "epc tender", "invitation to bid" | EPC Tender |
| `prequalification` | "prequalification", "pre-qual" | Prequalification |
| `financing` | "financing approved", "loan signed" | Financing |
| `feed_complete` | "FEED complete", "FEED approved" | FEED Complete |
| `feed_study` | "FEED study", "front-end engineering" | FEED |
| `announcement` | (fallback) | Intelligence |

The stage only **advances** (never reverts). A project at EPC Tender cannot go back to Prequalification.

### Award Probability by Stage (Iraq-Specific Calibration v4.5)

Probabilities are set to Iraq's actual EPC award rates — **not global averages** — based on Rystad Energy, IHS Markit, and Wood Mackenzie Iraq project data. Iraq's rates are 10–15% lower than global norms due to: budget freeze risk, political disruption, and MoO/MoE mid-tender reallocation.

| Stage | Award Probability | vs. Global Avg | Iraq-Specific Reason |
|-------|------------------|---------------|----------------------|
| Intelligence | 8% | ~5% | Iraq has an active pipeline — slightly above baseline |
| FEED | 20% | ~25% | 1 in 5 studies become tenders |
| FEED Complete | 38% | ~45% | Completion is a strong signal in Iraq |
| Financing | 48% | ~55% | Iraq budget uncertainty discounts this |
| Prequalification | 55% | ~65% | 35–40% cancel before tender in Iraq |
| EPC Tender | **62%** | ~74% | **Key recalibration** — Iraq actual rate ~62% (Rystad 2023) |
| Technical Bid | 72% | ~83% | Late-stage cancellations remain elevated |
| Commercial Bid | 82% | ~91% | BAFO stage still sees ~18% abort in Iraq |
| Award | 100% | 100% | — |
| Construction | 100% | 100% | — |
| Commissioning | 100% | 100% | — |

### Database Records Created

For each correlated item:
1. **`projects` table** — created or updated (stage, probability, source_count, last_updated)
2. **`project_events` table** — one row per signal: event_type, title, source_url, tender_ref, procurement_confidence, detected_date

The full event timeline is accessible via `GET /api/projects?id=N`.

### When It Records

Every **live search** (not served from cache) records a trend snapshot using Cloudflare's `ctx.waitUntil()` to ensure the D1 write completes even after the HTTP response is sent.

```js
waitUntil(
  env.DB.prepare(`INSERT INTO trend_snapshots ...`)
    .bind(query, resultCount, highUrgency, openBids, awards, sectors, companies)
    .run()
);
```

The `ctx.waitUntil()` is critical — without it, Cloudflare terminates the worker after the response is sent, and the async D1 write would be silently dropped.

### What Gets Recorded

```sql
query         -- "gas processing" (lowercased)
result_count  -- 20 (total items returned)
high_urgency  -- 8  (items with urgency = HIGH)
open_bids     -- 3  (items with stage = RFQ/RFP Released)
awards        -- 1  (items with stage = Contract Awarded)
sectors       -- ["Oil & Gas", "Gas", "Power Generation"]  (JSON)
companies     -- ["Petrofac", "Technip", "Saipem"]         (JSON)
snapshot_at   -- 2026-05-19 07:20:33
```

### Reading Trend History

- `GET /api/trends` → Summary of all tracked queries (latest snapshot per query, ordered by recency)
- `GET /api/trends?query=gas+processing&limit=30` → Full history for one query, with delta vs previous snapshot

The delta (▲▼) shows whether market activity for a query is increasing or decreasing over time — useful for spotting emerging sectors (e.g. if "solar Iraq" results jump from 8 to 18, the solar pipeline is heating up).

---

## 10a. Analyst Enrichment Pipeline — How the Project Pipeline Stays Current

> **v4.5 feature.** This section describes the hybrid curated + AI analyst architecture first introduced in v4.2 and refined through v4.5 (full `runAnalystEnrichment` implementation, silent background polling, non-destructive feed refresh).

### The Problem It Solves

A static list of 38 hand-researched projects (`RAW_SCENARIOS`) would become stale within weeks. Procurement stages advance, new tenders are issued, values get revised. The Analyst Enrichment Pipeline keeps the project database current — automatically, conservatively, and without hallucination.

### Three Intelligence Sources

On every request (when the KV cache is stale), the platform fetches three sources **in parallel**:

```
1. fetchLiveHeadlinesForScan()     — 9 RSS feeds (EN + AR) · up to 20 headlines
2. fetchD1Intelligence()           — D1: project stage advances (last 30d)
                                      D1: HIGH-urgency auto-saved leads (last 14d)
```

All three are passed to `runAnalystEnrichment()` as context for Claude Haiku. Arabic headlines are **not pre-translated** before analyst enrichment — Claude Haiku reads Arabic natively and responds in structured JSON.

### The Claude Haiku Analyst Prompt

Claude Haiku receives:
- The full tracked project database (id, title, sector, location, stage, value, timeline)
- Live headlines in English and Arabic (numbered, with URLs)
- D1 stage advances: `• Khor Mor C3 | Gas | Erbil | Lifecycle: EPC Tender | Detected: 2026-05-14`
- D1 HIGH leads: `• "Iraq MOE issues RFQ for 500MW substation" | Power | Baghdad | Stage: RFQ Released`

Claude returns a single JSON object:
```json
{
  "updates": [
    { "id": "khor-mor-c3", "stage": "EPC Bidding", "updateNote": "headline 4" }
  ],
  "additions": [
    { "id": "moe-substation-2026", "title": "500MW Baghdad Substation", "sector": "Transmission & Grid", ... }
  ],
  "deactivations": [
    { "id": "ratawi-solar", "reason": "headline 7 confirms construction started" }
  ]
}
```

**Rules enforced in the prompt:**
- `updates[].id` must exactly match an existing tracked project ID
- Stage/value/timeline only updated if explicitly stated in a source — not inferred
- D1 lifecycle stage advances are treated as high-confidence (already verified by `projectMemory.js`)
- Additions must cite a specific headline or D1 signal URL
- **Deactivations** only triggered when a source explicitly confirms: contract award + construction started, project completion, or cancellation. Sets `active=0` in D1 permanently.
- Max 10 updates, max 4 additions, max 3 deactivations per cycle

### D1 Persistent State (`pipeline_state` table)

This is the core of v4.2+. Unlike the previous approach where analyst output lived only in KV (lost after 8h), every update, discovery, and deactivation is now **persisted permanently to D1**.

```
Analyst run completes
     ↑
     │  persistAnalystOutput(env, updates, newItems, deactivations)
     │
     ├── For each stage update:
     │       INSERT INTO pipeline_state (project_id, stage, value, timeline, update_note, ...)
     │       ON CONFLICT DO UPDATE  ← COALESCE keeps non-null values; never overwrites good data
     │
     ├── For each new discovery:
     │       INSERT INTO pipeline_state (project_id, full_project JSON, is_discovery=1, ...)
     │       ON CONFLICT DO UPDATE full_project  ← updates if analyst finds more detail
     │
     └── For each deactivation:
             INSERT INTO pipeline_state (project_id, active=0, update_note=reason, ...)
             ON CONFLICT DO UPDATE active=0  ← permanently removes from active pipeline
             loadPipelineState() reads deactivatedIds → base.filter(s => !deactivatedIds.has(s.id))
```

### Request Flow (getEnrichedPipeline)

```
┌────────────────────────────────────────────────────────────────┐
│ 1. ensurePipelineStateTable()  — CREATE TABLE IF NOT EXISTS (safe first-run) │
│ 2. loadPipelineState()         — read all active rows from D1                 │
│    → overrides Map (project_id → {stage, value, timeline, updateNote})        │
│    → discoveries []  (full project JSON objects)                              │
│ 3. Apply overrides to RAW_SCENARIOS — each project gets its D1-persisted state│
│    (recalculates dynamic confidence from new stage)                           │
│ 4. KV cache check (analyst_pipeline_v1, TTL 8h)                               │
│    ✓ HIT  → return cached projects + any D1 discoveries not yet in cache      │
│    ✕ MISS → continue                                                           │
│ 5. runAnalystEnrichment(persistedBase, headlines, d1Intel, env)               │
│    → Claude Haiku reads all 3 intelligence sources                            │
│    → returns {updates[], additions[]}                                         │
│ 6. Apply fresh updates to persistedBase                                       │
│ 7. sanitiseAddition() on each new project:                                    │
│    → sector normalisation ("electricity" → "Power Generation")               │
│    → normalizeEntityList() on companies[] ("CNPC" → "PetroChina")           │
│    → reject if sector not in VALID_SECT                                      │
│ 8. persistAnalystOutput(env, updates, newItems)  — write to D1              │
│ 9. Merge: freshBase + olderDiscoveries + newItems                             │
└10. storeEnrichedPipeline(env, result)  — write to KV (8h TTL)               │
└────────────────────────────────────────────────────────────────┘
```

### Accuracy Over Time

| Time | Pipeline State |
|------|---------------|
| Day 1 | 38 curated base projects (RAW_SCENARIOS seed) |
| Week 1 | 38 base + analyst stage updates from headlines + 2–4 new discoveries in D1 |
| Month 1 | Stage overrides accumulate in D1 — analyst reads its own past work as new baseline |
| Month 3+ | Pipeline reflects real procurement progress; RAW_SCENARIOS is seed data only |

### Anti-Hallucination Controls

- Analyst only updates a project if the **exact project ID** is matched in the response
- Stage/value/timeline fields are only applied if they are non-empty and not in the `BAD_VALS` set (`"undefined"`, `"null"`, `"unknown"`, `"n/a"`)
- D1 stage advances from `projectMemory.js` are used as supporting evidence — they were already verified from real news searches
- Company names in additions are normalised through `entities.js` (50+ canonical entries) before being stored
- The pipeline falls back to `RAW_SCENARIOS` + `D1 overrides` gracefully if the analyst call fails or returns invalid JSON
- Deactivations only applied if the source explicitly confirms one of three conditions: award + construction, completion, or cancellation

---

## 10b. Confidence Calibration — Brier Score Tracking

> **v4.5 feature.** Empirical probability calibration infrastructure.

### The Problem

Confidence scores are only useful if they are accurate. A model that labels every project "74% likely to reach award" is not calibrated if Iraq's actual rate is 62%. The Brier score system measures this gap and enables data-driven recalibration over time.

### How It Works

```
project advances stage
       │
       ▼
logStageTransition(env, projectId, stage, predictedProbability)
  └── INSERT INTO confidence_log (project_id, observed_stage, predicted_prob, outcome='pending')

project reaches Award or Construction detected
       │
       ▼
markProjectAwarded(env, projectId)
  └── UPDATE confidence_log SET outcome='awarded' WHERE project_id=? AND outcome='pending'
```

### Brier Score Formula

```
Brier Score = mean((predicted_probability − actual_outcome)²)

where actual_outcome = 1 (awarded) or 0 (cancelled/not awarded)
```

| Brier Score | Interpretation |
|-------------|----------------|
| 0.00 | Perfect calibration |
| < 0.05 | Excellent |
| < 0.10 | Good |
| < 0.15 | Fair |
| ≥ 0.25 | No better than random guessing |

### Calibration Endpoint

`GET /api/intelligence?mode=calibration` returns:
```json
{
  "totalObservations": 47,
  "resolvedObservations": 12,
  "overallBrierScore": 0.087,
  "brierInterpretation": "good",
  "stageStats": [
    {
      "stage": "EPC Tender",
      "observations": 8,
      "resolved": 0.625,
      "meanPredicted": 0.620,
      "brierScore": 0.061,
      "calibrationGap": 0.005
    }
  ],
  "note": "Brier score: 0=perfect, 0.25=uninformative baseline"
}
```

`calibrationGap` = actual rate − predicted. Positive = model underestimates (conservative). Negative = model overestimates (optimistic). After 3–6 months of live operation, this data drives recalibration of `STAGE_PROBABILITY` in `projectMemory.js`.

---

## 11. Vector Search — Semantic Memory

### How It Works

When the Vectorize binding (`iraq-epc-opportunities`) is active, every search also queries the vector index for semantically similar past results.

```
User query "gas processing EPC" 
  → Vectorize semantic search
  → Returns past items about "gas treatment", "processing unit EPC",
    "gas separation facility" even if they don't literally match the query
  → Unique items (not in current RSS/API results) are appended to merged list
```

### Growth Over Time

The vector index grows as items are ingested via `POST /api/ingest`. Over time, the platform builds a semantic memory of every Iraq EPC opportunity it has ever encountered, enabling discovery of related opportunities that keyword search alone would miss.

### Current Status

Vectorize requires a paid Cloudflare plan. If the binding is unavailable, the platform runs in keyword-only mode with no degradation in other features.

---

## 12. KV Cache Strategy — What Gets Cached and Why

| Cache Key | Content | TTL | Reason |
|-----------|---------|-----|--------|
| `search_refresh_v1_{query}` | Full search result (JSON) | 3 hours | Background refresh result — served to users with zero API calls |
| `headlines_v1` | All 15 latest Iraq headlines | 30 min | RSS feeds are slow; headline ticker content doesn't need to be <1 min fresh |
| `market_pulse_v3` | All KPI stats (oil prod, electricity, etc.) | 24 hours | EIA/World Bank data updates at most daily |
| `ai_recs_v2` | Claude's 6 BD recommendations | 2 hours | Recommendations don't need per-request regeneration |
| `analyst_pipeline_v1` | Full enriched pipeline (38 base + overrides + discoveries) | 8 hours | Analyst cycle runs twice a day; permanent state in D1 means no data loss on expiry. Frontend polls this every 30 min (silent, no re-render if `analysedAt` unchanged) |
| `ai_brief_{hash}` | Per-opportunity Claude brief | 2 hours | Same article won't generate a different brief within 2 hours |
| `refresh_summary_last` | Last background refresh stats | 24 hours | Monitoring/debugging only |

### Cache Miss Flow

If no cache hit → full pipeline runs → result is stored in KV before returning to user → next request for same data served from cache.

---

## 13. Source Credibility Engine

Every opportunity item gets a credibility score (0–100) based on its source domain.

### Domain Reputation Tiers

| Tier | Score | Examples |
|------|-------|---------|
| **S — Official Government** | 95–100 | oil.gov.iq, moelc.gov.iq, investpromo.gov.iq, worldbank.org, un.org |
| **A — Premium Trade Press** | 85–94 | meed.com, zawya.com, reuters.com, bloomberg.com, iraqoilreport.com, rigzone.com |
| **B — Quality Regional** | 70–84 | iraq-businessnews.com, shafaq.com, rudaw.net, oilprice.com, energyvoice.com |
| **C — General News** | 55–69 | General news aggregators, lesser-known portals |
| **D — Unverified** | 40–54 | Blogs, forums, aggregators without editorial standards |

### AI Verification Layer

Items from sources with a credibility score below **80** AND classified as HIGH urgency are sent to Claude Haiku for content verification (Task 1). Claude's verdict (`aiVerified: true/false`) is stored on the item and displayed in the UI as a credibility indicator.

This two-tier system means:
- A Ministry of Oil tender from `oil.gov.iq` (score 99) is instantly trusted — no AI verification needed
- A HIGH-urgency item from an unknown blog (score 45) is verified by Claude before being presented as a genuine lead

---

## 14. The Full Data Flow Diagram

```
┌─────────────────────────────────────────────────────────────────────────┐
│                         USER INTERACTION                                │
│                                                                         │
│  User types "EPC tender Basra" → POST /api/search                       │
└───────────────────────────────────┬─────────────────────────────────────┘
                                    │
                    ┌───────────────▼───────────────┐
                    │     KV CACHE CHECK             │
                    │  search_refresh_v1_epc_...     │
                    └───────────────┬───────────────┘
              Cache hit < 5min      │       Cache miss
              ◄──────── YES         │       NO ──────────────►
              Return instantly      │
                                    ▼
              ┌─────────────────────────────────────────┐
              │         PARALLEL DATA FETCH              │
              │                                          │
              │  RSS ──────► 0–15 relevant items         │
              │  WB Proj ──► 0–5  Iraq projects          │
              │  WB Proc ──► 0–10 Iraq proc notices      │
              │  Tavily ───► 0–10 targeted web results   │
              │  Serper ───► 0–11 Google results         │
              │  Jina ─────► 0–5  gov portal items       │
              │                                          │
              │  Total raw: ~20–60 items                 │
              └─────────────────┬───────────────────────┘
                                │
              ┌─────────────────▼───────────────────────┐
              │    FILTER → DEDUPLICATE → ENRICH          │
              │                                          │
              │  hasEpcSignal() filter                   │
              │  URL-based dedup                         │
              │  detectSector, detectLocation,           │
              │  detectCompanies, detectOpportunity      │
              │  scoreRelevance()                        │
              │                                          │
              │  Result: 10–35 enriched items            │
              └─────────────────┬───────────────────────┘
                                │
              ┌─────────────────▼───────────────────────┐
              │  SORT: urgency rank → relevance score     │
              │  Take top 30 → AI verify → merge vector   │
              │  Final: top 20 items                     │
              └────┬───────────────────────────────┬─────┘
                   │                                │
                   ▼                                ▼
        ┌──────────────────────┐     ┌──────────────────────────┐
        │ CLAUDE HAIKU TASK 1  │     │ CLAUDE HAIKU TASKS 2+3   │
        │ Verify HIGH items    │     │ (run in parallel)         │
        │ from low-cred sources│     │                          │
        └──────────┬───────────┘     │ Task 2: Opportunity      │
                   │                 │   Briefs per item         │
                   │                 │ Task 3: Intelligence      │
                   │                 │   Report (summary +       │
                   │                 │   signals + recs)         │
                   │                 └──────────────┬───────────┘
                   └────────────────────────────────┘
                                    │
              ┌─────────────────────▼───────────────────────┐
              │          RESPONSE ASSEMBLY                    │
              │  summary, signals, recommendations,          │
              │  20 opportunities (with briefs, prime,       │
              │  credibility, urgency, stage, sector,        │
              │  location, companies, url, pubDate)          │
              └─────────┬───────────────────────────────────┘
                        │
           ┌────────────┴─────────────┐
           ▼                          ▼
    ┌──────────────┐         ┌─────────────────────┐
    │ SEND TO USER │         │  ctx.waitUntil()     │
    │  < 100ms for │         │  D1 trend snapshot   │
    │  cached      │         │  INSERT trend_snaps  │
    │  8–25s live  │         └─────────────────────┘
    └──────────────┘

───────────────────────────────────────────────────────────────────────────

                    BACKGROUND LOOP (every 3 hours, no user needed)

    ┌──────────────────────────────────────────────────────────────────┐
    │  Cron fires → refresh.js                                         │
    │                                                                  │
    │  For each of 14 queries (oil, gas, EPC, tender, solar...):       │
    │    → Full search pipeline (6 sources + Claude Haiku)             │
    │    → Cache result in KV (3h TTL)                                 │
    │    → Filter urgency=HIGH items                                   │
    │    → D1 dedup check (url / title)                               │
    │    → INSERT new leads → D1 leads table                           │
    │                                                                  │
    │  After all 14 queries:                                           │
    │    → If new leads found → POST WEBHOOK_URL                       │
    │    → Slack/Discord/Make/Zapier alert with lead list              │
    └──────────────────────────────────────────────────────────────────┘
```

---

## 15. API Endpoint Reference

### `POST /api/search`

The primary intelligence endpoint.

**Request body:**
```json
{
  "query": "gas processing EPC Iraq",
  "useCache": false   // optional: bypass KV cache for forced fresh fetch
}
```

**Response:**
```json
{
  "query": "gas processing EPC Iraq",
  "summary": "Executive summary from Claude Haiku...",
  "signals": [
    { "type": "TENDER_ALERT", "title": "...", "detail": "..." }
  ],
  "recommendations": [
    { "action": "REGISTER_NOW", "text": "..." }
  ],
  "marketSummary": "Iraq market context sentence...",
  "opportunities": [
    {
      "title": "...",
      "url": "...",
      "provider": "Tavily / tendersontime.com",
      "sector": "Oil & Gas",
      "location": "Basra",
      "stage": "RFQ/RFP Released",
      "urgency": "HIGH",
      "companies": ["Petrofac", "Technip"],
      "pubDate": "2026-05-19T...",
      "brief": "Claude Haiku 2-sentence EPC analysis...",
      "prime": "Petrofac",
      "relevance": 1.85,
      "credibility": { "score": 92, "tier": "A", "aiVerified": true }
    }
  ],
  "meta": {
    "total": 28,
    "aiEngine": "claude-haiku-4-5",
    "sourceCounts": { "rss": 11, "wbProcurement": 3, "tavily": 8, "serper": 6 },
    "timestamp": "2026-05-19T07:20:00.000Z"
  }
}
```

### `GET /api/intelligence`

Static + AI-enriched intelligence data. No external API calls on cache hit.

**Response fields:** `predictions` (34), `recentAwards` (9), `recommendations` (6), `competitors`, `stats`

### `GET /api/projects`

All tracked projects with lifecycle stage, award probability, and summary event counts.

**Response:** `{ projects: [...], total: N }` — each project includes `id`, `project_name`, `operator`, `sector`, `location`, `lifecycle_stage`, `award_probability`, `event_count`, `max_confidence`, `latest_tender_ref`

### `GET /api/projects?id=N`

Single project record + full event timeline.

**Response:** `{ project: {...}, events: [...] }` — events ordered by `detected_date DESC`

### `GET /api/headlines`

Live Iraq energy headlines from 7 RSS feeds. 30-min KV cache.

**Response:** `{ headlines: [...], fromCache: bool, count: 15 }`

### `GET /api/market-pulse`

Live Iraq energy KPIs. 24-hour KV cache.

**Response:** `{ oilProd, elecAccess, gasFlared, gridCapacity, oilRevenue, renewables, operators, peakDemand, eplPipeline }`

### `GET /api/leads` | `POST /api/leads` | `DELETE /api/leads`

D1 CRUD for saved leads.

- `GET` → `{ leads: [...], total: N }` — each lead includes all procurement intelligence fields
- `POST { title, url, urgency, procurement_tier, procurement_confidence, tender_ref, resolved_entities, ... }` → `{ ok: true, id: N }`
- `DELETE ?id=N` → `{ ok: true }`
- `PATCH { id, notes }` → `{ ok: true }` (update notes)

### `GET /api/watchlist` | `POST /api/watchlist` | `DELETE /api/watchlist`

D1 CRUD for monitored keywords.

- `GET` → `{ keywords: [...] }`
- `POST { keyword, label }` → `{ ok: true }`
- `DELETE ?id=N` → `{ ok: true }`

### `GET /api/trends` | `POST /api/trends`

D1 trend snapshot history.

- `GET` → `{ summary: [...], total: N }` (all queries, latest snapshot each)
- `GET ?query=oil&limit=30` → `{ query, snapshots: [...], total: N }` (history with deltas)
- `POST { query, result_count, high_urgency, ... }` → `{ ok: true }` (manual record)

---

## 16. Configuration Reference

Set all secrets in: **Cloudflare Dashboard → Pages → iraq-energy-ai-search → Settings → Environment Variables**

| Variable | Required | Where to Get | Function |
|----------|----------|-------------|---------|
| `ANTHROPIC_KEY` | **Recommended** | https://console.anthropic.com | Enables Claude Haiku 4.5 as primary AI. Without it, falls back to Llama 3.3 70B (free, lower quality). |
| `TAVILY_KEY` | **Recommended** | https://app.tavily.com | AI web search targeting Iraq gov + trade press domains. Free: 1,000 req/month. Without it: no Tavily source. |
| `SERPER_KEY` | **Recommended** | https://serper.dev | Google Search API for real-time results. Free: 2,500 req/month. Without it: no Google results. |
| `EIA_API_KEY` | Optional | https://www.eia.gov/opendata | Live Iraq oil production data in Market Pulse. Free, no rate limit. Without it: falls back to static value. |
| `NEWSAPI_KEY` | Optional | https://newsapi.org | Additional news source. Free: 100 req/day. Without it: source skipped. |
| `WEBHOOK_URL` | Optional | Slack/Discord/Make/Zapier | Receives alert when new HIGH-urgency leads are auto-saved. POST with Slack blocks + raw JSON payload. |
| `EXA_KEY` | Optional | https://exa.ai | Neural search for exact procurement documents. ~$10/1,000 queries. Without it: source skipped. |

### Local Development

```bash
# 1. Copy env template
copy .dev.vars.example .dev.vars

# 2. Fill in at minimum: ANTHROPIC_KEY, TAVILY_KEY, SERPER_KEY

# 3. Create local D1 database
npx wrangler d1 execute iraq-epc-db --local --file=migrations/0001_init.sql

# 4. Start dev server
npx wrangler pages dev .

# Open http://localhost:8788
```

### Minimum Viable Configuration

To run the full intelligence pipeline, you need:

```
ANTHROPIC_KEY  → Claude Haiku (AI briefs + reports)
TAVILY_KEY     → Tender-specific web search
SERPER_KEY     → Google News results
```

Without all three, the platform still works — it falls back to RSS feeds + World Bank + Llama AI — but result quality is significantly lower.

---

## How To Interpret a Search Result

When you receive 20 opportunities from a search, here is how to read the data:

```
Title:                  "Basra Oil Company — EPC Tender for Gas Compression Unit"
Provider:               "Tavily / iotc.oil.gov.iq"         <- Iraq oil ministry portal
Urgency:                HIGH                               <- Active procurement signal
Stage:                  RFQ/RFP Released                   <- Tender LIVE, submit now
Sector:                 Oil & Gas
Location:               Basra
Companies:              ["Basra Oil Company", "Petrofac"]
Brief:                  "BOC is tendering a 150MMscfd gas compression EPC contract
                         at Rumaila, est. $180M, pre-qual deadline June 2026. Likely
                         prime is Petrofac or CPECC; civil, E&I, rotating equipment
                         sub-packages available."          <- Claude Haiku brief
Prime:                  Petrofac
Credibility:            { score: 99, tier: "S", aiVerified: true }

-- BD Intelligence Layer additions --
procurementTier:        "VERIFIED"                         <- Tier badge shown green
procurementConfidenceScore: 87                             <- 87/100
tenderRef:              "ITB/BOC/2025/047"                 <- Extracted ref number
resolvedEntities:       [
                          { canonical: "Basra Oil Company", type: "NOC" },
                          { canonical: "Petrofac", type: "EPC" }
                        ]
procurementExplanation: [
                          "tender ID detected",
                          "submission deadline present",
                          "operator identified",
                          "official source (oil.gov.iq)"
                        ]
```

**Action:** This item would auto-save to Saved Leads on the next 3-hour background refresh (with all procurement fields persisted to D1), appear in the Saved Leads tab with VERIFIED badge and tender ref, and trigger a webhook alert if `WEBHOOK_URL` is configured.

---

## 17. Known Accuracy Limitations & Operational Risks

This section documents the platform's known accuracy boundaries, operational risks, and recommended mitigations. It is the honest counterpart to the capability documentation above.

---

### 17.1 Accuracy Limitations

#### Base Scenario Staleness
`RAW_SCENARIOS` in `intelligence.js` contains 38 hand-researched projects with fixed `reasoning`, `fundingConfirmed`, `stage`, `value`, and `timeline` fields. These reflect the state of each project **at the time of authoring**. Over time:
- Budget allocations referenced may change
- Bid timelines may have passed or been extended
- Named bidders may have been disqualified or added
- Stage may have advanced or regressed

**Mitigation:** The Claude Haiku analyst updates stage/value/timeline via D1 `pipeline_state` overrides twice daily. However, `reasoning` text is never AI-modified (to prevent hallucination). **Action required:** Review and update `RAW_SCENARIOS` quarterly or when a project is known to have materially changed.

---

#### AI-Discovered Project Quality
Projects added by the analyst (`aiDiscovered: true`, stored in `pipeline_state` as `is_discovery=1`) are grounded in an explicit news headline but are not manually verified. A speculative or misleading headline can create a phantom project.

**Current controls:**
- `sanitiseAddition()` validates sector against a whitelist, rejects projects without title+sector
- Company names are normalised through `entities.js`
- Evidence arrays require at least one URL
- `BAD_VALS` set blocks `undefined`, `null`, `unknown` field values

**Remaining risk:** Single-source AI-discovered projects with no subsequent corroboration should be treated as **leads for investigation**, not confirmed tenders.

**Recommended improvement:** Mark AI-discovered projects in UI as `🤖 AI-Discovered — Verify` until a second independent source confirms.

---

#### HIGH Urgency Auto-Save False Positives
Urgency classification (`detectOpportunity`) is keyword-based. Words like `deadline`, `urgent`, `bid`, `tender` trigger HIGH urgency regardless of broader context. This means general political or financial news that mentions a deadline may be auto-saved to Saved Leads.

**Current mitigation:** Procurement Tier badge (VERIFIED / PROBABLE / POSSIBLE / INTELLIGENCE) provides a second quality signal on each auto-saved lead.

**Recommended improvement:** Add `procurementConfidenceScore >= 40` as a second gate for auto-save (requires 1 hour implementation).

---

#### Arabic Translation Accuracy
Claude Haiku batch-translates Arabic items from INA, Shafaq Arabic, Al-Sabah, and Al-Mada. Translation is AI-generated and may:
- Misinterpret procurement-specific terminology
- Classify urgency or stage incorrectly based on a mistranslation
- Miss nuance in ministry Arabic that changes meaning

**Mitigation:** `arabicOriginal` field is preserved in all translated items. Arabic-speaking users should verify the source text for any VERIFIED or HIGH-urgency item that originated from an Arabic feed.

---

#### Confidence Score Calibration Lag
The confidence model (v4.5) is calibrated against **external** Iraq EPC award rate data from Rystad Energy, IHS Markit, and Wood Mackenzie. It is not yet calibrated against **Uruk's own bid outcomes**.

The Brier score infrastructure logs predictions to `confidence_log` and compares against actual outcomes. However, it requires **6+ months of resolved outcomes** (projects reaching Award or cancellation) before statistically meaningful recalibration is possible.

**Current state:** The model is directionally correct but should not be used as a precise probability — treat confidence as a relative ranking signal (higher = more procurement evidence), not an absolute percentage.

---

### 17.2 Operational Risks

| Risk | Severity | Detail | Status |
|------|----------|--------|--------|
| **Source outage silent failure** | High | If `oil.gov.iq` RSS or MoE feed goes offline, results narrow with no alert | No health monitoring yet |
| **No health check endpoint** | High | `/api/intelligence` failure not surfaced to users | Not implemented |
| **D1 table row growth** | Medium | `dedup_log`, `confidence_log`, `project_events`, `trend_snapshots` append-only, no purge job | No TTL/purge yet |
| **Anthropic API cost** | Medium | No per-day spend cap; a loop or high query volume could accumulate cost | Monitor via Anthropic dashboard |
| **Stale KV cache after code deploy** | Low | KV cache key names are not versioned; stale data may persist for up to 8 hours after a deploy | Acceptable; invalidate manually if needed |
| **Single-account secret storage** | Low | All API keys in one Cloudflare account; no rotation policy | Review annually |

---

### 17.3 What the Platform Does Not Do

To avoid scope confusion:

- **Does not guarantee tender data is complete** — it monitors known sources; unpublished or offline tenders are invisible
- **Does not replace direct ministry engagement** — procurement officers sometimes issue tenders verbally or through private channels
- **Does not track Uruk's bid submissions or win/loss history** — this requires a separate BD workflow layer
- **Does not extract structured deadline dates** — deadline language is detected but not parsed to a machine-readable `DATE` field
- **Does not send alerts by default** — webhook exists but requires `WEBHOOK_URL` to be configured in Cloudflare secrets

---

*Last reviewed: May 2026 — Uruk Engineering & Contracting Co. LLC*

*Document last updated: May 2026 (Phase 4 BD Intelligence Layer) | Platform: https://leads.urukepc.com*
