# Atlas Match – Source File Field Guide

## Overview
Atlas Match accepts CSV files with hotel records to match against the 3.3M hotel master database.
The pipeline runs 4 normalisation layers before matching, so minor inconsistencies are handled automatically.
However, the **6 mandatory fields** must be present (or mapped via an accepted alias).

---

## Mandatory Fields

| Field | Accepted Aliases | Format | Example |
|-------|-----------------|--------|---------|
| `name` | `hotel_name`, `property_name`, `title` | Text | `Grand Hyatt Dubai` |
| `country_code` | `country`, `country_name`, `cnt`, `country_iso` | ISO 2-letter **or** full name | `AE` or `United Arab Emirates` |
| `address` | `addr`, `street`, `street_address`, `full_address` | Text | `Sheikh Zayed Road` |
| `city` | `city_en`, `city_name`, `town`, `municipality` | Text | `Dubai` |
| `latitude` | `lat`, `lat_deg`, `geo_lat`, `y` | Decimal degrees | `25.2048` |
| `longitude` | `lon`, `lng`, `long`, `geo_lon`, `x` | Decimal degrees | `55.2708` |

> **Note:** If `latitude` / `longitude` are missing, the geocoder layer will attempt to fill them via OpenStreetMap (capped at 500 rows per job). Rows that still have no coordinates after geocoding are matched on name + country only and will have a lower confidence score.

---

## Optional Fields (improve match quality)

| Field | Accepted Aliases | Format | Example |
|-------|-----------------|--------|---------|
| `source_id` | `id`, `hotel_id`, `property_id`, `supplier_id`, `hid` | Text / Integer | `HBX-12345` |
| `phone` | `telephone`, `tel`, `phone_number`, `contact_phone` | Digits only | `97143171234` |
| `star_rating` | `stars`, `rating`, `category`, `star_category` | Integer 1–5 | `5` |

---

## Matching Thresholds

| Outcome | Condition | Action |
|---------|-----------|--------|
| **Auto Match** | Confidence ≥ 85% | Written to `atlas_match_auto_matches.csv` |
| **Review Queue** | Confidence 70–84% | Held for manual approve/reject in the UI |
| **Unmatched** | Confidence < 70% | Written to `atlas_match_unmatched.txt` |

Confidence is a composite score: **Name × 50% + Address × 25% + Geo proximity × 25%**

---

## File Requirements

- **Format:** CSV (comma-separated), UTF-8 encoded
- **Header row:** Required (first row must be column names)
- **Size:** No hard limit — tested up to 1M+ rows
- **Empty rows:** Automatically dropped
- **Duplicate detection:** SHA-256 fingerprint deduplication applied before matching

---

## Column Name Rules

- Column names are **case-insensitive** (`Hotel_Name`, `hotel_name`, `HOTEL_NAME` all work)
- Spaces and underscores are treated equally (`hotel name` = `hotel_name`)
- Any column not in the schema above is passed through unchanged in output files

---

## Download the Template

A ready-to-use CSV template is available from the **Dashboard → Download Template** button.
It includes correct headers and 3 sample rows.

---

## Example Valid Row

```csv
source_id,hotel_name,country,address,city,latitude,longitude,stars
HBX-001,Burj Al Arab,United Arab Emirates,Jumeirah Beach Road,Dubai,25.1412,55.1853,5
HBX-002,The Savoy,United Kingdom,Strand,London,51.5104,-0.1201,5
HBX-003,Hotel de Crillon,France,10 Place de la Concorde,Paris,48.8676,2.3214,5
```

---

## Common Mistakes

| Problem | Fix |
|---------|-----|
| Country as full name (e.g. `Saudi Arabia`) | Accepted — normaliser converts to ISO code |
| Lat/lon swapped | Check: latitude is -90 to 90, longitude is -180 to 180 |
| Phone includes `+`, spaces, dashes | Accepted — non-digit characters are stripped |
| Star rating as text (`Five Stars`) | Use integers only: `5` |
| BOM (byte order mark) in CSV | Save as UTF-8 without BOM |

---

*Atlas Match © 2026 — Hotel Data Intelligence*
