OSM Operator Normalization
OSM operator tag values vary widely for the same transit company. This normalization converts known aliases to canonical names.
flowchart LR
subgraph Input["OSM Data"]
A["operator=<br/>'SBB CFF FFS'"]
B["operator=<br/>'CFF'"]
C["operator=<br/>'VBZ'"]
end
subgraph Process["CSV Lookup"]
CSV["operator_normalizations.csv"]
end
subgraph Output["Normalized"]
D["osm_operator=<br/>'SBB'"]
E["osm_operator=<br/>'VBZ'"]
end
A --> CSV --> D
B --> CSV --> D
C --> CSV --> E
Why Normalize?
| OSM Variation | Canonical Name |
|---|---|
| "CFF", "FFS", "SBB CFF FFS" | SBB |
| "Compagnie générale de navigation sur le lac Léman (CGN)" | CGN |
| "Stadtbus Winterthur" | SBW |
Without normalization, attribute comparison flags these as mismatches.
Mapping File
File location: matching_and_import_db/utils/operator_normalizations.csv
Current Mappings
Behavior
| Characteristic | Behavior |
|---|---|
| Case sensitivity | Exact match required |
| Whitespace | Trimmed before lookup |
| Unknown operators | Kept as-is |
| Persistence | Only normalized value stored in DB |
Maintaining the Mapping
- Check
data/debug/org_mismatches_review.txtfor unmatched operator pairs - Add new aliases to CSV
- Re-run the pipeline
Each alias must be unique. Different capitalizations need separate entries.