OSM Operator Normalization

OSM operator tag values vary widely for the same transit company. This normalization converts known aliases to canonical names.

flowchart LR subgraph Input["OSM Data"] A["operator=<br/>'SBB CFF FFS'"] B["operator=<br/>'CFF'"] C["operator=<br/>'VBZ'"] end subgraph Process["CSV Lookup"] CSV["operator_normalizations.csv"] end subgraph Output["Normalized"] D["osm_operator=<br/>'SBB'"] E["osm_operator=<br/>'VBZ'"] end A --> CSV --> D B --> CSV --> D C --> CSV --> E

Why Normalize?

OSM Variation Canonical Name
"CFF", "FFS", "SBB CFF FFS" SBB
"Compagnie générale de navigation sur le lac Léman (CGN)" CGN
"Stadtbus Winterthur" SBW

Without normalization, attribute comparison flags these as mismatches.

Mapping File

File location: matching_and_import_db/utils/operator_normalizations.csv

Current Mappings

View current normalization mappings (6)
AliasNormalized Name
Compagnie générale de navigation sur le lac Léman (CGN)CGN
CFFSBB
FFSSBB
SBB CFF FFSSBB
Stadtbus WinterthurSBW
TLTransports Lausannoi
Source: matching_and_import_db/utils/operator_normalizations.csv

Behavior

Characteristic Behavior
Case sensitivity Exact match required
Whitespace Trimmed before lookup
Unknown operators Kept as-is
Persistence Only normalized value stored in DB

Maintaining the Mapping

  1. Check data/debug/org_mismatches_review.txt for unmatched operator pairs
  2. Add new aliases to CSV
  3. Re-run the pipeline

Each alias must be unique. Different capitalizations need separate entries.

Data update running in background
Preparing update... | Phase: initializing
Data update in progress
Core data is being refreshed. Use this time to read the documentation.
Elapsed: -- ETA: -- Phase: idle