Route Matching

Route matching is the ninth predicate run in the pipeline, after the full distance-matching block, and correlates ATLAS platforms with OSM nodes based on shared GTFS transit routes and directions.

flowchart TB CTX["MatchingContext"] --> UA["atlas.get_unmatched_records()"] CTX --> OSM["osm.batch_query_radius()<br/><i>excl. used + siblings + is_station nodes</i>"] CTX --> DIRS["osm.name_dirs"] CTX --> NR["osm.get_node_routes(node_id)<br/><i>from XML relations</i>"] UA --> T["atlas.get_routes(sloid)<br/>Build ATLAS tokens (GTFS)"] T --> LOOP["For each AtlasNode"] LOOP --> C["Find OsmNode candidates within 50m"] C --> P1{"P1: GTFS token<br/>intersection?"} P1 -->|Yes| M["Match"] P1 -->|No| P2{"P2: direction name<br/>fallback?"} P2 -->|Yes| M P2 -->|No| X["No match"] M -->|"ctx.commit()"| OUT["MatchRecord entity"]

Overview

While the previous predicates rely on exact UICs, names, or purely closest distance, Route Matching provides an alternative way to confidently match stops.

Importantly, this is still a spatial, stop-to-stop matching process, not just linking abstract routes. For every unmatched ATLAS stop, the predicate looks for unmatched OSM stops within a 50m radius. If an ATLAS platform and a nearby OSM node share strong GTFS route-token evidence or compatible direction-name evidence, they are matched together. Route data acts as the "proof" that two physically close points are indeed the same stop. The spatial candidate filter uses OsmNode.is_station, so aerialway stations remain eligible.

Result: 0 route-based matches

Unlike the exact, name, and distance predicates, route matching now re-validates each batched candidate list against the current used_ids set before selecting a match. This prevents an OSM representative consumed earlier in the same predicate run from being re-used by a later ATLAS row.

Required Data

Route matching relies entirely on data owned by the state layer — the predicate performs no file I/O:

  • OsmNode candidates found via batched OsmState.batch_query_radius() within max_distance
  • OsmState.name_dirs — per-node direction strings (loaded from osm_directions.csv sidecar or parsed from XML relations)
  • OsmState._node_routes via ctx.osm.get_node_routes(node_id) — per-node GTFS route memberships derived from OSM XML relations during OsmState.from_xml_file()
  • AtlasState._routes_by_sloid via ctx.atlas.get_routes(sloid) — GTFS route entries loaded from atlas_routes_gtfs.csv during AtlasState.from_dataframe()

Token-Based Matching

Route data is converted into comparable tokens. The predicate tries two priority levels:

P1: GTFS Route-ID Tokens

The predicate primarily compares per-stop GTFS route tokens that are already loaded into AtlasState and OsmState:

  • ATLAS Tokens: {(route_id_normalized, direction_id)} from atlas_routes_gtfs.csv.
  • OSM Candidates: For each nearby node, ctx.osm.get_node_routes(node_id) contributes (gtfs_route_id, direction_id) and (normalize_route_id(gtfs_route_id), direction_id) tokens derived from the XML relation pass.

If RouteState already contains an in-process mapping for an OSM relation ID, the predicate also adds that mapped ATLAS route ID and its normalized form to the OSM candidate token set before intersecting it with the ATLAS tokens.

Normalized route IDs are therefore carried on the ATLAS side in atlas_routes_gtfs.csv and computed on the OSM side at match time. RouteState uses the same normalization helper when it is populated.

If ref_trips does not yield a direction, OSM route extraction currently emits both direction buckets (0 and 1) for that relation membership so route-id evidence can still participate.

P2: Name-Based Direction Fallback

ATLAS direction names are compared against OSM route relation direction strings (first/last member names like "Zurich HB → Bern"), stored in OsmState.name_dirs. The current implementation checks exact direction-string membership.

Data Sources

Source File / Origin Loaded by Description
GTFS routes data/processed/atlas_routes_gtfs.csv AtlasState Timetable-derived route entries per SLOID for stop-level matching
OSM routes OSM XML relations OsmState.from_xml_file() Route memberships per OSM node (via relation ID)
Equivalency cache data/processed/atlas_routes.csv + data/processed/osm_routes.csv RouteState Optional atlas-route crosswalk, primarily populated by the route import path

Related Documentation

(Route provenance is tracked in the output via match_type — currently always route_gtfs_gtfs; the specific evidence is recorded in notes as either gtfs_tokens or direction_name.)

When Route Matching Succeeds

Route matching is particularly effective for:

  1. Platforms without UIC: Some OsmNode entities lack uic_ref but have route memberships
  2. Ambiguous proximity: When multiple OsmNode entities are nearby, shared routes disambiguate

Code Reference

Class / Method Description
RouteMatchPredicate Predicate class; leverages RouteState and batch_query_radius()
ctx.atlas.get_routes(sloid) Returns ATLAS route assignments for a SLOID
ctx.osm.get_node_routes(node_id) Returns relation memberships for an OSM node
RouteState.get_atlas_route() Returns the mapped ATLAS route for a given OSM relation ID

All predicate logic is in predicates/route_matching_gtfs.py. Route state logic lives in route_state.py.

Data update running in background
Preparing update... | Phase: initializing
Data update in progress
Core data is being refreshed. Use this time to read the documentation.
Elapsed: -- ETA: -- Phase: idle