Filter and Search Logic

This document defines the canonical filtering and search model used on the index map page.

Performance deep-dive for global stats query cost:

Core Rule

The filter system uses one boolean grammar only:

  • within a filter group, selected values combine with OR
  • across filter groups, active groups combine with AND

Formally:

FinalResult = ScopePredicate AND SearchPredicate AND AtlasPredicate AND OsmPredicate AND DuplicatePredicate

If a predicate group has no active selection, it contributes no restriction.

1. Scope Predicate

The scope predicate chooses which row categories are eligible.

ScopePredicate = MatchedBranch OR AtlasUnmatchedBranch OR OsmUnmatchedBranch

1.1 Matched Branch

Matched entries can be enabled in two ways:

  • All Matched Stops
  • one or more matched method sub-filters

Semantics:

  • All Matched Stops means all matched rows are included
  • selecting matched sub-filters means only matched rows with one of those methods are included
  • if all matched method children are selected, the UI rolls up to the parent state all

Formally:

  • MatchedBranch = stop_type = matched when the parent is all
  • MatchedBranch = stop_type = matched AND match_type IN selected matched methods when the branch is in subset mode

Matched method values include:

  • exact
  • name
  • distance_matching_trio
  • distance_matching_1
  • distance_matching_2
  • distance_matching_3a
  • distance_matching_3a_second_pass
  • distance_matching_3b
  • route_gtfs_gtfs

1.2 ATLAS Unmatched Branch

ATLAS unmatched entries also support a parent state plus sub-filters.

Semantics:

  • ATLAS unmatched means all atlas-unmatched rows are included
  • selecting No OSM < 50m and/or OSM < 50m means only those unmatched reasons are included
  • if both unmatched reasons are selected, the UI rolls up to the parent state all

Formally:

  • AtlasUnmatchedBranch = stop_type = atlas_unmatched when the parent is all
  • AtlasUnmatchedBranch = stop_type = atlas_unmatched AND unmatched_reason IN selected reasons when the branch is in subset mode

The unmatched reason mapping is:

  • No OSM < 50m -> match_type = no_nearby_counterpart
  • OSM < 50m -> match_type != no_nearby_counterpart OR match_type IS NULL

1.3 OSM Unmatched Branch

This branch is explicit only.

Semantics:

  • if OSM unmatched is checked, all osm_unmatched rows are included
  • if it is not checked, OSM-unmatched rows are not added implicitly by any other selection

Formally:

  • OsmUnmatchedBranch = stop_type = osm_unmatched

2. Search Predicate

Search tokens are OR-combined within the search group.

SearchPredicate = token_1 OR token_2 OR token_3 ...

Meaning:

  • if multiple search tokens are active, a row matches if it matches any of them

2.1 Accepted Formats

The search input (#smartSearchInput) accepts the following formats, parsed by parseSmartSearchInput() in filters.js:

Format Example Token kind Backend identifier_type
UIC station code (starts with 85) 8503000 station station
ATLAS SLOID ch:1:sloid:3000:3 atlas sloid
OSM node ID (digits only) 123456789 osm osm_node_id
Route ID (dash-separated) 11-T-j25-1 route route
Route + direction 11-T-j25-1 dir:0 route route

Unrecognized input shows the accepted formats hint tooltip (#smartSearchHint).

2.2 Search Flow

  1. User submits input via Enter key
  2. parseSmartSearchInput() classifies the value into a token kind or returns a validation error
  3. addSearchToken() adds the token to activeFilters.station and calls fetchAndCenterSpecificStop()
  4. fetchAndCenterSpecificStop() calls /api/stop_by_id with the identifier and type
  5. On success: the map centers on the result and filters update
  6. On failure: the token is reverted and an error is shown

2.3 Not-Found Feedback

When a correctly formatted input does not match any database entry, /api/stop_by_id returns a 404. The frontend displays an error in the #smartSearchError element, styled via .smart-search-feedback in index.css.

The error message follows the pattern: No {type} found matching: {identifier}, where {type} is the human-readable token kind (e.g. "OSM node", "UIC station", "ATLAS SLOID").

3. ATLAS Predicate

ATLAS-side attributes are OR-combined within the ATLAS predicate group.

AtlasPredicate = atlas_attribute_1 OR atlas_attribute_2 OR ...

Current ATLAS attribute values:

  • ATLAS operator

Semantics:

  • ATLAS predicates are evaluated on every row that has an ATLAS side
  • matched rows can satisfy ATLAS predicates
  • ATLAS-unmatched rows can satisfy ATLAS predicates
  • OSM-unmatched rows naturally do not satisfy ATLAS predicates because they have no ATLAS side

4. OSM Predicate

The OSM predicate is composed of two subgroups that are AND-combined:

OsmPredicate = TransportPredicate AND EntityPredicate AND OsmGroupPredicate

4.1 Transport Predicate

Transport types are OR-combined.

TransportPredicate = transport_type_1 OR transport_type_2 OR ...

Examples:

  • ferry_terminal
  • tram_stop
  • station
  • platform
  • stop_position
  • aerialway_station

4.2 Entity Predicate

OSM entity types (nodes vs ways) are OR-combined.

EntityPredicate = entity_type_1 OR entity_type_2 OR ...

Examples:

  • way (OSM entries derived from ways, identified by a way_ prefix in their ID)

4.3 OSM Group Predicate (Pairs/Trios)

OSM group types are OR-combined. In current terminology, OSM group means OSM pair or OSM trio.

OsmGroupPredicate = group_type_1 OR group_type_2 OR ...

Examples:

  • osm_pair_uic
  • osm_pair_uic_equal_15m
  • osm_pair_name
  • osm_pair_name_equal_15m
  • osm_pair_tram
  • osm_pair_tram_equal_15m
  • osm_trio

If the OSM groups master is selected with no subtype refinement, the system treats it as:

  • OsmGroupPredicate = group_member(any type)

If only osm_trio is selected, pair rows are excluded.

4.4 OSM-side Semantics

OSM predicates always apply to rows that have an OSM side.

This is an intentional product rule.

Consequences:

  • matched rows can satisfy OSM predicates
  • OSM-unmatched rows can satisfy OSM predicates
  • ATLAS-unmatched rows naturally do not satisfy OSM predicates because they have no OSM side

There is no separate applicability toggle for OSM predicates.

5. Duplicate Predicate

The currently exposed duplicate control is Duplicate ATLAS.

Semantics:

  • Duplicate ATLAS is an AND-filter on ATLAS duplicate-group membership:
    • row has representative_sloid set (non-representative member), OR
    • row is a representative referenced by at least one sibling (EXISTS atlas_stops WHERE representative_sloid = this.sloid)

Formally:

  • DuplicatePredicate = atlas_duplicate_member = true

Implementation note:

  • duplicate filtering is a data predicate
  • this predicate is applied server-side across /api/data, /api/top_matches, /api/random_stop, and /api/global_stats
  • whether both sides of a matched row are drawn is still a rendering decision, not a predicate

6. Top N Distances

Top N is not part of the canonical predicate formula for /api/data.

It is a special matched-only mode used by:

  • /api/top_matches
  • /api/random_stop
  • /api/global_stats

Top N is available whenever matched scope exists, meaning either:

  • All Matched Stops is checked
  • or at least one matched sub-filter is selected

If matched scope disappears, Top N is automatically disabled.

7. Low-Zoom Overview Mode

When the map is below the marker threshold and there are no active user filters, the UI switches to an overview mode:

  • stop_filter = atlas_unmatched
  • only the ATLAS side is rendered

This is a display optimization for low zoom, not part of the canonical predicate algebra.

As soon as any user filter is active, normal predicate semantics are used again.

8. Request Serialization Rules

The frontend sends only the filters that are semantically active.

Important examples:

  • All Matched Stops checked -> send stop_filter=matched, optionally alongside match_method refinements
  • Exact checked without All Matched Stops -> send match_method=exact only
  • ATLAS unmatched checked -> send stop_filter=atlas_unmatched, omit unmatched reason refinements
  • No OSM < 50m checked without the parent -> send match_method=no_nearby_counterpart
  • OSM group subtypes selected -> send osm_group_types=subtype_1,subtype_2
  • OSM groups master selected with no subtype -> send osm_group_types=all
  • Duplicate ATLAS checked -> send show_duplicates_only=true

This keeps requests compact, but there is one current implementation wrinkle: the backend treats any matched-method selection as implying matched scope. In practice, match_method=exact works even if stop_filter=matched is omitted.

9. Consistency Guarantees

The endpoints below use the same request parameter model and the same scope helper functions (resolve_stop_type_match_filters, build_stop_scope_condition) so stop-type and match-method semantics remain aligned:

  • /api/data
  • /api/global_stats
  • /api/random_stop
  • /api/top_matches

/api/data uses its own query builder function for viewport + attribute predicates, while /api/global_stats, /api/random_stop, and /api/top_matches use QueryBuilder.apply_common_filters. The resulting filter behavior is intended to be equivalent for shared parameters.

10. Worked Examples

Example 1:

(distance stage 1 OR atlas-unmatched OR osm-unmatched) AND operator=SBB AND duplicate_atlas

This means:

  • keep rows in any of those three scope branches
  • then require an ATLAS side with operator SBB
  • then require ATLAS duplicate-group membership

Example 2:

matched OR osm-unmatched plus platform

This means:

  • keep matched rows and OSM-unmatched rows in scope
  • then keep only those whose OSM side is platform

Example 3:

operator=SBB and tram_stop

This means:

  • require an ATLAS side satisfying SBB
  • require an OSM side satisfying tram_stop
  • in practice this mostly yields matched rows because both sides must exist

11. Global Stats Endpoint Semantics and Cache

/api/global_stats now delegates all cache-key construction, scoped query building, and aggregation logic to backend/services/global_stats.py.

The endpoint still follows the same predicate algebra defined in this document.

11.1 Shared Scope Semantics

/api/global_stats uses the same helper path as /api/data for scope selection:

  • resolve_stop_type_match_filters()
  • build_stop_scope_condition()
  • build_trio_middle_with_matched_side_condition()

This preserves the same trio-middle effective-match behavior across map rendering and global summary stats.

11.2 Effective Matched Semantics in Stats

For global stats aggregation, an internal effective_stop_type is computed:

  • rows with stop_type = matched are treated as matched
  • rows with stop_type = effectively_matched are also treated as matched

This is identical to the semantics already used for matched-scope filtering and avoids drift between counts and map behavior.

11.3 Global vs Viewport Scope

/api/global_stats is filter-scoped, not viewport-scoped.

  • it does not use min_lat, max_lat, min_lon, or max_lon
  • it summarizes the full filtered dataset
  • /api/data remains the viewport-scoped endpoint

11.4 Cache-Key Canonicalization

Global stats cache keys are canonicalized so equivalent requests share one cache entry.

Canonicalization rules:

  • comma lists are trimmed, sorted, and rejoined for:
    • stop_filter
    • match_method
    • transport_types
    • osm_entity_types
    • node_type
    • atlas_operator
    • osm_group_types
  • station filters are canonicalized as sorted triples:
    • (station_filter value, filter_type, route_direction)
  • show_duplicates_only is normalized to true or false
  • top_n is included directly in the key

As a result, different parameter orderings that express the same filter state map to the same cache key.

11.5 Cache Shape and Operational Notes

  • cache is an in-process LRU (size 5)
  • cache is thread-safe within one process
  • cache is not shared across multiple app processes or containers

The service also exposes clear_global_stats_cache() for explicit invalidation wiring during future write-path integration (problem resolution/import completion hooks).

Data update running in background
Preparing update... | Phase: initializing
Data update in progress
Core data is being refreshed. Use this time to read the documentation.
Elapsed: -- ETA: -- Phase: idle