Problem Detection and Prioritization
After the matching pipeline completes (Section 2), the system performs a comprehensive analysis to identify data quality issues in both datasets.
Problem Categories
The system detects three categories of problems:
| Category | Question | User Action |
|---|---|---|
| Stop Problems | "Is this stop at the correct location with correct attributes?" | Move node, update tags, fix matching. |
| Route Entity Problems | "Does this route exist and is it defined correctly?" | Create or delete route relation; fix route tags. |
| Route Membership Problems | "Is the list of stops for this route consistent?" | Add/remove stops from relation; reorder stops; fix roles. |
1. Stop Problems (Implemented)
Issues with individual stops (Distance, Attributes, Isolation, Duplicates). See 3.1 Stop Problems.
2. Route Entity Problems (Planned)
Issues with the Route object itself.
- Goal: Ensure every ATLAS route has a corresponding OSM route with correct global metadata.
- See 3.2 Route Entity Problems.
3. Route Membership Problems (Planned)
Issues with the stop-route relationship.
- Goal: Ensure the sequence of stops and their directional roles match.
- See 3.3 Route Membership Problems.
Architecture
Problem detection is built on the same domain models as the matching pipeline. The PipelineResult (containing MatchRecord, AtlasNode, and OsmNode entities) flows directly into problem detection — no ORM or dictionary conversion is needed.
Polymorphic Predicates
Each problem predicate is a plain function with a polymorphic signature:
def predicate(ctx: ProblemContext, record: MatchRecord | AtlasNode | OsmNode) -> list[ProblemResult]
Predicates use isinstance checks to decide what to evaluate:
distance_problemandattributes_problemonly act onMatchRecord(return[]for bare nodes)unmatched_problemonly acts on bareAtlasNodeorOsmNode(returns[]forMatchRecord)duplicates_problemacts on all three types
This allows the same STOP_PROBLEM_PIPELINE list to be used for both matched and unmatched records.
Two Invocation Paths
| Record type | Invocation | Where |
|---|---|---|
Matched (MatchRecord) |
match_record.evaluate_problems(problem_ctx, STOP_PROBLEM_PIPELINE) |
importer.py — calls the method natively on the domain entity |
Unmatched (AtlasNode / OsmNode) |
run_problem_pipeline(STOP_PROBLEM_PIPELINE, problem_ctx, node) |
importer.py — uses the standalone runner |
Both paths produce list[ProblemResult], which is then mapped to ORM Problem rows via apply_problem_results().
ProblemResult Value Object
All predicates return ProblemResult, a lightweight frozen dataclass decoupled from SQLAlchemy:
@dataclass(frozen=True)
class ProblemResult:
problem_type: str # 'distance', 'attributes', 'unmatched', 'duplicates'
priority: int # 1 = P1, 2 = P2, 3 = P3
has_atlas_duplicate: bool = False
has_osm_duplicate: bool = False
Priority Levels
All problems use a consistent three-level priority system:
| Level | Meaning |
|---|---|
| P1 | Critical |
| P2 | Significant |
| P3 | Minor |
Priority assignment is rule-based and considers factors like:
- Distance thresholds (configurable constants in
context.py) - Operator type (e.g., SBB railway platforms have higher distance tolerance)
- Attribute importance (UIC references are more critical than operator names)
- Isolation status (stops with no nearby counterparts are higher priority)
Code Reference
| Component | File | Purpose |
|---|---|---|
| Result value object | problem_detection/result.py | ProblemResult — frozen dataclass |
| Shared context | problem_detection/context.py | ProblemContext.build() — KDTrees, UIC counts, duplicate maps |
| Pipeline runner | problem_detection/pipeline.py | run_problem_pipeline(), STOP_PROBLEM_PIPELINE |
| Distance predicate | predicates/distance.py | distance_problem() |
| Attributes predicate | predicates/attributes.py | attributes_problem() |
| Unmatched predicate | predicates/unmatched.py | unmatched_problem() |
| Duplicates predicate | predicates/duplicates.py | duplicates_problem() |
| Domain models | models.py | MatchRecord.evaluate_problems(), AtlasNode, OsmNode |
| Database import | database/importer.py | Calls problem detection per stop during insertion |
| API endpoints | problems.py | Problem listing, aggregation, and duplicates grouping |