4.1 Import Process

The database import process transforms processed CSV files and match outputs into the PostgreSQL database that powers the web application. We use PostgreSQL 15+ with PostGIS 3.x for spatial queries.

High-Level Architecture

flowchart TB subgraph Input["Input Files"] CSV["Processed CSVs"] Match["Match Results (matched, unmatched_atlas, unmatched_osm)"] end subgraph Import["Import Pipeline (matching_and_import_db/database/importer.py)"] M["1. Run Alembic Migrations"] C["2. TRUNCATE Import Tables"] R["3. Load Route Mappings"] I["4. Import OSM, Stop Units, Stops, Routes"] P["5. Detect Problems During Stop Import"] M --> C --> R --> I --> P end subgraph ImportDB["Import DB (import_db)"] Op["stops_matched, atlas_stops, osm_nodes, osm_stops, osm_stop_members, atlas_routes, atlas_route_directions, osm_routes, osm_route_tags, route_atlas_stops, route_osm_stops, routes_matched, route_problems, problems"] end CSV --> R Match --> I

Table Behavior

Category	Database	Tables	Behavior
Import	`import_db`	`stops_matched`, `atlas_stops`, `osm_nodes`, `osm_stops`, `osm_stop_members`, `atlas_routes`, `atlas_route_directions`, `osm_routes`, `osm_route_tags`, `route_atlas_stops`, `route_osm_stops`, `routes_matched`, `route_problems`, `problems`	Truncated and repopulated on every run

The database is intentionally reproducible from source files. Every import run wipes current import data to ensure consistency with the latest processed pipeline output.

Import Steps

The import_to_database() function in matching_and_import_db/database/importer.py orchestrates the import.

1. Schema Update

ensure_schema_updated()  # Runs Alembic migrations to HEAD

Pending Alembic migrations are applied first so the database schema matches the current ORM models.

2. Truncate Import Tables

session.execute(text("TRUNCATE TABLE atlas_stops, osm_nodes, osm_stops, osm_stop_members, route_atlas_stops, route_osm_stops CASCADE"))
session.execute(text("TRUNCATE TABLE routes_matched CASCADE"))
session.execute(text("TRUNCATE TABLE problems, stops_matched CASCADE"))

The importer clears the import tables up front to guarantee a fresh state for the new run.

3. Load Route Mappings

The route loader preloads all route CSVs in one pass:

atlas_routes.csv
atlas_route_directions.csv
atlas_route_stops.csv
osm_routes.csv
osm_route_tags.csv
osm_route_members.csv

This reduces repeated file I/O and prepares entity-first payloads for route insertion.

4. Insert Entity Data

The importer writes data in this order:

Import all OSM nodes into osm_nodes so dependent route-stop rows can resolve their stop members.
Import OSM stop units into osm_stops.
Import OSM stop membership into osm_stop_members.
Import matched stops into stops_matched, creating atlas_stops rows on demand and evaluating matched-stop problems.
Import unmatched ATLAS stops into stops_matched, creating atlas_stops rows on demand and evaluating unmatched-stop problems.
Import unmatched OSM stops into stops_matched and evaluate unmatched-stop problems.
Import route entities and route membership into atlas_routes, atlas_route_directions, osm_routes, osm_route_tags, route_atlas_stops, route_osm_stops, routes_matched, and route_problems.

Records are accumulated in arrays and committed in batches. The importer uses batched persistence plus periodic session.commit() checkpoints to keep imports fast without holding the entire run in a single transaction.

export DB_IMPORT_BATCH_SIZE=5000

Progress is logged with timing information during long imports.

5. Detect Problems

A ProblemContext is built once from the pipeline output, precomputing structures such as KDTrees, UIC counts, and duplicate maps. Then run_problem_pipeline(STOP_PROBLEM_PIPELINE, ctx, stop) is called for each stop.

The stop-level predicates currently include:

distance_problem
attributes_problem
duplicates_problem
unmatched_problem

Each predicate yields zero or more ProblemResult objects with a priority score, which are then converted into ORM Problem rows. See 3.1 Stop Problems for the stop-problem rules.

Route-level problems are computed separately during route payload construction and persisted to route_problems.

Running the Import

Environment Variables

Variable	Description	Default
`DATABASE_URI`	Import database connection string	`postgresql+psycopg://stops_user:1234@localhost:5432/import_db`
`DB_IMPORT_BATCH_SIZE`	Records per batch commit	`5000`

Command

export DATABASE_URI="postgresql://user:pass@localhost/stopdb"
python matching_and_import_db/database/importer.py

# Or skip certain phases to focus on selected matching stages
python matching_and_import_db/database/importer.py --skip-phase1 --skip-phase2

Persistence Behavior

The current importer does not preserve user modifications between runs. Every import rebuilds the import database from source files.

Generating Report

Documentation

4.1 Import Process

High-Level Architecture

Table Behavior

Import Steps

1. Schema Update

2. Truncate Import Tables

3. Load Route Mappings

4. Insert Entity Data

5. Detect Problems

Running the Import

Environment Variables

Command

Persistence Behavior

Related Documentation