How to Map Road Networks to Graph Nodes and Edges

A routing engine returns a “no path found” between two streets that visibly cross on the map, or it reports a detour twice the real distance. The symptom traces back to ingestion, not the pathfinder: two LINESTRING segments that meet at an intersection were loaded with endpoints that differ in the 12th decimal place, so the graph holds two distinct nodes a few microns apart and no edge bridges them. The root cause is treating raw GIS geometry as if it were already topology — coordinates as continuous floats, segments as edges, intersections left implicit. This page resolves that with a deterministic converter: it quantizes coordinates, splits every linestring at its true crossings, collapses shared endpoints onto one canonical node, and emits a directed graph with geodesic edge weights ready to persist. It is the concrete builder behind the node and edge spatial mapping layer.

Prerequisites & Versions

Library / Component	Min version	Install
Python	3.10	`pyenv install 3.10` (needs `dict`/`tuple` generics)
`shapely`	2.0	`pip install "shapely>=2.0"` (vectorized predicates, stable `split`)
`geopandas`	0.14	`pip install "geopandas>=0.14"`
`networkx`	3.2	`pip install "networkx>=3.2"`
`neo4j` async driver	5.14	`pip install "neo4j>=5.14"` (only for the persistence step)

The input is a GeoDataFrame of LineString geometries already projected to WGS-84 (EPSG:4326). If your source CRS differs, reproject before calling the builder — the Haversine weight assumes (longitude, latitude) degrees, which is Shapely’s native (x, y) ordering. CRS handling itself is owned upstream by the node and edge spatial mapping cluster’s normalization step.

Implementation

The module below is self-contained. build_spatial_graph quantizes geometry, extracts true crossing points, splits each segment at the crossings that lie on it, and assigns every endpoint a canonical integer node id keyed on its rounded coordinate. Edge weights are geodesic meters from the Haversine formula; one-way tags produce a single directed edge, everything else produces both directions.

import math
from typing import Dict, List, Tuple

import geopandas as gpd
import networkx as nx
from shapely.geometry import LineString, MultiLineString, MultiPoint, Point
from shapely.ops import split, unary_union

EARTH_RADIUS_M = 6_371_000
SNAP_PRECISION = 6  # ~0.11 m at the equator


def haversine_distance(p1: Point, p2: Point) -> float:
    """Geodesic distance in meters. Points are (x, y) = (lon, lat)."""
    lat1, lon1 = math.radians(p1.y), math.radians(p1.x)
    lat2, lon2 = math.radians(p2.y), math.radians(p2.x)
    dlat, dlon = lat2 - lat1, lon2 - lon1
    a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
    return 2 * EARTH_RADIUS_M * math.asin(math.sqrt(a))


def quantize_coords(geom: LineString, precision: int) -> LineString:
    """Round every vertex so coincident points become bit-identical."""
    return LineString([(round(x, precision), round(y, precision)) for x, y in geom.coords])


def _crossing_points(geoms: List[LineString]) -> List[Point]:
    """True crossings: each segment intersected against the union of the others.

    Comparing a segment to itself is a no-op, so we subtract it from the full
    union first. Collinear overlaps yield LineStrings, not Points; those are
    ignored here and resolved by endpoint quantization instead.
    """
    points: List[Point] = []
    full_union = unary_union(geoms)
    for g in geoms:
        crossing = g.intersection(full_union.difference(g))
        if crossing.is_empty:
            continue
        if isinstance(crossing, Point):
            points.append(crossing)
        elif isinstance(crossing, MultiPoint):
            points.extend(crossing.geoms)
    return points


def build_spatial_graph(gdf_roads: gpd.GeoDataFrame, precision: int = SNAP_PRECISION) -> nx.DiGraph:
    """Convert raw road linestrings into a directed graph with geodesic weights."""
    gdf = gdf_roads.copy()
    gdf["geometry"] = gdf.geometry.apply(
        lambda g: quantize_coords(g, precision).buffer(0).simplify(1e-5)
    )

    split_pts = _crossing_points(list(gdf.geometry))

    G = nx.DiGraph()
    node_ids: Dict[Tuple[float, float], int] = {}
    next_id = 0

    def node_for(pt: Point) -> int:
        nonlocal next_id
        key = (round(pt.x, precision), round(pt.y, precision))
        if key not in node_ids:
            node_ids[key] = next_id
            G.add_node(next_id, lon=key[0], lat=key[1])
            next_id += 1
        return node_ids[key]

    for _, row in gdf.iterrows():
        # Split this line at every crossing point that actually touches it.
        segments: List[LineString] = [row.geometry]
        for pt in split_pts:
            nxt: List[LineString] = []
            for seg in segments:
                if seg.distance(pt) < 1e-7:
                    result = split(seg, pt)
                    nxt.extend(result.geoms if isinstance(result, MultiLineString) else [result])
                else:
                    nxt.append(seg)
            segments = nxt

        is_oneway = str(row.get("oneway", "no")).lower() in ("yes", "1", "true")
        for seg in segments:
            start, end = Point(seg.coords[0]), Point(seg.coords[-1])
            u, v = node_for(start), node_for(end)
            if u == v:
                continue  # zero-length artifact from over-aggressive snapping
            weight_m = haversine_distance(start, end)
            G.add_edge(u, v, weight=weight_m, length_m=weight_m, oneway=is_oneway)
            if not is_oneway:
                G.add_edge(v, u, weight=weight_m, length_m=weight_m, oneway=False)

    return G

Persisting the result to a spatial graph database keeps the topology static and indexable. Store each node’s coordinate as a native point so the planner can seek it, and MERGE on the canonical id so re-ingestion is idempotent:

// Run once before ingestion. A point index lets later routing queries seek
// the location property instead of scanning every RoadNode.
CREATE POINT INDEX road_node_location IF NOT EXISTS
FOR (n:RoadNode) ON (n.location);

from neo4j import AsyncGraphDatabase

BATCH = 5_000


async def persist_graph(G: nx.DiGraph, uri: str, auth: tuple[str, str]) -> None:
    driver = AsyncGraphDatabase.driver(uri, auth=auth, max_connection_pool_size=20)
    node_q = """
    UNWIND $rows AS n
    MERGE (v:RoadNode {id: n.id})
    SET v.location = point({longitude: n.lon, latitude: n.lat, crs: 'wgs-84'})
    """
    edge_q = """
    UNWIND $rows AS e
    MATCH (u:RoadNode {id: e.u}), (v:RoadNode {id: e.v})
    MERGE (u)-[r:CONNECTS {dir: e.dir}]->(v)
    SET r.weight = e.weight, r.length_m = e.length_m, r.oneway = e.oneway
    """
    nodes = [{"id": n, "lon": d["lon"], "lat": d["lat"]} for n, d in G.nodes(data=True)]
    edges = [{"u": u, "v": v, "weight": d["weight"], "length_m": d["length_m"],
              "oneway": d["oneway"], "dir": f"{u}_{v}"} for u, v, d in G.edges(data=True)]
    try:
        async with driver.session() as session:
            await session.run("CALL db.awaitIndexes(120)")
            for i in range(0, len(nodes), BATCH):
                await session.run(node_q, rows=nodes[i:i + BATCH])
            for i in range(0, len(edges), BATCH):
                await session.run(edge_q, rows=edges[i:i + BATCH])
    finally:
        await driver.close()

How It Works

Read the builder against the topology it produces — three decisions carry the correctness:

Quantization makes coincident points identical, not just close. Rounding to precision=6 (~11 cm) collapses survey jitter so two segments that meet at a corner round to the same (lon, lat) key. node_for keys the node map on that rounded tuple, so the canonical-node lookup is an exact dict hit — no tolerance search, no spatial join. This is the same coordinate-precision discipline that the graph query planner optimization layer depends on downstream.
_crossing_points returns only true crossings. Intersecting each segment with the union of the others (not itself) yields the points where distinct roads physically cross. Splitting there turns an X-shaped pair of linestrings into four atomic edges meeting at one shared node — the planar-graph invariant every shortest-path algorithm assumes.
Direction lives on the edge, never the node. A oneway tag emits one CONNECTS edge; an undirected road emits both. The node holds only geometry. That separation is what lets Dijkstra or A* respect one-way restrictions without re-interpreting bidirectional segments mid-traversal, and it keeps the geodesic weight symmetric for two-way roads.

The Haversine weight is a great-circle measure on the WGS-84 sphere, so length_m reflects real travel distance rather than the degree-space distance a planar Euclidean metric would (mis)report. Once persisted, radius and corridor queries against n.location reuse the pattern described in filtering graph paths by Haversine distance in Cypher.

Common Failure Patterns

1. Duplicate nodes from a precision mismatch. If build_spatial_graph and the consumer round to different decimal places, “the same” intersection maps to two keys and the graph silently fragments. Symptom: inflated node count and unreachable pairs. Assert connectivity right after the build, before persisting:

def assert_topology(G: nx.DiGraph) -> dict:
    deg = dict(G.degree())
    dangling = [n for n, d in deg.items() if d == 1]   # dead-ends OR data errors
    isolated = [n for n, d in deg.items() if d == 0]    # always errors
    assert not isolated, f"{len(isolated)} isolated nodes — precision/snapping bug"
    return {"nodes": G.number_of_nodes(), "edges": G.number_of_edges(),
            "dangling": len(dangling), "components": nx.number_weakly_connected_components(G)}

2. Collinear overlaps leave segments unsplit. When two roads share a stretch of geometry (a divided highway digitized twice, or a ramp tracing a main road), intersection returns a LineString, which _crossing_points skips. Those overlaps must be deduplicated before the build — unary_union the input first, or drop near-identical geometries — otherwise you get parallel edges with conflicting oneway tags. Run unary_union(list(gdf.geometry)) and inspect the result type to detect overlaps early.

3. Over-aggressive simplify drops real intersection vertices. A simplify tolerance set in degrees but reasoned about in meters can delete the very vertex where two roads meet, severing connectivity. Keep the tolerance below your snap precision (here 1e-5 degrees < the 1e-7 split test), and the if u == v: continue guard discards any zero-length artifact a collapsed segment would otherwise create.

Performance Notes

Construction is dominated by the crossing-point pass. _crossing_points does one unary_union plus a per-segment intersection, so for $N$ input segments the practical cost is roughly

$$ C_{\text{build}} \approx \underbrace{N \log N}{\text{union / STRtree}} + \underbrace{N \cdot \bar{k}}{\text{per-segment splits}} $$

where $\bar{k}$ is the mean number of crossing points touching a segment. The naive variant — intersecting every segment against every other — is $O(N^2)$ and becomes the bottleneck well before a city-scale extract; the union-based form above keeps it near-linear because Shapely 2.x indexes the union with an STRtree internally.

Memory is the real ceiling. networkx holds the entire topology in RAM, budgeting on the order of a kilobyte per node and edge once Python object overhead is counted, so a continental extract of tens of millions of edges will not fit. Switch strategies at that scale: stream atomic edges through persist_graph in BATCH-sized chunks straight to the database — the streaming counterpart developed in scaling async graph ingestion with Python asyncio — and drop the in-memory DiGraph entirely, or build per-tile and merge on the canonical node keys. Frequent MERGE on a hot point index also fragments it under sustained ingestion; rebuild during a maintenance window if seek latency drifts, and supply explicit spatial bounds on every routing query so the planner seeks n.location rather than scanning the full graph.

Node and Edge Spatial Mapping — CRS normalization and the geometry-to-topology contract this builder implements
Implementing Geohash vs Quadtree Indexing in Neo4j — index the persisted nodes for radius and tile queries
Optimizing Cypher Query Plans for Spatial Data — make the routing queries seek the point index this graph populates
Scaling Async Graph Ingestion with Python asyncio — stream the built edges past the in-memory ceiling

This guide is part of Node and Edge Spatial Mapping, within the Spatial Graph Database Fundamentals for Python reference.

How to Map Road Networks to Graph Nodes and Edges

Prerequisites & Versions

Implementation

How It Works

Common Failure Patterns

Performance Notes

Related

Related pages

Siblings