Efficient Real-World Route Searching
In this assignment, students implement a real-world route searching algorithm for HuskyMaps, a derivative of the street mapping application Bear Maps (Nifty Assignment 2018). The assignment applies several topics commonly taught in CS2, including priority queues, graphs, shortest paths algorithms, binary trees, tree traversals, testing, and debugging large programs. After building HuskyMaps in a previous homework, this homework challenges students to improve upon the relatively simple A* route search algorithm by implementing contraction hierarchies, a 2008 algorithm for efficient routing based on the idea that many shortest paths queries share the same major roadways. This can be done in 3 phases: (1) compute an importance heuristic for each vertex; (2) add new shortcut edges by contracting each vertex in the order given by the importance heuristic; and (3) precompute the shortest paths on the augmented graph from each vertex to increasingly important roadways. Afterwards, a source-to-destination route search query can be answered by finding the common most-important roadways (midpoints) and reconstructing the best path from source to midpoint to destination. Reconstructing the path is an application of inorder tree traversal since each shortcut edge contains two smaller edges, both of which can also be shortcut edges. Each part of this assignment can be scaffolded. Unit tests on a small, manually-computable graph provide immediate, formative feedback to students as they develop their implementation. Randomized route search queries serve as a stress test for the correctness and efficiency of their implementation, which they can compare against their earlier A* search algorithm implementation.
|Summary||Students implement a real-world route search algorithm by preprocessing the graph, precomputing partial shortest paths, and responding to queries using the partial shortest paths as input. They must implement 2 new classes and integrate their optimized data structures and algorithms into a large, existing code base.|
|Topics||priority queues, graphs, shortest paths algorithms, binary trees, tree traversals, testing, debugging|
|Difficulty||Medium or hard depending on scaffolding.|
|Strengths||Algorithm refinement. Students previously solved route search with A* search. In this assignment, students implement a new approach to single-pair shortest paths that combines several ideas from across CS2. Instead of relying on an instructor-provisioned autograder, students can use their own A* search as a correctness oracle.
Crisp algorithm with a clean implementation. In order to get high performance, some graph algorithms can be very complicated to learn or implement. Contraction hierarchies arise from a simple insight: many of our shortest paths queries share the same major roadways. Despite implementing a new algorithm in a large code base, the suggested implementation is cleanly modularized from the rest of the project and requires no API hacking.
Extending a large code base. It can be difficult for instructors to rationalize why students should care about software engineering concerns, such as code style or composition, so long as their code gets the job done. Because students need to extend the functionality of a previous assignment, students can realize some of the benefits of these principles.
Progressive testing and debugging process. The assignment is scaffolded with three levels of testing: a set of test cases on a small, manually-computable graph that students can solve by hand; a set of local integration tests that check for problems in a limited number of scenarios; and random testing with their A* search oracle.
|Weaknesses||Computationally-intensive. While the overall memory usage for working with the augmented graph is only a constant factor increase, graph preprocessing time can take up to 30 minutes on the provided
Depends on HuskyMaps. Adopting this assignment is a large undertaking because it extends HuskyMaps. It is possible to implement contraction hierarchies without the HuskyMaps project, but the loss of interactivity may reduce student interest and motivation.
|Libraries||Spark, Gson, WebSocket, algs4, JUnit, Hamcrest, Apache Commons, SLF4J. All required dependencies are included.|
|Variants||Improved priority heuristic. The Deleted Neighbors heuristic is a simple and effective heuristic at encouraging spatial diversity, but can be improved with ideas from graph theory to produce better contraction orderings.
Directed edges and travel speed. All edges in the graph are inserted in both the forward and backward directions regardless of the actual roadway direction. Edge weight is given as a function of distance alone even though road speed is available in the dataset.
Parallelism. Different subproblems in the algorithm can be parallelized to different extents: some are embarrassingly parallel, others have some concurrency issues, and yet others require going back to the drawing board entirely.
Swap student implementations. Instead of implementing contraction hierarchies on top of their own code base, have students swap projects to highlight the usefulness of a cleanly-designed API.
While many problems in graph theory can be solved with a combination of simple algorithms such as DFS, BFS, and Dijkstra’s algorithm—contraction hierarchies included—it’s sometimes the case that faster solutions require specialized graph augmentations or data structure modifications. For example, it’s possible to solve in two different but closely related ways the special case of single-source shortest paths in an undirected, weighted graph G = (V, E) where each positive integer edge weight w is bounded by a constant k.
On a 2015 CS2 final exam, significantly fewer students used the first approach despite its simplicity.
Our CS2 course spends relatively more time on data structure refinement but comparatively less time on graph algorithm optimization strategies. While class time is spent on iteratively improving tree data structures from ordered linked lists to binary search trees to balanced search trees, it is rare for students to see an equivalent process for graph algorithm refinement since our curriculum introduces different graph algorithms as (typically) solving different graph problems.
Iterative refinement also enables more robust, student-driven testing of their own programs. By relying on their A* search algorithm as a correctness oracle, students can compare the two implementations to evaluate correctness and query runtime.
While the graph preprocessing step is a tractable problem, it is annoyingly slow on large graphs. The full
seattle graph, for instance, can take nearly 30 minutes to preprocess on a single thread. Continuing with the theme of the HuskyMaps project and the themes of CS2 as a whole, we’d like the algorithm to run in a reasonable amount of time on planet-scale graphs as well.
The graph preprocessing step can be parallelized but only in certain places where the result of an operation does not depend on the results of other operations. Improving parallel graph preprocessing runtime beyond a certain point requires a different approach to graph contraction, e.g. an approach based on computing independent level sets in the graph. Parallelizing contraction hierarchies involves judicious application of parallel threads and identification of concurrency bugs. The ceiling for runtime optimization depends on how much graph theory the instructor would like to introduce.
See Contraction Hierarchies briefly explained for a primer on the contraction hierarchies algorithm. What follows is a high-level implementation outline for the assignment without discussion of idea or correctness.
Suppose we have an undirected graph G = (V, E) with non-negative edge weights representing the real-world distance between neighboring vertices. A shortcut edge through vertex v is a pair of edges (u, v) and (v, w) where (u, v, w) is the shortest path from u to w. Shortcut edges are a recursive, binary tree-like structure as either edge can also be a shortcut edge.
At query time, lookup the precomputed shortest paths for the source and destination. Find the set intersection of the most-important roadways. Then, identify the most-important roadway (midpoint) with the lowest total distance from source to midpoint to destination. Finally, reconstruct the shortest path from source to destination.
Components include the JavaSpark web framework and Google Gson library. Alan Yao for creating the original version of this project. Colby Yuan and Eli Lipsitz for substantial refactoring of the front end. Rahul Nangia for substantial refactoring of the back end.