Geospatial Queries in NoSQL: Precise Location Search

How to implement efficient proximity searches in a NoSQL database using Geohashes.

Geospatial Queries in NoSQL: Precise Location Search

We have all been there. You choose a NoSQL solution like Google Cloud Datastore or Firestore for its schema-less flexibility and automatic scaling. Then, the inevitable requirement drops: “We need a ‘Find Stores Near Me’ feature.”

If you were running PostgreSQL, you would simply enable PostGIS and move on. However, in the NoSQL ecosystem, specifically with older Datastore implementations or pure key-value patterns-native geospatial support is often limited, prohibitively expensive, or completely nonexistent.

I encountered this exact bottleneck while optimizing a high-traffic locator service. Our initial “naive” implementation fetched entire state-level datasets and filtered them in memory. As our data swelled to over 10,000 records per region, timeouts became the norm rather than the exception. We didn’t need a new database; we needed a smarter algorithm.

In this post, I will share how we solved this using Geohashes. This battle-tested pattern allows you to query 2D spatial data using 1D string lookups, drastically reducing query costs while boosting performance.

The Problem: Why “Naive” Filtering Fails

When you first build a locator, the dataset is usually manageable. It is tempting to fetch all records for a generic region (like a ZIP code or State) and calculate distances on the client.

However, as your service scales, this approach hits a hard wall:

  • Network Latency: Fetching 5,000 records just to display the 10 closest ones is a massive waste of bandwidth.
  • Memory Overhead: Parsing megabytes of JSON can freeze mobile UIs or crash backend services.
  • Cost: Many NoSQL databases charge by the read. Client-side filtering means you pay for thousands of reads that the user never actually sees.

We needed a surgical approach: querying only the bounding box relevant to the user, relying on the limited query capabilities of a document store (specifically, equality filters and range comparisons).

The Solution: Geohashes

The solution is Geohashing.

For those unfamiliar, a Geohash encodes a geographic location (latitude/longitude) into a short string of letters and digits. It works by recursively dividing the world into a grid:

  • g covers a massive chunk of the planet.
  • gc is a smaller chunk inside g.
  • gcp is smaller still.

Why this works for NoSQL

The elegance of Geohashes lies in their prefix-based proximity: shared prefixes indicate physical closeness. If two points share a long prefix (e.g., dr5ru...), they are neighbors.

This transforms a complex two-dimensional spatial query into a simple string range query, a task that NoSQL databases handle with incredible efficiency.

Implementation Guide

We utilized the geofire-common library (available via npm). It abstracts away the complex mathematics required to calculate bounding boxes.

1. The Write Path: Storing the Hash

When you create or update your “Store” or “Service Location” record, you must compute the hash and save it alongside the coordinates.

import { geohashForLocation } from "geofire-common";

const lat = 40.7128;
const lng = -74.006;
const hash = geohashForLocation([lat, lng]);

// Save 'hash', 'lat', and 'lng' to your NoSQL document
// We use the hash for queries and lat/lng for precise distance comparisons
await db.collection("locations").doc("nyc-store").update({
  geohash: hash,
  lat: lat,
  lng: lng,
});

2. The Read Path: Calculating Bounds

Here is where the optimization happens. When a user requests “Stores within 5 miles,” we do not scan the entire table. Instead, we calculate the Geohash “bounds” for that specific radius.

Due to grid alignment, a circular search area might overlap up to 9 different geohash grid squares.

import { geohashQueryBounds, distanceBetween } from "geofire-common";

const center = [lat, lng];
const radiusInMeters = 5 * 1609.34; // 5 miles

// Get the start/end keys for the relevant grid squares
const bounds = geohashQueryBounds(center, radiusInMeters);

3. Querying and Filtering

You will likely need to execute a few parallel queries, one for each bound, and merge the results.

Crucial Note: Geohashes are rectangular, but your search radius is circular. You will encounter “false positives” (results in the corner of the square that are outside your circle). You must filter these out on the client side using a precise distance calculation.

Optimizing the UX: The “Expanding Ring” Pattern

Simply querying a static radius often results in a poor user experience. In a dense city, 5 miles might yield 50 results, whereas in a rural area, it might yield zero.

To solve this without hammering the database, we implemented an expanding ring algorithm:

  1. Attempt 1: Query for the nearest 25 records within 5 miles.
  2. Check: Did we get fewer than 25 results?
  3. Attempt 2: If yes, double the radius to 10 miles and query again.
  4. Repeat: Continue expanding (20, 40, 80 miles) until the threshold is met or a max limit is reached.

This ensures urban users get fast results without over-fetching, while rural users still find what they need.

Real-World Edge Cases

Implementing this in production revealed a few “gotchas” that most tutorials skip.

1. The Floating Point Trap

Never blindly trust coordinates provided by CSV uploads or legacy systems. As I discussed in my previous article, “The 15-Mile Mistake”, truncating float precision on your latitude/longitude renders the Geohash inaccurate.

  • If your DB stores 41.8 instead of 41.88203, your Geohash will point to a completely different grid square.
  • Rule: Always sanitize and validate coordinate precision before generating the hash.

2. Data Staleness

Store location information might change. If you rely on third-party APIs to source your locations, you need a strategy for incremental updates.

  • We check the “last updated” timestamp on reads.
  • If a record is “stale” (e.g., > 6 months old), we trigger a background job to re-fetch the geodata from the source of truth (like Google Maps or the provider API) to ensure the geohash remains accurate.

Conclusion

You don’t always need to migrate to a geospatial-native database to unlock high-performance location features. By leveraging Geohashes and libraries like geofire-common for the JavaScript ecosystem or thephpleague/geotools for PHP, you can build a robust locator service directly on top of Datastore, Firestore, or any other NoSQL database.

It is efficient, cost-effective, and scales effortlessly with your traffic.

If you are struggling with messy coordinate data, I highly recommend reading my deep dive on Float Precision errors.

Have you implemented Geohashing in a non-standard DB? Let me know your war stories in the comments below.