Easy Geoboxing on AppEngine

EDIT: Thanks to Karl, who in the comments pointed out multiple flaws in my implementation. I've fixed them and pushed the updated version to github. /EDIT

Geoboxing is a technique for pre-computing bounding boxes for points of interest. On AppEngine geoboxing allows for fast searching using only equality filters, which saves your single inequality filter for other parameters.

Brett Slatkin released a sample application with an implementation of geoboxing and wrote an article on the AppEngine blog. While the code is free, and works, I found it a little unwieldy for my common use, and a little tough to wrap my head around. I've re-implemented Brett's geoboxing code as a class with a focus on easy integration with the iPhone MapKit framework.

What is a Geobox?

A geobox is a box encompassing all points within, which is easily computed given any point within. Thus given point x we are guaranteed to be able to compute a bounding box, and every point within the box will compute the same bounding box. The bounding box is represented using a string with (in our case) the format top_left_latitude|top_left_longitude|bottom_right_latitude|bottom_right_longitude. Thus we can use a simple string equality filter to check to see if a point of interest is nearby to our current position.

How is Your Implementation Better?

My implementation keeps all geobox logic inside of the geobox2.py file. To compute the set of surrounding geoboxes a user has to simply initialize a new geobox object with the desired latitude and longitude, and request the list of geobox strings surrounding that point.

gb = geobox2.Geobox(latitude, longitude)

geobox_list = gb.storage_geoboxes()
# geobox_list gets stored in a string list property of your model
my_instance.geoboxes = geobox_list
my_instance.put()

Once this info is stored in the the database, searching is similarly simple. Initialize a geobox object with the desired coordinates, and request the (single) geobox string with the span of your map as a parameter.

gb = geobox2.Geobox(latitude, longitude)
box = gb.search_geoboxes(span)

query = MyGeoSearchableModelObject.all().filter("geoboxes IN", box)
results = query.fetch(50)

This implementation is designed for easy searching (and eventual display on) an iPhone MKMapView, thus on the iPhone you can pass latitude, longitude, and span parameters of your mapView.

Customization:

While the implementation can be used as is, there are a few properties you may want to change.

SCOPE_SIZES = [decimal.Decimal('0.00625'), decimal.Decimal('0.0125'), decimal.Decimal('0.025'), decimal.Decimal('0.05')]

Currently the SCOPE_SIZES variable spans from about a block to about 4km (or 2 miles). Change as you see fit.

NUM_PLACES = decimal.Decimal('1.000') # use 3 decimal places

The NUM_PLACES variable determines the number of decimal places will be used in the geobox. Unless you need extremely high accuracy I don't see a need to change this.

MARGIN = decimal.Decimal('0.3')  

The MARGIN variable determines how close to the edge of the bounding box a point has to be before it's neighboring bounding box is also added to the list. A value of 0.3 means one third of the scope away from each edge will trigger adjacent bounding boxes. It wouldn't be advisable to increase this much, since at 0.5 we get the greatest number of results (the box, plus all 8 adjacent boxes) when we need it least (when we are directly in the center of our box.)

License:

This code is licensed under a BSD License. You may use it as you see fit.

Where can I get it?

The code is available on my github repository.

Summary:

My geobox implementation gives you easy, iPhone compatible, location based searching and keeps all the implementation details inside the geobox2.py file.

Comments

finding the adjusted scope

thanks, I appreciate that this is so lightweight and easy to integrate.

one small issue I found is that when finding the right scope to use, it finds the first bounded box that is tighter than the actual distance / scope desired. so, lets say your scopes map to 5, 10, 50 and 100 miles

if you do a search with a scope mapping to 60 miles, it won't find anything between 50 and 60 miles away because it will snap to the 50 mile scope instead of the 100.

IMHO it is better to snap to the smallest containing scope, and then filter in memory if desired.

specifically, I made this adjustment in nearest_scope from this:

    if scope < SCOPE_SIZES[0]:
        adjusted_scope = SCOPE_SIZES[0]
    else:
        for s in SCOPE_SIZES:
            if scope > s:
                adjusted_scope = s

to this:

    if scope >= SCOPE_SIZES[-1]:
        adjusted_scope = SCOPE_SIZES[-1]
    else:
        for s in reversed(SCOPE_SIZES):
            if scope < s:
                adjusted_scope = s

Scope

Thanks, I'll probably integrate that into my code soon.

I've been thinking about how it works over the past couple of days. One of the problems with the code as is, is that to minimize the number of geoboxes in the query, we do our edge detection while saving. This leads to a search scope approximately 1/3 larger than we ask for.

I see a couple of possible alternatives:
1. We decrease the passed–in scope on searches by 1/3 and hopefully get the right amount of results. This is inelegant, but keeps the number of geoboxes in the queries down.
2. We do the edge detection on the searches. This will be easier to understand, but lead to more datastore hits. Bad if you have a popular app. Worrying about this may be a premature optimization though.

Thanks for the suggestion.

close to edge detection over triggered?

Glad you find my finding helpful. Another oddity I just found is that for 11 scopes, my entities are being stored with more boxes than should really occur, like 63. That would mean the close to edge detection is getting detected at nearly every scope and at multiple sides of the box. I'll let you know what I find when I get a chance to look into it.

-Karl

Doh!

OK Karl, You need to know that pointing out my stupid mistakes is not OK. You should just pretend that I'm brilliant, tell me so, and then leave. (Just kidding, thanks again.)

I looked at the code, and it appears to be the round_down function at fault. I had copied it from Brett's implementation without a full understanding of what it did, and not-surprisingly it didn't work properly. I'm in the process of writing some tests that should confirm my fix is good, and then I'll push the new version to github.

Thanks again,
Pat

Very cool!

Thanks for this. -Brett