What Is GIS And Why I Love It

I’ve always loved Geographic Information Systems (GIS) as a branch of technology. I’ve worked with them in a professional capacity for about 8 years and had a focus on highly specialised data analysis and manipulation using GIS for over 5 of them. Of course, describing it succinctly and interestingly is one of the unanswered hard questions in the field. And getting non-GIS IT people to understand what makes it such a specialised sub-field of computer-booping can be difficult, especially when you want to capture the vagaries and complexity inherent in the subject.

What is GIS for the average shmuck?

In it’s simplest form for you to describe to someone you meet on the street it’s just mapping shit into real world locations. Food delivery app shows you restaurants near you? That’s GIS. Weather app telling you when it’ll start pissing down in your area? You guessed it babe, GIS. Grindr? Hell yeah fuckin' GIS.

Note: Those “meet hot mature singles in your area” ads are not GIS. It turns out they always say that no matter where you are, you can try this yourself with location spoofing unless I am wrong and Wilkes Land, Antarctica is actually MILF/DILF central.

Still keeping it relevant to the person you met on the street, you can point out that the maps on their smartphone are GIS all the way down. Think about all the data they capture and display and how it all builds upon itself to allow complex features all rooted in geography.

The location of all the libraries, cafes, points of interests, bookstores, nightclubs, bookstores, and post offices; as well as all the opening hours being matched to your current location (usually in a time zone demarcated by geographical borders).
Parks, lakes, beaches, roads, and building/property boundaries.
Of all the directly above, which ones can be transited and in what manner of transport (walking, cycling, speedboat, car, plane, parkour).
Combining all these, how to get from my favourite bookstore to an excellent cocktail bar on foot in the shortest length of time without violating trespassing law since I am not a crow.
Ensure the route I take is not too congested by using sourced traffic data which is based on the number of people on each path segment on all possible routes.

Hopefully by now the person on the street understands how GIS affects their everyday life and also ignored the thing about hot singles ads. That’s a weird thing to lead with when talking to a stranger. We’ll skip that next time.

What is GIS for the average shmuck who has opinions on IDEs?

The kind of people who argue about whether Emacs or Vim is better for writing code instead of actually writing the code they need to might not realise the challenges involved in GIS. To a lot of our ilk, anything we don’t have a lot of experience with is probably pretty easy; from design and marketing to finance to such simple topics as knowing what date it is or the constraints that make up a person’s name.

But if that is us and we want to build a cool app to find nearby cafes it shouldn’t be difficult, right?

Let’s develop it from first principles!

We just need two float fields on our “cafes” and “users” tables; one for latitude, one for longitude. All our problems solved!*

*No problems are solved at this point. In fact, many more problems are created.

In very broad terms we could just store the location of every cafe that way? Then we show the distance to the nearest cafe in our handy app, that’s so easy to build.

Oh damn, the nearest cafes to our users are across a river and most of them aren’t willing to go for a swim at 08:30 for their pre-work coffee. I guess we need to store uncrossable boundaries like rivers. But you can collapse a river into a polygon! You can store all the points of the river as lat/longs in another table, then you can create a polygon from those points and any line from a user to a cafe that crosses that polygon can be discarded as an unsuitable location.

Well shit, it turns out that rivers are not usually convex polygons, they tend to have sections where a line drawn between two points will be outside the polygon. But I guess we can store the order of points to make the polygon so when we reassemble it it’s correctly formed and then check the line?

Now we’ve got the latitude and longitude and found a cafe, so it’s only a 0.002˚ walk to the cafe for that user! Although they submitted a complaine that they don’t know what that means, so let’s just convert it to metres… Oh okay apparently this doesn’t work because everything skews depending on whether our users are in Lichtenstein or Australia because the Earth is an oblate spheroid. By now your average developer is kinda wishing that all the Flat Earth Society members around the globe were correct, as it would make this much easier. Developers are second only to physcists in imagining a spherical cow.

Some open source packages should solve our problems.

Hell yeah let’s just `import gis` in Python.

Screw it, other people have worked out the hard part already. Let’s just use GDAL, OpenLayers, and PostGIS and she’ll be right. Now we can use fancy geography functions to handle points (our cafes and users) and polygons (rivers, international borders, classified military installations, and other obstructions our users complain that they can’t cross) and use fancy intersection queries to make sure they can get to their destination. We can easily display this in a cute webmap.

We haven’t got in route planning yet but we’ll do that later, I think that should only be 2 story points.

Unfortunately all our cafes and users appear in the ocean off the coast of Ghana? This was correct for the popup cafe at Null Island but not for anywhere else. After much dicking around we discover the submitted location entries are in WGS84 (EPSG:4326) while our web mapping provider uses Web Mercator (EPSG:3857) and now the developers are panicking at the amount they have to learn and the number of hacks they have to write to change coordinate reference systems on the fly. Thankfully projection libraries can sort that out.

Now we want to overlay drone photos on the Google-provided satellite basemap in our app so people can get a perfect view of their cafe. Unfortunately all the photos are skewed, none of them line up with the imagery? At about this time we discover that global projections like WGS84 and Web Mercator still have issues with skew over time and also accuracy issues given the problem of assigning one coordinate system across the entire globe. Thankfully it looks like there’s over 5000 other coordinate systems we can use and reproject depending on the location. It turns out that all those cartographers and surveyors who spend years designing coordinate systems that help keep things accurate in their areas (and times!) of the globe were doing it for a reason.

By now your average developer is starting to sweat and understand why this is a complex field.

Beyond Consumer Maps

A lot of what I’ve discussed so far is related to vector data; discrete collections of points used for marking locations in the real world of objects that we are representing in abstract. And we can already see that there is a fractal of complexity as we dive further down. As a side note, Benoit B. Mandelbrot’s development of fractal geometry came from the study of geography in the form of the coastline paradox.

But there’s so much more breadth and depth to GIS which buoys my love for it. When we start adding raster data (images mapped to real world locations) we open up whole new fields.

Scientific And Operational Analysis

Brilliant remote sensing and earth science people realised that using satellite imagery, especially with more bands than the red, green, and blue of the visible light spectrum, they can derive new measures like how dense greenery is in an area so you can tell the pasture growth rates in paddocks. Another measure is the soil-adjusted vegetation index which can inform you of the health of different vegetation by taking repeated measurements over time in areas you’re assessing for ecological damage due to climate change or drought.

You can also bringing in multiple sources of data and combine them, such as wind observations and predictions along with the amount of unburnt land to determine the likely paths of bushfires. Combined with your understanding of roads and the locations of population centres you can know when to warn people to evacuate or where to send fire crews to be most effective. Or maybe you track the population of age brackets in different suburbs to find which school’s need to have increased capacity based on where you’re likely to see more young families. Or find the most cost- and time- effective route for your band’s national tour of Australia that hopefully includes Perth.

Managing All This Data

From a developer and sysadmin standpoint there’s also so many things you can and need to consider when handling GIS data. Using our earlier cafe finding app example we need to think about how to cleverly index that data so any queries can be done quickly. We need methods of speeding up those queries so maybe we need to simplify some polygons but how do we limit how our simplification algorithm affects accuracy of our results? How do we effectively limit our queries so we’re not sending megabytes of location data to a user on a slow connection to show them the map? Often you can preprocess some of it into a JPEG or PNG but what decisions do you make about when, which images do you cache so you’re not doing it a million times a minute and how do you slice it up or recombine it so that a user checking the app after walking five metres doesn’t trigger an entire rasterisation process?

When you start handling imagery data the challenges become even greater. Satellite images are commonly between 200MB and 10GB so how do you ingest and store that quickly? How do you handle it when you get a new image every few minutes? With that much data you have massive concerns about your storage bill, let alone your egress charges if you want to serve it back out again.

What if someone wants to view some of that imagery? They’re zoomed in to an area the size of a house but your 1GB satellite image covers the whole suburb. How can you quickly slice out just the section they’re looking at without having them get frustrated or bored waiting and close. There’s a lot of work in formats like ECW and GeoTIFF that allows people to extract just the information they want, but now you as the GIS developer need to make sure your imagery conforms or find a way to transcode it efficiently. You also have the opposite problem of someone wanting to look at the whole of a satellite image on their mobile phone without downloading it - now you need to think about overviews and zoom pyramids so you’re not crashing their phone trying to render 10 billion pixels when their phone is only 1080x1920.

The analysis side of it is also full of amazing challenges. When you’re dealing with a terapixel image you can’t just load it all in RAM so how do you extract windows for rapid processing? How do you compare images from two different drone captures over time when they have a sub-pixel misalignment?

Conclusion

From my descriptions above you can also see where it intersects with other complex fields like time zones, geopolitical boundaries and the inherent laws and regulations of countries (border crossings, restricted airpsaces, etc), temporal data in general (that cafe closed 10 years ago buddy), queueing and routing theory for route planning, etc.

All of these challenges and many more are what makes it such a complex and interesting field. There’s so much it is used for now, so much more it can be used for, and so much more we can do to make it easier for people to access the data.

2023-08-14 Edit: Fixed spelling errors thanks to the good eyes of Shaggy and Shmouf.