We were working with Mapbox GL and our app required to have a polygon for every postal code in the US. Of course, we were having performance issues since it had to draw 800MB of geometries in the same layer.

Searching for information on how to improve performance with large topo-json files we found Mapshaper.org. This online service offers a way to decrease the definition and thus the size of the dataset by reducing the vertices of the geometries and polygons in it.

Mapshaper is a software for editing a number of geo-based formats. It has a pretty good set of command line functions that include simplifying geometries, filtering, editing, erasing, etc.

The software is free to use and can be downloaded from GitHub. It also has a public website that can be used to simplify, and convert between multiple formats. It even includes a console where more specific commands are available like filtering, editing and erasing.

Simplifying the dataset

We had to use a large dataset of all the zip codes in the US with a file size around 800MB. When we loaded the file into Mapshaper and zoomed on it, we noticed that the polygons had a huge definition, the smaller ones having more than ~100 vertices.

zip code zoom in

The MapShaper UI offers a slider to simplify the polygons (average/remove vertices) once the ‘Simplify’ option on the toolbar is clicked. Here is an example of what can be achieved with different values at two zoom levels. We used a topo-json dataset of Texas State that can be found here:

tx_texas_zip_codes_geo.min.json

animated example

Resolution is the value of the slider in the site.

By reducing the resolution to 50% there isn’t any change at a moderate zoom level but the file size changes dramatically down to 11.7% of the original size. Even at 5% the polygons only lose details on the tips.

Conclusions

The tool introduces a useful way to reduce the amount of data and therefore the loading time of our applications. But it doesn’t stop there, since it is also able to convert between multiple geo-based file formats. This is very useful when searching for new datasets that sometimes are only available in shapefile or csv formats.