GeoJSON doesn't encode the fact that the boundary points are common between adjacent polygons. When you simplify those polygons, each one is handled separately, and you end up with "slivers" where the boundaries are misaligned:
https://www.bing.com/images/search?q=map+slivers+betwen+poly...
TopoJSON solves this by encoding each such boundary only once. So when you simplify the polygons, they are all done together, and the same simplification applies to adjacent polygons. No more slivers!
From what I can tell, the top criticism of GeoJSON is the under-enforced winding order specification, and crossing the antemeridian.
I don't think I would trust a zebra or a giraffe for this task either.
https://maps.app.goo.gl/JH93ko96QcoLXuBJ9
https://maps.app.goo.gl/au53iTnsmNdFuEZV8
Even the one zoomed in on the state appears to use maybe 15-20 vertices max.
In the second one, if I squint real hard I can just barely make out one slight dogleg on the western border and one on the south. And that is partly because I knew to look for them in the zoomed-in map.
If we use, say, the Census TIGER/Line boundary definitions for the states, we are probably talking about hundreds of thousands of vertices, perhaps millions. You won't be using those in an online map without simplifying.
20-25 years ago I worked a lot with map data from otherwise high quality, and sometimes authoritative, sources like the USGS and NOAA that had this non-identical shared boundaries problem (in formats other than GeoJSON). If the format doesn't allow such mistakes to be expressed, then they have to fix their data to publish it in said format.
Is there much work developing or using TopoJSON these days? I haven't seen much about it in a few years.
I'm just saying that for the specific task I mentioned GeoJSON or any format such as shapefiles that store polygons individually naturally leads to the "sliver" problem.
A nice processing pipeline is:
1. Convert GeoJSON to TopoJSON.
2. Run the simplification on the TopoJSON.
3. Convert the resulting TopoJSON back to GeoJSON.
The TopoJSON GitHub has tools for each of these steps.
The dangerous part is that some tools fully assume this and will completely screw with calculations if you’re assuming a flatland CRS. So you’ve got to be careful in checking and setting those parameters.
One nice thing is that the structure of GeoJSON works incredibly well in typescript. It has discriminated unions built in so you can walk entire geodatasets in a pretty comfortable way.
I thought the spec allowed you to specify the CRS, but I just checked the RFC and they removed that from the 2016 specification and WGS84 is specified. It does allow for alternative CRS with prior arrangement, but like you said that does require a lot of care.
If you have old geojson in a different projection, will your library respect the crs field or will it simply misinterpret your data?
Wondering if anyone could shed light on the decision to remove it as a standard when projection seems to be a critical part of GIS.
It seems like they decided to just opt out of trying (see the yellow box in section 4): https://stevage.github.io/geojson-spec/#section-4
I think they should have completely backed off from touching on projections and datums in the format altogether. Ie. Something like, “coordinates are 2 or 3 tuples where the values in order correspond with easting/long and northing/lat and elevation/altitude. See metadata for agreed upon units and CRS/projection semantics. It is strongly encouraged to standardize on WGS84 when encoding data with an earth-resolvable datum.”
Because GeoJSON otherwise works fine for indoor spaces, video game spaces, fictional lands, other celestial bodies, etc. You just have to educate on the idea that there’s more to data compatibility than it being GeoJSON.
Sounds like Amazon
But thankfully there is also the SQLite backed GeoPackage, which is not only more flexible but also much smaller. It takes some extra steps to get testing teams working due to it’s binary nature, but other than that it is the best format in geospatial data analysis.
Long live SQLite!
For example, for map tiles mbtiles (sqlite) files can be used. In many applications though, pmtiles files are better because they allow for http range requests.
[0] https://github.com/OpenDataDE/State-zip-code-GeoJSON/blob/ma... although you can generate newer versions from the last census.
For missing ones you have to fall back to distance based estimates and in my business that means you’re quote may be off and you’re exposed
That said, this is a textbook example of what I have always found so infuriating, personally, about working on commercial software, and one of the many reasons I ultimately moved into a non-software-writing role. The (very sensible and practical) shortcuts and tradeoffs that are commonly made due to time and cost constraints. The attitude of "well the vast majority of our use cases work, so we're done." I've always thought edge cases must be addressed. Something in my brain hurts when I knowingly release something where only 99% of cases work.
I can imagine this is probably the same thing some artists feel when they are commissioned to produce (in their view rushed, flawed, or incomplete) artwork for business purposes.
I only write software at home, as a hobby now, and this gives me the outlet to follow my heart around edge cases!
also i hear your point on swe roles and don't disagree
GeoPackages also allow to set a proper CRS, which is not as easy in GeoJSON IIRC.
Getting your CRSes wrong is fun...
[0] https://docs.postgrest.org/en/v14/how-tos/working-with-postg...
[1]: https://github.com/PostgREST/postgrest/blob/f1d0e8ea2266077d...
[2]: PostGIS has https://postgis.net/docs/AsTopoJSON.html but it doesn't take a record.
I've found it very useful for storing geospatial data over time.
If you have a non-insignificant amount of data points to track this is going to eat just a ton of memory while also being pretty slow to encode/decode.
Imagine, for example, if we encoded this as a binary. First 2 bytes for the feature type, second 2 bytes for the geometry type, 3 bytes for a fixed point x, 3 bytes for a fixed point y, and you could optionally provide the properties as a json blob in a trailing string. That's 10 bytes for all the coordinate stuff. Less bytes than what currently stores the `"type": "Feature"` string.
This is a fair critique, however, for any large GeoJSON, the coordinate arrays will dominate the size. I think it's also safe to assume this data will be gzipped at rest and over the wire, which will eliminate most of the "header" metadata size you mention. As you point out, it would be much more efficient to have a binary format, and there are good examples like these, that are ~2-3x smaller in benchmarks:
https://flatgeobuf.org/ https://github.com/mapbox/geobuf
That said, I think GeoJSON should be compared against other human readable formats like KML, which has a lot of wasted space as well, while being more difficult to read/write.