PostGIS normalize_geometry
PL/pgSQL function to remove spikes and simplify geometries with PostGIS
Contents
- Get it
- Background
- Synopsis
- Input parameters
- How normalization works
- Return value
- Examples
- History
- Comments and Ideas
Get it
GitHub
View project on GitHub or download the latest release.
Background
The initial proof-of-concept about checking the angles between adjacent points was provided by the well-known Andreas Schmidt & Nils Krüger’s function Spike-Remover (kudos to them).
But I felt that wasn’t enough for a generic purpose normalizing function. It produced some false positives and failed to detect many other cases which ultimately made it unusable for me.
I wrote a brand-new algorithm integrating the angle checks with area and points distance ones, building a production-ready function that can be safely used in automated queries.
What it does
The function decomposes any POLYGON
, MULTIPOLYGON
and MULTILINESTRING
into single LINESTRING
, then it iterates over all the points taking into account 3 at a time. When conditions are met depending on the input parameters, the point which produces the unwanted condition is removed.
This effectively removes spikes and points lying on the same straight line, producing normalized geometries.
Synopsis
geometry normalize_geometry(
PAR_geom geometry
, PAR_area_threshold double precision
, PAR_angle_threshold double precision
, PAR_point_distance_threshold double precision
, PAR_null_area double precision
, PAR_union boolean DEFAULT true
) geometry;
Input parameters
PAR_geom
The input geometry. It can be any PostGIS geometry type, but the function will actually do something only on POLYGON
, MULTIPOLYGON
, LINESTRING
and MULTILINESTRING
types, while passing through unchanged anything else.
PAR_area_threshold
Expressed in square units
, depending on the geometry SRID. For example, if the geometry SRID is an UTM one, this value is assumed to be expressed in square meters.
PAR_angle_threshold
Expressed in decimal degrees
(half a degree is 0.5
and a round angle is 360
).
PAR_point_distance_threshold
Expressed in linear units
, depending on the geometry SRID. For example, if the geometry SRID is an UTM one, this value is assumed to be expressed in meters.
PAR_null_area
Expressed in square units
, the same as PAR_area_threshold
.
PAR_union
Set this parameter to false
if you whish to recollect single parts of multigeometries using ST_Collect()
instead of ST_Union()
. More on that here.
What about the other geometry types?
POINT
and MULTIPOINT
are returned unchanged: a POINT
is as simple as it gets, while a MULTIPOINT
cannot have a spike by definition and if it has more than one point with the same coordinates then it isn’t a valid geometry, thus you need to make it valid, not normalize it.
GEOMETRYCOLLECTION
are also returned unchanged: this is because I never use GEOMETRYCOLLECTION
and I didn’t bother about treating them. Actually, to me it doesn’t sound like a brilliant idea to work and normalize directly a GEOMETRYCOLLECTION
, as too many unexpected results may arise. Maybe I will spend some time thinking about it in the future.
How normalization works
The function analyzes all the adjacent points in the input geometry in groups of three.
Now imagine a triangle is drawn connecting those three points. The central point of a group is removed in one of the following cases:
- The area of the triangle is smaller than
PAR_area_threshold
and the angle corresponding to the central point is smaller thanPAR_angle_threshold
. - The area of the triangle is smaller than
PAR_area_threshold
and the angle corresponding to the first or the last point is smaller thanPAR_angle_threshold
while the distance between the other two points is smaller thanPAR_point_distance_threshold
. - The area of the triangle is smaller than
PAR_null_area
, regardless of the angles.
A more technical explaination
Considering 3 adjacent points Pn-1
, Pn
and Pn+1
, the point Pn
will be removed in one of these cases:
Case 1 - Removing spikes
Both the following conditions must be met:
- The area obtained connecting those points (i.e.: the area of the triangle formed by
Pn-1
,Pn
andPn+1
points) is equal or smaller thanPAR_area_threshold
. - The angle in
Pn
is equal or smaller thanPAR_angle_threshold
OR
the distance betweenPn-1
andPn
is equal or smaller thanPAR_point_distance_threshold
and the angle inPn+1
is equal or smaller thanPAR_angle_threshold
OR
the distance betweenPn
andPn+1
is equal or smaller thanPAR_point_distance_threshold
and the angle inPn-1
is equal or smaller thanPAR_angle_threshold
.
Case 2 - Removing point lying almost on the same straight line
The area obtained connecting those points (i.e.: the area of the triangle formed by Pn-1
, Pn
and Pn+1
points) is equal or smaller than PAR_null_area
.
Some considerations and uses of PAR_null_area
parameter
Leaving the parameters PAR_area_threshold
, PAR_angle_threshold
and PAR_point_distance_threshold
set to 0
and using only the last parameter PAR_null_area
, some interesting results can be obtained: the specific value 0
for PAR_null_area
will remove all the useless points lying exactly on the same straight line, while providing any value greater than 0
for PAR_null_area
the function can effectively be used to simplify your geometries. See Example 3.
If (for any undisclosed reason) you don’t even want to remove the points lying on the same straight line, use a value smaller than 0
for PAR_null_area
(e.g.: -1
) and this feature will be disabled.
It is implicit that every polygon or polygon inner ring whose area is smaller than PAR_null_area
will be entirely removed.
Return value
The output of the function is a normalized geometry
(what else did you expect?). In case a geometry is entirely removed (less than 3 points are left for a polygon, less than 2 points are left for a linestring, a polygon area is smaller than PAR_null_area
) NULL
will be returned.
By default, the output is wrapped by PostGIS function ST_Union() which is used to recollect the single parts of MULTI*
geometries given in input (remember that input geometries are decomposed into single LINESTRING
to perform the normalization).
While this should usually be convenient, there might be other cases where one would like to treat the single parts of multigeometries separately (e.g.: not necessarly dissolving multipolygon’s boundaries, etc.). Then just set PAR_union
to false
and ST_Collect() will be used instead of ST_Union()
. A ST_Dump
will then provide all the single parts. See Example 2.
Examples
Example 1 - Basic usage
Those parameters are the ones I use most of the times:
SELECT normalize_geometry(t.geom, 0.5, 0.5, 0.005, 0.0001) FROM my_table;
They will successfully normalize geometries like:
LINESTRING(0 0, 2 2, 0 4, -5 4, 0 4.001, 2 6)
POLYGON((0 0, 1 1, 2 1, 3 0.5, 2 -3, 3 0.499, 0 0))
into:
LINESTRING(0 0, 2 2, 0 4, 2 6)
POLYGON((0 0, 1 1, 2 1, 3 0.5, 0 0))
Example 2 - Filter single parts of multigeometries
Setting the parameter PAR_union
to false
, the output will be recollected with ST_Collect()
, so a ST_Dump()
can be used to obtain the single parts of multigeometries:
SELECT (ST_Dump(normalize_geometry(geom, 0.5, 0.5, 0.005, 0.0001, false))).geom FROM my_table;
If you are on PostgreSQL 9.3+, here is what your average query would look like, taking advantage of LATERAL
and including some filtering condition:
SELECT n.geom [, other-fields]
FROM my_table AS t
CROSS JOIN LATERAL (SELECT (ST_Dump(normalize_geometry(t.geom, 0.5, 0.5, 0.005, 0.0001, false))).geom) AS n
WHERE [some-condition];
If you are on an older version of PostgreSQL, a subquery will do:
SELECT * FROM (
SELECT (ST_Dump(normalize_geometry(geom, 0.5, 0.5, 0.005, 0.0001, false))).geom [, other-fields]
FROM my_table
) AS subq
WHERE [some-condition];
Example 3 - Use the function to simplify geometries
Both ST_Simplify() and ST_SimplifyPreserveTopology() use the Douglas-Peucker algorithm. Who does really (I mean, FOR REAL!) know how their tolerance
parameter works and is able use it in a predictable way? Personally, I’ve had some disastrous results with them.
Using normalize_geometry
with all threshold parameters set to 0
and focusing just on PAR_null_area
, we have a pretty powerful geometry-simplifying function, try it!
-- Remove only points lying on the same straight line
SELECT geometry normalize_geometry(geom, 0, 0, 0, 0) FROM my_table;
-- A higher PAR_null_area value will simplify the geometry removing more points
SELECT
normalize_geometry(geom, 0, 0, 0, 0) AS geom_A
, normalize_geometry(geom, 0, 0, 0, 0.2) AS geom_B
, normalize_geometry(geom, 0, 0, 0, 0.5) AS geom_C
, normalize_geometry(geom, 0, 0, 0, 1) AS geom_D
, normalize_geometry(geom, 0, 0, 0, 3) AS geom_E
FROM (
SELECT ST_Buffer('POINT(0 0)', 10, 12) AS geom
) AS s;
History
28 May 2020 - Version 1.2.1
18 July 2018 - Version 1.2.0
6 Dicember 2016 - Version 1.1.0
2 June 2016 - Version 1.0
Comments and Ideas
Want to leave a comment or give me an idea? Send me an or use the comments section below.