PHP Shapefile 3.4.0
Invalid/corrupted Shapefile recovering, GeoJSON Feature properties support and more
23 January 2021
What’s new in Version 3.4.0
- Capability to ignore DBF and SHX files to recover corrupted Shapefiles
- Full GeoJSON Feature support with properties data
- Improved handling of Logical fields in DBF files
- Handling of unspecified bounding box in SHP and SHX file headers for empty Shapefiles
- Correct behaviour with DBF files for empty Shapefiles
- Increased tolerance coefficient to deal with extremely small areas when determining ring orientation
Capability to ignore DBF and SHX files to recover corrupted Shapefiles
While on one hand all three SHP, SHX and DBF files are deemed mandatory, on the other hand there are certain cases where one would like to ignore specifically DBF and/or SHX files, especially when dealing with incomplete or corrupted Shapefiles. New ShapefileReader boolean constructor options Shapefile::OPTION_IGNORE_FILE_DBF
and Shapefile::OPTION_IGNORE_FILE_SHX
allow exactly that. It is worth noting that:
- When setting
Shapefile::OPTION_IGNORE_FILE_DBF
totrue
, data and fields definition will not be available. - When setting
Shapefile::OPTION_IGNORE_FILE_SHX
totrue
, the library relies on record headers content lengths values and assumes there are no unused bytes between records in SHP file. Random access to specific records will not be possible,ShapefileReader::getTotRecords()
method will output special valueShapefile::UNKNOWN
and callingShapefileReader::setCurrentRecord
method will raise aShapefile::ERR_INPUT_RANDOM_ACCESS_UNAVAILABLE
ShapefileException.
Full GeoJSON Feature support with properties data
Geometry::initFromGeoJSON()
method will now correctly load GeoJSON Feature properties into Geometries. Also, parsing is more robust and Shapefile::ERR_INPUT_GEOJSON_NOT_VALID
ShapefileException offers some details in case of malformed GeoJSON.
Improved handling of Logical fields in DBF files
When reading Shapefiles, values "1"
and "0"
are now considered truthy and falsy respectively. These are not in the standard but it seems some wild software out there is using them.
When writing Shapefiles, the encoding and handling of different values and data types passed as input to Geometries has been improved: numbers are loosely casted to bool before conversion, truthy and falsy string values are stricly checked against allowed ones (the first non-trimmable char is being used, as before) and anything else is considered as null
or not initialized.
Handling of unspecified bounding box in SHP and SHX file headers for empty Shapefiles
For broader compatibility with external software and file sources, ShapefileWriter
class uses negative Shapefile::SHP_NO_DATA_VALUE
(which becomes positive!) for unspecified bounding box min coordinates values and regular Shapefile::SHP_NO_DATA_VALUE
for max ones, while ShapefileReader
class ignores altogether the bounding box for empty Shapefiles.
Correct behaviour with DBF files for empty Shapefiles
Actually quite a corner case, but the library now allows to change the data structure (i.e.: adding fields) of an empty Shapefile with no records open in Shapefile::MODE_APPEND
mode.
Increased tolerance coefficient to deal with extremely small areas when determining ring orientation
I had thought about changing current algorithm in favour of the method described in this Wikipedia article that I believe it was used by JTS algorithm::Orientation::isCCW()
and its GEOS port algorithm::Orientation::isCCW()
before they switched to a more complex one.
The algorithm they use right now looks more refined and is said to be more robust, nonetheless I decided not to do a dumb copy/paste port without deeply understanding what is actually happening there and why, especially because it appears that JTS falls back to polygon area computation anyways for complex cases, which supposedly gives more accurate results with a broader range of invalid polygons/rings.
My mathematical knowledge and available time to study the subject is very limited, so I decided to simply increase the tolerance coefficient in order to improve the support for extra small areas and not to throw too much time into a pure performance update that doesn’t make much sense right now.
Download and documentation
Go to the Lab page: PHP Shapefile