Place+Names+-+parse+and+recombine

=Raison d'etre= toc This page responds to a request from vyger seeking a way to parse the Standardized Place name using SQLite akin to what might be done in Visual Basic with the InStr function: code ... With the RM geocoded table the Normalized is generally a 4 component field delimited by commas, what I would like is set the Name to the substring left of the third comma and the Abbrev to the substring left of the second comma... code =SQLite now has the INSTR function!= As of SQLite 3.7.15 dated 2012-12-12, the INSTR(X,Y) function was finally included. Until that time, I do not think it would have been possible to parse a comma delimited string in SQLite without using a higher level language, either to extend SQLite or to use SQLite as a data source. Now it has become feasible with those SQLite managers that have incorporated SQLite 3.7.15 or later. Unfortunately, as of this writing, SQLiteSpy has not been updated since 2011 and thus does not support the INSTR function; a new version that will has been promised but no timeline given. To carry out the development of a suitable query, I was fortunate to find that SharpPlus SQLite Developer has undergone recent revision and therefore does support it. So does SQLite Expert but only the paid version of SQLite Developer and SQLite Expert support the fake RMNOCASE collation needed to modify the Place Name. I'm hopeful that a future version of RM//trix// will incorporate the SQLite INSTR function and these queries. =Sample results of parsed Place names=

The results above show some of the contents of a temporary table xPlacePartsTable containing the PlaceID and Normalized columns from RM's PlaceTable, the comma placements within the values of Normalized and the parsing of Normalized into four parts. Because it is directly and uniquely related to PlaceTable via PlaceID, it is easy to assemble the Standardized Name parts and update the working Name and Abbrev accordingly. =Where are the commas? query= This query creates an initial temporary table xPlaceCommaTable with the columns from PlaceID to Comma3: code format="sql" -- PlaceCommaParse.sql /* 2013-02-17 Tom Holden ve3meo Creates a temporary table of non-empty Standardized Place names with the positions of up to three commas in the string. Can be used to parse out the 4 parts of the name for further use such as the generation of a 2-part Abbreviation and 3-part Name for reports. DROP TABLE IF EXISTS xPlaceCommaTable;

CREATE TEMP TABLE xPlaceCommaTable AS SELECT PlaceID ,Normalized ,Comma1 ,Comma2 ,Comma2 + INSTR(SUBSTR(Normalized, Comma2 + 1), ',') AS Comma3 FROM (   SELECT PlaceID        ,Normalized        ,Comma1        ,Comma1 + INSTR(SUBSTR(Normalized, Comma1 + 1), ',') AS Comma2    FROM ( SELECT PlaceID ,Normalized ,INSTR(Normalized, ',') AS Comma1 FROM PlaceTable WHERE PlaceType = 0 AND Normalized NOT LIKE '' )   ); code =Parse the Standardized Place names query= This query uses the initial temporary table to build an extended table with all the columns from the first plus the parsed parts of the Standardized Name as shown in the figure above: code format="sql" -- PlaceParse.sql /*  2013-02-17 Tom Holden ve3meo

Requires existence of table created by PlaceCommaParse.sql. Extracts the parts of a 4-part Standardized Place name and saves them to a temporary table */ DROP TABLE IF EXISTS xPlacePartsTable;

CREATE TEMP TABLE xPlacePartsTable AS SELECT * ,CASE WHEN Comma1 > 0 THEN SUBSTR(Normalized, 1, Comma1 - 1) ELSE Normalized END AS Place1 ,CASE WHEN Comma2 > Comma1 THEN SUBSTR(Normalized, Comma1 + 1, Comma2 - Comma1 - 1) WHEN Comma1 > 0 THEN SUBSTR(Normalized, Comma1 + 1) ELSE '' END AS Place2 ,CASE WHEN Comma3 > Comma2 THEN SUBSTR(Normalized, Comma2 + 1, Comma3 - Comma2 - 1) WHEN Comma2 > Comma1 THEN SUBSTR(Normalized, Comma2 + 1) ELSE '' END AS Place3 ,CASE WHEN Comma3 > Comma2 THEN SUBSTR(Normalized, Comma3 + 1) ELSE '' END AS Place4 FROM xPlaceCommaTable; code =Update Place Abbreviations with two parts of the Standardized Place name= This query writes up to the first two parts of the Standardized Name to the Abbrev column of PlaceTable: code format="sql" -- PlaceAbbrevUpdate.sql /* 2013-02-17 Tom Holden ve3meo Combines up to the first two parts of the Standardized Place name (the Normalized column) and places the concatenated result in the Abbrev column of PlaceTable for Places having non-empty Normalized fields.

Requires temp xPlacePartsTable generated by PlaceParse.sql or equiv. UPDATE PlaceTable SET Abbrev = (       SELECT CASE                WHEN Comma1 > 0                    THEN Place1 || ', ' || Place2                ELSE Place1                END AS Abbrev        FROM xPlacePartsTable        WHERE PlaceTable.PlaceID = xPlacePartsTable.PlaceID        ) WHERE PlaceID IN (       SELECT PlaceID        FROM xPlacePartsTable        ); code =Update the working Place name with three parts= A similar query can set the working Place Name to up to the first three parts of the Standardized name: code format="sql" -- PlaceNameUpdate.sql /* 2013-02-17 Tom Holden ve3meo Combines up to the first three parts of the Standardized Place name (the Normalized column) and places the concatenated result in the Name column of PlaceTable for Places having non-empty Normalized fields.

Requires temp xPlacePartsTable generated by PlaceParse.sql or equiv. AND (fake) RMNOCASE collation. UPDATE PlaceTable SET Name = (       SELECT CASE                WHEN Comma2 > Comma1                    THEN Place1 || ', ' || Place2 || ', ' || Place3                WHEN Comma1 > 0                    THEN Place1 || ', ' || Place2                ELSE Place1                END AS Name        FROM xPlacePartsTable        WHERE PlaceTable.PlaceID = xPlacePartsTable.PlaceID        ) WHERE PlaceID IN (       SELECT PlaceID        FROM xPlacePartsTable        ); code =Further ideas= With these queries as examples, others can be readily developed.
 * 1) Some may prefer just the first part of the Standardized Place name for the Place Abbreviation, i.e., typically the municipality's name alone.
 * 2) Those whose Standardized Names are more typically only 3 levels (e.g., Canadian places when geo-coded have only municipality, province, country) may prefer just two parts for the working Place Name (municipality, province).
 * 3) Perhaps revise the UPDATE query so that the working Place Name receives 2 parts of a 3-level Place and 3 parts of a 4-level place.
 * 4) The temporary table, xPlacePartsTable, when viewed is itself useful for inspecting Standardized Place names for incomplete names. Depending on the SQLite manager, it can be sorted and filtered for special views.
 * 5) xPlacePartsTable itself can be edited. Thus Place//n// values can be changed in it and UPDATEs run to propagate changes to the RM database. Note that TEMP tables are lost when the SQLite manager that created them closes its connection to the database nor are they available to another SQLite manager open at the same time.