Events+-+Merge

toc =Intro= [|John_James] posted these requests for help in a [|post] to the Home page: Let's see what we might do.
 * 1) What I would like to do now is clean some data where indi’s have a death event and the gedcom contributor has used a custom “Cause of Death” event so in other words “Details” from “EventType 1021” into “EventType 2” where OwnerID is the same.
 * 2) The other thing I could achieve myself at present would be to change all EventType 1021 to EventType 2 but then I would have duplicate events to deal with and that is another query I would find most useful in the future.

=Copy Description from Cause of Death Event to Death Event Description= The Descriptions are contained in the Details column of EventTable, a TEXT type. We know the EventTable.EventType is 2 and 1021, respectively, for Death and Cause of Death (the second a custom FactType whose ID number may vary with subsequent or other imports. The only Death events we wish to modify are those for which there is a corresponding Casue of Death event, i.e., for a common person, designated by the common OwnerID. Because both Death and Cause of Death are solely individual FactTypes, not Family or other, we can safely ignore OwnerType. It's possible there might be more than one Death event and/or Cause of Death event for a person but we will assume that is not the case. Let's see how many pairs of the two events we have. Here is where the JOIN command is necessary.

Explicit JOIN
code format="sql" -- List of Death events with corresponding Cause of Death events SELECT Death.OwnerID AS [RIN Death] ,Cause.OwnerID AS [RIN Cause] ,Death.DATE AS [Date Death] ,Cause.DATE AS [Date Cause] ,Death.Details AS [Details Death] ,Cause.Details AS [Details Cause] FROM EventTable AS Death INNER JOIN EventTable AS Cause USING (OwnerID) WHERE Death.EventType = 2 AND Cause.EventType = 1021 code EventTable is given two aliases so that it can be JOINed to itself, each alias acting as an independent table with the exact same contents. The INNER JOIN of the two tables constrains the result set to only those records from the first table for which the criteria match with the second table. A LEFT JOIN would include all records from the first table.

Implicit JOIN using WHERE
Instead of the explicit JOIN command, we could have written this with an implicit JOIN as: code format="sql" ... FROM EventTable AS Death ,EventTable AS Cause WHERE Death.EventType = 2 AND Cause.EventType = 1021 AND Death.OwnerID = Cause.OwnerID code It would perform just as well either way.

Inspect the results
Now we can inspect the result set to see if we have any issues to be concerned about, primarily, more than one Death or Cause of Death event per person. This can be done by browsing the results but this could be tiring if the set is very large. A query of the query could be faster and make it more obvious: code format="sql" SELECT COUNT ,* FROM ( -- List of Death events with corresponding Cause of Death events SELECT Death.OwnerID AS [RIN Death] ,Cause.OwnerID AS [RIN Cause] ,Death.DATE AS [Date Death] ,Cause.DATE AS [Date Cause] ,Death.Details AS [Details Death] ,Cause.Details AS [Details Cause] FROM EventTable AS Death INNER JOIN EventTable AS Cause USING (OwnerID) WHERE Death.EventType = 2 AND Cause.EventType = 1021 ) GROUP BY [RIN Death] ,[RIN Cause] ORDER BY COUNT DESC code

COUNTing, GROUPing and green light
If the COUNT column is all 1's (and any values > 1 will be at the top of the list), we're good to go! The original query is wrapped inside an outer query which asks for all the columns of the inner query (the * does that) plus a count of all the records in the result set having each unique combination of the RIN for Death event and the RIN for the Cause event, the GROUP BY clause. This grouping could be simplified to count just one of the columns since the two are forced to be identical by the JOIN criterion USING (OwnerID) but I thought it would be instructive to show that multiple columns can be used to define the grouping. To put the largest counts at the beginning of the list, the query is sorted on the COUNT column in descending order.

Construct for new Death Description
Supposing that there is but one Death fact with but one Cause fact for each person, we can then proceed to copy the value from Cause Description to the Death Description. We will want a space character between the original Death description and the appended Cause description. If the Cause Description is empty, there is no point in appending anything. If the original Death desc ription is empty, there is no need for the space character.

Eliminate empty Cause Descriptions
We can eliminate any empty Cause descriptions by extending the constraints in the above query to include "AND Cause.Details NOT LIKE ''". To get a list of the EventIDs for Death events with mating non-empty Cause descriptions, the first query is revised thusly: code format="sql" -- List of Death events having non-empty Cause of Death event Descriptions SELECT Death.EventID FROM EventTable AS Death INNER JOIN EventTable AS Cause USING (OwnerID) WHERE Death.EventType = 2 AND Cause.EventType = 1021 AND Cause.Details NOT LIKE ''; code

Cause of Death event Description for any given Death event
The Cause of Death description for any particular Death event can be found: code format="sql" -- Cause of Death description for a given Death event SELECT Cause.Details FROM EventTable AS Cause WHERE Cause.OwnerID = 567 --(an example of the Death event's OwnerID) AND Cause.EventType = 1021; code

Test new Death event Description
Put together the Death Description and the Cause of Death Description with a space character between and let's see what we get. LTRIM will clear out the space character if the Death Details field is empty. The double bars || are SQLite's concatenate operator. Single quotes surround text so ' ' is one space character.

code format="sql" -- Test new Death description SELECT Death.OwnerID AS RIN ,LTRIM(Death.Details || ' ' || Cause.Details) AS "New Death Description" FROM EventTable AS Death ,EventTable AS Cause WHERE Death.EventType = 2 AND Cause.EventType = 1021 AND Death.OwnerID = Cause.OwnerID AND Cause.Details NOT LIKE ''; code Look up some of the persons by RIN in RootsMagic to review how the existing Death fact Description compares with the new one that will replace it.

UPDATE EventTable with new Death event Descriptions
Now let's revise the Death descriptions (make a backup of your database first!). We have to tinker with our queries to fit within the rules of the UPDATE command: code format="sql" UPDATE EventTable SET Details = LTRIM(Details || ' ' || ( -- Cause of Death description for a given Death event SELECT Cause.Details FROM EventTable AS Cause WHERE Cause.OwnerID = EventTable.OwnerID --(the OwnerID in the record being updated) AND Cause.EventType = 1021 )) WHERE EventID IN (       -- List of Death events with corresponding Cause of Death events        SELECT Death.EventID        FROM EventTable AS Death        INNER JOIN EventTable AS Cause USING (OwnerID)        WHERE Death.EventType = 2            AND Cause.EventType = 1021            AND Cause.Details NOT LIKE ''        ); code The OwnerID from EventTable for the record being updated is passed to the query that returns the Cause description from the record with a matching OwnerID. Only those records in EventTable whose EventID is in the list of Death EventIDs that have related non-empty Cause of Death descriptions are updated.

=Merging Events= The first half of this page addressed simply the appending of the Description (Details field) from one type of event to another. If that is all that is wanted from the one type of event, then it is a simple matter to delete all the records for that type of event. However, what if there are other elements of that event that we would like to bring over to the target event, e.g., Notes, Sources, Images? To do so involves a much more comprehensive and complicated procedure if they are not to be lost when the secondary event is deleted.

John is quite right that simply changing the event type from Cause of Death to Death could result in another problem to be addressed - now there will be two Death events where there was one before. They are more likely not complete duplicates, differing in some minor or major way, and that will make more difficult the identification of the pairs of events to be merged. Accurate and reliable pairing is fundamental to a successful automatic merging process. A general solution may require a procedure akin to RootsMagic's Duplicate Search Merge for People, i.e., a weighted scoring for similarities between events, manual selection of the primary, and manual initiation of the merge, one pair at a time.

Let's set that aside for John's case where we have but one Death event with but one Cause of Death event per person. We have demonstrated above that they can be readily paired.

What do we need to do with the data from the secondary event?
Just a bit mind-boggling...
 * 1) Date - discard in favour of Primary's
 * 2) Place - discard
 * 3) Place details - discard?
 * 4) Description - append as above
 * 5) Proof - discard or take the lower of the two
 * 6) Primary - set flag
 * 7) Sort Date - discard
 * 8) Private - discard? note that merging Death and Cause of Death forces them to share one privacy setting
 * 9) Note - append as we did for Description
 * 10) Sources - add to Primary (which opens up a whole additional investigation into merging sources!)
 * 11) Images - add to Primary if not duplicate (easier to do than Sources I think)
 * 12) Shares - add to Primary

---more to come---