LinkSolv 8.3 Help Pages and User Guide

How to Make Prior Estimates

 
  1. Linkage model should capture everyting you know about the data. Try to use all of your prior knowledge when you set up data specifications and match specifications so that calculated probabilities will be as accurate as possible. The examples discussed here linked fake Crash records to Hospital records but the guidelines apply to real data linkages as well. The oversights described are common mistakes.
  2. Total Matches. Total Matches was set to 1,200 based on an earlier linkage even though reported data suggested 1,830 true links for the current data sets.
  3. Error Probability. Error probabilities for each field were left at default values of 0.01 when preparing data sources Crash and Hospital even though fake data fields had been simulated with error probabilities of 0.02.
  4. Probability Different. It was expected that Hospital County would not always equal Crash County and that Hospital Hour would not always equal Crash Hour but probability of correct but different was left at default values of 0.00 for both fields.
  5. Probability Different. Prior estimates of model parameters were not accurate because they did not incorporate all prior knowledge. Consequently, linkage counts by decile showed poor goodness of fit:
    CrashHospital__Fit10
    (Poor Use of Prior Knowledge)
    Chi Square p Value = 0.03
    Decile
    PairsInDecile
    ActualTrue
    ExpectedTrue
    1
    193
    17.33
    7.64
    2
    194
    128
    100.67
    3
    194
    188.67
    187.04
    4
    194
    194
    193.49
    5
    194
    194
    193.92
    6
    193
    193
    192.98
    7
    194
    194
    193.99
    8
    194
    194
    194
    9
    194
    194
    194
    10
    194
    194
    194
 
Bayesian Model Check. The Bayesian Model Check Report provided clues about errors in the linkage model. Observed Actual True equaled about 1,691, much greater than 1,200 specified. Observed combined error probabilities for County and Hour were 0.156 and 0.133, respectively, much greater than 0.02 specified. The linkage model was revised to reflect more prior knowledge. Total Matches was set to 1830. Error probabilities were set to 0.02 for Crash fields and Hospital fields. Probability of correct but different was set to 0.10 for County and Hour fields (based on prior anecdotal evidence, not derived from linkage results). After the revisions, observed Actual True = 1,738 and the linkage model has much better goodness of fit:
CrashHospital__Fit10
(Good Use of Prior Knowledge)
Chi Square p Value = 0.80
Decile
PairsInDecile
ActualTrue
ExpectedTrue
1
276
1
1.80
2
276
7.33
4.27
3
276
18
12.13
4
276
77.33
84.45
5
276
253.33
259.25
6
276
276
275.71
7
276
276
275.98
8
276
276
276
9
276
276
276
10
277
277
277
 
Find Missing Links. One strategy to correct the shortfall in Actual True (1738 is 95% of 1830) would be to consider an additional match pass using County and Home Zip as join fields. County and Home Zip are equal on 55 of 92 missing true links.
 
Authored with help of Dr.Explain