The problem: I am trying to add some logic to a database refresh job that will de-identify PHI-related fields, and have come across a strange side effect. I am attempting to exclude a group of patients, in order to allow that set to see their true information, and mask all of the other sets. When I run my update statement, it also updates a large portion of that set I was trying to exclude. Why is that, and how can I change my update statement to avoid that behavior?
The data model:
- Patients table – ID as the key, and GroupID to differentiate groups of patients. Also has an IsDeleted field (bit) to indicate soft deletion.t item
- Phone table – ID as the key, and has three business fields: AreaCode, PhoneNumber, and Extension. Also has an IsDeleted field (bit) to indicate soft deletion.
- PatientPhone – Has an ID as the key, but also uses the combination (PatientID, PhoneID) to link these tables.
The relationship: Patient to PatientPhone – 1 to Many Phone to PatientPhone – 1 to Many
In order to validate before and after, I have been using this query to sample the phone numbers by grouping the first couple of digits. It gives me a wide distribution of values, as I would expect:
SELECT P.GroupID, left(PH.AreaCode, 2) as ShortAreaCode, left(PH.PhoneNumber, 2) as ShortPhoneNumber, left(PH.Extension,2) as ShortExtension, count(*) FROM dbo.Patient P JOIN dbo.PatientPhone PP ON P.ID = PP.PatientID JOIN dbo.Phone_Copy PH ON PP.PatientID = PH.ID WHERE P.GroupID = 12345 GROUP BY P.GroupID, left(PH.AreaCode, 2), left(PH.PhoneNumber, 2), left(PH.Extension,2) ORDER BY count(*) desc, P.GroupID, ShortAreaCode, ShortPhoneNumber, ShortExtension
I then run this update statement against the Phone table to default every other patient’s phone number to 555-5555:
UPDATE PH SET AreaCode = '555', PhoneNumber = '5555555', Extension = NULL FROM dbo.Phone as PH INNER JOIN dbo.PatientPhone as PP on PH.ID = PP.PhoneID INNER JOIN dbo.Patient as P on P.ID = PP.PatientID WHERE P.GroupID <> 12345 --This is the GroupID where we want to retain phone numbers
However, when I re-run my validation query, I see a couple of problems:
- Thousands of records I did NOT want to update were updated.
- Many of the other groups of Patients were NOT updated.
(Sample of the output afterwards): In theory, the data model allows for phones to be related to more than one patient. We did have some issues in the past, and most of the phone numbers have been denormalized. I need help to understand:
- How I can explain this side effect of the update statement
- How I can adjust my update logic to exclude any ‘shared’ phone numbers, so that the set of patients not in one particular GroupID will have their phones defaulted
I would tend to believe the data (phones being shared) is likely the cause of this problem, but I am having trouble proving that. Can someone help me with this? Thanks for your time.
✓ Extra quality
ExtraProxies brings the best proxy quality for you with our private and reliable proxies
✓ Extra anonymity
Top level of anonymity and 100% safe proxies – this is what you get with every proxy package
✓ Extra speed
1,ooo mb/s proxy servers speed – we are way better than others – just enjoy our proxies!
USA proxy location
We offer premium quality USA private proxies – the most essential proxies you can ever want from USA
99,9% servers uptime
No usage restrictions
Perfect for SEO
We are working 24/7 to bring the best proxy experience for you – we are glad to help and assist you!