Securing Identity Management - The Privacy Mandate
RLS - The Traditional Weak Link in the RHIO Security ChainThe centralized store of accessible demographic information employed by most RHIO implementations creates an unacceptable security risk for any RHIO. Data aggregation accentuates three critical risk factors that increase the potential that sensitive information will be improperly disclosed:
First, data aggregation increases the value of the centralized store creating a lucrative target for potential attackers.
Second, it increases the number of entities that legitimately should have access to the central store; this in turn increases the number of avenues that can be compromised by attackers.
Third, a centralized store of sensitive data can become a valuable resource that may be susceptible to political pressure for legalized access by interests claiming a need to know. A concerted effort by the government to obtain data from the large Internet search engines is a compelling example of this third risk factor.
Blinded Record Linkage – The Solution
Methods must be deployed that can strongly secure this centralized data store. The CareEvolution RHIO Technology Platform provides a solution for this challenge. The CareEvolution RTP achieves a secure, performant solution to record linkage in the distributed system by using a blinded directory for centralized demographic data used in record location. A set of techniques are implemented to cryptographically (one-way) hash any demographic data that will be aggregated centrally. This ensures that patient demographic data stored in the centralized index is unrecoverable. There are two direct results of hashing the centralized index : World Class Security - From any plaintext string (i.e. “Smith”), a one-way hashing algorithm can quickly produce a long sequence of numbers (a “hash”), which represents the string “Smith”. However, to take this hash and reverse the algorithm to arrive at “Smith” would require years of computation, hence the term “one-way” hash.
Record Linking Challenge – Since hashes of similar strings, such as “Smith” and “Smit” yield drastically different number sequences, the very process of hashing renders the traditional preferred probabilistic record linking techniques inoperable. As a result, all contemporary providers of MPI or Identity management solutions have avoided the formidable technical challenges post by a crypto-hashed central directory. While this may have been acceptable when such solutions were intended to be implemented behind the security firewalls within an institution, we believe that extending a non-hashed centralized repository of demographic information across a region, let alone the country, poses an unprecedented and unwarranted privacy risk.
There is a solution, though it is not technically straightforward. Sophisticated string processing techniques are available that allow for both the security of one-way hashing and effective probabilistic matching. Approximate matching in this scheme is accomplished using a technique called bigramming. Bigramming breaks up the source string into many derived strings. Each derived string is given a similarity score that indicates how similar it is to the source. Two strings that have been bigrammed can then be compared by determining if they share a derived string. If so, the two derived similarity scores can be used to compute an overall “dice score.” Using a bigramming technique to generated derived strings, and then hashing derived strings allows for approximate, blinded identifier matching. This is the technique employed by the CareEvolution Crpto-Record Locator Service (Crypto-RLS)


