9. Deeper Insight Into Healthcare Data and Data Sovereignty
Last updated
Last updated
Over the past decade, the world has talked about big data, machine learning, deep learning, LLMs (Large Language Models) and many other data-related innovations and possibilities, so most people know that data matters.
In healthcare, in particular, utilizing data measured scientifically and systematically has been in place for decades. A prime example is clinical trials, which use data to evaluate the safety and effectiveness of new treatments. Typical data generated during clinical trials are lab values measured by hospital testing equipment, gene sequences, and more. These data can be organized into well-structured spreadsheets that can be used to validate the efficacy and safety of certain medications, discover new statistical findings, and more.
In recent years, healthcare data has included data generated in hospitals and data that can be collected from wearable devices, smartphone sensors, patient self-reporting, and other sources that are part of a patient's daily life outside the hospital. These different types of data can be used to create a holistic understanding of a patient's health, considering factors beyond prescribed medications. This is called real-world data, and it's gaining traction with recent research into COVID-19 vaccines () and digital therapies ().
As mentioned above, data is already playing an important role in healthcare. In the next chapter, we'll explore the specific types of healthcare data and the challenges of acquiring and utilizing it.
There are many definitions and categorizations of healthcare data. In this chapter, we will introduce the concepts and methods of categorizing healthcare data used in this paper.
Most major countries, including the U.S., include personally identifiable data as an important criterion for data classification. While it's important to avoid the risk of privacy breaches, health data, in particular, can combine various valuable data to drive new innovations that improve patient and individual health. This requires a delicate approach that distinguishes between privacy and utilization.
The most prominent laws that follow this approach are the HIPAA/HITECH laws in the U.S. These two laws set out fundamental principles for the protection and use of health information and categorize health information into three categories. Health information that does not fall into one of these categories is still basically subject to general privacy laws.
Data type
Identifiability
Consent for use required
Use for research purposes
PHI(Protected Health Information)
O
O
Possible after IRB evaluation
DHI(De-identified health information)
X
X
Possible after IRB evaluation
LDS(Limited Data Sets)
X (applying somewhat relaxed condition)
X (Exempt for research purpose)
Possible after submission of a non-identification agreement and IRB review
PHI (Protected Health Information)
PHI is defined as individually identifiable health information that is created, collected, transmitted, or maintained by a healthcare entity, payment entity, or healthcare-related entity that is covered by HIPAA and that includes information about an individual's (1) past, present, or future physical or mental health condition, (2) health insurance information, or (3) medical expense status.
PHI must be utilized, corrected, and exported for purposes other than treatment only with the patient's consent, except in some exceptional cases such as public interest. Research organizations and others may utilize protected health information for research purposes through an institutional review board (IRB).
DHI (De-identified Health Information)
DHI is recognized under two methods: 1) the Safe Harbor method and 2) the Expert Determination method. The Safe Harbor approach removes the 18 types of identifiers listed below. The subject of the Expert Determination method is a person with appropriate knowledge and expertise in the field of statistics or science regarding identifiability or identification methods. They must determine that the information poses a very small risk of identifying an individual, even when combined with other information, and document their reasons and findings.
Institutions regulated by HIPAA are prescribed that they can use and release DHI freely. Nonetheless, in this procedure, if the information is identifiable, it is considered PHI.
Identifiers Type:
Name, Address, Dates related to an individual(date of birth, date of insured, date of terminating the insurance, date of death, etc), Contact number, VIN(Vehicle Identification Number), Fax number, Device identifiers and serial numbers, E-mail address, Online access address(URLs), SSN(Social Security Number), Internet access address(IP), Medical record number, Biological fingerprint or voiceprint, Health plan beneficiary number, Photographic image, Bank account number, Suggested information to be re-identifying possible, Certification/qualification information, Possibly recognizable information moreover.
LDS (Limited Data Sets)
LDS is similar to DHI under the Safe Harbor approach in that it is information that has been stripped of identifiers, but it is subject to more relaxed standards and may include some date information (date of birth, date of admission, date of discharge, etc.) and information such as zip code and place of residence (state, city).
Instead, it requires users of the information, such as researchers, to submit the consent prohibiting data re-identification that outlines how they intend to prevent data abuse, and stipulates that if the information is used for certain purposes (research, public health, health care delivery), it can be used without the patient's consent and after going through an IRB. In other words, it puts the onus of re-identification on the user and makes it easier to put the information to valuable use.
In addition to individually identifiable possibilities, there are many other ways to categorize data, such as whether it is structurable, who created it and how, what it is used for, and what it is about. However, rather than applying strict classification criteria or describing all types in detail, this whitepaper focuses on introducing representative types that are important in terms of their use value and helping you understand how each data type is utilized.
Clinical Data
It is the most representative healthcare data and a type including patient information generated when medical centers like hospitals and so on proceed with the diagnosis, injection, running tests, surgery, etc. Therefore, from the structured test numerical value to medical records to the digital screening and image(X-ray, CT, MRI, sonogram, endoscopy, etc) written in natural language, various detailed items are existing.
This information is called EMR(Electronic Medical Record) when saved electronically. Furthermore, the total personal medical information stored in many places is called EHR(Electronic Health Record). Clinical Data is PHI(Protected Health Information) at the generating stage, and strictly prohibited to access and utilize this data for other institutes except for the patient under the duty and responsibility of medical centers to store under the law.
Omics data
It means a data set of total concepts including the biomaterials like genome, transcriptome, proteome, metabolome, and microbiome. These biomaterials each have distinctive features and expect to be personalized medical services when the related data can be accumulated and analyzed on a large scale.
Genome data is the most representative omics data and means data to represent a genetic code recorded on DNA deciding the personal features through sequencing listing by combining alphabets A, T, G, G like a cryptogram. In fact, analyzing genome data seems like decoding the cryptogram, and the main task is to analyze what makes a difference between individuals depending on a single or plural nucleotide at a certain spot. In particular, more than 80% of the cause for the rare disease is a genetic mutation, so decoding the cryptogram is important to figure out the gene causing the disease.
Recently, the progress of technology for machine learning and analyzing big data can make it possible to utilize clinical data and analyze it complexly. Through this, it is possible to make an early diagnosis and utilize for finding a biomarker used for predicting and measuring the treatment reaction.
PGHD (Person-generated Health Data)
Without depending on the external institutes, it is data generated from the various sensors from wearable devices, cellphones, etc, possessed by patients or individuals or data including self-uploaded postings on SNS or surveys. These data have a feature to be collected frequently from routine life without visiting hospitals.
SDOH (Social Determinants of Health)
SDOH is data affecting the health among decided social or economic external factors by nature like population statistics information, social or political factors, climate, or environment.
Research Data
It is data generated to develop the new treatment method from the laboratories and pharmaceutical companies or hospitals related to Medicine, Pharmacy, or Life Science. Typically there is data to be the result of clinical trials or research. Clinical data or Omics data generated already can also be research data in case of being re-utilized or collected for the research.
Mostly, research data is scientifically strict, designed for systematic collecting, and verified by academia and reviewing institutions. Also, before conducting the research, it is distinctive to have high-quality data enough to get reviewed regarding the legality and suitability for data collecting subjects and collecting methods from reviewing committees like IRB, etc.
Other Data
Moreover, there is meaningful data when combined with other healthcare data and analyzed even though it is not related to health itself, like personal payment information. For example, payment details of personal regular fitness centers seem not to be related to health itself. However, certain health-related figures are improved or worsened. We can try to predict the change in the personal health index by relating and analyzing the payment information.
Like this data, its value can be much higher when combining different data types. Therefore, when certain data is combined with the generally-known healthcare data, it is a very important future task to utilize it worthwhile.
Data is being used to enhance health outcomes and pave the way for new advances in disease treatment. However, despite the immense amounts of money invested and the potential benefits promised by technologies empowered by big data, the actual results have not met expectations. The primary reasons for this shortfall include a lack of data that is reliable, collected over an extended period, and correlated.
In simpler terms, beyond artificial intelligence or big data technologies, the quality and quantity of the foundational data are crucial for innovation in healthcare. In this chapter, we will explore the challenges associated with obtaining sufficient, high-quality healthcare data.
Type 1. Difficulties in Combining and Analyzing Data Due to Anonymization
An individual's medical information is one of the most sensitive types of personal data. It is increasingly mandated by laws worldwide to be protected in very rigorous ways. The most prevalent protections are pseudonymization and anonymization. These processes de-identify data, making it challenging or even impossible to identify an individual, thereby reducing the risk of harm from data breaches or misuse. De-identified data, when securely anonymized or pseudonymized, can be freely used for certain purposes, such as research to develop new drugs and treatments. Consequently, major countries are enacting legislation that allows data to be used for valuable purposes while minimizing the risk of personal identification.
Nevertheless, these privacy measures inevitably pose limitations when it comes to extracting value from the information. When data sets are combined, they can be analyzed more comprehensively, thus creating new value. However, pseudonymized data either abstracts or categorizes data values. For instance, a 33-year-old individual is described as being 30 years old, a person weighing 87 kilograms is described as weighing between 80-90 kilograms, and so forth. Depending on the purpose of data usage, this can be inappropriate. Furthermore, combining data sets often enables us to achieve results that would not be possible with individual datasets alone, and anonymizing data makes this very difficult.
Type 2. The difficulty transmitting and utilizing data under the graded protection based on data generating location
Even if the data is not personally identifiable, it may face rigorous protection simply because it was generated in a hospital or pertains to genetic information, thus creating significant obstacles in its practical use. For instance, as of November 2022, in South Korea, if a patient measures their blood sugar at home with a personal device, it is categorized as health information and can be freely transmitted and used for any purpose. However, if the same blood sugar information is collected in a hospital and stored in an Electronic Medical Record (EMR), it falls under the Medical Act's jurisdiction. In such a scenario, even if a patient requests it, the hospital cannot directly send the data to another organization providing blood sugar analysis services. Currently, the only way to transfer this data is by the patient personally visiting the hospital, receiving the information, and delivering it to another organization.
Data subjects should have the self-determination right to decide who will know their information, to what extent, and how it will be used. For general personal information, laws such as the Personal Data Protection Act guarantee this right. However, medical information is governed by the Healthcare Act, which only guarantees self-determination rights as part of the data subject's data portability right (access rights, the usage of structured data formats, and the right to request transfer to a third party).
The frustration originates from how organizations, like hospitals, manage and account for patient data. While the original intention was to protect sensitive medical information, it inadvertently made it challenging for patients to access quality medical and health services, as they cannot transfer or aggregate their data to other organizations at their discretion. Consequently, patients' medical data becomes fragmented and stored by individual healthcare organizations. In such a situation, it becomes virtually impossible to provide customized services such as precision medicine for each individual patient. Thankfully, laws concerning 'my data' have first been implemented in the financial sector, and discussions are in progress to enact legislation to actualize the portability right and complete self-determination of personal medical information.
As explained above, the extent of data currently available for use without explicit patient consent is limited to a select few purposes, like research and statistics. Additionally, the quality of such data may be compromised. To amass as much data as possible without these issues, it's essential to inform and secure consent from data subjects such as patients, concerning the types of data to be gathered, its purpose, and the usage terms.
Type 1. Consent Acquisition Issues
Nevertheless, the cause of such "inadequate consent" is not necessarily attributed to the organization's ill intentions. It could be due to the complexity of the language used in terms of service and privacy policies, which could make it difficult for most people to comprehend. Alternatively, the very act of consent, intended to protect privacy, might paradoxically become burdensome for individuals, causing them to agree or disagree mindlessly, rather than ensuring the terms and conditions align with their best interests. Hence, even if an organization's intention is to better safeguard privacy (at least in terms of complying with the law in good faith), the outcome might still be inadequate consent.
On the flip side, a patient's understanding of how their data will be beneficial and the potential risks involved could also influence consent acquisition. The greater the perceived personal benefit from using their data, and the more they comprehend the risks, the more likely they are to provide informed consent.
Type 2. Needed to provide the authority to control the data after consent
On January 20, 2020, the U.S. Department of Health and Human Services (HHS) altered the Common Rule, allowing for the secondary use of data without additional consent, even if the data is identifiable and not used for research purposes. This condition is fulfilled if an informed, blanket consent is obtained initially. This modification aims to boost research efficiency and data value by eliminating the time and cost involved in acquiring patient consent each time, except when specific risk factors are present. It could also be beneficial to gather data with consent for a broader purpose, as plausible uses often cannot be anticipated until after the data collection.
However, it's crucial to offer patients access to complete histories of their data use and disclosure, as well as the right to revoke consent for the use of their data. An alternative approach is to implement a dynamic-consent system that collects data upfront but gives patients the chance to view more details at the usage point, and the option to either opt-in or opt-out at any moment, even after consent has been granted.
Securing informed consent and preserving the right to manage data post-consent is vital for protecting privacy and extracting value from data utilization. Achieving this will deliver a positive experience in terms of information transparency and system trust, expectations that will only increase in the future. This is a crucial consideration for organizations aiming to acquire patients and users, regardless of the legal dimensions. Thus, solutions that facilitate informed consent-based data management and usage are required from the perspectives of patients, organizations aiming to use data, and organizations managing data on behalf of patients.
Required actions for sharing data
Data Structuring and Standardization
Clinical data often presents inconsistencies in the terminology used in unstructured text formats. The challenge here is to structure this data so that it is comprehensible to a computer.
Additional efforts should be made to standardize data types, terminology, and formats to facilitate collaboration via data sharing.
Introduce searchable metadata to determine the existence of duplicate data and to identify combinable data.
Quality Control
Identify and rectify mistakes such as patients with multiple health conditions only entering the diagnoses necessary for insurance claims, or unintentional omission of information or erroneous entries during manual record keeping.
Endeavor to resolve issues of accuracy with measurement devices, including inconsistent results depending on the proficiency of the user.
Storing Data
Employ technologies that facilitate the storage, management, and transfer of data in a compact form, making it easily reanalyzable.
The concept of incentives is closely tied to consensus on data rights. It's widely agreed that patients should have autonomy over their data. However, the idea of ownership, which is intricately linked to incentives, is more complex. Generally, the notion of ownership applies to physical assets like real estate or objects. For intangible assets, there are intellectual property rights like copyright and patents, but these are only recognized when there's creative input involved. Therefore, copyright only applies to compiled databases, not to raw information or data as such.
Creating a patient's medical data involves substantial work. Beyond the basic data primarily gathered by healthcare professionals and machines in medical institutions, a significant portion is produced through expert evaluation or interpretation, such as diagnoses or positive-negative test results. Furthermore, several steps need to be taken before the data can be shared and used in a meaningful way. Sometimes, additional effort is needed to merge separate data. The outcome is a dataset compiled with significant investment and expertise from medical professionals.
Moreover, it's worth mentioning that healthcare data is public, supported either by a health insurance system or a publicly-funded healthcare system. Therefore, rather than securing exclusive revenue and usage rights for a specific entity, it would be more advantageous for the public and the data subjects to guarantee non-rivalry, that is, one entity's use of data doesn't limit its usage by others. This would permit more extensive data use.
Currently, there's no reliable method to document this history of ownership, data sharing, and usage, and to make it openly available and usable for all stakeholders.
Instead of making individuals trust a company (or agree to terms to use a service) and entrust their personal data to be managed on their behalf, self-sovereign identity technology allows companies to request the individual's consent to access and use their data. Crucially, the individual maintains full control over their information.
This represents a paradigm shift in which individuals can actively decide to share only the information they want and control its use. Furthermore, it creates a landscape where hackers can only attack one individual at a time. In contrast, previously, they only needed to infiltrate a centralized server once to steal vast amounts of personal information, ranging from dozens to millions of records. This reduced incentive to hack individual data makes the system ultimately more privacy-protective.
Leveraging self-sovereign identity technology, Hippo Protocol aims to actualize the concept of 'MyData'. This concept permits individuals to exercise self-determination not only over their personal identity information but also their medical data.
The MyData model addresses the complications of convoluted API and platform models that lack incentives for data interoperability. Source: mydata.org
DID (Decentralized Identifier) offers a solution to these issues. A DID is a unique identifier, created independently using mathematics and cryptography, and can be used freely across the internet. The concept is akin to assigning a unique number to every atom in the universe and randomly selecting one to be given a password (private key). This key enables control over the ID generated from that unique number and its associated information. Personal account details and associated data are stored on a blockchain, with access and control limited to the private key owner. This approach enables us to construct and manage our identities independently, free from government or corporate influence. This method is particularly advantageous in healthcare, a sector necessitating high privacy levels. As long as all computers and the internet connected to the blockchain persist, our identity record and control over it can remain secure, thus actualizing the self-sovereign identity envisioned by Hippo Protocol.
9.4.2 VC(verifiable Credential): All things proving myself
Birth certificates, college degrees, passports, driving licenses, employee IDs, gym memberships, hospital registration cards, and prescriptions all describe and validate specific details about me. For instance, when I visit a pharmacy and show my prescription, I am able to demonstrate that a specific medication for a particular condition was prescribed to me by a certain doctor at a particular hospital. This reassures the pharmacist to dispense the medication with the certainty that the prescription is valid, and the receipt I get from the pharmacy affirms that I indeed received the prescribed medication. When these documents are collectively submitted to your insurance provider, they enable you to make a claim based on a validated record.
A Verifiable Credential (VC) comprises specific information that describes and validates certain facts about you. The data contained in a VC includes the issuer (the DID of the issuer), the subject of the credential (the DID of the data subject), and the claim being made (such as age, relationship, diagnosis, etc.), and the holder (the DID of the holder, often the same as the data subject, but a guardian could be the holder if the data subject is a minor). All this information must be verifiable, including who issued it, whether it has been tampered with, and whether it has expired or been revoked.
The traditional physical credentials referred to in the initial example are all susceptible to forgery and pose significant challenges for online verification. To address this issue, separate attestation devices and verification authorities like signatures and holograms have been used, but they fall short in privacy protection and have many cost and technical limitations for online use on a global scale. As VCs are issued on a blockchain that can be transparently verified by anyone, they can be authenticated online much quicker and at a significantly lower cost. Owing to these advantages, DID and VC were developed with financial support from the US Department of Homeland Security among others, and were adopted as open global standards in July 2022.
The importance of VC is also evident in realizing the collaborative healthcare data ecosystem that Hippo Protocol aims to establish. If patient data could be verified without the need for an intermediary, it would minimize the friction in the distribution and utilization of the data, thus cultivating a more dynamic ecosystem.
As previously discussed, blockchains are most effectively used to store a minimal amount of data in a highly secure and trusted repository, like an asset register or an identity ledger. However, while certain information can be recorded on a blockchain, the storage of large amounts of healthcare data, such as PGHD (Patient-Generated Health Data), which can span from a few hundred GB of genomic data to several TB, is neither practical nor necessary on a blockchain.
Secure data exchange can be executed in various ways, and the methods outlined above are not the only ones that Hippo Protocol will consider. We plan to design the protocol in conjunction with the community to ensure it remains open and receptive to improved solutions.
Organizations or entities that issue data can do so via verifiable credentials (VCs). This issuance can either be done individually upon receiving requests from data subjects, or in bulk by inputting the subjects' DIDs into an internal admin page. Either way, it requires transforming the data model so that internal agency data can be released in the form of VCs. Currently, VCs can be issued in two syntactic representations: JSON-LD and JWT. Alternatively, one can encrypt the data file with the data subject's secret key, upload it to distributed storage, and incorporate the file's hash value into the VC without transitioning to these data models.
Another significant detail included in the VC is the commission rate that will be allocated to the issuing organization's share when the issued data is sold. Up until now, data issuing organizations have had no feasible way to earn from their share of data that exits the organization. However, with VCs, all data is subject to the patient's decision to circulate, and any data that is traded for a fee with the patient's consent is automatically divided between the patient and the issuing organization. This enables the issuing organization to gain incentives from data use, besides the data issuance fee. This mechanism motivates issuing organizations to prepare and issue more reliable and usable data.
Organizations developing AI healthcare solutions, leveraging existing ones to offer data-driven healthcare services, or aiming to screen clinical trial participants, can use Hippo Protocol to harness trusted, consented data. The first step requires preparing a data wallet (Data Hippo) and a Decentralized Identifier (DID) for the organization intending to use the data. They must also integrate a software development kit (SDK) tailored to data utilization into their internal services, readying them for data usage.
The process of data enrichment involves linking the individual's DID with the utilizing organization's DID. Typically, the individual is asked to scan a QR code for login, authentication, or connection purposes, at which point they provide their consent. The organization doesn't need to request all data at once. Initially, it can request only the information needed for basic service usage. Further data requiring a higher level of user consent can be requested separately. This method ensures that the right data is available at each user conversion stage.
In order to obtain consent, the organization must clearly communicate to the individuals who they are, what permissions are being requested, and what data will be used under what conditions. This is similar to presenting and obtaining legal notices for data collection and use, like conventional terms and conditions and privacy policies. However, the key difference is that organizations can use standardized terms certified by a governance framework.
This approach has several advantages. Individuals aren't burdened with fine-tuning legal notices every time, as discussed in the management of consent (signatures) in the data wallet. Furthermore, organizations can adopt licenses that meet specific compliance requirements for different countries and contexts. This significantly reduces the cost and time associated with legal review, making it easier to obtain informed consent.
Data user institutes can use the standardized protocol of Hipp Protocol DAO and license through CompliantData SDK.
The features mentioned above could initially be implemented in Data Hippo. Along with the use of a data utilization SDK, patients could access healthcare and community services through the data stored in their personal data wallets. This SDK is freely integrable into other for-profit and non-profit products, requiring no contracts.
Data Hippo could be the initial use case for the Hippo Protocol ecosystem, enabling patients to take advantage of their data wallets. Data Hippo offers trusted, personalized information, healthcare solutions, and community experiences to patients. This is achieved by using data gathered with informed consent. Additionally, patient-derived health data generated during service use, as well as clinical data submitted via the wallet, can be processed into a format that can be used by other data-requiring organizations like pharmaceutical companies. This will allow for value-added data sales. The entire process is based on patient consent, and the revenue generated serves as a shared compensation source between the patient and the data issuing organization. This promotes sustainable data compensation and usage.
To make this scenario work, conditions for data use and compensation are included in the patient's consent. Depending on the degree of privacy disclosure, data can be categorized into protected health information, de-identified health information, and limited datasets. Alternatively, it can be classified into identified health information, anonymous health information, and pseudonymous health information. Generally, the greater the privacy level and information requested, the higher the reward. However, this also increases the likelihood of a patient declining to protect their privacy. Consequently, organizations wishing to use the data will strive to obtain only the essential data required to persuade patients. Moreover, since data creation requires significant public resources, the system can be designed to distribute a larger reward percentage for public purposes, the more pseudonymous or anonymous the information is and the more it's used for scientific research. This mechanism could potentially enhance the quality of public healthcare.
The Hippo SDK is an open-source development kit that empowers data subjects to construct data wallets that govern their identity, data acquired from organizations, and rewards earned from data sharing. Keeping a person's data in a data wallet is akin to holding a key card or receipt that grants access to that data, rather than storing the original data itself. This is comparable to owning a wallet that contains not just cash or credit cards, but also IDs, membership cards, tickets, blood donation records, keys, receipts, and more, which can be accessed whenever needed. If you misplace your wallet, you lose everything in it, indicating you have complete control and, consequently, full responsibility. The same principle applies to data wallets.
In this section, we'll present the primary features we aim to incorporate in the Hippo SDK. The majority of these are grounded on open standards and open source, enabling us to add value centered on the problem Hippo Protocol is tackling, rather than reinventing the wheel, using already proven technology. This strategy minimizes potential intentional and unintentional bugs that might occur in new software while still enjoying the ongoing enhancements and innovations facilitated by open standards. Moreover, it allows the user's assets and data in a wallet developed with the Hippo SDK to be accessed through various applications, augmenting the user's utility.
The Hippo SDK's fundamental function is to securely generate and store the private keys required for accessing and managing your assets, data, and decentralized identity (DID). Owing to its compatibility with the Cosmos SDK standard, it offers the same level of security as most blockchain wallets for private key generation and mnemonic code recovery. Hence, the Hippo SDK can also be utilized to implement basic blockchain wallet functionality. The device's trusted execution environment (TEE) can be deployed for added security in private key storage.
Once a DID is established, it facilitates peer-to-peer connections with the organization accessing your data without any intermediaries. All data and messages shared between the two parties are encrypted end-to-end, ensuring that only those involved can view the contents, thus minimizing the risk of privacy breaches during communication. The initial connection is usually initiated by scanning a QR code or clicking a button on the organization's app or website. Users then verify and approve information about their connection partner, including permissions and data requests. Once connected, the relationship is remembered as a trusted association unless terminated by either party.
In the healthcare context, hospitals often require patients to register as new patients during their first visit. This involves pressing a registration button and scanning a QR code with their data wallet, which requests them to share their legal identity: name, social security number, photo, gender, etc. If the patient consents, a hospital representative verifies the patient's identity from their data wallet, and the patient is registered.
This approach eliminates traditional logins and other authentication methods, eradicating the need for users to generate and remember usernames and passwords for every service they wish to connect to. It allows users to log in, authenticate, send and receive assets and data, and more, simply by recognizing a QR code. The data wallet application can also use additional security measures such as PINs and biometrics in tandem with the private key, providing effective multi-factor authentication (MFA) and a much higher level of security than typical login methods.
When linked to an entity, say a hospital, through a data wallet, original data or its certificates can be mutually exchanged as per your request or when the other party sees fit. Usually, when a hospital dispenses data like medical records, they undergo an authentication process to confirm the patient's identity before issuing it as a VC to the patient's data wallet. A notification and a message alerting of the new data issuance requiring authorization is then received by the data wallet. Upon authorization, the data becomes accessible. Unlike conventional cloud storage, the user's encryption key secures all data.
Such data can be showcased wherever required. For instance, a data-driven healthcare service might ask the patient to scan a QR code to obtain particular data necessary for service provision. On scanning the QR code, the patient views data usage details and the terms of use by the service provider. If approved, the data is shared with the service provider, who then verifies if the data hash matches the issuer's hash and if the issuer is trusted. Upon validation, the patient gets the necessary service. This process, though intricate, is handled by automated software at high speed, making it seem like a simple authentication process for the patient.
In certain situations, you can choose to share only some parts of your data, avoiding the need to disclose potentially personal information. Consider a case where you need to verify that you're the legal guardian of your child. Initially, you have information about your child and your relationship with them in your data wallet. A representative at the hospital will request you to scan a QR code to confirm your guardianship. After scanning and authorizing the QR code, the system only confirms your registration as the patient's guardian, without exposing any personal data. This approach, called zero-knowledge proof, is sufficient because the hospital staff only needs to verify the actual guardian and not their personal information.
Consent only requires the user to click the consent button. However, with a data wallet, you can easily view what you've consented to, and if you want to withdraw consent, you can do so at any point. Traditional methods often require cumbersome processes or even separate forms to withdraw consent. In contrast, a data wallet offers maximum control over your data even after you've given consent.
Furthermore, any data a user consents to share can be signed with the user's cryptographic key as a type of watermark. This ensures that any organization possessing data without this watermark will have to prove their legal acquisition of it. This strengthens the security of data circulation.
The inherent issue with traditional consent acquisition is that users are "forced" to accept terms and conditions to evidence consent, which primarily safeguards the service provider rather than the data subject. Lengthy and complex terms of service and privacy policies often discourage users from meticulously reading and assessing them, leading to 'insufficient consent'.
To tackle this, Hippo Protocol will introduce a standardized policy license that balances data subject protection and reasonable usage. If multiple services share identical privacy policies, users can be assured of uniformity without painstakingly reading each one. Once read properly, users can accept the same privacy policy license across different services, even setting it to auto-accept if desired. This enhances user convenience, protection, and companies' consent acquisition rates.
Any non-standard terms or previously unagreed terms in the policy will be isolated for separate consent. Users only need to review the changes or additions, fostering more confidence in their consent decision. These licenses will be overseen by a decentralized governance framework to guarantee trustworthiness, which will be elaborated on under Governance and DAOs.
Healthcare data, to reach its full potential, needs to be collected and utilized on a global scale. This can be accomplished by providing data subjects with value-added services that arise from the use of their information, or, where there is no immediate service to offer, by compensating them accordingly. In this regard, it's essential to support assets that enable quick and cost-effective data transfer on a global scale over the internet and to store and exchange it via data wallets.
Hippo Protocol mainnet coin $HP and dollar-pegged stablecoins serve as the best assets for this purpose, particularly with the institutional adoption of stablecoins, which facilitates the transmission of small amounts of money worldwide at credit-card-payment speeds, with fees of less than 0.0x dollars. Moreover, IBC (Inter-Blockchain Communication Protocol) will facilitate the management and transfer of stablecoins and $HP across the Cosmos ecosystem. Consequently, Hippo Protocol and the Hippo SDK will be developed to align with these technologies, enabling global rewards through data collection and utilization in a form of assets preferred by users or businesses.
Since data wallets contain an individual's valuable assets and data, a secure method to back up and restore them is also crucial. By default, users can back up the mnemonic code displayed during the creation of a data wallet by BIP-39 by writing it down in a secure location. However, this method may be unfamiliar, and many individuals might be uncomfortable bearing full responsibility for it. To address this concern, we plan to support the encryption of your mnemonic code with a password of your choice and storing it in your preferred personal cloud storage, such as iCloud or Google Drive.
While this method is generally unsuitable for storing large sums of money, it may be an acceptable compromise for most data wallet users, as it reduces the risk of losing the mnemonic code. Of course, users can opt for a more secure method and avoid storing it in private cloud storage. As we adhere to blockchain wallet standards, methods to enhance the security of blockchain wallets, such as multi-sig wallets and adding passphrases, can be implemented in almost the same way.
Notifications are necessary to ensure that users see required notices or consent requests in a timely manner. To effectively get the user's attention, we will only utilize the device's system notification features when consent and signatures are required.
While wallet owners and data subjects are typically the same, many patients may find it difficult to manage their own data wallet for reasons such as health or technical literacy. In these cases, you can provide the ability to designate a proxy at the data wallet level to allow someone else to make decisions about consent and data sharing on behalf of the patient. In this case, the wallet user with the proxy status can have the ability to store and control the data subject's data in the wallet on their behalf.
These representatives can be individuals, such as guardians, as well as entities. Extending this functionality, it is also possible to implement a guardian to take over the patient's assets and data upon the patient's death. While further verification is required to ensure that this is permissible under national laws, identity authentication for representatives and guardians can be implemented electronically through the verifiable credentials (VCs) described earlier.
Claim data, derived from clinical data, is based on the information submitted when making an insurance claim from the medical center to the insurance institute. Here, the patient's privacy, diagnosis, and medication information are included. In Korea, the single insurance system is adopted, HIRA(Health Insurance Review & Assessment Service) and NHIS(National Health Insurance Service) established and have opened the public data based on the whole nation's data (,, etc). Korean pharmaceutical company, HK Inno-N utilized this and launched a new medicine called K-CAB for gastroesophageal reflux disease().
PGHD seems to be not related to the disease, but it is possible to find new discoveries about the disease when it combines with clinical and other data. Actually, in recent new medicine clinical trials, it tends to keep trying actively to utilize PGHD as RWD(Real-world Data)().
is an actual case to utilize SDOH data. This project regulates the social or economic factors (education, job, home, income, social safety), physical environment, health(smoking, eating habits, alcohol, sexual life), and public health(access to medical center) as the staple factors and aims to analyze the influence on health.
Research data is necessary to cooperate with the various institutes to secure enough participants for research progress. It is not easy to communicate with people who speak different languages. If other institutes use the titles and units for equal data differently, it will be difficult to communicate and cooperate. Therefore, research data is mostly well structured and collected under the unified rules between institutions conducting the research together. Consistently, an effort on standardization like CDM() is continuing the integrative analysis for patients' data accumulated through integrative analysis or various types of research targeting the internal data from different hospitals.
Securing this consent is a fundamental prerequisite for any organization seeking to use the data. Consequently, these organizations will aim to get patient consent to maximize the unrestricted use of data. However, the downside is that consent might be procured in a way that inadequately safeguards the data subject. In fact, both EU () and Korean courts () have ruled that passive consent, such as via a pre-selected checkbox, or consent gathered in a manner that is not easily recognizable to the data subject, is invalid.
Major countries are implementing laws to strike a balance between privacy and data utilization through patient data self-determination. A notable instance is the in the U.S., which mandates healthcare organizations to ensure that patients' medical information, stored within these organizations, is interoperable, and that patients can access, exchange, and use their medical information in their chosen applications. Non-compliance can result in penalties up to $1 million per violation.
However, numerous healthcare organizations still share data in formats that are challenging to read and utilize electronically. Other companies and researchers are legally prevented from sharing data with patients, even with consent, or are hesitant to supply data due to data protection concerns, even in countries where such laws do not exist (,).
In South Korea, the Ministry of Health and Welfare is advocating for MyData legislation and a pilot service in the medical field called MyHealthway. indicate that medical institutions will not be compelled to comply with patient information transfer requests, but voluntary participation will be encouraged to improve service quality for individuals and patients. Private companies other than medical institutions will be eligible to participate after 2024, but this is still far from realizing the right to strict data self-determination.
As such, data self-determination can only be achieved through legal obligations or penalties. Ideally, it would be propelled by the voluntary motivation of ecosystem stakeholders. However, found that executives from healthcare organizations reported a lack of economic incentives to share data, alongside concerns about losing a competitive edge by external data sharing. In reality, data sharing measures such as data structuring, standardization, quality control, and data archiving often require the resources and expertise of the data-generating healthcare organization, while the benefits are more likely to be reaped by the data-using organization. This misalignment of incentives complicates voluntary participation by data generators.
Store and manage up to 200 GB of genomic data per individual ().
Self Sovereign Identity (SSI) is a new model suggested to represent an individual's identity online. The technologies enabling this model, Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), were officially accepted as web standards by the World Wide Web Consortium (W3C) in July 2022. ()
Up until now, our digital identities have been solely determined by accounts we hold on servers of services offered by specific companies. This arrangement necessitates the creation of a new account each time we opt for a new service. Furthermore, if identity verification is required, the process must be repeated for each service. Consequently, many people have defaulted to using accounts from larger services like Google and Facebook to log in. However, this practice results in an accumulation of personal data on a single server, heightening the risk of hacking. As our dependence on a specific service grows, so does our vulnerability to account suspension or restrictions, particularly if we are perceived as breaching the company's policies. This was recently exemplified by the US-based payment service Paypal, which attempted to implement a policy enabling it to impose a $2,500 penalty on accounts of users found in violation of its regulations. ()
In such a situation, a viable solution could be allowing data to be transmitted and received through the Elliptic Curve Integrated Encryption Scheme (ECIES). This is a standard framework for data encryption delivery that utilizes the Elliptic Curve Diffie-Helman (ECDH)-based multi-signature technology as per the standard. This ensures that data is securely encrypted and decrypted, making it inaccessible except by the two parties explicitly involved in the data exchange, and bypasses any intermediary servers. This means patients can exchange large volumes of data directly with healthcare organizations and data utilization entities without reliance on other entities. Consequently, the data distribution pathway can be made highly efficient, minimizing the risk of privacy breaches. Coupling this with appropriate incentives can make data trading a reality.
An alternative solution is leveraging the InterPlanetary File System (), a protocol for distributed storage and sharing of files. When a file is uploaded to an IPFS network, it gets distributed across multiple nodes. Simultaneously, a unique identifier - the Content Identifier (CID) - is generated from the file's hash value that serves to link the distributed files. If a large patient-specific dataset is encrypted using the patient's public key, uploaded to IPFS, and issued with the CID in a VC (Verifiable Credential), the patient can confidently share the data with another organization, which can then verify the data's integrity by ensuring the CID hasn't changed.
Lastly, secure data exchange can also be facilitated through numerous personal devices and relays, interconnected peer-to-peer with cryptographic key pairs, like and Nostr.