Health Management Strategies for Medical Health Records Incorporating Graph Theory Methods

Personal health data integration and circulation is one of the important parts of China’s medical and health system reform. As a promising self-management tool for personal medical and health data, personal health records, together with electronic health records and electronic medical records, constitute the core content of personal health data [1-3]. Health records are important information for recording personal health status, including disease history, family medical history, living habits, and health examination results. In order to better manage personal health information, it is crucial to develop a scientific, rational and systematic health record management strategy [4-7].

Healthcare records are formed by each healthcare organization in the course of medical business activities, and are used to record the process of medical activities, details of medical activities, and the results of medical activities, and contain a large amount of valuable medical information. As a basis for dealing with doctor-patient disputes, medical and health records can provide valuable information for scientific research and education, which is not only an important support for medical services but also an important guarantee for medical quality and safety [8-11]. Therefore, in the information society, it is necessary to carry out the transformation of informationization of medical and health record management. The in-depth promotion of the application of information technology in healthcare records management is necessary for the development of healthcare records management business and the promotion of people’s health quality management [12-14]. At the same time, the network is a double-edged sword. In the process of promoting the construction of medical and health record informatization, the relevant personnel grasp the opportunities brought about by clear information technology, and at the same time, they should also recognize the challenges brought about by the application of information technology in order to continuously improve the construction of information technology in medical and health record management [15-18].

Literature [19] launched a cross-sectional study of hospitals using electronic cases in order to explore the impact of the application of electronic medical records on the quality of healthcare services. The results were analyzed through questionnaires and descriptive and statistical analyses using the Statistical Package for the Social Sciences (SPSS), and the results confirmed that electronic medical records are conducive to improving the quality of healthcare services. Literature [20] reviewed the literature related to “personal health records” in various fields and analyzed the results to show that the phrase is an extension of the electronic record and outlined several aspects of archives and records management that can be effectively studied in personal health records. Literature [21] utilized cross-sectional, descriptive, and comparative design methods to examine the differences in healthcare delivery between hospitals using paper cases and electronic medical records. A comparative test analysis concluded that the adoption of electronic medical records was beneficial in improving the quality of healthcare services. Literature [22] emphasized the important role of “lean management” in hospitals. Interviews and observations were conducted with medical records department personnel and hospital experts. The results of the study indicated that the medical records department is an important part of the hospital’s information system, and it is important to use lean management to improve the daily work of this department. Literature [23] mentions a secure management framework for patient medical records based on Ethernet blockchain to provide correctness in medical records lifecycle management. By utilizing a secret sharing scheme where shares are attributed to the patient, it aims to give individual patients full control over their medical records. Literature [24] evaluated a distributed security module for clinical images by using the RC6 encryption algorithm and CSIS for distributed storage of clinical images and sharing them using a fully secret sharing scheme. It is considered to be able to publicize individual key sharing and “n” image sharing as the scheme has strong, perfect security.

Literature [25] was based on a questionnaire to understand the importance of using electronic medical records for patients. The results of the study showed that electronic medical records not only effectively save healthcare resources, but patients can also realize their health management through electronic medical records. Literature [26] explored the blockchain electronic medical record management framework, which utilizes blockchain technology to ensure the security and privacy of medical information management and realizes the authenticity and integrity of the medical record through records for off-chain storage, and also emphasizes that the patient is the exclusive property of the medical record. Literature [27] proposes a blockchain technology framework for electronic medical record management. The results obtained through proof of concept in medical clinics indicated that private health organizations have relatively low security and efficiency in the management of electronic medical records, and the development of technological solutions for the management of electronic medical records is an effective measure to address this issue. Literature [28] discusses the use of blockchain networks for solving the problems in medical record systems with the aim of enhancing the patient’s control access to the medical records while healthcare professionals need to access and update the medical records with the patient’s permission and authorization to achieve the integrity and security of the medical record data. Literature [29] developed a medical record management framework in order to support healthcare delivery in hospitals. Relevant data from several hospitals were collected through quantitative analysis, questionnaires, etc. The study emphasized the deficiencies in medical record management in public hospitals and developed a framework for improving medical record management.

The study aims to develop a disease prediction model to enhance the effectiveness of healthcare management.In this paper, the DHG4DP model is constructed by extracting the symptom information of patients from the electronic health record (EHR). Then, the large hypergraph is divided into two sub-supergraphs so that DHG4DP can both globally mine the patient-disease relationship and discover specific disease-accompanying patterns. In addition, the model associates the sub-supergraphs with the patient’s time-series information, constructs dynamic sub-supergraphs, and organically integrates the disease relationships extracted from the two dynamic sub-supergraphs with the subdivided disease types, ultimately realizing the improvement of the disease prediction accuracy. The constructed model is practically applied in medical work, and the management effect is examined through controlled experiments.

2

Research on healthcare management strategies

In traditional medical activities, the diagnosis and treatment of diseases have a certain delay. That is, before the onset of the disease, doctors can not accurately determine whether the patient is sick or not, and it is difficult to realize the dynamic prediction of the patient population. In the new era, artificial intelligence technology has been widely used in the field of medicine, and disease prediction is widely supported by this technology.Dynamically analyze the prevalence of diseases in the population, while also effectively analyzing the social environment, population immunity, and other factors.In addition, with the support of the corresponding mathematical model, it can dynamically predict and analyze disease prevalence, which provides an effective basis for the hospital’s health management strategy.This paper combines the graph theory method with medical and health records to construct a disease prediction network model that promotes medical and health management.

2.1

Disease prediction based on medical health records

Medical Health Record (EHR) data refers to a longitudinal patient electronic medical information collection system that can record all data generated by a patient at a healthcare facility.EHR contains a wide variety of data and information about a patient, such as patient demographics, medical history, medication and allergy history, immunization status, lab test results, radiological images (e.g., X-rays, etc.), vitals, personal data such as age, height, weight, etc., medical procedure records, payment information, etc. This information plays a vital role in analyzing patient characteristics and helping physicians make decisions. [30].

2.2

Complex Networks and Graph Theory Methods

It is still difficult to directly apply readily available mathematical formulas to computationally analyze a patient’s medical and health records. Instead, some analytical methods in complex networks and graph theory can be employed to map multiple data and information about patients onto the structure of networks and graphs for intuitive visual representation [31]. Quantitative computational analysis is performed through the dynamics of the model to understand and analyze the laws and properties embedded in the EHR.

The use of finite graphs to describe the EHR process uses vertices and undirected edges, which together then form an undirected graph G = G(V,E), which can be accurately represented as an adjacency matrix if there are a total of N vertices in the graph, each of which can be differentially represented by the notation i(i = 1,2,⋯,N). in the form of A = (a_i,j), where: (1) $a_{i, j} = {\begin{array}{l} 1, \forall e (i, j) \in E \\ 0, O t h e r w i s e \end{array}$ \[{{a}_{i,j}}=\left\{ \begin{array}{*{35}{l}} 1,\forall e\left( i,j \right)\in E \\ 0,\text{ }Otherwise \\ \end{array} \right.\]

Therefore, if this undirected graph G = G(V,E) is quantized and analyzed, it becomes possible to use algebra and matrices for some calculations.

In addition, this undirected graph can be represented by another association matrix B = (b_i,j) of N×K, where the corresponding term b_i,j in the matrix is equal to 1 when the vertex i(i = 1,2,⋯,N) is connected to the associated edges i(i = 1,2,⋯,N) , k∈K , while the corresponding term is equal to 0 if the vertex is not connected to an edge. The corresponding term b_i,j in the matrix is: (2) $b_{i, j} = {\begin{array}{l} 1, \forall e_{k} \in E \\ 0, O t h e r w i s e \end{array}$ \[{{b}_{i,j}}=\left\{ \begin{array}{*{35}{l}} 1,\forall {{e}_{k}}\in E \\ 0,\text{ }Otherwise \\ \end{array} \right.\]

At this point, the connection between the behavior and the data is established using some basic methods of graphs.

When there are a total of N vertices in this undirected graph G = G(V,E), the properties of a vertex can be analyzed and compared, i.e., the degree of centrality of a vertex can be calculated, which is called degree centrality. The degree centrality of a vertex i(i = 1,2,⋯,N) in an undirected graph can be expressed in terms of the degree of the vertex i, which is actually the number of edges of a vertex i(i = 1,2,⋯,N) connected to other vertices. It is used to indicate the importance of vertex i. The notation k_i is used to denote the degree of vertex i: (3) $D_{i}^{c} = k_{i} = \sum_{j = 1}^{N} a_{i, j}$ \[D_{i}^{c}={{k}_{i}}=\sum\limits_{j=1}^{N}{{{a}_{i,j}}}\]

So the more edges a vertex i is connected to other vertices in an undirected graph, the more the degree of that vertex i is at the center of many vertices in the graph, i.e., the greater the centrality of that vertex i. The distribution of degrees in an undirected graph can also be calculated. If the total number of vertices in a network graph is N, and the number of vertices with k connected edges in the vertices is N_k, then the distribution of its degree is expressed by equation (4): (4) $p_{k} = \frac{N_{k}}{N}$ \[{{p}_{k}}=\frac{{{N}_{k}}}{N}\]

The distribution of degrees describes the conceptual problem of analyzing the complexity of graphs in undirected graphs.

After representing a complex behavior as an undirected graph with N vertices, the linkage efficiency of the graph can be evaluated. For an undirected graph G = G(V,E) with two vertices i and j, assume that the linkage efficiency between i and j can be denoted as e_i,j, which is equal to the reciprocal of the distance d_i,j between the two vertices, and therefore can be denoted as $e_{i, j} = \frac{1}{d_{i, j}}$ ${{e}_{i,j}}=\frac{1}{{{d}_{i,j}}}$. If there is no edge connecting the two vertices i and j, their distance can be denoted as d_i,j = ∞ , which means that the linkage efficiency e_i,j of the two vertices i and j is equal to 0. Then the efficiency of the undirected graph can be denoted by the degree distribution. G = G(V,E) efficiency: (5) $E_{f} = \frac{1}{N (N - 1)} \sum_{i = 1}^{N} \sum_{\begin{matrix} j = 1 \\ j \neq i \end{matrix}}^{N} \frac{1}{d_{i, j}}$ \[{{E}_{f}}=\frac{1}{N\left( N-1 \right)}\sum\limits_{i=1}^{N}{\sum\limits_{\begin{smallmatrix} j=1 \\ j\ne i \end{smallmatrix}}^{N}{\frac{1}{{{d}_{i,j}}}}}\] is the average of the efficiency e_i,j between all two vertices i and j, so if the connection distance between two vertices is shorter, then the efficiency of the inter-vertex link represented by the undirected graph is higher.

3

Disease Prediction Model Based on Dynamic Hypergraphic Networks

3.1

Modeling framework

With the development of graph neural networks, dynamic graph neural networks have received attention for their ability to flexibly model temporal information. In addition, multimodal methods also shine in various fields of deep learning. Multimodal can utilize multiple forms of information, and through certain fusion methods, the effect is better than the effect of unimodal methods. [32].

The Dynamic Hypergraph Neural Network based Disease Prediction Model (DHG4DP) designed in this paper is shown in Figure 1. The model first processes the dataset. The symptom information of the patient is extracted first, and this information is used to initialize patient embedding. Then, the model extracts the global disease relationship graph and constructs the local disease hypergraph and sudden disease hypergraph based on this graph.In the second step, the model distinguishes the diseases into persistent and emergent diseases and performs hypergraph convolution and feature fusion on the disease embedding according to the two sub-hypergraphs constructed in the previous step to obtain the disease embedding. In the third step, the model handles the temporal relationship to categorize the impact of the disease on the patient into long-term disease, emergent-associated disease, and emergent-unassociated disease, and then through the attention mechanism to Generate the embedding of each hospitalization record of the patient, and then model the temporal information of the patient through the M-GRU module to get the final embedding of the patient. Finally, the model processes the final embedding of the patient through the MLP layer to calculate the probability of the patient’s disease and then obtains the disease prediction results.

3.2

Formalized definitions

SYMBOL DESCRIPTION: During a medical visit, a patient may be diagnosed with one or more conditions, which are usually indicated by specific medical codes.

Given an electronic medical record dataset {γ_u|u∈U}, where U denotes a collection of patients, and $γ_{u} = (V_{1}^{u}, V_{2}^{u}, \dots, V_{T}^{u})$ ${{\gamma }_{u}}=\left( V_{1}^{u},V_{2}^{u},\cdots ,V_{T}^{u} \right)$ denotes a sequence of historical visits for patient u . Each visit $V_{t}^{u} = {C_{t}^{u}, E_{t}^{u}}$ $V_{t}^{u}=\left\{ C_{t}^{u},E_{t}^{u} \right\}$ is recorded in the visit record with a subset $C_{t}^{u} \subset ℂ$ $C_{t}^{u}\subset \mathbb{C}$ of medical codes and a subset $E_{t}^{u} \subset E$ $E_{t}^{u}\subset \mathbb{E}$ of clinical events. ℂ = {c₁,c₂,⋯,c_|ℂ|} refers to the set of diseases in the EHR dataset, where medical codes represent diseases. In addition, E = {E₁,E₂,⋯,E_|E|} is the set of clinical events in the EHR dataset, where |E| denotes the number of clinical events. For patient u, the ith clinical event $E_{r i}^{u}$ $E_{ri}^{u}$ of his t visit consists of an event type q_i∈ℚ and a set of clinical event features ${A_{i}^{1}, \dots, A_{i}^{| m_{i} |}}$ $\left\{ A_{i}^{1},\cdots ,A_{i}^{\left| {{m}_{i}} \right|} \right\}$, where |m_i| denotes the number of clinical features. Each clinical feature $A_{i}^{k}$ $A_{i}^{k}$ is a meta-ancestor $(n_{i}^{k}, v_{i}^{k})$ $\left( n_{i}^{k},v_{i}^{k} \right)$ consisting of a feature name $n_{i}^{k} \in ℕ$ $n_{i}^{k}\in \mathbb{N}$ and a feature value $v_{i}^{k} \in X$ $v_{i}^{k}\in \mathbb{X}$, where ℕ and X represent the set of feature names and the set of feature values, respectively.

Problem Statement: Predict the diseases that will be diagnosed in a patient’s T + 1 th visit. Specifically, given a patient u with a history of T visits, the method is to predict y^T+1∈{0,1}^|ℂ|, y^T+1 is a vector representing the probability of various diseases appearing in the patient’s T+1 visits.

3.3

Dynamic word hypergraph construction

Since the patient possesses multiple visits, and each visit the patient is not diagnosed with exactly the same disease, a dynamic hypergraph can be obtained for the patient’s historical visits. In this section, let G = {G¹,G²,⋯,G^T} represent the dynamic ultrasound and G^t = (P^t,H^t) represent the ultrasound at the patient’s t visit. P^t ⊂ C represents the set consisting of all nodes in the hypergraph G^t, i.e., the set consisting of all diseases diagnosed at the patient’s t visit. Similarly, H^t denotes the edge of the hypergraph H^t. G^t has an adjacency matrix O^t of size |P^t|×|H^t|. $O_{c, u}^{t}$ $O_{c,u}^{t}$ is 1 when Patient u has Disease c_i in the t visit record and 0 otherwise.

In practice, a disease diagnosed at a single visit may be either a persistent disease or a sudden-onset disease. In order to model the fine-grained higher-order relationship between persistent and emergent diseases, for the set D^t consisting of diseases in the record (|t|≥2) of the tth visit, this section further classifies these diseases into persistent and emergent diseases:

Persistent Disease: $D_{p}^{t} = D^{t} \land D^{t - 1} \in {0, 1}^{| ℂ |}$ $\mathbb{D}_{p}^{t}={{\mathbb{D}}^{t}}\wedge {{\mathbb{D}}^{t-1}}\in {{\left\{ 0,1 \right\}}^{\left| \mathbb{C} \right|}}$, when a disease is present in the t visit record and also in the t–1 visit record.

Sudden illness: $D_{e m}^{t} = D^{t} \land \neg (D^{t - 1}) \in {0, 1}^{| ℂ |}$ $\mathbb{D}_{em\text{ }}^{t}={{\mathbb{D}}^{t}}\wedge \neg \left( {{\mathbb{D}}^{t-1}} \right)\in {{\left\{ 0,1 \right\}}^{\left| \mathbb{C} \right|}}$, when an illness is present in the record of the t visit but not in the record of the t–1 visit.

After categorizing the diseases diagnosed at the t visit into two types, this section constructs subgraphs of both types based on the hypergraph G^t:

Localized Disease Hypergraph $G_{D}^{t}$ \[\text{G}_{D}^{t}\] : This is a hypergraph consisting of persistent diseases diagnosed at visit t, where the persistent diseases serve as nodes in the hypergraph and visit t serves as an edge in the hypergraph.

Sudden illness hypergraph $G_{D N}^{t}$ \[\text{G}_{DN}^{t}\] : This is a hypergraph describing the relationship between each sudden illness and persistent illness. In this hypergraph, diseases are used as nodes in the hypergraph and visit t is used as an edge in the hypergraph.

In real healthcare scenarios, a disease diagnosed in a single visit by a patient may be long-standing or sudden in onset. To capture these effects, in the dynamic hypergraph learning module, this section extracts the local context as well as the abrupt context.

Local context: for each diagnostic node (i.e., the node corresponding to the diagnosed disease in the hypergraph), this section aggregates the representations of the diagnostic nodes connected to it in the hypergraph $G_{D}^{t}$ \[\text{G}_{D}^{t}\] and aggregates the representations of the sudden disease nodes connected to it in the hypergraph $G_{D N}^{t}$ \[\text{G}_{DN}^{t}\] (i.e., the node corresponding to the diagnosed sudden disease in the hypergraph) as shown in Eq: (6) $Z_{D}^{t} = H y p e r G C N (G_{D}^{t}) + H y p e r G C N (G_{D N}^{t})$ \[Z_{D}^{t}=\text{ }Hyper\text{ }GCN\left( \text{G}_{D}^{t} \right)+\text{ }Hyper\text{ }GCN\left( \text{G}_{DN}^{t} \right)\] Where Hyper GCN(·) denotes the hypergraph convolution operation. $G_{D}^{t}$ \[\text{G}_{D}^{t}\] denotes the localized disease hypergraph of the patient’s t visit. $G_{D N}^{t}$ \[\text{G}_{DN}^{t}\] denotes the sudden disease hypergraph of the patient’s t visit.

Sudden context: for each sudden disease node, this section aggregates the representations of the diagnostic nodes connected to it in the hypergraph $G_{D N}^{t}$ \[\text{G}_{DN}^{t}\], in order to obtain the sudden context, as shown in the following equation: (7) $Z_{N}^{t} = H y p e r G C N (G_{D N}^{t})$ \[Z_{N}^{t}=HyperGCN\left( \text{G}_{DN}^{t} \right)\] Where Hyper GCN(·) denotes the hypergraph convolution operation. $G_{D N}^{t}$ \[\text{G}_{DN}^{t}\] denotes the sudden illness hypergraph of the patient’s tth visit.

The above operation captures the fine-grained higher-order relationships of diseases in the same visit, and it is obvious that there are some relationships between diseases in different visits, in order to further capture the relationships between diseases in different visits, this section designs a transfer function to extract the transfer context from the historical visits, as follows.

For sudden-onset disorders $D_{e m}^{t}$ \[D_{em}^{t}\] diagnosed in a single visit, they are often induced by a combination of historical disorders. In this section, we design a click-attention as a transfer function to capture the bursty transfer context, and the corresponding formula for this operation is shown below: (8) $Z_{e n}^{t} = A t t (Z_{N}^{t - 1}, Z_{N}^{t - 1}, Z_{D}^{t})$ \[Z_{en}^{t}=Att\left( Z_{N}^{t-1},Z_{N}^{t-1},Z_{D}^{t} \right)\] Where $Z_{e n}^{t}$ \[Z_{en}^{t}\] denotes the abrupt transfer context in the tth visit. The corresponding equation for the click attention function is shown below: (9) $A t t (Q, K, V) = s o f t \max (\frac{Q W_{q} {(K W_{k})}^{⊤}}{\sqrt{a}}) V W_{v}$ \[Att\left( Q,K,V \right)=soft\max \left( \frac{Q{{W}_{q}}{{\left( K{{W}_{k}} \right)}^{\top }}}{\sqrt{a}} \right)V{{W}_{v}}\] Where W_k, W_q, and W_v are the attentional weights and a is the attentional scaling factor.

In the case of persistent disorders, since they have been diagnosed several times in previous visits, this section provides that the representation of such disorders remains the same from visit to visit.

Using M-GRU to capture the interaction between the persistent disease representation $I_{p}^{t}$ $I_{p}^{t}$ in visit t and the abrupt transfer context $Z_{e n}^{t}$ $Z_{en}^{t}$ in visit t, the transfer output $Z_{p}^{t}$ $Z_{p}^{t}$ for visit t is obtained as shown in Eq: (10) $Z_{p}^{t} = G R U (C o n c a t (I_{p}^{t}, Z_{e n}^{t}), h_{p}^{t - 1})$ \[Z_{p}^{t}=GRU\left( Concat\left( I_{p}^{t},Z_{en}^{t} \right),h_{p}^{t-1} \right)\] Where $h_{p}^{t - 1}$ \[h_{p}^{t-1}\] represents the output from the M-GRU in the t–1 th visit. Concat(·) represents the concatenate operation.Specifically, when t = 1, since no sudden illness has been diagnosed at the first visit, this section orders $P_{p}^{t} = D_{p}^{1}$ \[\text{P}_{p}^{t}=\text{D}_{p}^{1}\] and uses the M-GRU model with an initial hidden state of $h_{p}^{0} = 0$ \[h_{p}^{0}=0\] to compute the transfer output for the first visit using the following formula: (11) $h_{p}^{1} = G R U (I_{p}^{1}, h_{p}^{0})$ \[h_{p}^{1}=GRU\left( I_{p}^{1},h_{p}^{0} \right)\] Where $I_{p}^{1}$ \[I_{p}^{1}\] indicates an indication of persistent illness diagnosed at the first visit.

After that, this section uses the maximum pooling operation on the transfer output to generate the visit representation v^t for the tth visit, as shown in the formula below: (12) $v^{t} = \max p o o l i n g (h_{p}^{t})$ \[{{v}^{t}}=\max pooling\left( h_{p}^{t} \right)\]

Finally, this section utilizes the attention mechanism to calculate the impact of each visit in a patient’s history on the predicted outcome to obtain the final visit representation, as shown in the following equation: (13) $α = s o f t \max ([v^{1}, v^{2}, \dots, v^{T}] w_{α})$ \[\alpha =soft\max \left( \left[ {{v}^{1}},{{v}^{2}},\ldots ,{{v}^{T}} \right]{{w}_{\alpha }} \right)\] (14) $O_{v} = α {[v^{1}, v^{2}, \dots, v^{I}]}^{⊤}$ \[{{O}_{v}}=\alpha {{\left[ {{v}^{1}},{{v}^{2}},\ldots ,{{v}^{I}} \right]}^{\top }}\] Where AA is the trainable parameter, BB is the final visit representation, and VV denotes the attention score.

3.4

Subhypergraph Convolution and Fusion

After constructing two dynamic sub-hypergraphs in the previous subsection, this paper uses the hypergraph convolution module to process them. Before that, the inputs of hypergraph convolution are clarified.

The different relationships between disease and patient will directly lead to different impacts on the patient. Therefore, we create different disease embedding for the same disease from two aspects, respectively:

Direct disease. The disease is diagnosed directly by the patient and this embedding is recorded as E_d.

Indirect disease. The disease is a neighbor of the disease directly suffered by the patient; this embedding is denoted as E_n.

For the convenience of subsequent calculations, both diseaseembedding have the same dimensions, i.e., E_d, E_n∈ℝ^|D|×|dim_d|. In order to integrate patientembedding and diseaseembedding, a bridge needs to be built between the two, i.e., the relationship between disease and patient. In this paper, we construct a disease-patient relationship graph, represented by the adjacency matrix A_dp∈{0,1}, such that if patient p suffers from disease d in any hospitalization record, then let A_dp[p][d] = 1. Meanwhile, let c^t and n^t be the multihot vectors denoting the diseases that are directly diagnosed in the hospitalization record t and the diseases that are adjacent to the directly diagnosed diseases, respectively, we can fuse the embedding of the patient and the diseases according to the following two equations: (15) $\begin{matrix} Z_{d}^{t} = A_{d p} E_{p} W_{p d} + c^{t} ⊙ E_{d} + h y p e r c o n v (H_{c}^{t}, E_{d}) \\ + h y p e r c o n v (H_{n}^{t}, E_{n}) \end{matrix}$ \[\begin{align} & Z_{d}^{t}={{A}_{dp}}{{E}_{p}}{{W}_{pd}}+{{c}^{t}}\odot {{E}_{d}}+hyperconv\left( H_{c}^{t},{{E}_{d}} \right) \\ & +hyperconv\left( H_{n}^{t},{{E}_{n}} \right) \end{align}\] (16) $Z_{n}^{t} = n^{t} ⊙ E_{n} + h y p e r c o n v (H_{o}^{t}, E_{n}) + h y p e r c o n v (H_{n}^{t}, E_{d})$ \[Z_{n}^{t}={{n}^{t}}\odot {{E}_{n}}+hyperconv\left( H_{o}^{t},{{E}_{n}} \right)+hyperconv\left( H_{n}^{t},{{E}_{d}} \right)\] Where A_dpE_pW_dp denotes the ordinary graph neural network aggregation method and W_dp is a learnable parameter matrix responsible for mapping the patient embedding into the vector space of disease embedding. ⊙ denotes element-by-element dot product, and c^t⊙E_d filters out the diseases that the patient in E_d directly suffers from and makes the value indicating other diseases in E_d to be 0. hyperconv( ) denotes the hypergraph convolution module.

After the calculation of the above two formulas, we get the two disease embedding at the time of hospitalization record t, i.e., the direct disease $Z_{d}^{t}$ $Z_{d}^{t}$ and the indirect (neighboring) disease $Z_{n}^{t}$ $Z_{n}^{t}$, and then we input them into the fully-connected layer and the LeakyReLU activation function to obtain the final disease embedding $F_{{d, n}}^{t}$ $F_{\left\{ d,n \right\}}^{t}$: (17) $F_{{d, n}}^{t} = L e a k y Re L U (Z_{{d, n}}^{t} W_{r e l u})$ \[F_{\left\{ d,n \right\}}^{t}=Leaky\operatorname{Re}LU\left( Z_{\left\{ d,n \right\}}^{t}{{W}_{relu}} \right)\] Where W_relu is the learnable parameter matrix.

3.5

Timing feature learning

In this paper, diseases are divided into three categories in terms of temporal relationships:

Long-term diseases. The vector is denoted as $x_{l}^{t} = c^{t} \land c^{t - 1}$ $x_{l}^{t}={{c}^{t}}\wedge {{c}^{t-1}}$, which represents diseases that are present at both the t and t–1 moments of the hospitalization record.

Emerging associated diseases. Vector denoted $x_{n n}^{t} = c^{t} \land n^{t - 1}$ $x_{nn}^{t}={{c}^{t}}\wedge {{n}^{t-1}}$, which represents diseases that were considered associated potential diseases at the time of hospitalization record t–1 and diagnosed at the time of hospitalization record t.

Emerging unrelated diseases. Vector denoted $x_{n u}^{t} = c^{t} \land \neg (c^{t - 1} \lor n^{t - 1})$ $x_{nu}^{t}={{c}^{t}}\wedge \neg \left( {{c}^{t-1}}\vee {{n}^{t-1}} \right)$, which represents diseases that were diagnosed at the time of hospitalization record t and were other diseases at the time of hospitalization record t–1.

The attention mechanism is then used to learn the impact of the latter two disease categories on the patient’s disease embedding generation. Noting the embedding matrix as the set of embedding for diseases that neither appear in the existing disease records nor in the direct neighbors of the diseases in the existing disease records, the attention mechanism is computed as follows: (18) $h_{n n}^{t} = A t t (x_{n n}^{t} ⊙ F_{n}^{t - 1}, x_{n n}^{t} ⊙ F_{n}^{t - 1}, x_{n n}^{t} ⊙ F_{d}^{t})$ \[h_{nn}^{t}=Att\left( x_{nn}^{t}\odot F_{n}^{t-1},x_{nn}^{t}\odot F_{n}^{t-1},x_{nn}^{t}\odot F_{d}^{t} \right)\] (19) $h_{n u}^{t} = A t t (x_{n u}^{t} ⊙ E_{r}, x_{n n}^{t} ⊙ E_{r}, x_{n n}^{t} ⊙ F_{d}^{t})$ \[h_{nu}^{t}=Att\left( x_{nu}^{t}\odot {{E}_{r}},x_{nn}^{t}\odot {{E}_{r}},x_{nn}^{t}\odot F_{d}^{t} \right)\] Where the attention module denoted by Att( ) is defined as follows: (20) $A t t (Q, K, V) = s o f t \max (\frac{Q W_{q} {(K W_{k})}^{T}}{\sqrt{a}}) V W_{v}$ \[Att\left( Q,K,V \right)=soft\max \left( \frac{Q{{W}_{q}}{{\left( K{{W}_{k}} \right)}^{T}}}{\sqrt{a}} \right)V{{W}_{v}}\] Where a is the dimension of attention and W_q, W_k, W_v are the attention weights.

For long-term illness $x_{l}^{t}$ $x_{l}^{t}$, we modeled the temporal characteristics $h_{l}^{t}$ $h_{l}^{t}$ as of hospitalization record t based on the M-GRU module in the (21) $h_{l}^{t} = M - G R U (x_{l}^{t} ⊙ F_{d}^{t}, h_{n n}^{t}, h_{n u}^{t}, h_{l}^{t - 1})$ \[h_{l}^{t}=M-GRU\left( x_{l}^{t}\odot F_{d}^{t},h_{nn}^{t},h_{nu}^{t},h_{l}^{t-1} \right)\] (22) $z^{t} = σ (x_{l}^{t} ⊙ F_{d}^{t} W_{z} + h_{l}^{t - 1} U_{z} + b_{z})$ \[{{z}^{t}}=\sigma \left( x_{l}^{t}\odot F_{d}^{t}{{W}_{z}}+h_{l}^{t-1}{{U}_{z}}+{{b}_{z}} \right)\] (23) $r^{t} = σ (x_{l}^{t} ⊙ F_{d}^{t} W_{r} + h_{l}^{t - 1} U_{r} + b_{r})$ \[{{r}^{t}}=\sigma \left( x_{l}^{t}\odot F_{d}^{t}{{W}_{r}}+h_{l}^{t-1}{{U}_{r}}+{{b}_{r}} \right)\] (24) ${\hat{h}}^{t} = ϕ (x_{l}^{t} ⊙ F_{d}^{t} W_{h} + (r^{t} ⊙ h_{l}^{t - 1}) U_{h} + b_{h})$ \[{{\hat{h}}^{t}}=\phi \left( x_{l}^{t}\odot F_{d}^{t}{{W}_{h}}+\left( {{r}^{t}}\odot h_{l}^{t-1} \right){{U}_{h}}+{{b}_{h}} \right)\] (25) ${\tilde{h}}^{t} = ϕ (h_{n n}^{t} + h_{n u}^{t})$ \[{{\tilde{h}}^{t}}=\phi \left( h_{nn}^{t}+h_{nu}^{t} \right)\] (26) $h_{l}^{t} = (1 - z^{t}) ⊙ h_{l}^{t - 1} + z^{t} ⊙ {\hat{h}}^{t} + {\tilde{h}}^{t}$ \[h_{l}^{t}=\left( 1-{{z}^{t}} \right)\odot h_{l}^{t-1}+{{z}^{t}}\odot {{\hat{h}}^{t}}+{{\tilde{h}}^{t}}\]

Here, W_{z,r,h} and U_{z,r,h} are the weight matrices of GRU and b_{z,r,h} is the bias parameter. After M-GRU, we use the maximum pooling layer to obtain the embedding v^t of the inpatient record t: (27) $v^{t} = \max_p o o l i n g (h_{l}^{t})$ \[{{v}^{t}}=\max \_pooling\left( h_{l}^{t} \right)\]

After the above computation, the location-based attention mechanism is used to fuse all the hospitalization records embedding to get the final embedding o_p of the patient :

(28)

α = s o f t \max ([v^{1}, v^{2}, \dots, v^{T}] W_{α})

\[\alpha =soft\max \left( \left[ {{v}^{1}},{{v}^{2}},\ldots ,{{v}^{T}} \right]{{W}_{\alpha }} \right)\]

(29)

o_{p} = α {[v^{1}, v^{2}, \dots, v^{T}]}^{T}

\[{{o}_{p}}=\alpha {{\left[ {{v}^{1}},{{v}^{2}},\ldots ,{{v}^{T}} \right]}^{T}}\]

Here W_α is the weight matrix of the learnable attention mechanism and α is the attention score.

The disease predictions of the model are then obtained through a multilayer perceptron (MLP) layer.

4

Empirical analysis

4.1

Test of disease prediction model effect

4.1.1

Data preprocessing and experimental setup

The MIMIC-III dataset was chosen for this experiment to verify the model effect, and its detailed information can be found in Table 1. This dataset has 1863 diagnostic items.

Table 1.

MIMIC-III data set information

Statistical Items	Quantity
Diagnoses	1863
Treatment procedures	1365
Patients	5596
Dictionary size of text data	64352
Average of diagnoses	12.8
Average of treatment procedures	4.33
Average word of per visit	2234

Because the purpose of this experiment is to predict the disease for which a patient will visit the clinic next, patients who have visited the clinic at least twice in the EHR data are selected, i.e., patients who have visited the clinic less than twice are deleted, and patients who have visited the clinic with both textual information and information about the diagnosis and treatment process are also selected from it.In the MIMIC-III database, four tables, DIAGNOSES_ICD, PROCEDURES_ICD, ADMISSIONS, and NOTEEVENTS, were selected from which textual information such as patient’s diagnostic code, therapeutic procedure code, information about the time of the visit, and medical advice were obtained, respectively. For the information in this paper, keyword extraction was performed using the TF-IDF method.

During the experiment, the MIMIC-III dataset is divided into training, validation, and test sets in the ratio of 6:2:2. The experimental environment settings and the selected evaluation metrics are Accuracy, Recall, and Mean F1 value, respectively. Among them, Recall represents the probability of being predicted as a positive sample among the actual positive samples, which measures the check-perfect rate, and Precision represents the degree of accuracy of prediction among the results of positive samples, which measures the check-accuracy rate. The average F1 value provides a comprehensive evaluation of the model’s prediction accuracy.

4.1.2

Experimental results

In order to verify the experimental effectiveness of the DHG4DP model, five baseline models are selected for comparison, namely, LIGHTED, INPLIM, Tr-LSTM, tBNA-PR, and TURTAM. The experimental results on the MIMIC-III dataset are shown in Fig. 2. Compared with the baseline model, the proposed model performs optimally in terms of Precision, Recall and F1 values and achieves the best prediction performance. The Precision, Recall, and F1 values have been improved by 3.8%, 2.6%, and 3.7%, respectively, when compared to the baseline model TURTAM, which performed better.

4.2

Analysis of the effect of health care management

4.2.1

Subjects

A tertiary hospital in city A was selected for the study to reform the health management strategy. The 180 medical workers who managed health care through conventional management methods from January 2022 to November 2022 were considered as the control group. The 180 medical workers who applied healthcare management after the disease prediction network model from December 2022 to October 2023 were taken as the observation group. The statistics of the survey data are shown in Table 2.Comparing the gender data of the two groups, the difference was not statistically significant (c2=0.0311, P=0.7896). The mean age of the control and observation groups was 33.53 and 32.51 years, respectively. The difference between the age data of the two groups was not significant (t=0.8724, P=0.4116). The 1050 patients admitted during the treatment period of the control group and the observation group, respectively, were investigated as the patient group, and the difference between the gender data of the two groups was not statistically significant (c2=0.2468, P=0.6236). The control group and observation group had an average age of 53.41 years and 51.33 years, respectively. The difference between the age data of the two groups was not significant (t=0.7365, P=0.4635). In conclusion, the experimental subjects selected in this paper are suitable for control analysis.

Table 2.

Survey data

Data		Group	N	M	SD	t/c²	P
Gender	Female	Observation group	71	-	-	0.0311	0.7896
	Male	Observation group	109	-	-
	Female	Control group	69	-	-
	Male	Control group	111	-	-
Age		Observation group	180	32.51	8.22	0.8724	0.4116
Age		Control group	180	33.53	8.01	0.8724	0.4116
Patient gender	Female	Observation group	506	-	-	0.2468	0.6236
	Male	Observation group	544	-	-
	Female	Control group	495	-	-
	Male	Control group	555	-	-
Patient age		Observation group	1050	51.33	12.63	0.7365	0.4635
Patient age		Control group	1050	53.41	13.11	0.7365	0.4635

4.2.2

Observation indicators

Before and after the reform of the health management strategy, the hospital’s health management scores, medical staff’s health knowledge scores, and the incidence of health emergencies were used as observation indicators. Among them, the health management work score was mainly determined by the hospital’s health management work score sheet, including the score of infectious disease management organization, the score of health management system implementation, the score of infectious disease reporting, the score of medical staff’s vaccination card checking, the score of hospital health care room management, etc. The score was scored using the Likert 1-10 scale method, and the score value was positively correlated with the effect of the health management work.The health knowledge score of medical staff was determined by the hospital’s questionnaire, which included knowledge of infectious diseases, methods of preventing infectious diseases, and health knowledge, etc. The total score of each item was 100, and the score was positively correlated with the health knowledge level of medical staff.Health emergencies include influenza, chicken pox, mumps, bacillary dysentery, and tuberculosis.

4.2.3

Statistical methods

The final data of this study were processed using spss26.0 software before and after the reform of the health management strategy. The measurement data for hospital health management work scores and health knowledge scores of medical staff are expressed as standard deviation and tested by t.The rate of health emergencies is determined by the counting data expressed as a percentage and tested by C2. When p is less than 0.05, it means that the difference is statistically significant.

4.2.4

Experimental results

A comparison of hospital health management scores before and after carrying out healthcare management strategy reform is shown in Table 3. In the observation group, the scores of hospital infectious disease management organization (A1), health management system implementation (A2), infectious disease reporting (A3), medical staff vaccination card checking (A4), and hospital health care room management (A5) were significantly higher than those of the control group, and the differences were statistically significant (P<0.05).

Table 3.

Hospital health management performance ratings

Group	N	A1		A2		A3		A4		A5
		M	SD	M	SD	M	SD	M	SD	M	SD
Control group	180	6.23	1.24	6.54	0.98	6.85	0.78	6.89	1.11	6.74	1.07
Observation group	180	9.47	0.33	9.51	0.16	9.44	0.24	9.66	0.32	9.55	0.29
t	-	4.485		2.364		4.514		4.189		3.367
P	-	0.003		0.004		0.001		0.003		0.000

A comparison of health knowledge scores of healthcare workers before and after carrying out healthcare management strategy reform is shown in Table 4. In the observation group, healthcare workers’ knowledge related to infectious diseases (B1), methods of preventing infectious diseases (B2), and scores of health knowledge mastery (B3) were significantly higher than those of the control group, and the difference was statistically significant (P<0.05).

Table 4.

Comparison of health knowledge scores of medical staff

Group	N	B1		B2		B3
Group	N	M	SD	M	SD	M	SD
Control group	180	55.36	5.36	61.33	5.85	59.63	5.77
Observation group	180	86.35	8.32	88.64	8.01	86.47	8.36
t	-	10.635		11.587		9.658
P	-	0.000		0.000		0.000

A comparison of the incidence of health emergencies before and after carrying out the reform of healthcare management strategies is shown in Table 5. The incidence rate of health emergencies such as influenza (C1), chickenpox (C2), mumps (C3), bacillary dysentery (C4), and tuberculosis (C5) among the patients in the observation group was 1.9%, which was significantly lower than that of the control group (4.95%), and the difference was statistically significant (P=0.016<0.05).

Table 5.

The incidence of emergency health events was compared

Group	N	C1	C2	C3	C4	A5	Total incidence
Control group	1050	25	9	10	4	4	52(4.95%)
Observation group	1050	9	7	4	0	0	20(1.9%)
c²	-	-	-	-	-	-	6.358
P	-	-	-	-	-	-	0.016

5

Conclusion

In this paper, we combine graph theoretical methods and utilize graph neural networks for in-depth analysis of medical and health records as a way to facilitate the management of healthcare. The study’s dynamic hypergraph network-based disease prediction model outperforms the five baseline models in prediction performance. Compared to the best-performing baseline model, the precision, recall, and F1 values of this paper’s model have been improved by 3.8%, 2.6%, and 3.7%, respectively. Applying this paper’s constructed disease prediction model to medical work significantly improves the hospital health management work score, health care workers’ health knowledge score, and the incidence of health emergencies compared to the traditional health management model, with a P value less than 0.05. It shows that the disease prediction model applied to medical work helps to improve the effectiveness of health management.

Funding:

Source: 2023 Henan Provincial Department of Science and Technology Soft Science Project;

Project Title: “Evaluation Study on Home-Based Bedside Care Services in Henan Province”;

Project Number: 242400410363.

Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 1 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Biologie, Biologie, andere, Mathematik, Angewandte Mathematik, Mathematik, Allgemeines, Physik, Physik, andere

Zeitschrift RSS Feed

Health Management Strategies for Medical Health Records Incorporating Graph Theory Methods

Yanjie Wang

Junwei Yan

Online veröffentlicht: 17. März 2025

Eingereicht: 29. Okt. 2024

Akzeptiert: 13. Feb. 2025

DOI: https://doi.org/10.2478/amns-2025-0166

SchlüsselwörterGraph theoretic methods, Hypergraph networks, Disease prediction, Health management strategies

© 2025 Yanjie Wang et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Schlüsselwörter
Graph theoretic methods, Hypergraph networks, Disease prediction, Health management strategies