Identifying Dementia from EHR Data

An Interview with Vinod Vydiswaran, Ph.D.

January 16, 2024 5:00 AM

In 2009, the Health Information Technology for Economic and Clinical Health Act, wow, that's a mouthful, more commonly known as the HITECH Act, spent billions to promote the uptake of electronic health records by US hospitals. Fast forward more than a decade later, and now approximately four out of five healthcare institutions have electronic health record systems in place that integrate clinical notes, test results, medications, diagnostic images, et cetera. The adoption of EHR systems into healthcare introduces new and exciting opportunities to extract information that can be used to augment other types of data for research. As you might imagine though, it can be tricky to pull out meaningful information from the text of clinical notes. In this episode, we'll speak with a University of Michigan researcher, Dr. Vinod Vydiswaran, who's been developing methods to identify dementia from EHR data.

More resources

Faculty Profile: https://medicine.umich.edu/dept/lhs/vg-vinod-vydiswaran-phd

Transcript

Matt Davis:

In 2009, the Health Information Technology for Economic and Clinical Health Act, wow, that's a mouthful, more commonly known as the HITECH Act, spent billions to promote the uptake of electronic health records by US hospitals. Fast forward more than a decade later, and now approximately four out of five healthcare institutions have electronic health record systems in place that integrate clinical notes, test results, medications, diagnostic images, et cetera. The adoption of EHR systems into healthcare introduces new and exciting opportunities to extract information that can be used to augment other types of data for research. As you might imagine though, it can be tricky to pull out meaningful information from the text of clinical notes. In this episode, we'll speak with a researcher who's been developing methods to identify dementia from EHR data. I'm Matt Davis.

Donovan Maust:

I'm Donovan Maust.

Matt Davis:

You're listening to Minding Memory, a podcast devoted to exploring research on Alzheimer's disease and other related dementias.

Our guest today is Dr. Vinod Vydiswaran. Dr. Vydiswaran is an Associate Professor in the Department of Learning Health Sciences at the University of Michigan Medical School and of the University of Michigan School of Information. His research interests include clinical and consumer natural language processing, large scale text mining, and medical information science. Currently, his research is focused on medical and health informatics, specifically on medical information shared between doctors and patients via clinical notes and online portals. His team has focused their efforts on developing methods that can be used to extract information on dementia and caregiving from the free text of clinical notes. He's here today to tell us a little bit about his work. Vinod, welcome to the podcast.

Vinod Vydiswaran:

Hello, everyone. It's great to be here with all of you.

Matt Davis:

So to start things off, as a patient over the last five or 10 years, I've noticed that when you go visit the doctor, things are a little bit different. We've gone from situations where the doctor will sit down with you and jot some things down in a piece of paper and put it in your chart to doctors clicking boxes on computer screens, which I imagine is part of this adoption of EHR systems. So for our listeners that might not have clinical backgrounds, just in broad strokes, what is an electronic health record?

Vinod Vydiswaran:

Yes. So you said it exactly how it got introduced in our lives. Electronic health records is a recording or a documentation of any clinical visit that happens between when a doctor visits a healthcare setting. It could be a doctor's visit, it could be a lab report if a test is done or it could be even an emergency visit that would generate some document that captures what was the problem, what were the medications that were prescribed, was there something that was discussed but not prescribed, what was the assessment of the current condition, and what was the plan going ahead.

This could be in notes, but it is also a collection of the billing codes that might have generated because of that visit and other diagnostic codes as well. All of that is part of electronic health records and it's all stored in this huge data warehouse that can then be used to pull information about past visits for a particular patient, but also about other patients who might have had similar condition.

Donovan Maust:

So on this podcast, so Matt and I are interested in data, and part of what we do in talking with guests is talking about and thinking about different kinds of data sources. So thinking about an EHR, to what extent are they similar across institutions? To what extent are the data you can get out similar versus do you have to do harmonization? How do you think about that and reproducibility of EHR work?

Vinod Vydiswaran:

So the electronic health record market has two main players, the Epic system and the Cerner system, but even with that, the joke in the community is every system is a unique system of its own. You can customize it so specifically to a particular institution that if it works in one and the other system is also a similar health system software, it could be a very different implementation. So that's why there have been recent efforts on standardization and interoperability. One of the standards is called FHIR, FHIR for short, that's the Fast Healthcare Interoperability Resource, which is breaking those barriers, those silos where currently information is all within one institution, but if you want to share that information across multiple institutions, you could issue a query that gets processed the same way across different healthcare systems and then they could respond to that, but it has not really reached all parts of electronic health records. You could think that it's very useful to pull out something that is already standardized like a diagnostic code, but it is much less successful in text data, the actual clinical documentation that is written out.

Matt Davis:

So I've heard of Epic, but are there other big names and vendors when it comes to EHRs?

Vinod Vydiswaran:

Yes. So Cerner is another healthcare system software that is the second largest probably. Their VA used to have a different system called the Vista system, but that has now gone away and they are also one of Cerner's, I think, clients, but there may be other systems that are used at much smaller scales too.

Matt Davis:

Are there best practices that enable EHR data from different vendors to communicate efficiently or are we not at that stage yet?

Vinod Vydiswaran:

There are some nascent efforts at the research world. One thing that comes to mind is a common data model, where if an institution wants to be part of a research network, then they agree to transform their data into a common data model that can then reside as a standardized copy of the dataset, and then that can be queried across institutions. So for example, PCORI a few years back, had an initiative where they tried to have multiple research networks all standardize their data, and then if you had a rare patient diagnosis and you wanted to find out other patients across the nation who also had that rare condition, you could issue a query which is processed the same way across all of these institutions to find other places where patients with that rare condition could be identified and then you could create a cohort to study across these multiple institutions.

Matt Davis:

Considering HIPAA and protections, obviously, something really important, when you're talking about EHR data and vendors and design and all that, do you ever get into governance issues? I would assume that the healthcare institution, I don't know if I can say owns the data, but do these vendors often want to use the data for different things or is governance not an issue when you talk about EHR data?

Vinod Vydiswaran:

It's a critical issue. It is absolutely critical, and the reason for that is, as you correctly said, there are privacy protection requirements for institutions when they have patient data. So most of the times, these data are either anonymized or they never leave the confounds of the institution. It's only aggregated data or anonymized data or de-identified data that might be shared across institutions, but most of the work happens within the institution.

There are very recent efforts and some from my group that I am happy to talk about in more detail about the notion of federated learning, where think about if it's a very small institution, a nursing home facility. They may not have the resource to actually build a high-end artificial intelligence enabled system. However, the data that that nursing home facility might have might be critical to build a model that is more generic, more representative of our community. So a federated network solution is one where those data could be shared in a secure way with an enclave, with a secure enclave that consolidates this information, builds a massive AI model, an artificial intelligence model on these data, and the results of it can benefit all of these smaller institutions that may not be able to run it on their own. So there are some efforts about it, but it's still, let's say, not extremely commercialized and standardized approach yet.

Matt Davis:

I have heard about EHRs building these massive predictive – Part of my question about governance was it does sound like they often will use data across institutions to do something from a predictive standpoint, but it's interesting talking about that example that you mentioned about one institution trying to predict something. It does make me wonder to what – I guess it depends on what you're talking about. Maybe you want to have something that's massively generalizable or maybe you want something generalizable just to your institution because all you do is care about your patient mix in terms of predicting some kind of an outcome.

Vinod Vydiswaran:

Absolutely. I think, and you probably need both in the sense that – So typically what happens is if there's a researcher group, they would work with their own data, the dataset that they have access to already, build an extensively tailored model, but then try to see if that model can generalize across data from other institutions and through collaborations as of now and try engagements with multiple other institutions, try to run the same model or a very similar model on another side and learn from that. We have been part of multiple such research endeavors, one in dementia itself where I'm working with researchers at Oregon Health Sciences University to try to implement the same algorithm across two institutions to see how it identifies patients living with dementia here at Michigan and at Oregon Health.

Matt Davis:

So our listeners, we talk about data sources that Donovan mentioned and a couple of the sources we talk about. We talk about surveys and a fair amount of use of healthcare insurance claims. So I guess just broadly speaking, what do you get from an EHR that you don't get from surveys and claims?

Vinod Vydiswaran:

The electronic health record is a good consolidated version of everything clinically relevant. So it has not only the billing codes about what got diagnosed and what got billed, but also about the patient history, their family history, things about what were their prior surgeries and complications. Whether they have had past history of smoking or drinking and so on, that is captured both in the questionnaire format, survey responses as you said, but also through the interactions between the doctor and the patient. It's written out especially in a follow-up visit where you might say that this patient drinks two times a day or drinks, let's say, four to six standard units within a week or something, and that would then be mentioned in the context of that diagnosis, and then whether it affects the follow-up assessment and plan is immaterial. It's still there.

So this richness of interaction between doctor and patient gets captured in text, which is typically not available when you are just working with claims data or prescription data because another example could be when a medication was discussed but it was considered to be either too expensive at this moment or maybe something that they may want to try out later on if the current medication does not work. That information may not be there if you just look at what got prescribed and what was paid out through claims.

Donovan Maust:

So then thinking about that information that's available in the text, I'm assuming how you get at that is maybe through NLP or natural language processing. So in case folks aren't familiar with that, can you explain what that is and, very basically, how does that actually work to say get some of that text data out?

Vinod Vydiswaran:

Yes. So my area of research is in clinical natural language processing. Natural language processing is a computer field, a subfield of artificial intelligence where we are trying to build machine-learned models that will look at text data and try to infer some information that might be already there but hidden in plain text, extract that information, but also use it for downstream clinical decision support tasks. When combined with other information such as X-rays or other image-based results, you can think of it as a broader area of AI working with multimedia, but the natural language processing is a component that works specifically with text data, but it doesn't have to be natural language, it doesn't have to be a cleanly written English, but it is still something that is in words and in text form.

Matt Davis:

You can imagine that it's probably, I assume, relatively straightforward to just pull words, specific words and terms out of a chunk of text, but how much beyond that does NLP extend in the methods that you use?

Vinod Vydiswaran:

The keywords are just looking at words and pulling and just finding out whether a particular document mentions a word is absolutely one form of natural language processing, but the artificial intelligence models have advanced significantly over the last decade and a lot of that has found its way to health text as well. An example of that would be a large language model, where depending on how the structure of the note is, you can start to prioritize which mentions are really diagnostic in nature and which are just mentions for future discussion, for consideration.

You would want to start distinguishing which mention of breast cancer is about the patient having that condition and which ones are about their mother or somebody in the family having that condition. So trying to tease out those different ways in which the same words appear is critical to then know what is the status of a particular patient in the context of the treatment that they're getting.

These models can also now be generative in nature, which means that you can start to infer what is the typical diagnosis of a patient and if there are signs there in the text about early detection of a particular condition that might happen. So for example in dementia, you know that it's a progressively deteriorating condition, but if you can find out early enough that there are risks of mild cognitive impairment or the mild cognitive impairment is severe enough to move into potential dementia, then early detection can help the clinical decision systems as well. So that is where the current NLP is moving towards.

Matt Davis:

I'm just thinking about clinical notes and how often it says something is not present. Is it at all tricky to find things that are versus not present in terms of the language that you use in clinical notes?

Vinod Vydiswaran:

This is a very good example, and I think it happens a lot that the negation and hedging is a common factor and feature of a clinical text, but it is predominantly a solved problem. So this is really a good example of where just a keyword search would not work. You would want to know within the context of the mention of that word whether it is negated or it is given out in a format that is meant to be instructive rather than actually diagnostic. An example that comes to mind is if a patient is about to undergo surgery and they have instructions saying do not drink alcohol, that doesn't mean that the patient drinks alcohol at all that they have been asked not to drink alcohol. It's a very standard instruction that is given out no matter whether the patient drinks alcohol at all. So that would be a place where, in the notes, the only place alcohol might be mentioned for this patient is in this instruction saying do not drink alcohol before coming for surgery or something, and recognizing that and recognizing that those mentions are almost like empty mentions, they are not relevant for diagnosing whether the patient has alcohol risk or not is a critical task within the machine learned system.

Matt Davis:

So thinking back to mentioning when you go to a doctor now and they have to have a screen up, I'm often trying to look at what they're doing on the screen as they check through things. My impression though is that some of these systems are designed in a way so that the user might check some boxes, and from that, there might be some form of automatic text generated. I'm just curious. I could imagine that if there is automatic text generated from a series of selections that that might potentially be a problem when you try to actually analyze the text. Does that at all come up at all?

Vinod Vydiswaran:

It comes up a lot and, actually, it's not a problem. It's actually something that is a very easy task for a natural language pressing system. Because it is so templatized, which means that they use a standard wording all the time, the differences in where it is between one mention and the other is really where the information lies. So one of the biggest thing about natural language processing is to take away all of the commonalities and really focus on things that are specific to a patient. Having a templatized text helps you really zoom into what was different.

So if it is something that's a checkbox and it gets represented as a text form of the checkbox, you know that one of the checkboxes is marked present and others are not, and just by looking at differences between two different notes, you can see exactly which one was highlighted in which one was not. So that's actually very good.

The hardest is when it is in a completely descriptive form, which is free flowing and then you need to try to infer what was being mentioned. Sometimes it refers to something that's already mentioned up there that the doctor is able to see when they're typing it in but the machine wouldn't do. So that might become the hardest ones for NLP models to start to decipher, but even there, they are being very successful now because they can look at all of the past notes. So a lot of NLP depends on whether they have been mentioned before and it's very rare that something of clinical importance is never mentioned in text. So looking at text helps you really get a much complete picture of the patient's status as compared to just looking at their lab reports or X-ray reports and so on.

Donovan Maust:

Just as a follow-up in addition to the templates, I'm thinking about for a given patient who's relatively clinically stable across different time points, their notes are going to look very similar. So that same trick of you're just looking for what's different for a patient from time to time is where the money is, I guess.

Vinod Vydiswaran:

Exactly, yes. So there is lot of copy/paste behavior in clinical documentation. People would just copy from the previous one and then edit it because they don't have to type it all together, but that's exactly-

Donovan Maust:

That is true. I will confirm that.

Vinod Vydiswaran:

Exactly. In fact, now that we know that that is a common phenomenon, you can train machines to find near duplicate text. So all of the near duplicate text where really it's a copy from the previous one can just be ignored and you then completely zoom into what got changed, and then there's a reason behind it. So maybe a medication was removed and that means that this medication is no longer active and should not be included in the follow-up task or something, and that would be information that you can gather based on that.

Donovan Maust:

Actually, to be honest as a clinician, I wish that I could do that easily because if you're trying to learn about a patient, you're looking at their notes from other clinical services, you're trying to figure out what was changed when to filter in to where a note changed from visit to visit would actually be tremendously helpful, but anyways. So you've told us, so there are two big EHR companies, but, really, each organization or institution can have its own build of the data. Then in talking about how NLP is done and how those data are extracted, I could imagine potentially that that's different from researcher to researcher even or it might be different from EHR to EHR. So how does this field think about the, say, the reproducibility of the work that's done when it seems like it could be so project-specific in terms of what's done?

Vinod Vydiswaran:

That's a very good question, and I think if you follow the paradigm of see where the money goes, you can see how the support, the grant support from our government agencies have moved from an institutional focus to look at building new algorithms to a multi-institutional focus to see how do these algorithms adapt across multiple institutions. So yes, you have to validate those models on your institution first. A lot of these models do get built for a specific need, but then the question of generalization comes up because you need to make those also reproducible elsewhere.

That is where a lot of these standardization on computable phenotypes comes in, which is a set of instructions that you can give where you have decided on a particular logic to identify a patient and then see whether that logic matches and can be used as is in another institution to then follow and identify patients that match that criteria in this other institution.

This clearly, as of now, requires multi-institutional collaborations and we have been part of multiple such research collaborations where an algorithm might have been defined in another institution that we test it out at Michigan Medicine or something that we develop at Michigan Medicine that get tested out elsewhere. Through that, it also helps increase the usability of these algorithms beyond just one institution.

A lot of the AI models are getting fairly complex now. So there is a risk of how can you share those models across institutions without really leaking any private information on which these models are trained. So there is still a little bit of barrier that AI researchers are working through before they can completely make that publicly available, but through collaborations and having these data agreements where you can still run a model that is trained on one institution, other institution can still help at least identify the generalizability of these models.

Matt Davis:

Just curious, the algorithms of predictive models you've been talking about, do they have a cost for a healthcare institution?

Vinod Vydiswaran:

Yes, I would say. It really depends on whether those are used in clinical care or are they used primarily for health services research. So if it is used in clinical care, the typical way it comes up is through alerts when a patient is reaching a threshold where some action needs to be taken. That implementation happens in a clinical setting. An example is sepsis prediction, for example, which is an AI driven model looking at all of this information, but then an alert gets generated that if the model is not very accurate, then it might either generate too many alerts, which would mean that whenever the alert is generated is not significant and people might not take the action that they were supposed to take or too few alerts where then they might be missing some critical information. So getting the balance right between when the alerts need to be given out and how accurate they are when they're given out is an important part of any algorithm that is getting trained.

Matt Davis:

So we know that your team has spent some time looking at and reviewing some of the existing methods specifically around identification of dementia and cognitive issues in EHRs. I'm just curious what you found and just about the state of the science.

Vinod Vydiswaran:

So one of the projects I would want to focus here and highlight is this multi-institutional effort that we undertook to validate algorithms, compatible phenotypes, across multiple institutions. So a compatible phenotype is one where it's a set of logical instructions that look at different aspects of a patient's health record and infer certain characteristics about it. So in our context, let's say we need to identify everyone who is diagnosed with dementia. It appears to be a very simple task, but there are different algorithms, different logical steps that have been proposed to really identify who has dementia.

Some look at one years of data. Some look at three years of data. Of course, those who are looking at three years of data might get more information and hence might identify more patients than what an algorithm which is only looking at one years of data might identify. Some look for medications in addition to just diagnostic codes. Some ask for more than one diagnosis within six months. So all of these are variations of the basic idea we would think of just using diagnostic codes.

So there are multiple such algorithms that have all been published in literature, in biomedical literature. Then what we set out to do was to identify, really, how different these are with one another. If you were to implement them all in our healthcare system, how many would we identify and how many of those would be common across all of these systems? Which ones are more relevant for identifying patients living with dementia and which ones are just identifying people who are getting old? There is a lot of overlap between these confounding factors, and it is important to tease out the one that is actually precise in identifying people living with dementia and not just identifying geriatric conditions.

So long story short, we have built a system that then we have evaluated our patients and patients at Oregon Health Sciences University against already proposed and already published algorithms to then identify which ones are most closely aligned. One interesting thing we found there was, actually, there's no good answer. There's no one correct answer because the only way we can really know if a patient has dementia or not is by asking clinicians to look at the chart and based on just the chart review, independently assign whether the patient has dementia, which means there is a human factor.

When we looked at our attending physicians and asked them to dig that judgment, one, they did not completely agree with each other all the time, but that said, once they do agree, the algorithms don't agree with them one-thirds of the time. So there's only 67% overlap between who are called as dementia patients and who these algorithms assigned as dementia patients, which means we have 33% error in if we just use the algorithms because they are going to identify more patients than what our physicians are comfortable calling somebody as person living with dementia.

So that was an interesting finding and I thought it was important for us to report that because a lot of institutions will blindly implement one such compatible phenotype algorithm. So first of all, one such algorithm when there are seven available, so we need to know which one is better, but even then might actually identify more patients than what their own attending physicians might comfortably say as people living with dementia.

Donovan Maust:

So in the work that you just described where there was a third additional identified by the algorithm, was that because had you selected the one best – I think you mentioned that there were seven different algorithms you used, so that third came from applying just one of them or all seven of them or how did you do that?

Vinod Vydiswaran:

So we evaluated this based on individual algorithm by algorithm. So we could then use something like a union of these algorithms to be our main criteria by which we say, "Is there one of the algorithms that would say yes to this patient?" and then did a selection criteria based on that. So if our pool has 50% of the pool is identified by one of these algorithms as a person living with dementia and the remaining 50% is none of the algorithms say this person is living with dementia, we have a group of now 100 patients, let's say, and all of the 100 are given to our physicians to make an independent judgment of them. They come back and say 33% of this group is actually the one that has dementia. That means that we have the 17% remaining that an algorithm said yes but physicians said no.

Donovan Maust:

I'm curious, did the physicians revisit any of those charts? Was there any particular theme or did they say, "Oh, okay, maybe yes, this person really does"? What was going on there?

Vinod Vydiswaran:

So we looked at it and we said, "So somebody diagnosed them as dementia." That's why the algorithm pointed it out and brought it out. So what happened? So when we looked at the error cases, what we found was dementia is misdiagnosed for multiple reasons. One, it could be that there was just not enough information for our physicians to comfortably say that this person has dementia. A lot of physicians rely on tests for cognitive mental acuity and so on to really diagnose a person has dementia. They might have mild cognitive impairment, but they'll not make that leap into calling somebody and diagnosing somebody as dementia unless you have something specifically in the lab reports. So that was missing and hence they would not be comfortable saying that.

Another example was a lot of the mentions were really about delirium, and so the physicians would not be comfortable calling it dementia, whereas the codes that were assigned were then making the algorithm believe that they are dementia patients. So those are the two largest classes of mistakes, but sometimes it was ... So NLP models also might make mistakes by when the discussion is about dementia in the context of watch out for dementia or something like that or evaluate for dementia in the next visit or something would then can attribute it to somebody having a potential for dementia, and that is not right because maybe the test that was conducted in the next visit came out negative.

Matt Davis:

So you mentioned some of the different characteristics of the different algorithms that you compared and looked at. It feels like the time component. I was just thinking about that. It doesn't seem very fair to compare algorithms of different windows of observation in terms of detecting an outcome, but it does make me wonder what is the right window if you observe somebody with their EHR data because not everybody has a visit every year per se.

Vinod Vydiswaran:

That is a problem, and that is a feature of our healthcare system in the sense that how often do we follow up with care and how often do these particular patients have their records in the healthcare system. Again, we are assuming that all of it is at one institution, that they come back to the same place for their follow-up visit as well. They might actually go somewhere else out of the system and suddenly we have lost information about them too. So that is an important feature to note, but there is no good answer.

Typically, most of the more reliable ways, look at two to three years of data, of past data to make an inference, but also rely on more than one diagnosis within that period. So just not rely on one diagnosis and say that that is enough, but looking at two of them within a particular timeframe of two years apart or something within a three-year window would be their way of doing it.

These are knobs that can be changed. So it also depends on why you want to identify patients living with dementia. So what is the follow-up task? Because sometimes you are looking at a study and identifying a cohort of interest where you want to identify people who have not been diagnosed with dementia but have all of the factors related to it. So they have unrecognized dementia. That would be a very different way of than identifying them. Some you want to identify dementia patients who also have something else. So it doesn't really matter how you cleanly define patient living with dementia because the second one might be more restrictive of a filter.

So the short answer to your question is it depends on what you want to use it for because it's most likely not just to identify patients living may with dementia, but whether they are taking some special services or whether they need some special services and they should be recommended some more tests or something like that, and that becomes an important factor.

Matt Davis:

Was there anything specific that surprised you?

Vinod Vydiswaran:

One of the things that surprised us really is independent of the dementia care itself. It is really about the informatics behind it. What I mean by that is when we are trying to study and report our performance of these measures across multiple institutions, a lot of people do not take care in how the study was designed and how that affects the measures that get reported. So when we started seeing this, we realized that is actually a more fundamental problem in the way people report their results. If it's a very skewed distribution, if it's a dataset of patients living with dementia who all have some specialized services that they undertake, the population prevalence of dementia in that population is going to be very, very high as compared to if it is in a general tertiary care outpatient clinic.

So that prevalence affects the specificity and sensitivity values that get reported based on that cohort, but when a new algorithm is defined and it is tested on an unusually large prevalent cohort, you would get unusually large numbers, but that is not a product of just the algorithm being good. It may be good, but it is also a factor of the cohort on which it was studied. So that is a factor that should be incorporated when the results are either published or when they are taken up by another institution when trying to decide which algorithm is most suitable for them.

So this effort that we took to take these six, seven algorithms and implement them independently as a third party valuation and then tested it out on another site was to highlight both of these factors. I think that is just maybe a public service announcement that we are doing at this time, say that be very careful about how you report it and the cohort that you reported on because that's an important characteristic.

Matt Davis:

So thinking about using data extract from EHR, it seems like one of the strengths is you can get stuff, you can get some contextual factors potentially that you can't get from claims specifically. I was wondering if – I know that your work extends into some other things as well that you looked at identifying things like caregiving, and I know other teams are thinking about social determinants of health relevant to dementia. I'm just wondering if you could talk about that a little bit and where things are at.

Vinod Vydiswaran:

Yes, and both of these are examples of factors that might become relevant in a clinical decision support system later on that are not explicitly mentioned in the structured data. What I mean by structured data is something that comes out of a questionnaire or something where a form is filled out, but it might actually be in the free text form or the unstructured form of the dataset. An example is caregiving.

So in dementia care, sometimes the way caregivers are mentioned is just in text saying, "This patient came with their spouse who also helps with the care at home," and that is the only mention or it could be like, "The patient came with the daughter who was helping answer the patient's questions during the visit," and so on. So that would be an expression that highlights who the caregiving partner is in this particular scenario. There is no official form fill that our patients living with dementia fill out saying, "Yes, my caregiver is so-and-so." So you need to rely on these textual patterns, these typical free flowing English language to determine who the actual caregiver is.

Sometimes they mention it very explicitly and sometimes they don't say them as caregiver. They're not formally caregiving partners. They might just be the one who is taking a lead on that. So that's an example of something there where NLP can provide value addition to the existing electronic health record systems because we can identify these patterns and at least highlight them so that they can be validated in the next visit to just make sure that it is more formalized as a relationship.

Donovan Maust:

Is there anything you're working on these days that you're particularly excited about?

Vinod Vydiswaran:

Yes, I am excited to see how we can bring the power of artificial intelligence to underserved populations, to populations and settings where the low resourced settings and so on. I touched upon it earlier in my response talking about federated learning, and I think that is an interesting take on trying to see how we can learn from data that is hidden in silos, in small silos maybe across a lot of care settings where you can learn and build a model that can generalize across all of them, and then have a way in which the power of AI can then benefit these patients identify specific characteristics about their cohort and benefit those local communities as well.

So we are trying to simulate a setting now to see how well these deep neural network models can be trained in that setting, and that is a precursor to actually then a larger grant where we could collaborate with either a healthcare system that is distributed into small nursing home facilities or some way in which you have multiple institutions that might be able to contribute data to a common resource to help train a much larger model that is generalized enough to serve the larger population.

Donovan Maust:

That sounds awesome. Hopefully, you can also harness the power of AI to solve all of the data use issues that I'm sure that work entails.

Vinod Vydiswaran:

Exactly. It's fascinating. I just love working in this field and trying to see how AI can help healthcare do healthcare better.

Matt Davis:

I imagine there is a ton of interest in this area, so your work is very timely. Looking forward to seeing what you do next. Vinod, thanks so much for joining us, and thanks to all of you who listened in.

If you enjoyed our discussion today, please consider subscribing to our podcast. Other episodes can be found on Apple Podcasts, Spotify, and SoundCloud, as well as directly from us at capra.med.umich.edu, where a full transcript of this episode is also available. On our website, you'll also find links to our seminar series and data products we've created for dementia research. Music and engineering for this podcast was provided by Dan Langa. More information available at www.danlanga.com.

Minding Memory is part of the Michigan Medicine Podcast Network. Find more shows at uofmhealth.org/podcast. Support for this podcast comes from the National Institute on Aging at the National Institutes of Health, as well as the Institute for Healthcare Policy and Innovation at the University of Michigan. The views expressed in this podcast do not necessarily represent the views of the NIH or the University of Michigan. Thanks for joining us, and we'll be back soon.

More resources

Transcript

Global Footer Secondary Navigation