Audio Transcript: Demystifying Big Data and Machine Learning
Demystifying Big Data and Machine Learning
Podcast Interview with Prashant Natarajan (co-author)
July 28, 2017
Interviewers: Eric Brown (EB) and Marc Russo (MR)
Interviewees: Prashant Natarajan (PN) and Marc Perlman (MP)
EB: Hello everyone and thank you for joining us today. This is our HIMSS NY first ever podcast where we’re going to be speaking with one of the authors of Demystifying Big Data and Machine Learning. Prashant Natarajan is with us and we’re going to into a number of questions about the book as well as learn about him as a person and his background. Marc, would you like to say anything before we get started with the questions?
MR: Prashant, thank you very much for taking this time. My name is Marc Russo. I’m here with Eric at Deloitte and we also work together as part of the HIMSS NY chapter on membership engagement and we’ve been looking at this book recently as part of our book club and we’re really excited to speak with you today.
PN: Thank you Marc. It’s absolutely my pleasure to be here to talk to the membership of the HIMSS NY chapter and it’s exciting to hear that we are the first book and the focus of your first ever podcast. Thank you so much for the opportunity.
EB: Thank you so much. Thank you Marc Perlman for making this happen. Marc Perlman is also at Deloitte as well and he contributed to the book. Marc, do you want to say a few words?
MP: Thank you very much for allowing us to be on today. This is a very important book for the industry. I’m glad Prashant could join us also.
EB: Thanks so much. This is all exciting stuff. Why don’t you tell us a little about your background and why you got into the healthcare field for our members of the chapter?
PN: I have an undergraduate degree in Chemical Engineering and a graduate degree in technical communications and linguistics so I think I have some food for thought for both sides of my brain. I’m currently of Product Director at Oracle Health Sciences based out of Pleasanton CA, where I am responsible for a portfolio of health informatics products and solutions covering secondary use of data, integrated big data analytics, AI, position medicine and population health. I’m also the Chair of the 2018 Innovation Conference and Showcase for HIMSS Northern California, which includes Silicon Valley. I have about 15 years of experience as an emerging technologies strategist, product management, and export consultant, with experience in strategy planning and implementation, market analysis and research, and serving as a trusted advisor of healthcare providers, payers, and life sciences organizations. I’d love to say that it was my dream ever since I was 4 years old and it was something I was laser focused on doing, but it’s a little more boring than that. I actually started in B2B Ecommerce and while it was pretty exciting, I wanted to do something a little more challenging and work in an area where I could bring value not just to my users but potentially to myself and healthcare was an obvious choice as all of us need healthcare. The emerging technologies that we are developing can only benefit us and our families. So that is really how I got into healthcare.
EB: Quite an impressive background. And I know a lot of our listeners are still in college and still very interested in hearing about your career path. For those who are interested in B2B environments, it’s another place that they can consider as well. The next question we had is around the buzzword of “big data”. A lot of buzzwords tend to lack substance. How do you convey its importance and its context in healthcare, especially for those of our members who are quite frankly are afraid of big data and for those who lack technical understanding?
PN: As we have written at length in Chapter 2, there are a few foundational principles that we need to deal with. The point you make is absolutely valid. There are a lot of buzzwords. And unfortunately there are a lot of snake eyes. It is important to recognize what big data means and what it doesn’t mean. So the foundational principles are the following:
Our industry determines what characteristics big data mean to end users. In other words, what qualifies as big data in Ecommerce is not necessarily automatically the same qualifier for healthcare and life sciences. Your average e-commerce company (Facebook, Google) gather more web and user logs and manage much more in volume in terms of data than your average healthcare provider on a daily basis. So what that means is that volume is not necessarily deterministic in healthcare. There are other things in healthcare that matter more than volume and those are the things that we talk about when we refer to the 3+2 Vs of big data and also make the point that not all the Vs are equal. I’ll stop there for a minute before I go on to the other foundational principles.
EB: I’m very lucky to have this call with you today because I did not get it as I get it now when I was reading the book. Putting it into perspective when you compare it to other industries, it’s absolutely valid.
PN: Moving on, the other foundational principles are research for this book, which lasted more than 2 years and included interviews of over 100 people, both inside and outside healthcare, showing where IT matters much more than volume in healthcare. This doesn’t just mean unstructured data. This means using data to reach new use cases and new audiences who have not been reached before and to create new value for patients, providers, and organizations that have not been done before. So when we look at IT in big data and take a look at the all-encompassing definition, it becomes pretty clear that the challenge in healthcare is not of volume, but bringing new and novel sources to accomplish a diversity in use cases and addressing more users than before. Voracity in healthcare data is extremely important because of two things: one is a large of healthcare data goes into patient care wellness and what that means is the quality of the data or as we talk, the fidelity of the data is extremely important. Voracity is important whether the data is little data or structured data, discrete data, or big data because the uses in which the data is used are still going to be patient care or financial wellness. So not having the right data in place could cause harm to the patient or the financial wellbeing of the organization, which we want to avoid. And the last point is that data fidelity is much more important than “one size fits all" quality. We talk about the passionate debates that take place where one side of the house which says that data quality is important and we shouldn’t be doing other things with secondary use. And we have the other side of the spectrum who says that data quality actually doesn’t matter. We shouldn’t be spending time on data quality. I think that this is a false dichotomy and it’s a false conversation in some ways because I think what it does is that it allows people to dig into their own positions, and while it allows for entertaining discussions, it does not help any healthcare organization make any decision at all. Data quality will also be a challenge. Data creation, unless it’s coming out of a machine directly, is going to be human created in some instances and when humans are involved, there are going to be mistakes. So trying to go after this ideal situation of data quality is not going to help us. In some cases, what may not be acceptable in a clinical setting may be perfectly acceptable in an operational analytics setting or vice versa. So understanding that and being able to build that and acknowledging that data fidelity is the driver for big data is much more important than data quality.
EB: That’s a great point. That was something that I highlighted that I learned from the book around data fidelity.
MR: I wanted to build off that and the 5 Vs. I was wondering if you could walk us through your approach to how organizations can leverage big data and data analytics to create shared value. Are there smart first steps? How do they measure and define those success stories?
PN: We have an entire chapter dedicated to this and the reason we have an entire chapter for this is because we thought there was a lot of misconceptions in the space. The question really is forget those misconceptions, of course we want to debunk them. So, a few things to consider in order to get to success is first, recognizing the fact that the McNamara fallacy doesn’t apply in healthcare. There has always been this fallacy that not everything in healthcare can be measure accurately. We posit that this is because there has not been enough data. It’s time for us to end the fallacy and move on. The second thing that we talk about is some of the lessons that determine purpose and scope, creating a new opportunity for the providers and patients, using more data variety or volume apply in several instances. At the same time, we also need to be conscious that big data is not a one size fits all solution. We shouldn’t be afraid to say no to the latest big data technology because not every use case requires big data management. Determining the benefits of the new technology compared to the cost of deployment, maintenance, and migration is a big condition to investing a scale across any enterprise. Another point that we make in the chapter is that executive sponsorship is extremely important. Change starts from the top and one of the consequences of the data-fication is the need of a Chief Data Officer or a Chief Analytics Officer, responsible for data across the enterprise, which is going to be as important as any other technical factor or use case.
MR: Marc, I wanted to see if you had anything to add to those points.
MP: I think Prashant described from the very pragmatic point of view and one of the key points is that one size doesn’t fit all and that data for one industry isn’t for others. One of the takeaways was talking about one of the health plans and it was really about transforming their business. As people transform their business, it’s about strategically reviewing and doing interesting things. We’re getting to a point where there are regulatory reforms and the way that healthcare is delivered is changing. On one hand, health plans can look at it from a population and wellness perspective while on the other hand, they can look at it from underwriting and delivering more efficiently. We’re going from a time of what I would call “break-fix” medicine, where you’re broke and we’ll fix you to a time where we are really trying to manage people per time to a system of health. These types of technologies are going to be absolutely vital and this book is very valuable to the industry.
MR: I want to skip to one of my future questions. How do we leverage existing analytics investments and go from analytics 1.0 to 2.0 and then 3.0 and so on?
MP: I think the most important thing to do is to deal with the end in sight. It’s surely not a deep technologist who talks about what is the right appliance to use. I think we have to really break down the silos and really look at the inoperable platform and we have to be able to have a secure and accurate way of patient matching and across various, disparate systems. We have to start looking at a systematic approach. The data may not be residing within one sole organization. The question is how do we make organizations much more proactive and that’s going to be inoperability and technologies with predictive capabilities to make a difference and build the cause.
EB: I think those are some great points that have been made. Going back to the script that we wanted to outline, we wanted to talk about the risks associated with deep learning itself and predictive analytics and how do we alleviate some of the concerns around privacy, specifically in the provider space. What are some of the risks with overly automating processes?
PN: So, I think going back to a point made earlier, but I don’t think that this risk is particular to deep learning or predictive analytics. I think that the major risk we face now is how we democratize big data, machine learning, and AI and move it, as any emerging technology should, from the places where it is practiced where a lot of expertise is required and move it to professionals today with tools, training and support mechanisms that allow them to incorporate this in their daily jobs. I don’t think that the answer lies in educating enough data scientists because we’ll never have enough to manage the data in the world. The biggest risk, in my opinion, is that healthcare is not able to take advantage of it because we don’t have the right skillsets available across organizations, regardless of size. I think that the way we democratize AI and machine learning is to allow people who are business analysts, technical analysts, architects, and executors today do the language and mathematica to a level of plain English where they can understand and more importantly, use it and leverage it for their daily work. We often hear the democratization of AI, which is meaningless because unless you can bring it down to your own colleagues, it will never apply to your average Joe and Jane. The data science risks, in my opinion, are some classic ones. The primary one being that correlation is not causation. Understanding the difference between correlation and causation is still important. It doesn’t come down to the fact that people don’t understand the meaning of the terms. It’s just that when you’re taking a look at the insights, if you really don’t understand the drivers, including the biases of the humans and machines, it’s very likely that people mistake the two and make decisions that may not necessarily be accurate. The other big risk is that folks become enamored by accuracy or performance. Because they get say 90% performance in an algorithm, they want to go with it. But there could be a problem with the algorithm, so it’s important to try different algorithms with the same data set. At the same time, you should always treat your data with suspicion. If the results seem too good to be true, chances are that they are. And the other thing is that when we focus on data fidelity, we must also ensure the quality of the training data set. Two other topics related to this is human feedback loops. In healthcare, human feedback loops are extremely important. Downstream users can provide feedback on when algorithms get things right and wrong and we should be focusing on how are capturing this feedback from humans so that they can bring it back to the training that happens. This is great for false positives but can miss false negatives so you’ll want to pay special attention to false negatives as you train and use this experience to find missed results in production data and review. In healthcare specifically, without human feedback loops, especially from clinicians or executors, things will be very challenging to move forward. Finally, also related to the same thing, is healthcare doesn’t trust black boxes. Black boxes work acceptably so if you’re trying to recommend the next DVD you’re trying to watch or the next product you’re trying to purchase. Black box doesn’t work so well when you’re trying to tell the physician to operate now without considering the comorbidity.
EB: Good point, we definitely don’t want doctors to prescribe medicine based on a black boxes without understanding those comorbidities. Just to build on a point in your response to the previous question around data quality. Data quality can be challenging with little data, but it’s our key to voracity. So, how do you propose we manage data quality with vast, fast, and varied big data?
PN: The answer to that is very clearly data fidelity and in order to do that, we need to examine data quality for what it is today. Having spent a few years of my life building systems to validate and manage data quality and more importantly, looking at how that changes business successes and ROI, I would be the last person to say that data quality doesn’t matter. At the same time, I think it’s high time for us to move past this one size fits all data quality for little data where you know all of the allowable values for every single attribute and what you need to do to get things there. So really, the conversation has to move towards data fidelity, which is how appropriate is the data in the context of its use. The three key words are appropriate (so we are not asking for perfection), use (in other words, we don’t care about the quality until the data is ready to be used so we are not going to be spending time and effort on all the data that are coming in), and context (the same piece of data or the same analytic can be interpreted by two different users or user communities in two dramatically different ways as we often see in healthcare. In this case, we have to acknowledge the same as well).
EB: I think that you made some great points such as the context and keeping that in mind. For the next question, I wanted to ask you what are AI contextual intelligence agents and how are they related, if at all, to data and machine learning?
PN: We are in the nascent stages of big data, machine learning, and AI and I don’t see these as separate; rather, they are interconnected. It’s about big and little data, bringing them together, using each, and getting more value out of them than we do today. So, data is the lifeblood of the pipeline, so to speak. Machine learning is a data intensive method that uses learning algorithms in order to essentially generalize things and also in order to predict and use as a basis for predictive analytics. One of the things that my co-author and I did was to look extensively at industries that are in a situation where there is a huge amount of machine learning happening. The key thing that we saw over and over again was that it’s not really about big data or data intensive methods. The most important thing is context and we write in Chapter 7 that machine learning and big data and little data will be actualized in their potential for healthcare when these contextual intelligence agents are put to use. So what are contextual intelligence agents? It’s basically an AI system that can interact directly with humans by spoken or written communication and that can understand context to identify what is important in a given situation. These CIAs are the first and crucial stop on the journey to AI. So the question is why is context so important? Well, most machine learning algorithms have been created to answer a specific question (i.e. How can I compare two patients? Is this a heart failure patient? When will this patient convert from pre-diabetes to full-blown diabetes?). But for any given situation, there are many questions that can be reasonably asked and then answered. But intelligence in a machine learning context and an AI context depends in part on the ability to know which question to ask at which appropriate time based on the context of the situation, who’s asking the question, what has changed, where is the question being asked, why is the question being asked, and so on and so forth. So these systems that are able to sufficiently be aware of context and most importantly, answer the questions, we call CIAs.
EB: That’s very interesting. I remember taking a pause when I was reading this Chapter 7 to post on the LinkedIn group about this. I really love the quote that you have on here from Pablo Picasso: “Computers are useless, they can only give you answers”.
PN: It was a bit of an interesting thing whether that quote was real or not. I did some investigation and it is a real quote. I think we can finally show Picasso only after about 60 years that computers are useful after all.
EB: I think that is all that we had. Is there anything else that I didn’t ask you today that you would like to share with our listeners?
PN: I think we covered a lot of ground here today. I don’t think that there are any specific points in addition to the ones we discussed here as key salient points for people to think about. I think we hit upon several ones. Of course, I will encourage folks to read the book and specifically the case studies because it’s easy to opine. As an author, by nature, we opine, but what’s more important for healthcare is to bring real life case studies that have been validated, that are in production, that have shown success to demonstrate the points that we make. So we have 8 case studies, so I would encourage all of our readers to read those case studies and relate them back to the previous chapters.
EB: Absolutely, that’s something that we all looked forward to and it’s one of the reasons we selected the book. I definitely agree with you 100% there.
PN: I also want to take this opportunity to thank HIMSS NY, to the Deloitte team, to my employer Oracle for letting me participate. And to my friend and mentor, Marc Perlman, who it is always a pleasure to collaborate with.
EB: I also want to thank you guys for taking the time to do this today. Marc, for making this happen. Prashant for taking the time out of your busy day. And Marc Russo as well. This is really exciting and everyone who’s a member will be so thrilled when they see this publication posted.
PN: Thank you so much Eric and Marc. I really appreciate the fact that you took the time to read the book. Thank you for your kind feedback and if there’s any way that I can serve your membership, I’d be happy to do so.
EB: Definitely, will keep in touch and let’s keep the conversation going. This has been a great treat for us. Thank you.