In the ever-evolving landscape of healthcare, data-driven insights are a cornerstone for improved patient care, drug safety, and research. To harness the power of public cloud computing and the variety of advanced machine learning services across the different cloud platforms, we set out to build a multi-cloud AI solution. The application we developed builds on multi-cloud application patterns, together with networking solutions from Aviatrix, and is available to all industries, not just the healthcare.
If you have a specific use case you would like to discuss, contact us through LinkedIn or our website: https://skypurple.cloud
Our mission: Unlock valuable insights from the publicly available healthcare reports and data using AI tools and services provided by the three main public cloud platforms.
In this post, we'll describe our deployments, exploring the architecture, tools, and services we employed across multiple cloud platforms, including AWS, Google Cloud Platform (GCP), and Microsoft Azure.
The multi-cloud secret sauce? Aviatrix, the multi-cloud networking solution that seamlessly connects these cloud environments.
By the end of this post, you'll hopefully understand how the solution utilises multiple cloud providers and be equipped to embark on your own multi-cloud endeavours. The solutions we’ve used for this proof of concept are just scratching the surface of what’s possible.
Hopefully, we'll inspire you to "go build" your own!
Please feel free to get in touch for deeper insights and explanations of how we put the application components together – there are a few design components not included on the diagram!
Figure 1: Multi-Cloud AI Application Architecture - Full Solution Overview
Setting Up Multi-Cloud Networking with Aviatrix: Secure Data Transmission Across Clouds
Our first step was to establish a robust, secure, and agile multi-cloud core network transit. Aviatrix proved to be the ideal choice for this task.
Why Aviatrix?
Security, Security & Security: Aviatrix offers end-to-end encryption, ensuring the highest level of data security between cloud platforms.
Visibility: With its comprehensive visibility into network traffic, Aviatrix makes monitoring and troubleshooting a breeze.
Automation: Aviatrix integrates seamlessly with Terraform, our infrastructure-as-code tool of choice, allowing for effortless deployment and scaling. With a single process we deployed network infrastructure across multiple cloud providers and then layered in the Aviatrix solution.
Figure 2. Aviatrix Multi-Cloud Core Transit Design Pattern
With the added benefit of a single platform for all data transfer between clouds, our Aviatrix network architecture eliminated any connectivity incompatibilities between the cloud providers. More importantly, for Industries with a lot of compliance and regulatory requirements, we get end-to-end visibility across the entire infrastructure with Aviatrix CoPilot; not to mention, we were able to take advantage of the new Aviatrix Distributed Cloud Firewall (DCF) features to protect the data without routing to a complex (& expensive) firewall cluster!
The entire multi-cloud network infrastructure, together with subnets, security, and the Aviatrix gateway configurations, was deployed with Terraform. Aviatrix and the three public cloud platforms are a Terraform providers, making it easy to automate the entire deployment.
GitHub link to Aviatrix Terraform provider:
Data Gathering and Preparation: From FAERS to Insights The main purpose of our solution lies in the analysis of publicly available healthcare data, with many valuable resources available, it’s straightforward to collect and analyse drug interaction & drug data. Before diving into the analysis, we needed to prepare the data.
Data Gathering: We collected FAERS data in ASCII and XML formats, downloaded directly from the FDA, and stored this in Amazon S3, using intelligent tiering for better cost efficiency.
Data Transformation and preparation: ETL Magic
For data preparation, especially when working with textual data like the FAERS data, you can use a combination of AWS services and custom data processing pipelines. Here are some of the cloud services and tools we used during our data preparation:
Amazon S3: Amazon S3 is a scalable object storage service that we used to store FAERS data files in ASCII or XML format. Organize your data in S3 buckets and folders for easy access and management. Use intelligent-tiering for a lower cost profile.
AWS Data Pipeline: AWS Data Pipeline offers a robust solution for data preparation, allowing businesses to streamline the process of transforming and moving data stored in Amazon S3. With AWS Data Pipeline, users can create automated workflows to extract, transform, and load (ETL) data, making it ready for analysis and reporting. This service supports a variety of data transformation tasks, including those that involve AWS EMR, AWS Glue, or custom scripts using AWS Lambda.
Data Pipeline also provides scheduling options, monitoring capabilities through AWS CloudWatch, error handling, and reliable data movement between different storage locations. The AWS service simplifies data preparation, making it an essential tool for organizations looking to harness the power of their data, allowing you to cleanse, structure, or enrich your data. By leveraging this service, businesses can accelerate their data analytics initiatives and gain actionable insights from their data. Data pipeline was excluded from the solution diagram to better emphasis the core AI components and the multi-cloud network design. Contact us if you want investigate the configuration in greater depth.
AWS Lambda: AWS Lambda is a serverless compute service that can be used to run event-invoked processes and code to call ML Services on all cloud platforms with simple API Calls. The Lambda processes can be included in application pipelines, and Step Functions, to parse the data and orchestrate the data flow to the ML services for downstream consumption and analysis.
Combining Healthcare NLP Services from AWS, GCP and Azure
With a simple UI for viewing the output (which is containerised and hosted on Kubernetes – GKE, AKS and/or EKS), the solution also allows us to manually upload new documents for analysis. A trigger pushes data/documents into an SQS queue, to throttle and streamline the process of ingesting files to the initial Lambda function, which then validates the incoming documents to ensure they conform to acceptable formats. The document metadata is then analysed to determine the size of the document, allowing sizing controls to be applied to the pipeline. The diagram above (Figure 1), shows only the EKS cluster on AWS but containers could be easily deployed and orchestrated across GKE and AKS using our Kubernetes tool of choice, ArgoCD.
Once the initial document checks are passed, the request is pushed to a Step Function which takes over the data processing. This is where the parallel processing starts and data is sent to the additional cloud platforms, GCP and Azure, via the Aviatrix core transit network, with AWS traffic traversing local Aviatrix gateways for added visibility. All data is sent to the various APIs and cloud services using Private endpoints (e.g. Private Link). This provides internal, private IPv4 IP addresses and enables routing of the network traffic across the Aviatrix core transit, without any data targeted to public APIs or services. With this design pattern we also enabled end-to-end troubleshooting, visibility, and inspection capabilities by integrating with Aviatrix CoPilot.
Amazon Comprehend Medical: Amazon Comprehend Medical is a specialized NLP service designed specifically for healthcare and life sciences use cases. It is tailored to extract medical information and insights from unstructured medical text, including clinical notes, medical records, and research articles. It can identify medical conditions, medications, treatment plans, dosage information, and more.
We used Comprehend Medical for tasks like language detection, sentiment analysis, and entity recognition on the FAERS textual data.
Google Cloud Healthcare: Healthcare Natural Language API is a natural language processing (NLP) service tailored for healthcare and life sciences. It helps extract insights from medical and clinical text, such as electronic health records (EHRs) and medical literature.
Azure Cognitive Services – Text Analytics: Text Analytics for Health performs four key functions, entity recognition, relation extraction, entity linking, and assertion detection, all with a single API call.
The specific services and tools used for data preparation and analysis will depend on the format of the ingested data, and the nature of the pre-processing required. AWS offers a range of options to cover various data preparation needs, from automated ETL with Glue and Data Pipelines to custom data processing with Lambda, and NLP capabilities using Comprehend & Comprehend Medical. Azure and GCP also have similar services so, for portability of the solution, we chose methods and services that could be easily updated with calls to APIs on the alternate platforms. With the main application running on Kubernetes, as stated, it would be straightforward to run the application on any other cloud providers’ Kubernetes PaaS service.
The primary purpose of this solution, beyond conceptualising an AI multi-cloud use case, is to give some insights into the way these tools can be used together across public cloud providers. For any specific use case, there may be a better performing tool on a different platform, and once you commit to multi-cloud you are no longer tied to a single provider. This is equally relevant when running applications from on-prem to access multiple cloud providers. For an on-prem use case, the Aviatrix transit would be configured with an Edge gateway in the data centre (or a hosting partner) which would become part of the main transit backbone.
Data Analysis: Unleashing the Power of NLP The rich ecosystem of machine learning services, including Azure Cognitive Services, AWS ML services and Google Cloud allowed us to perform document & data analysis with ease. We leveraged the tools to extract scores from FAERS data, shedding light on patient experiences. The returned results were shown in isolation as well as combined and passed to some of the new Gen AI services, such as Amazon Bedrock and Azure Open AI, to provide an “LLM combined summary”. The solution described would allow a user with specific domain expertise to identify which of the results (or the combined result) provided the most accurate & relevant result, enabling them to identify the best cloud platform. They can then prioritise the service or pipeline and minimise ongoing costs for a production solution. Regular analysis across the different cloud providers is possible, allowing a simple switch if one provider starts to outperform the current pipeline.
Application UI & Multi-Cloud Kubernetes As well as being able to deploy the application to any public cloud platform, with a central DNS service managing the application routing, we explored deploying the containers across all three platforms. This would offer a more resilient and cost-effective solution – being able to take advantage of free tier allocations across each platform, as well as ride out any specific cloud platform outages.
Cost Optimization: Efficiency at Scale With the ability to scale cloud resources, we optimize costs by utilising serverless services and pipelines, and spin down resources during periods of inactivity. Cost-conscious cloud architecture patterns allow these machine learning solutions to be financially viable.
Contact us at skyPurple Cloud for help with refactoring your own applications and identifying cost savings. We have a mature FinOps practice working out of the UK office in Oxford.
Our Conclusions
Our multi-cloud AI solution for healthcare data demonstrates the power of multi-cloud computing, advanced machine learning, and secure cloud networking.
By leveraging the strengths of AWS, Azure, GCP, and Aviatrix, we've unlocked valuable insights from the FAERS data, providing a holistic view of patient experiences. We’ve also proved that the portability and flexibility of a multi-cloud solution is within reach of all cloud customers. There is no requirement for large investments in technology and complex network configurations, cutting costs by using Aviatrix as our network transit for multi-cloud connectivity.
We hope this blog post inspires you to design your own multi-cloud solutions, armed with knowledge to navigate the complexities of cloud computing, connectivity, and data analysis.
The future of healthcare is data-driven, and we're excited to be at the forefront of this innovation.
Multi-cloud is way easier than surfing! ;) Now, GO BUILD!
Stay tuned for more technical insights and solutions as we continue to push the boundaries of what's possible with AI in a multi-cloud world. Connect with us on LinkedIn and arrange to meet with our experts to guide you on your multi-cloud journey.
Next Steps, Final Thoughts & Our Next Blog
Machine Learning Models for Deeper Insights: Training Models for Custom Insights
While the standard NLP tools were good for some of analyses, we craved deeper insights into, and wanted to extract additional value from, the data. We are working on training custom machine learning models using Amazon SageMaker, Google Cloud AutoML and Azure Machine Learning to uncover patterns and associations within the data and increase performance, as well as identify the best performing ML platform of the three main cloud providers. Again, this is research that is very relevant to many industries looking to optimise their use of AI in a single or multi-cloud environment.
Compliance and Security: Safeguarding Patient Data
In the healthcare industry, data privacy and compliance are non-negotiable. Our solution adhered to stringent regulations, including HIPAA, to ensure all data remained secure. We will cover how we managed security and risk management in a subsequent post and discuss our partnership with Orca Security. Stay tuned for our comprehensive post on cloud security posture management!
Comments