This blog presents a roof damage assessment and reporting solution powered by the AMD Instinct MI300X accelerators on Dell PowerEdge XE9680 servers.

| Industry Challenges in Roof Damage Assessment

Insurance companies often face significant challenges in assessing and reporting roof damage efficiently. Traditional workflows rely heavily on manual inspections and detailed reporting, leading to time-consuming processes that delay claim submissions and resolutions. These inefficiencies not only frustrate policyholders but also increase operational costs and introduce inconsistencies in claim evaluations. With thousands of claims requiring physical inspections annually, the lack of automation hampers transparency, consistency, and speed, straining customer satisfaction and insurer resources.

| A High-Performance Solution

To address these challenges, we developed an automated Roof Damage Assessment and Reporting solution powered by Dell PowerEdge XE9680 servers equipped with AMD Instinct MI300X accelerators. This solution transforms the insurance claims process by automating damage detection and streamlining report generation. Integrating advanced vision language models (VLMs), audio multimodal models, and retrieval-augmented generation (RAG), it enhances operational efficiency and customer experiences.

This blog delves into the solution architecture and highlights its capabilities:

Automated detection and classification of roof damage with RAG and multimodal models.
Deployment of advanced models on Dell PowerEdge XE9680 servers, utilizing AMD Instinct MI300X accelerators.
Rapid inspection report generation in compliance with industry standards to reduce claim processing delays and improve transparency.

| Solution Architecture

To power this solution, we selected the Dell PowerEdge XE9680 equipped with AMD Instinct MI300X accelerators due to its exceptional performance and memory capacity, crucial for handling the latest high parameter count large language models. With 192GB of HBM3 memory per accelerator, we can comfortably run the entire Llama 3.1 70B model on a single accelerator. Memory and compute-intensive workloads, such as serving multiple model instances and fine-tuning, are also possible using only one hardware system with eight accelerators. As shown in the chart above, max token throughput with vLLM model serving of Llama 3.1 70B scales by a factor of ~7.9 with an increase in concurrent requests from 64 to 2048, achievable due to the unparalleled memory capacity of the AMD Instinct MI300X accelerator combined with the Dell PowerEdge XE9680 server.

To deliver an industry specific solution, we paired cutting-edge language models with critical software components as shown in the architecture below, such as vision language models, audio multimodal models, and a vector database. The memory and performance capabilities of Dell PowerEdge XE9680 with AMD Instinct MI300X accelerators make it possible to support this extensive software stack without compromising accuracy or efficiency.

This software stack includes the following key components:

rocm-vLLM v0.6.4, an industry-standard library for optimized open-source large language model (LLM) serving, with support for AMD ROCm 6.2.
Llama 3.1 70B Model, an industry-leading open-weight language model with 70 billion parameters, served using vLLM with AMD ROCm optimizations.
Molmo 72B VLM, an industry-leading vision language model served on vLLM; vision language models are multimodal AI models that can process and understand both visual and textual inputs simultaneously, enabling tasks like visual question answering, image captioning, and text-to-image search.
Ultravox, a multimodal language model that processes audio and text for real-time voice interactions.
EVF SAM, an advanced model that enhances text-prompted segmentation by integrating early vision-language fusion, achieving high accuracy in image and video predictions.
bge-large-en embeddings model, one of the top ranked text embeddings models on Hugging Face APIs, generates high-quality low-dimensional vector representations of text.
MilvusDB, an open-source vector database with high performance embedding and similarity search.
LightRAG, a retrieval-augmented generation (RAG) model that uses graph structures and dual-level retrieval to enhance information accuracy and response times.

| Solution Overview

This solution automates key aspects of damage identification, area segmentation, and report generation, ultimately reducing manual effort, accelerating claims submissions, and enhancing transparency and consistency. Below are the key steps involved in the solution:

VLM-Powered Damage Identification Process:
The core of this solution is the Molmo 72B Vision-Language Model (VLM), which processes high-resolution images collected during roof inspections. The model identifies and describes key types of roof damage, including flashing damage, shingle damage, and chimney damage. This AI-driven approach ensures accurate damage categorization and eliminates the reliance on manual analysis.

Integration with Inspector Feedback (Audio):
To enhance the assessment’s accuracy, the solution integrates inspector feedback via transcribed audio recordings, processed using the Ultravox 8B Audio Multimodal model. It is then fed into the VLM along with the input roof images; this human-in-the-loop approach combines the VLM-generated descriptions with expert insights to provide a comprehensive evaluation of the roof’s condition. Here, the Llama 3.1 70B Large Language Model (LLM) is used partially to classify the roof’s condition as damaged or undamaged.

Automated Marking of Damage Areas:
The EVF-SAM (Segment Anything Model) automatically segments the images to mark areas of roof damage. This visual segmentation provides a clear representation of the damage, supporting insurers with precise, visual evidence for claims processing.

Automated Roof Damage Report Generation:
Using Retrieval-Augmented Generation (RAG), the solution generates a comprehensive damage report for each home, customized to include information from the homeowner’s policy document. This report simplifies the submission process for both homeowners and inspection agencies, ensuring accuracy and compliance with claim requirements. In this step, Llama 3.1 70B is fully utilized for report generation.

Real-Time Dashboard for Hardware Performance:
The solution features a dynamic UI dashboard that showcases hardware performance metrics. Users can adjust the number of simultaneous input inspections and view real-time analytics on server performance, enabling an optimized workflow powered by the Dell PowerEdge XE9680 server with AMD Instinct MI300X accelerators.

By integrating state-of-the-art AI models and scalable hardware, this solution revolutionizes the roof inspection process, significantly reducing the time and effort required for damage assessments and claims submissions while enhancing overall accuracy and transparency.

To learn more, please request access to our reference code by contacting us at contact@metrum.ai.

| Additional Criteria for IT Decision Makers

| What is RAG, and why is it critical for enterprises?

Retrieval-Augmented Generation (RAG), is a method in natural language processing (NLP) that enhances the generation of responses or information by incorporating external knowledge retrieved from a large corpus or database. This approach combines the strengths of retrieval-based models and generative models to deliver more accurate, informative, and contextually relevant outputs.

The key advantage of RAG is its ability to dynamically leverage a large amount of external knowledge, allowing the model to generate responses that are informed not only based on its training data but also by up-to-date and detailed information from the retrieval phase. This makes RAG particularly valuable in applications where factual accuracy and comprehensive details are essential, such as in customer support, academic research, and other fields that require precise information.

Ultimately, RAG provides enterprises with a powerful tool for improving the accuracy, relevance, and efficiency of their information systems, leading to better customer service, cost savings, and competitive advantages.

| Why is the Dell PowerEdge XE9680 Server with AMD Instinct MI300X Accelerators well-suited for RAG Solutions?

Designed especially for AI tasks, the Dell PowerEdge XE9680 server is a powerful data-processing server equipped with eight AMD Instinct MI300X accelerators, making it well-suited for AI-workloads, especially for those involving training, fine-tuning, and conducting inference with Large Language Models (LLMs).

Effectively implementing Retrieval-Augmented Generation (RAG) solutions requires a robust hardware infrastructure that can handle both the retrieval and generation components. Key hardware features for RAG solutions include high-performance accelerator units and large RAM and storage capacity. With 192 GB of GPU memory, a single AMD Instinct MI300X accelerator can host an entire Llama 3 70B parameter model for inference. Optimized for generative AI, the AMD Instinct MI300X accelerator can deliver up to 10.4 Petaflops of performance (BF16/FP16), and provides 1.5TB of total HBM3 memory in a group of eight accelerators.

| References

AMD images: AMD Library, https://library.amd.com/account/dashboard/

Dell images: Dell.com

Copyright © 2024 Metrum AI, Inc. All Rights Reserved. This project was commissioned by Dell Technologies. Dell and other trademarks are trademarks of Dell Inc. or its subsidiaries. AMD, AMD Instinct™, AMD ROCm™, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other product names are the trademarks of their respective owners.

***DISCLAIMER - Performance varies by hardware and software configurations, including testing conditions, system settings, application complexity, the quantity of data, batch sizes, software versions, libraries used, and other factors. The results of performance testing provided are intended for informational purposes only and should not be considered as a guarantee of actual performance.