This is an invited publication co -written with Tim Krause, main MLOPS architect in Conxai.
Conxai technology GmbH He is a pioneer in the development of an advanced ai platform for the architecture, engineering and construction industry (AEC). Our platform uses advanced ai to empower experts in construction domains to create complex use cases efficiently.
Construction sites generally use multiple CCTV cameras, generating large amounts of visual data. These camera foods can be analyzed using ai to extract valuable information. However, to comply with the GDPR regulations, all people captured in the footage must be anonymized masking or blurring their identities.
In this publication, we deepen how Conxai houses the segmentation model of a latest generation significant in AWS using amazon Simo Storage Service (amazon S3), amazon Elastic Kubernetes Service (amazon EKs), Kserve and Nvidia Triton.
Our ai solution is offered in two forms:
- Model as a service (Maas) – Our ai model is accessible through an API, which allows perfect integration. The price is based on the processing of lots of 1,000 images, offering flexibility and scalability for users.
- Software as a service (SAAS) -This option provides an easy -to -use board, which acts as a central control panel. Users can add and administer new cameras, see images, perform analytical searches and enforce GDPR compliance with anonymization of the automatic person.
Our ai model, adjusted with a patented data set of more than 50,000 self -defined images of construction sites, achieves significantly higher accuracy compared to other Maas solutions. With the ability to recognize more than 40 kinds of specialized objects, such as cranes, excavators and portable baths, our ai solution is designed and uniquely optimized for the construction industry.
Our trip to Aws
Initially, Conxai began with a small cloud supplier specialized in offering affordable GPU. However, it lacked essential services necessary for automatic learning applications (ML), such as Border and Backend infrastructure, DNS, loading balancers, scale, BLOB storage and administered databases. At that time, the application was implemented as a single monolithic container, which included Kafka and a database. This configuration was not scalable or maintainable.
After migrating to AWS, we got access to a robust services ecosystem. Initially, we implement the container ai everything in one in a single instance of amazon Elastic Compute Cloud (amazon EC2). Although this provided a basic solution, it was not scalable, which required the development of a new architecture.
Our main reasons to choose AWS were mainly driven by the extensive experience of the team with AWS. In addition, the initial cloud loans provided by AWS were invaluable to us as startup. Now we use AWS managed services whenever possible, particularly for tasks related to the data, to minimize the general maintenance expenses and pay only the resources we really use.
At the same time, our goal was to remain agnostic in the cloud. To achieve this, we chose kubernetes, allowing us to implement our pile directly on the edge of a client, as in the construction sites, when needed. Some customers are potentially very restrictive compliance, do not allow the data to leave the construction site. Another opportunity is federated learning, the training in the client advantage and only transfers pesos of the model, without confidential data, to the cloud. In the future, this approach could lead to a adjusted model for each camera to achieve the best precision, which requires hardware resources on the site. At the moment, we use amazon EK to download the administration overload to AWS, but we could easily implement in a standard Kubernetic cluster if necessary.
Our previous model was running in Torchserve. With our new model, we first tried to make an inference in Python with Flask and Pytorch, as well as with Bentoml. Achieving high inference yield with high GPU use for profitability was very challenging. Exporting the model to Onnx format was particularly difficult because the Oneformer model lacks strong community support. It took us for some time to identify why the OneForr model was so slow in the execution time of ONNX with Nvidia Triton. Ultimately, we solved the problem by converting onnx into tensorrt.
Define the final architecture, the training of the model and the optimization of the costs took approximately 2 to 3 months. Currently, we improve our model by incorporating increasingly precise data, a process that takes about 3 to 4 weeks of training in a single GPU. The implementation is completely automated with Gitlab Ci/CD pipes, Terraft and Helm, which requires less than an hour to complete without any inactivity time. The new versions of the model are generally implemented in shadow mode for 1 to 2 weeks to provide stability and precision before complete implementation.
General solution of the solution
The following diagram illustrates the architecture of the solution.
The architecture consists of the following key components:
- Cube S3 (1) is the most important data source. It is profitable, scalable and provides almost unlimited BLOB storage. In encrypting the s3 cube, and we eliminate all the data with privacy concerns after the processing was carried out. Almost all microservices read and write files from amazon S3, which finally triggers (2) amazon Eventbridge (3). The process begins when a customer loads an image on amazon S3 using a presigned URL provided by our API Maneing Authentication and Autoration through amazon Cognito.
- The s3 cube is configured in such a way that all events in Eventbridge.
- Triggermesh is a Kubernettes controller where we use
AWSEventBridgeSource
(6). Embrace the infrastructure automation and automatically creates a processing tail of the amazon Simple Cola Service (amazon SQS) (5), which acts as a processing buffer. In addition, it creates a rule of Eventbridge (4) to forward the S3 event from the event bus in the SQS processing tail. Finally, Triggermesh creates a kubernetic capsule to probe processing tail events to feed the Knative broker (7). Resources in the Kubernetes cluster are implemented in a private subnet. - The central place for Knative Eventing is the Broker Knative (7). It is backed by amazon Managed Streaming for Apache Kafka (amazon MSK) (8).
- The Knative trigger (9) Knative corridor survey based on a
CloudEventType
And he forwards it accordingly to KserveInferenceService
(10). - Kserve is a standard model inference platform in Kubernetes that Knative Serving uses as a base and is fully compatible with Knative Eventing. It also extracts models from a repository of models in the container before the models server begins, eliminating the need to build a new container image for each version of the model.
- We use the “Transformer and predictor's characteristic of Kserve in the same POD” to maximize the speed and inference performance because the containers within the same POD can communicate on the local host and the network traffic never leaves the CPU.
- After many performance tests, we achieved the best performance with the Nvidia Triton inference server (11) after converting our first model into ONNX and then tensorrt.
- Our transformer (12) uses Flash with Gunicorn and is optimized for the number of CPU workers and nuclei to maintain GPU use above 90%. The transformer obtains a
CloudEvent
With the Reference of the Image of amazon S3, the download and performs model inference through HTTP. After recovering the results of the model, performs the preprocessing and finally loads the results of the amazon S3 processed model. - We use Karpenter As the auto climber cluster. Karpenter is responsible for climbing the inference component to handle high user application loads. Karpenter launches new instances of EC2 when the system experiences greater demand. This allows the system to automatically expand computer resources to meet the highest workload.
All this divides our architecture mainly into the AWS Managed Data Service and the Kubernetes Cluster:
- S3 Bucket, Eventbridge and SQS Queue, as well as amazon MSK, are totally managed services in AWS. This maintains under our data management effort.
- We use amazon EK for everything else. Triggermesh,
AWSEventBridgeSource
Knative Broker, Knative Trigger, Kserve with our Python transformer and Triton's inference server are also within the same EKS cluster in an instance of EC2 dedicated with a GPU. Because our EKS cluster is only used for processing, it is totally statular.
Summary
Since initially having our own highly personalized model, the transition to AWS, improving our architecture and introducing our new OneForm construction. We achieved a use of more than 90%GPUs, and the number of processing errors has fallen almost to zero in recent months. One of the main design options was the separation of the model of the preprocessing and postprocessing code in the transformer. With this technology battery, we gain the ability to reduce to zero in kubernettes using the Knative Server function without server, while our scale of a cold state is only 5-10 minutes, which can save significant infrastructure costs for a possible use of inference due to lots.
The next important step is to use these results of the model with adequate analysis and data science. These results can also serve as a data source for generative characteristics of ai, such as automated reports. In addition, we want to label more diverse images and train the model in additional construction domain classes as part of a continuous improvement process. We also work in close collaboration with AWS specialists to bring our model in the AWS Inferentia chips sets for better profitability.
For more information about services in this solution, see the following resources:
About the authors
Tim Krause He is the main architect of MLOPS in Conxai. It deals with all activities when ai meets the infrastructure. He joined the company with a previous platform, Kubernetes, Devops and Big Data Knowledge and was training LLMS from scratch.
Mahdi Yosofie He is an AWS solution architect, working with starting clients and taking advantage of his experience to help start customers to design their work loads in AWS.