This article describes the components that make up the high-level architecture for Axsy Smart Vision Object Detection and how those components interact as part of the Axsy Smart Vision Object Detection model training process.



High-Level Architecture

The Axsy Smart Vision Object Detection architecture comprises of 4 main components, listed below and depicted in the accompanying architecture diagram:


Axsy Mobile App with Smart VisionThe Axsy Mobile App with Smart Vision runs on each end user's mobile device (iOS, Android or Windows). Trained Axsy Smart Vision Object Detection models are downloaded to the app for use with in-app, on-device image recognition. The Axsy mobile app is publicly available on the various per-platform app stores.
Customer Salesforce OrgCustomer-specific instance of the Salesforce Platform with Axsy installed. Handles orchestration of Axsy Smart Vision Object Detection model training, which is unique per customer.
Axsy Smart Vision Object Detection Training App on Google Cloud Vertex AIModel training is executed by the Axsy Smart Vision Object Detection Training App hosted on Google Cloud Vertex AI.
Customer-Managed Google Cloud StorageTraining images and the resulting models are securely stored in the customer-managed Google Cloud Storage bucket. It is crucial to note that these images and models are never stored at rest by the Axsy Smart Vision Object Detection Training App.



Axsy can specify the geographic hosting location of the Axsy Smart Vision Object Detection Training App within Google Cloud, ensuring alignment with any region-specific requirements of its customers.



High-Level Architecture for Axsy Smart Vision Object Detection



Model Training Process

Storage of Training Data

The sequence for storing training images is as follows:


Training images originate from the customer's Salesforce Org and may include both idealised sample images and image corrections made by users of the Axsy Mobile App.
All training images are initially routed to the Axsy Smart Vision Object Detection Training App, which serves as an intermediary to facilitate their seamless upload to the specified Google Cloud Storage bucket. This transfer employs a transient storage protocol, ensuring that images are maintained only for the minimal duration necessary to achieve secure and efficient transfer to the designated Google Cloud Storage repository.
Training images are stored securely at rest in the customer's Google Cloud Storage bucket.


How images are stored for future model training



Initiating A Model Training Run

The following steps and accompanying diagram illustrate the process for initiating a model training run by the Axsy Smart Vision Object Detection Training App:


Initiation of a model training run is triggered from the customer's Salesforce org. This is triggered either from a Salesforce admin manually starting a training run via the Salesforce Web UI or because certain thresholds configured in the Salesforce org are met (e.g. number of end user corrections) to automatically initiate training.
When training is initiated, the Axsy Smart Vision Object Detection Training App instantiates a Virtual Machine to run the Axsy training code stored in the Axsy repository.
The Axsy training code retrieves the image dataset from the customer's Google Cloud Storage.
Using the retrieved training images, a new model version is trained.
Once training is complete, the newly trained model is securely stored in the customer's Google Cloud Storage bucket.
The Virtual Machine running the training code is deleted and all temporary data including training images and models are securely erased.


   

Initiation of a model training run



Delivering A Trained Model to the Mobile App

The following steps and accompanying diagram illustrate how a version of a trained model is delivered to the Axsy Mobile App:


The Axsy Mobile App with Smart Vision requests a model from the Salesforce Org.
The Axsy Smart Vision package running in the customer's Salesforce Org requests a secure, short-lived download URL from the Axsy Smart Vision Object Detection Training App and returns it to the Axsy Mobile App. 
The Axsy Mobile App uses the secure URL to retrieve the Smart Vision Object Detection model from the customer's Google Storage bucket.



   

Delivery of a trained model version to the Axsy Mobile App for use with on-device image recognition



Axsy Smart Vision and the Salesforce Einstein Trust Layer

Axsy Smart Vision is an on-device AI solution. This means that any AI models in-use are run locally within the Axsy Mobile App. This is unlike Salesforce’s Einstein 1 Platform which is designed to run in the Cloud.


As such, while security and trust are as important to Axsy Smart Vision as it is to the Einstein 1 Platform, how Axsy Smart Vision - as an on-device solution - approaches such key concepts may differ from how the Einstein 1 Platform - as a Cloud solution - handles those same concepts.


Additionally, as Axsy Smart Vision is an image recognition solution as opposed to a generative AI solution like the Einstein 1 Platform, some trust concepts applicable to the Einstein 1 Platform may not apply to Axsy Smart Vision.


NOTE: For more information on the Einstein Trust Layer referenced in this article, please see HERE.



Secure Data Retrieval

Any data that is inputted into the Axsy Smart Vision model originates locally from the user’s device and may be a combination of:

  • Salesforce data (e.g. record fields) that have already been synced to the Axsy Mobile App and stored in the app’s local encrypted database.
  • Dynamic data (e.g. Flow) calculated from the data already synced and stored by the Axsy Mobile App. NOTE: The Axsy Mobile App has an offline Flow engine, so execution of Flows also occurs locally and not on the Salesforce Platform in the Cloud.
  • Photos and/or videos that are captured locally, by the user themselves, with their device’s camera.


Unlike with the Einstein 1 Platform, there is no “server-side grounding” as no data is retrieved or processed from off-device or from the Cloud.



Data Masking and Demasking

As the Axsy Smart Vision models run on-device and all data retrieval is local, there is no Personal Identifiable Information (PII) that is sent in-transit over a data connection or shared with a third party. Accordingly, as the Axsy Smart Vision model runs locally, there is no need to mask data that is input to the model and no need to demask the output from the model.



Prompt Defence and Toxicity Detection

The Einstein 1 Platform is focused on generative AI through the use of Large Language Models (LLM). As such, the Einstein Trust Layer includes prompt defence to steer the LLM towards generating a desirable outcome as well as toxicity detection to ensure the generated output does not include problematic content. 


Axsy Smart Vision is a visual AI model rather than a generative AI model. This means that Axsy Smart Vision does not generate content. Rather, it uses image recognition to identify, count and classify objects from inputted images and videos. As such, prompt defence and toxicity detection - which are only relevant for generative AI - are not in scope for Axsy Smart Vision.



Feedback and Corrections

Axsy Mobile App users are able to provide corrections to what is output by the Axsy Smart Vision model. These corrections are then uploaded to the Salesforce Platform so they can be used in future training runs of the model. 


It is important to note that only corrections made manually and explicitly by the user are synced to the Salesforce Platform - so the user has full control over what corrections are saved. Additionally, these corrections only serve to improve and train the model the user has access to and these corrections are not fed into any other wider, generally available models.


As with all network traffic between Axsy and the Salesforce Platform, any corrections saved to the Salesforce Platform are sent over an encrypted connection.