3D Convolutional Neural Network — A Guide for Engineers

Artificial Intelligence (AI) and Deep Learning are revolutionizing how we approach engineering problems. The revolution took off in computer vision and speech recognition; now it is spreading to product design.

One of the most exciting progresses in Deep Learning is the Convolutional Neural Network (ConvNet). Recent research has opened the way for implementing even a 3D Convolutional Neural Network. The 3D convolutional neural network is a key enabler for the revolution in engineering; empowering product design engineers with high-end simulation capability.

In this article, we will introduce the concept of a 3D Convolutional Neural Network and explore its business relevance. We will start with a case from the automotive industry.

We will also delve into the basic concepts behind this technology, being friendly to readers without an AI background. We will progress towards understanding the most cutting-edge applications. The final aim is to grasp how neural network technology enables AI to "learn" 3D CAD shapes. The objective of the learning process is a predictive model that can predict the engineering behaviour of any product with associated data, such as a car's aerodynamics.

By the end of this article, as an engineer, you will gain a fundamental grasp of this revolutionary technology and have the basics to propose yourself as a thought leader to impact your organization positively in product design.

What about introducting a 3D convolutional neural network in your business?

Preliminary Case History: Better Competition with 3D Convolutional Neural Network

The following case history is taken from the competitive landscape of Tier 1 suppliers providing automotive components to major automotive manufacturers (OEMs).

This also stems from the author's experience as engineering director of a Tier 1 supplier for four business lines: original equipment for standard passenger cars, luxury cars and trucks, and finally, aftermarket (spare parts).

Everything starts with an RFQ, a Request for Quotation or bid by an OEM to three or more suppliers who had to compete to secure winning a project for the supply of a product line, from engineering concept to manufacturing.

The experience, confirmed by the trends in the last decades, is that OEMs and suppliers are beyond the old approach of using product simulation to fix issues after they have been encountered. At least, product simulation should support the comparison between different design solutions, i.e., support the exploration of the design space.

However, two major issues remain unsolved in the classic simulation (CAE) approach:

  • how to simulate in seconds or minutes instead of hours or days
  • how to make the simulation technology available to all engineers without heavy infrastructure and staff training expenses

Or, in a single question; how to enable all engineers with interactive design space exploration?

Regarding the business case, an early engineering predictive capability, approachable not only by R&D but also by product design and technical sales teams; it would give sales engineers a competitive advantage in becoming the preferred supplier in an RFQ and establishing a long-term relationship with a customer.

We will now review what tools are have available to teams to provide concrete evidence, based on sound engineering judgement, that their product is innovative, cost-effective, and compliant with performance targets.

How to Support Product Design

How do demonstrate that a product complies with targets and constraints before manufacturing?

There are three possible approaches:

  1. The first approach proposes a "copy and paste" of a previous product solution. However, this "no-brainer" approach, recycling the same design at a lower price, isn't sustainable for long since it leads to win-lose situations between OEM and supplier, a lack of confidence in performance targets and poor adaptation to evolving regulations for decarbonization and sustainability.
  2. The second approach involves deploying resources like testing, prototyping, and CAE simulation. Although this seems like the ultimate answer, it's costly and time-consuming. Setting up testing rooms or involving CAD to CAE workflows can take days, weeks, or months. Because competition is fierce, OEMS and suppliers face costly software and hardware maintenance investments and necessary dedicated R&D teams.
  3. The third and most innovative approach is to embrace the AI Deep Learning approach in design. This approach involves recycling previous solutions to create AI predictive models. Organisations adopting this approach can provide technical product capability demonstrations much faster than before, in real-time and with easily deployable data-driven neural network predictive models.

Once a real-time solution is available, it is immediate to consider the next step of implementing it in an iterative design process, thus reaching your product's desired objectives faster, such as reduced weight, better heat dissipation, and better energy efficiency. The iterative design process can be automated and operated by humans or embedded in a generative design approach.

Can My Company Use AI to Predict Designs?

Entering the world of AI-driven simulation for product design is not as complicated as it may seem.

The two key ingredients you need to get started are a business case and data:

  • A business case is simply the need to be competitive and innovative to attract customers or prospects
  • Data are generated whenever your company designs CAD and performs CAE simulation or testing

For instance, you must deliver a product in 8 weeks instead of 6 months. That's your business case. And you have data from the last 2 years, during which you've created 4 product concepts that can be used to train your AI.

In the following, we will go into more technical detail on how prediction arises from data. But regarding practical deployment, NCS, which technology is based on 3D convolutional neural network, already empowers CAE engineers and data scientists to build CFD surrogates and deploy them to designers.

Before diving into the article...

In short, 3D convolutional neural network - deep learning approaches can improve accuracy and performance compared to traditional models, resulting in a competitive advantage and increased market share for your company.

Please note all the following technical details aim to satisfy your curiosity about the inner workings of Deep Learning tools such as NCS. Still, we assure you that the practical usage of the NCS platform for AI-assisted engineering prediction has no prerequisite AI, data science or computer science skills.

In the next chapters, we will first review how a 3D convolutional neural network works, exploring such topics as classification and regression (two forms of pre and diction), and then move on to 3D convolutional neural network architecture with more details on its building blocks, such as artificial neurons and filters.

Learning - in Humans and Artificial Agents

We will first give an overview of the AI training process, a key passage towards an actionable AI for designers.

There are three main types of AI learning. We will focus on supervised training, i.e. training with labelled data to predict labels for new examples. Two other approaches are unsupervised training (finding patterns in unlabeled data), and reinforcement training (training through interaction with an environment).

A model is trained to perform a task based on labelled examples in supervised training. This training process can occur with humans and artificial intelligence (AI) systems. The differences between human and artificial training can provide insight into the strengths and limitations of AI. Let's use image classification to compare supervised trainings; in this task, the deep learning model is trained to identify the class of an image (e.g. dogs, cats, etc.) based on labelled examples of images. What about humans?

Training Humans

In the case of humans, supervised training occurs through experience and observation. A human might learn to recognise the different classes of objects in images through repeated exposure to labelled examples. Of course, for us, the process starts with real-life objects and verbal interaction at a very early age. This process is time-consuming and requires a lot of effort and memory. However, we humans have the advantage of understanding the context of the image and using prior knowledge and experience to recognise objects in new and unseen images.

Introduction to Deep Learning Training

Deep learning systems can perform tasks much faster and more efficiently than humans. However, they are trained to carry out specific tasks without truly understanding the meaning or context behind their actions. We aim to automate engineering tasks that can be easily translated into mathematics and extract data from past experiences, such as design, laboratory testing, and CAE simulations.

It's important to recognize that deep learning techniques have certain limitations. Unlike humans, these systems do not understand the context and meaning of images. Instead, they use the features and characteristics they've learned from labelled examples to make predictions. This means they may not generalize well to new or unseen images.

Technical Aspects of Deep Learning Training

Deep learning systems can be trained on large datasets of labelled images to learn the features of each object class quickly. The training process involves adjusting mathematical connections, called "weights," in the model (a neural network). This minimizes the error between predicted and actual class labels using algorithms such as gradient descent, enabling the deep learning system to learn much faster than a human could.

Predicting with Deep Learning

Deep learning systems can make predictions much faster than humans, achieving superhuman performance. For instance, they can process images and make real-time classification predictions, while humans would take much longer to identify the image class.

The speed advantage is particularly evident with complex industrial designs. Predicting the temperature on the surface of a complex industrial geometry given its operating conditions is precisely the type of data-driven engineering prediction NCS specializes in. Such predictions are known as regression.

Deep Learning Prediction Speed - Regression Speed

Deep learning is superhuman even more when the challenge is not about classification but regression, i.e. predicting a number such as in the following question - what is the aerodynamic coefficient of this car, given the image of the car?

What is the aerodynamic performance of this car, i.e. what are the values of pressure and velocity in the space around the car and on its body, to compute for example drag (resistance) and downforce? The answer can be given either by either 1) a software with a reasoning close to the human brain, i.e. focusing on the contect of fluid mechanics, physics of turbulence and car body-air interaction; or 2) data-driven AI based on training AI with data provided by 1) and then predicting by associating geometrical features to numerical values in each point in space (by regression). Image source: Audi.
What is the aerodynamic performance of this car, i.e. what are the values of pressure and velocity in the space around the car and on its body, to compute for example drag (resistance) and downforce? The answer can be given either by either 1) a software with a reasoning close to the human brain, i.e. focusing on the contect of fluid mechanics, physics of turbulence and car body-air interaction; or 2) data-driven AI based on training AI with data provided by 1) and then predicting by associating geometrical features to numerical values in each point in space (by regression). (Image source: Audi)

Here, deep learning can achieve superior performance than humans and compared to software coded by humans based on physics. Deep learning predictions can be several orders of magnitude (from 1'000 x to 1'000'000) faster than traditional computational software such as CAE, based on physical knowledge (again, having context and meaning!) of aerodynamics and industrial geometries such as cars.

What Are Classification and Regression?

Classification and regression are two common types of supervised machine learning. More precisely: in a classification task, the goal is to predict a categorical label for a given input data. In a regression task, the goal is to predict a continuous value for a given input.

Practical Examples: Cats Vs Dogs & House Prices

Let's start with a simple example to understand the difference between classification and regression.

Imagine you have a dataset (=a collection of data) of images of cats and dogs, and you want to train a machine learning model to predict whether each image is a cat or a dog. This is a classic classification task where the output is a categorical label ("cat" or "dog"). In this example, the input data is the image, and the output is a binary label ("0" for cat, "1" for dog). The model would learn to make predictions based on the patterns in the input. For example, it might learn that images with fur and whiskers are more likely to be cats.

Now let's consider a different example. Imagine you have a dataset of housing prices in a city, and you want to train a machine learning model to predict the price of a house given its size, number of bedrooms, and number of bathrooms. This is a classic regression task where the output is a continuous value (the house price). In this example, the input is the size, number of bedrooms, and number of bathrooms, and the output is the price of the house. The model would learn to make predictions based on the patterns in the input. For example, you might learn that houses with larger sizes, more bedrooms, and more bathrooms tend to have higher prices.

Classification and Regression in Machine Learning

Classification is a type of supervised learning that aims to predict a categorical label for a given input. This category label can have multiple values, but each sample in the dataset belongs to only one category. For example, in a binary classification problem, the category label has two values: cat or dog. In a multi-class classification problem, the category label has more than two values, such as red, green, and blue. In a classification task, a model is trained on a labelled dataset, where each sample is a pair of input data and a categorical label. The training goal is to learn the mapping from the input to the category label.

Various metrics can be used to evaluate a classification model's performance. Accuracy measures the fraction of correctly classified samples, while precision measures the fraction of samples correctly classified as positive among all samples classified as positive. Recall measures the fraction of positive samples that are correctly classified, and the F1 score is the harmonic mean of precision and recall.

A confusion matrix is a tool used to measure a classification model's performance. It is a table that summarises the number of true positive, false positive, true negative, and false negative predictions a classifier makes. It compares a given dataset's predicted and true labels and can help identify which classes the classifier performs well on and which classes it struggles with. The diagonal elements of the matrix represent the number of correct predictions for each class, while the off-diagonal elements represent the number of incorrect predictions. A confusion matrix can calculate performance metrics such as accuracy, precision, recall, and F1 score.

Regression is supervised learning where we aim to predict a continuous value for a given input data. In a regression task, a model is trained on a labelled dataset, where each sample is a pair of data and a continuous label. The training aims to build a model that can map from an input to a continuous label. Unlike classification tasks, where the labels are discrete categories, regression seeks to predict a numerical outcome for a given input.

To evaluate a regression model's performance, metrics such as mean squared error (MSE), mean absolute error (MAE), and R-squared (R²) can be used.

MSE measures the average squared difference between predicted and true values, while MAE measures the average absolute difference between predicted and true values.

R-squared measures the proportion of variance in the dependent variable explained by the independent variables in the regression model. A high R-squared value does not necessarily indicate a good model, as other factors, such as overfitting and poor predictive ability, should also be considered. R-squared should be used with other metrics, such as residual plots, to comprehensively evaluate a regression model's performance.

Deep Learning and 3D CNN Architecture

A 3D Convolutional Neural Network (3D CNN) as refers to neural network architectures with multiple layers that can learn hierarchical data representations. Each layer learns increasingly complex spatial features of data. These representations can then be used for various tasks such as classification, regression, or generation. Each type of layer has a definite function.

A ConvNet architecture. (Image Source: Serengil, Sefik Ilkin "Handwritten Digit Recognition Using CNN with Keras",  https://sefiks.com/2017/11/05/handwritten-digit-recognition-using-cnn-with-keras/)
A ConvNet architecture. (Image source: Handwritten Digit Recognition Using CNN with Keras)

For example, the max-pooling layer is a "downsampler" for feature maps. A feature map is generated by convolving a filter over an image reducing its spatial size and computational burden. Another form of pooling is global average pooling.

Intuitive illustration of the workings in a max pooling layer; only maximum values are kept in each area (source: https://deepai.org/)
Intuitive illustration of the workings in a max pooling layer; only maximum values are kept in each area. (Image source: https://deepai.org/)

A neural network includes interconnected processing nodes - artificial neurons. Let's have a deeper look at the connection between neurons and learning. How can they possibly learn? The key to the approach shown here (supervised learning) is to have examples to learn from.

In deep learning, neurons are organised into layers (this is the "depth" of deep learning).

What is the role of the layers?

Neurons in the first layer process the input, and the output from one layer is used as input for the next layer, and so on until the output from the final layer fairly represents the prediction.

Furthermore, each layer is trained to learn a different data representation. For example, in image classification, the first layer may learn to recognise edges and basic shapes, and the next layer may learn to recognise more complex shapes, and so on, until the input reaches the final layer. It transforms into a high-level representation to make a prediction or classify images.

Neurons' weights in deep neural networks are trained using backpropagation. Backpropagation updates the weights based on the error between the predicted output and the ground truth label. The error is propagated backwards (hence the name of backpropagation!) through the network, and the weights are adjusted to reduce the error. This process is repeated until the error reaches an acceptable level.

If ŷ is the prediction for a given input x, the error between the predicted output and the true label y is expressed by formulas such as E=(ŷ-y)². A training process aims to minimise this error by adjusting weights and biases in the network. The weights are optimized with algorithms such as gradient descent, adjusting the weights in the direction of the negative gradient of the error E.

The "learning" in "deep learning" aims to capture high-level input representations that can be used for various tasks such as classification or regression.

3D Convolutional Neural Networks: Classification

A 3D Convolutional Neural Network is a deep learning model used in various applications, such as computer vision or medical imaging.

In these cases, we want AI (deep learning) to learn how to react to inputs rather than programming the AI according to a predetermined pattern. The outcome of this learning process is a predictive model. The predictive models emerging from the 3D convolutional neural network framework are designed to process and analyse data with a temporal dimension, such as videos. 3D convolutional neural networks can be used for processing 3D point cloud data from LiDAR sensors for object detection and semantic segmentation tasks in robotics and autonomous vehicles.

The 3D CNN can learn spatial relationships within the data and extract features that can be used for tasks such as classification or segmentation. The goal is to assign a semantic label, such as "road", "building", "vehicle", etc., to each pixel in the image, providing a detailed understanding of spatial information in the scene. A recent innovation is applying a 3D convolutional neural network to deep learning models to capture how simulation software can associate accurate engineering predictions to CAD designs, also with predictive uncertainty.

3D Convolutional Neural Networks: Regression

Earlier we mentioned using a 3D CNN to deep learning models to capture how software can associate accurate engineering predictions with CAD designs.

If the 3D ConvNet is trained on CAD geometries associated with CFD results, it will perform a regression rather than a classification task. In this scenario, the input to the network would be the CAD geometry, and the output would be the corresponding CFD result.

The 3D ConvNet would learn to map the input CAD geometry to the output CFD result based on the data for training. This could be useful in problems where the goal is to find the optimal CAD geometry, where performance comparison is carried out by CFD or its ConvNet surrogate.

The CFD surrogate works by extracting features from the input CAD geometry using 3D convolutions and processing these features through multiple layers to produce the final output. The network weights are updated during training to minimise the difference between the predicted CFD results and the ground truth CFD results.

Thus, it becomes possible to envisage (and reach) very ambitious objectives, such as coupling the neural network regression capability to a geometrical shape to optimize, leading to generative AI engineering (creating new shapes within the AI software without the need for third-party tools).

Training the Neural Network

During training, the network weights are updated to minimise the difference between the predicted CFD results and the CFD results: this is done using a loss function, which measures the error between the predicted CFD results and the ground truth CFD results.

There are several loss functions; a common one used in regression tasks is the mean squared error (MSE) loss. The loss measures the average difference between the predicted CFD results and the CFD results. The goal of training is to minimise this value and is expressed as Loss = (1/N) * Σᵢ (ŷᵢ - yᵍᵗ)² where N is the number of samples in the training set, ŷᵢ is the predicted CFD result for the iᵗʰ sample, and yᵍᵗᵢ is simply the ground truth ("gt") CFD result for the iᵗʰ sample. The summation runs over the dummy index i for samples from 1 to N.

Whatever the exact mathematical form, the network weights are updated using an optimisation algorithm such as stochastic gradient descent (SGD) to minimise the loss.

The loss gradient is computed using backpropagation, which allows the network to update the weights in the direction of minimal loss. The weight update rule for SGD is given by: w' = w - η * Loss where w is the weight, η is the learning rate, and Loss is the loss gradient along the weight. The learning rate η determines the step size at which the weights are updated, and it is a hyperparameter that must be chosen carefully:

  • a high learning rate can cause the network to converge quickly, but it may not find the optimal solution
  • a low learning rate can provide a more accurate solution but may take longer to converge.

Disambiguation: Recurrent Neural Network

Different types of neural networks are used for different purposes in machine learning. Here, for disambiguation purposes only, we will open and close a small chapter on the concept of a recurrent neural network (RNN).

An RNN is a neural network that processes sequential data, such as time series or natural language. They are characterized by their ability to maintain an internal state or memory, which allows them to process sequences of data of varying lengths and make use of information from previous time steps in the sequence. RNNs are highly effective in speech recognition, machine translation, and text generation tasks.

The Building Brick: The Artificial Neuron

The Artificial Neuron is the basic building block, a sort of atom, of Artificial Neural Networks (ANNs), which are modelled after the structure and function of biological neurons in the human brain. ANNs are defined as "computational models that simulate how the human brain processes information".

We would better restrict this ambitious definition to a more operative one: a useful ANN for engineers is a computational model that reproduces some specific functions of the brain, or software produced by humans, used in engineering, to assist humans with superhuman speed.

How the Human Brain Processes Information: Mathematical Model for the Single Physical Neuron

The human brain is complex and processes vast amounts of information, thus enabling us to interact with our surroundings. Neurons, as specialised cells that transmit information, play a crucial role in this process. Neurons receive and process sensory information and send signals to other neurons and muscles through electrical and chemical signals. Networks of interconnected neurons form to process information in parallel, with the strength of the connection between neurons affecting their likelihood of firing together.

Mathematical models of neurons and neural networks have been developed to understand how the brain processes information. These models can predict neuron response to stimuli and show relationships between neurons. A simple mathematical model of a neuron is an artificial neuron consisting of inputs, weights, and an activation function that converts inputs into output.

Neuron: typical structure (source: https://neuroscientificallychallenged.com/glossary/neuron)
Typical structure of a neuron.

Enter the Artificial Neuron

As far as we are concerned for our engineering application purposes, the basic idea behind artificial neurons is to start from an idealistic model of a biological neuron and use that as a building block for more complex models that reproduce specific functions of the human brain or human artefacts such as simulation software. Artificial neurons (or "perceptrons") are mathematical functions that process input signals and generate output signals. It is important to underline here that we will not seek to reproduce a biological neuron mathematically but rather use it as a starting point for an artificial learning process.

From the biological to the articificial neuron. Image source: Sefik IlkinSerengil "Introduction to Neural Networks: Taking Lessons From The Past", https://sefiks.com/2017/01/15/introduction-to-neural-networks-a-mechanism-taking-lessons-from-the-past/
From the biological to the articificial neuron. (Image source: Sefik IlkinSerengil)

Building Blocks of an Artificial Neuron

Artificial Neurons have three components: inputs x, weights w, and an activation function f.

The inputs x are the signals the neuron receives from other neurons in the network. The weights w are values that determine the strength of the input signals. The function of activation f is a mathematical operation that determines whether the neuron will fire, or generate an output signal y, based on the input signals x and weights w. So we imagine something like y=f(x, w, b) where the bias term b could represent other parameters.

The inputs to an artificial neuron are multiplied by their corresponding weights, and the results are then summed.

The artificial neuron mathematics can be represented as y = wx + b where "b" is the bias term, and "y" is the resulting sum of all weighted inputs.

This weighted sum is passed through an activation function f, as mentioned above, which determines whether the neuron will generate an output signal. This function is usually non-linear, such as a sigmoid or rectified linear unit (ReLU), allowing the network to model complex relationships between inputs and outputs.

The function can be a sigmoid function: f(z) = 1 / (1 + e⁻ᶻ), or a rectified linear unit (ReLU) function: f(z) = max(0, z).

From a Single Neuron to a Neural Network

Artificial Neurons are the building blocks of ANNs and can be connected to form more complex models. For example, multiple artificial neurons can be connected to form a layer of a neural network, and multiple layers can be stacked to form a deep neural network. In a neural network, each neuron receives input from multiple neurons in the previous layer, and the output from each neuron in one layer is used as input to multiple neurons in the next layer.

How Does a 3D Convolutional Neural Network Work?

A 3D convolutional neural network is based on the concept of convolutional neural networks (CNNs) but with the addition of a temporal dimension.

In a traditional 2D CNN, the input consists of multiple image frames, which the network processes and analyses. In a 3D convolutional neural network, the input includes both spatial and temporal and spatial discretisation, making it possible to analyze the relationship between frames in time.

The 3D convolutional neural network architecture comprises several layers, including an input layer, multiple convolutional layers, activation functions, max-pooling layers, and a final classification layer.

The convolutional layers learn and extract spatial and temporal features from the input. Activation functions introduce non-linearity into the model, allowing it to model complex relationships between the input and the target output.

Feature Maps

Feature maps are arrays of values representing a certain feature in an image, such as an edge or a texture. The map is generated by applying convolutional filters to an input image. The filters scan the image and detect patterns at different scales, producing maps.

Max-Pooling

"Max-pooling" reduces the size of the map by dividing it into non-overlapping regions (often squares) and taking the maximum value in each region. Max-pooling preserves important features in the map by retaining the maximum activation value within each region while reducing the size of the feature map, making the network more computationally efficient. Additionally, reducing the feature map's size helps the network to be more robust to small translations in the input image, as the max-pooling operation is invariant to small translations.

Classification Layer

The classification layer typically comprises one or more fully connected (dense) layers connected to each neuron in the previous layer. The predictions made by the classification layer are based on the learned features from the previous layers. These features can include edges, textures, and shapes. Thus, classification layers are a crucial component of a Convolutional Neural Network to make predictions based on the features learned by the network.

Convolution in Detail

Convolution is a mathematical operation that allows information from one function to be transformed and combined with another. It is widely used in image processing, signal processing, and machine learning. Convolution can be represented mathematically using convolutional operators, which are functions that define how the input and output channels are related.

The Mathematics of Convolution

The convolution operation between signals f and g can be expressed mathematically as (f*g) (t) = ∫ f(τ) g(t - τ) dτ where f and g are the input signals, t is time, ∫ is an integral over time (∫ dτ where τ is the "dummy" variable of integration), and "*" denotes the convolution operator. The convolution operation "*" produces an output signal, a transformed version of the input signals.

The input signals can be considered image pixels in image processing, and the convolutional operator can be seen as a filter. The filter is applied to each image region, resulting in a transformed version highlighting certain image features. This process can be repeated with different convolution layer filters to extract different image features.

Convolution in image processing and machine learning has numerous benefits over previous approaches. One of the main ones is its ability to extract features automatically from the input. In traditional image processing techniques, feature extraction was performed manually, which was time-consuming and labour-intensive. Conversely, convolution automates the feature extraction, reducing the time and effort required.

Another benefit of convolution is its ability to preserve the input spatial structure. Convolution is applied locally, meaning it only considers information from a small input region. This allows the spatial input structure to be preserved, which is important for object recognition and image classification tasks. Convolution also can learn patterns in the input. Convolutional layers in deep neural networks are used to learn the most relevant input features. The filters in the convolution layers are adjusted during the training process to maximise the model performance.

Finally, convolution is computationally efficient on modern hardware, making it well-suited for real-time applications.

In summary: convolution is a powerful mathematical operation with practical benefits for image processing because of its ability to extract features, preserve the data spatial structure, learn patterns, and be computationally efficient.

gCNN - What is it?

A geodesic 3D convolutional neural network (gCNN) is a 3D convolutional neural network using geodesic instead of traditional 3D convolutions.  Geodesic convolutions consider the intrinsic geometry of the processed data. Instances are the curvature or shape of objects with bumps. gCCNs are often used in medical applications: for example, organs, bones and tissues often have complex shapes that can be better captured and analyzed using geodesic convolutions.

The advantages are twofold:

  1. Geodesic convolutions can better analyse the data's intrinsic geometry and improve the model accuracy. 
  2. Geodesic convolutions can be more computationally efficient than traditional CNNs, as they only consider the most important information in the data. The reduced number of parameters in the model, the reduced number of computations required for each convolution operation, and the reduced amount of data to be processed all contribute to efficiency.

More on gCNNs

Traditional convolutional neural networks (CNNs) use a fixed grid structure to process the input, resulting in a loss of information about the intrinsic data geometry. On the other hand, geodesic convolutions use a flexible grid structure adapted to the data's geometry, allowing a more accurate representation of the data's intrinsic geometry. This is done by defining the grid structure in terms of geodesic distances, rather than Euclidean distances, between the data points. The geodesic distance between two points in the data can be computed using various techniques, including heat diffusion, shortest path algorithms, or curvature-aware algorithms. The choice of technique will depend on the specific task and the type of processed data,

Being Euclidean or Not Euclidean?

In summary, Euclidean and non-Euclidean distance measures refer to different distance measures used in CNNs. Euclidean distance is used in traditional convolutions, and non-Euclidean distance measures are used in geodesic convolutions to capture better the intrinsic geometry of the processed data :

Euclidean distance measures the straight-line distance between two points in space. In the context of CNNs, Euclidean distance is often used to define the distance between the filter and the input in a traditional convolution operation. In this case, the filter is moved across the input in a grid-like structure, with the distance between the filter and the input defined by the Euclidean distance between the two points.

Non-Euclidean distance, on the other hand, refers to any distance measure that is not the Euclidean distance. In the context of CNNs, non-Euclidean distance measures can define the distance between the filter and the input in a geodesic convolution operation. In this case, the grid structure used to move the filter across the input is adapted to the intrinsic geometry of the data. The non-Euclidean distance measure defines the distance between the filter and the input.

Let's Go Back to Cats - Is a Cat Euclidean?

A Euclidean manifold is a mathematical space with a notion of distance and a set of coordinates.

In the case of a 2D picture of a cat (not the cat itself!), the pixels could be thought of as points in the picture's plane. The relationships between the pixels can be captured using a Euclidean distance. In fact, the Euclidean distance between two pixels (x1, y1) and (x2, y2) in a 2D image can be represented as √((x - x1)²+(y2 - y1)²). This equation captures the straight-line distance between the two pixels in a 2D plane, where x and y represent the pixel coordinates. By computing the Euclidean distances between all pairs of pixels in an image, the relationships between the pixels can be captured and used for various image-processing tasks, such as image segmentation, feature extraction, and clustering.

However, this is a simplified representation of the image data, as the actual pixel values are discrete rather than continuous. The relationships between the pixels are not perfectly smooth or continuous. This representation is enough for many image-processing tasks, allowing standard CNNs, which work with Euclidean spaces, to process the image by default. 

3D Convolutional Neural Network Applications - General

We can think of many applications for a 3D convolutional neural network, including:

Medical Imaging

Medical imaging refers to various techniques used to visualize the human body for clinical purposes. Medical imaging developed during the last century includes a variety of modalities, such as computed tomography (CT), magnetic resonance imaging (MRI), ultrasound, and X-ray imaging.

All these imaging techniques were, per se, revolutions in the field of medicine, enabling doctors and clinicians to non-invasively view internal structures and detect abnormalities that might otherwise go undetected.

However, interpreting medical images can be challenging and time-consuming, requiring expert knowledge. This is where the use of advanced machine learning techniques, such as 3D CNNs, can greatly improve the accuracy and efficiency of diagnosis. A 3D CNN can learn to identify patterns and structures indicative of certain diseases using multiple layers of convolutional filters to extract image features.

For example, a 3D CNN could be trained on a large dataset of MRI scans to identify brain patterns associated with Alzheimer's disease. Once trained, the network could automatically analyze new MRI scans and provide an accurate diagnosis in a fraction of the time it would take a human expert to do the same.

Furthermore, the use of 3D CNNs can also provide insights into the structure and function of the human body that might not be apparent from 2D images alone. By analyzing multiple slices of a 3D image, a CNN can construct a complete representation of the internal structures and their relationships to each other.

In conclusion, medical imaging is a crucial tool for modern medicine, and using advanced machine learning techniques such as 3D CNNs can greatly improve the accuracy and speed of image-based diagnoses. With further research and development, these techniques hold great promise for improving our understanding of the human body and improving patient outcomes

Computer Vision

A 3D convolutional neural network can be used in computer vision for action recognition, object and pattern recognition, and scene understanding. For example, the network could be trained to recognise human actions such as walking, running, or jumping.

Neural Network Training

Training a neural network involves recognising parameters to minimise errors between its predictions and outputs. The network is provided input + output data called training data. The training aims to find the best parameters that allow the network to predict the outputs for these data correctly. This is achieved through machine learning algorithms, which adjust the parameters based on the difference between the predicted and actual outputs.

More training data of better quality is crucial for successfully training neural networks. More data means better learning, but data quality is important to avoid biased or noisy results. Balancing underfitting and overfitting is another challenge, where the network may be too simple or too complex, resulting in poor performance. Techniques like regularization, early stopping, and drop-out can help find the right balance.

Choosing the right loss function is a challenge when training a neural network. The appropriate function depends on the problem being solved, such as mean squared error for regression tasks, cross-entropy for classification tasks, and hinge loss for binary classification tasks. The data quality, underfitting and overfitting balance, choice of function, and optimization algorithm all affect model performance.

Conclusion: 3D Convolutional Neural Network Applications - 3D Simulation

We have seen how 3D CNNs can extend the GNN operations range to 3D geometrical data.

This is why they are the key to building surrogate models of numerical solvers that mimic the CAD-CAE process. Here, we sum up all we have learnt and go back to explain the "magic" behind the initially presented business case.

NCS technology was first developed in Lausanne, starting from academic examples such as hydrofoil optimization with neural networks and moving quickly to industrial use cases in the automotive, aerospace, civil engineering, electronics and civil engineering sectors where CAD can be very complex and subject to hundreds or thousands of modifications in its lifecycle, without the possibility of having a 1:1 correspondence between CAD and CAE because of the latter's belated time response.

With 3D CNNs, we alleviate the delay between CAD and CAE with a neural network predictive model.

The predictive model learns bottom-up the geometrical features found in CAD and how they correspond to outputs provided by CAE. The model training aims to emulate the CAD import, meshing and solver workflow, with a lead time between 0.02 and 2 seconds instead of hours. Such an incredible speed during the execution phase is due to the simple nature of the activation functions, as we have seen before. Also, the computation of weights and any hyperparameter tuning occur once and upstream of the execution phase of Deep Learning predictions. Predictions can be carried out on specific easy Apps available for product designers, commissioning engineers, sales engineers, and any other engineer not specialized in simulation.

The Deep Learning concept of App preparation followed by multiple real-time executions differs greatly from CAE.

With Deep Learning, tuning and validating the predictive model are thus preliminary to multiple (hundreds, thousands or more) quick and easy executions where product designers explore their design spaces interactively or use generative algorithms.

Several application examples of Deep Learning and CAE are available, especially in the automotive industry, like, for example, real-time aerodynamic predictions.

In conclusion, with the growing relevance of deep learning and artificial intelligence in Engineering, 3D convolutional neural network concept is becoming important for any engineer looking to stay ahead in her or his field!