There's been multiple approaches trying to extrapolate convolutional neural networks to irregular 3D data. MeshCNN[1] is an innovative framework to build classifiers out of convolution and pooling operations designed specifically to work with edge features in 3D meshes. In this project, we improve MeshCNN to support large realistic CAD models and evaluate its performance segmenting these models into their constituent surface types.
CNNs have been successfully applied for classification and segmentation of regular data such as images, natural language or time series. However, the convolution and pooling operations performed by classical CNNs are not directly applicable to irregularly sampled data, such as 3D point clouds and meshes. In contrast to previous attempts [2][3] to workaround this limitation by transforming 3D data into regular representations, MeshCNN redefines convolution and pooling operations so that they're directly applicable to mesh edges. MeshCNN counts with two different architectures built out of these primitives: a mesh classifier, with a typical design of stacked convolution and pooling layers ending up in a fully connected part; and a fully-convolutional ResUnet architecture for class segmentation.
In this project we first modify MeshCNN to overcome memory and CPU bottlenecks that make learning on large meshes (more than a few thousand edges) inviable with the original implementation. Then we use our improved version to measure MeshCNN performance in a class segmentation problem, where the goal is to predict the segmentation of realistic CAD model meshes into explictely parameterized surfaces. The CAD models used are a subset of the 1 million samples available in ABC Dataset[4]. We modify MeshCNN implementation As a result of this exercise we identify shortcomings in MeshCNN design and implementation, solving some of them and leaving others for future work. Additionally, the outcome of our experiments provides some insight into the issues that may be found on surface class segmentation tasks for CAD models.
MeshCNN defines a set of operations where mesh edges are the primitives. Each edge has 5 input features:
All these features are relative and implicitly invariant to translation, rotation and unform scaling.
Convolution is applied to the 1-ring neighbourhood of each edge, made of 5 edges (4 neighbours and itself). Edge order invariance is ensured by applying symmetric functions to the neighbour edge features:
In a pooling layer, a mesh is simplified to a target number of edges (a model parameter) collapsing sequentially those edges with the smallest feature norm. A collapsed edge is removed and the features of the neighbouring edges are averaged by pairs. The correspondence between new and old edges is saved and used during unpooling to recover the original mesh structure.
Out of these mesh operations, MeshCNN builts a ResUNet model designed to learn segmentation of mesh edges into classes. UNet is a fully-convolutional neural network architecture originally designed for medical image segmentation [5]. It's characterized by a contractive path or encoder, where feature maps are convolved and pooled into progressively smaller resolutions (and hence larger receptive fields); followed by and expansive path or decoder, where the encoded features are gradually unpooled to their original resolution. Another hallmark of the UNet architecture are the so called "skip connections", with which the network convolves together feature maps of same resolution coming from the contractive and expansive paths. ResUNets [6] add residual blocks to the UNet design in order to facilitate learning of linear mappings in deep neural networks.
The CAD models used to train and test MeshCNN belong to the ABC Dataset, a compilation of 1 million publicly available models extracted from Onshape. Each model is made of explicitely parameterized surfaces and curves, and it's defined as a Boundary representation in STEP and Parasolid formats. The dataset comes together with a processing pipeline to generate meshes with high quality triangulation in OBJ format as well as surface and curve labels for vertices and faces in YAML. We only use the OBJ meshes provided in the dataset for our initial experiments, remeshing the models with a lower resultion at a later stage using ABC processing tools.
In its original publication [1], MeshCNN is validated using very small synthetic datasets (fewer than 1000 samples) made of relatively small meshes (2250 edges at most). By contrast, we aim to learn segmentation on a much larger dataset (up to 1 million samples) made of large CAD models (up to 200K edges) compiled from real use cases. Our initial experiments expose some limitations in the original MeshCNN implementation that make it incapable of handling these kind of workloads. As part of this project we address some of the identified limitations in the implementation. Others are circumvented through data preprocessing and listed for future work.
The first bottleneck is GPU memory usage, which grows quadratically with the number of mesh edges. Each pooling layer keeps in memory a tensor of \(edges^{2}\) elements mapping edge collapses (MeshUnion class). This makes training with half of the meshes in the ABC dataset impossible (a single pooling operation in a 50K-edge mesh would allocate ~18GB of memory just for edge collapse bookkeeping!).
\[ U = \begin{bmatrix} u_{11} & u_{12} & \dots \\ \vdots & \ddots & \\ u_{N1} & & u_{NM} \end{bmatrix} \begin{cases} M = \text{#input edges}\\ N = \text{#output edges}\\ u_{ij} = \text{# collapses of input edge i into output edge j} \end{cases} \]The tensor though is inherently sparse, therefore we reimplement MeshUnion and MeshUnpool classes using torch.sparse_coo_tensor and sparse operations such as torch.sparse.mm and torch.sparse.sum. The memory consumption at function create_GeMM in the MeshConv layer was also reduced to less than a half by deleting temporary tensors once they're not needed and rewriting the symmetric functions so that they are done in place.
Another important bottleneck is that the edge collapsing at the MeshPool layer is done sequentially in CPU. As a result, MeshCNN training is CPU-bound and GPU utilization is low (under 30% in a NVIDIA Tesla P100). Training times become impractical for small datasets (9h per epoch for 1K meshes of 35K edges). To alleviate this problem, we modify MeshCNN code to allow for distributed training across multiple CPUs, GPUs and even nodes. Using torch.nn.parallel.DistributedDataParallel, each batch is split and processed by identical instances of the model in separate devices, averaging the gradients for each node during the backwards pass.
Thanks to these optimizations we can train MeshCNN with ABC models of average size (~35K edges) and typical batch sizes (~10), having still room to increase the number of pooling layers.
The methodology followed to evaluate MeshCNN's performance on surface type segmentation for CAD models is iteratively refined through a set of initial experiments that help us finding MeshCNN limitations as well as the required ABC dataset preprocessing steps for the classification task at hand.
As first preprocessing step for the compilation of dataset, we detect and exclude non-manifold meshes, since they're not supported by MeshCNN. Additionally, small connected components (<10% of total number of faces) within a model are likely to become non-manifold during pooling. Models with such small components must be excluded with the original MeshCNN code. To overcome this problem, we modify MeshCNN to skip edge collapses that would result in a non-manifold mesh.
ABC models are published with meshes of very different sizes, ranging from 2000 edges and a few surface patches, up to 200 thousand edges and thousands of surfaces. However, MeshCNN relies on input meshes having a similar number of edges. Larger meshes need to be pooled further and at more levels than smallers ones, since they have features at more scales that need to be captured in order detect surface patches of many sizes. Therefore a given MeshCNN parametrization (number of pooling layers and target sizes) may be optimal for some mesh size, but inadequate for smaller meshes, or even impracticable if they cannot be simplified enough without breaking their manifold properties.
In our first experiments we use the meshes in OBJ format provided as part of ABC dataset, taking samples of a target number of edges with a small error margin. Since ABC meshes have very high resolution, sampling meshes of the smallest sizes (5000 edges or fewer) results in very small datasets (3% of the whole ABC dataset) consisting of very simple models made of a few plane and cylindrical surfaces. To obtain subsets of acceptable size and complexity, one has to sample meshes sizes near 35000 edges (average size of ABC models). However, learning segmentation on larger meshes requires more complex models (more convolution and pooling stages), which in turn need larger training sets to obtain the same generalization error than simpler ones. Thus, in order to have reasonable training times, we remesh ABC models to a target size of 2000 edges (-2% error) using cadmesh, ABC's meshing tool . We limit the meshing error with respect to the original CAD surfaces by setting curvature-dependent face size constraints (gmsh's MeshSizeFromCurvature). That way we obtain more than 10000 meshes out of the first 170000 ABC samples.
Surfaces in ABC dataset models are predominantly planes and cylinders, with less than 10% of the surfaces belonging to other classes. This imbalance in the surface class occurrence biases the model towards the overrepresented classes and impedes the learning of the underrepresented surface types.
Given that experiments using weighted loss function to penalize more misclassifications of the underrepresented classes show no noticeable improvement, we address the class imbalance by resampling. Any model made exclusively of planes and cylinders is discarded, hence downsampling these surface types. Additionally, synthetic samples are created for the other classes to upsample them. The synthetic samples were created using gmsh to mesh parametric surfaces with uniformly distributed random parameters.
Although the generation of open meshes with a single surface seems the most straightforward option to have a fine control in the class upsampling, this approach presents multiple problems:
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Thefore we opt for augmenting our datasets with closed meshes made of multiple surface types, much more similar to (albeit still simpler than) ABC models, at expense of losing the fine-grained control over the class occurrence in the augmented dataset.
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
To reduce overfitting we use AdamW optimizer, a version of Adam that includes decoupled weight decay regularization [7]: \[ \boldsymbol{\theta_{t}} \leftarrow \boldsymbol{\theta_{t-1}} - \eta_{t}\left( \frac{\alpha \boldsymbol{\hat{m}_{t}}}{\sqrt{\boldsymbol{\hat{v}}_{t}}+\epsilon} + \lambda \boldsymbol{\theta_{t-1}}\right) \]
Train set size | 262 |
---|---|
Test set size | 30 |
Mesh size | <=5000 edges |
Conv. channels | 32, 64, 128, 256 |
Pooling sizes | 4000, 3000, 1800 |
Batch size | 4 |
Optimizer | Adam |
Test accuracy | 91.6% (15 epochs) |
In this first experiment we train MeshCNN with a toy ABC subset of size and mesh resolutions similar to those used in [1], with the following objectives:
As a result of the experiment, we identify the memory and processing bottlenecks as well as required preprocessing steps covered in the methodology.
.Train set size | 862 |
---|---|
Test set size | 93 |
Mesh size | [33000,35000] edges |
Conv. channels | 32, 64, 128, 256 |
Pooling sizes | 20000, 15000, 10000 |
Batch size | 10 |
Optimizer | Adam |
Test accuracy | 89.5% (19 epochs) |
In this next experiment we compile a larger dataset with meshes of sizes close to the ABC average. We constrain the size difference (in edge count) among meshes in this dataset to be at most 10%, since training with meshes of very different sizes could potentially affect the classifier performance as explained in the methodology.
The model learns to predict the two most frequent classes (plane and cylinder surfaces), but not the rest. The optimization seems to stay at a local minima in the loss function, most probably due to the imbalance in the class occurrence.
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Train set size | 4000 |
---|---|
Test set size | 1000 |
Mesh size | [1920-2000] |
Conv. channels | 32, 64, 128, 256, 512 |
Pooling sizes | 1600, 1280, 1024, 850 |
Batch size | 10 |
Optimizer | AdamW |
Test accuracy | 85% (20 epochs) |
The dataset for this experiment contains 5000 ABC models sampled so they don't contain exclusively plane and cylinder surfaces, and remeshed to 2000 edges. Compared to the previous experiments, the model used here
has an additional convolution and pooling stage and adds weight decay regularization to the optimizer (AdamW).
Even with the reduced mesh size, the increased capacity and the surface class resampling, the model shows a poor accuracy prediciting surfaces other than planes and cylinders.
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Train set size | 5104(ABC)+5100(synth) |
---|---|
Test set size | 1275 |
Mesh size | [1960-2000] |
Conv. channels | 32, 64, 128, 256, 512 |
Pooling sizes | 1600, 1280, 1024, 850 |
Batch size | 10 |
Optimizer | AdamW |
Test accuracy | 86.2% (20 epochs) |
In this final experiment the dataset is increased to more than 10K samples, half of which are synthetic closed meshes made of randomly parameterized surfaces added to upsample the underrepresented surface types. The results show an important improvement in the prediction accuracy for sphere surfaces. There's no improvement however in the accuracy for other classes.
In this project we evaluate the performance of MeshCNN predicting mesh segmentation by surface type in CAD models from ABC dataset. The outcome of the experiments helps us identify multiple root causes for the poor accuracy exhibited by the tested models, some intrinsic to the input data, others related to architectural and implementation shortcomings in MeshCNN.
We've already established that the problem of segmenting ABC meshes by surface type suffers from class imbalance. More than 90% of the surfaces in ABC models are plane and cylinders, an imbalance that can probably be extrapolated to CAD models in general. [Prati 2004] supports the view that it's class overlapping, a phenomenon highly correlated to class imbalance, what has a greater impact on classification performace. Certainly, surface classes have overlaps, where some surfaces types are a subset of more general ones:
The most important consequence of surface class overlapping is the presence of ambiguous labelling: concrete, simpler surface class (e.g. cylinder) are sometimes generated in CAD model using a more abstract parameterization (e.g. a revolution surface) and thus labelled likewise.
The following CAD model represents a perfect pathological case, where planar surfaces, predicted as planes by the classifier, were actually created using BSplines. The other surfaces were result of a revolution, however their edges are predicted to belong to an extrusion or cylinder, and in fact they could have been created out of those surfaces too.
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
The next two examples contain BSpline surfaces that are very similar to a torus and a sphere respectively, and as a result some of their edges are classified as such.
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |
MeshCNN implementation of mesh pooling suffers from memory and CPU bottlenecks. We've greatly reduced the memory consumption during pooling by using sparse matrices for edge collapse bookeping. Additionally, to achieve faster training times, we also modify MeshCNN to use Pytorch's distributed training. However, MeshCNN remains essentially CPU-bound, and the speedup offered by distributed training is limited by inter-node communication penalities. Fine-grained GPU parallelization in contrast scales better with sample and batch size and it's one of main reasons deep learning has achieved great results in the last decade. There's multiple research papers on GPU-parallelized mesh simplification algorithms, some of which use edge collapses [9] [10] and could be therefore implemented in MeshCNN. A full in-GPU MeshCNN implementation would reduce training times considerably, accelerating training with more agressive data augmentation techniques (remeshing, vertex perturbation) that can potentially improve the generalization error.
Other limitations reside in the core concept behind MeshCNN: the extrapolation of the convolution and pooling operations powering CNNs to 3D meshes. MeshCNN's edge-oriented pooling exhibits some problems that its pixel-based counterpart does not have. One of these limitations, which affects exclusively open meshes, is that MeshCNN mesh pooling uses boundary-preserving edge collapsing, and thus edges in the 1-ring neighbourhoods at mesh boundaries are never collapsed. A reason behind this design choice may be that a mesh boundary could have information relevant to the classification tasks (e.g. classification of open meshes representing polygons). However, we've shown how in segmentation tasks this policy cripples the edge classification accuracy near boundaries. A potential future line of investigation could evaluate the impact of unconstrained edge collapsing on MeshCNN performance for segmentation tasks on open meshes.
A more general limitation in mesh pooling is that the target mesh size may only be achieved if the necessary amount of edge collapses can be performed without breaking the mesh manifold. Image pooling on the contrary doesn't have any limit, and in fact images are typically pooled down to much smaller sizes (e.g. 10% of the original size) in order to learn global features. For meshes, the minimum achievable pooling size varies between samples, but lies around 40% of the original size for ABC dataset models. This means that mesh features at the global scale cannot be learned by MeshCNN, since they're never pooled down enough to fit in the CNN receptive field (i.e. the convolution filter size). While some surfaces classes may be identifiable out of local features such as local curvature (e.g. sphere, cylinder, cone, torus), more general surface types, like revolution and extrusion, may only be characterized by global properties (e.g. axial symmetry in revolution surfaces). This inability to learn global features may be further exacerbated by MeshCNN adaptive edge collapsing policy, by which pooling happens non-uniformly across the mesh, thus increasing the chances that large scale features are missed.
The following example shows clearly how after the last pooling stage most of the edge collapses were done at the planar surfaces, leaving many of the original edges in the extrusion surfaces intact. Features calculated for the unpooled edges are therefore strictly local and as such indistinguishable from those that would be produced for a cylindrical surface. The global features of the extrusion, namely, the relation between the two planar surfaces at both ends of the extrusion and the extrusion surface itself, remains mostly out of reach of the model receptive field.
Plane | Cylinder | Cone | Torus | BSpline | Sphere | Revolution | Extrusion | Other |