Surface class segmentation with MeshCNN

Surface class segmentation in CAD models with MeshCNN

Project by Andrés Mandado

There's been multiple approaches trying to extrapolate convolutional neural networks to irregular 3D data. MeshCNN[1] is an innovative framework to build classifiers out of convolution and pooling operations designed specifically to work with edge features in 3D meshes. In this project, we improve MeshCNN to support large realistic CAD models and evaluate its performance segmenting these models into their constituent surface types.

Introduction

CNNs have been successfully applied for classification and segmentation of regular data such as images, natural language or time series. However, the convolution and pooling operations performed by classical CNNs are not directly applicable to irregularly sampled data, such as 3D point clouds and meshes. In contrast to previous attempts [2][3] to workaround this limitation by transforming 3D data into regular representations, MeshCNN redefines convolution and pooling operations so that they're directly applicable to mesh edges. MeshCNN counts with two different architectures built out of these primitives: a mesh classifier, with a typical design of stacked convolution and pooling layers ending up in a fully connected part; and a fully-convolutional ResUnet architecture for class segmentation.

In this project we first modify MeshCNN to overcome memory and CPU bottlenecks that make learning on large meshes (more than a few thousand edges) inviable with the original implementation. Then we use our improved version to measure MeshCNN performance in a class segmentation problem, where the goal is to predict the segmentation of realistic CAD model meshes into explictely parameterized surfaces. The CAD models used are a subset of the 1 million samples available in ABC Dataset[4]. We modify MeshCNN implementation As a result of this exercise we identify shortcomings in MeshCNN design and implementation, solving some of them and leaving others for future work. Additionally, the outcome of our experiments provides some insight into the issues that may be found on surface class segmentation tasks for CAD models.

MeshCNN architecture

MeshCNN defines a set of operations where mesh edges are the primitives. Each edge has 5 input features:

Dihedral angle between two incident faces (\(\phi\))
The angles at the two vertices opposite to the edge (\(\alpha_{1}, \alpha_{2}\))
The ratio between the edge length and the triangle height for each incident face (\(\frac{\mid e\mid}{\mid h_{1}\mid}, \frac{\mid e\mid}{\mid h_{2}\mid}\))

All these features are relative and implicitly invariant to translation, rotation and unform scaling.

Convolution is applied to the 1-ring neighbourhood of each edge, made of 5 edges (4 neighbours and itself). Edge order invariance is ensured by applying symmetric functions to the neighbour edge features:

In a pooling layer, a mesh is simplified to a target number of edges (a model parameter) collapsing sequentially those edges with the smallest feature norm. A collapsed edge is removed and the features of the neighbouring edges are averaged by pairs. The correspondence between new and old edges is saved and used during unpooling to recover the original mesh structure.

Out of these mesh operations, MeshCNN builts a ResUNet model designed to learn segmentation of mesh edges into classes. UNet is a fully-convolutional neural network architecture originally designed for medical image segmentation [5]. It's characterized by a contractive path or encoder, where feature maps are convolved and pooled into progressively smaller resolutions (and hence larger receptive fields); followed by and expansive path or decoder, where the encoded features are gradually unpooled to their original resolution. Another hallmark of the UNet architecture are the so called "skip connections", with which the network convolves together feature maps of same resolution coming from the contractive and expansive paths. ResUNets [6] add residual blocks to the UNet design in order to facilitate learning of linear mappings in deep neural networks.

ABC Dataset

The CAD models used to train and test MeshCNN belong to the ABC Dataset, a compilation of 1 million publicly available models extracted from Onshape. Each model is made of explicitely parameterized surfaces and curves, and it's defined as a Boundary representation in STEP and Parasolid formats. The dataset comes together with a processing pipeline to generate meshes with high quality triangulation in OBJ format as well as surface and curve labels for vertices and faces in YAML. We only use the OBJ meshes provided in the dataset for our initial experiments, remeshing the models with a lower resultion at a later stage using ABC processing tools.

MeshCNN improvements

In its original publication [1], MeshCNN is validated using very small synthetic datasets (fewer than 1000 samples) made of relatively small meshes (2250 edges at most). By contrast, we aim to learn segmentation on a much larger dataset (up to 1 million samples) made of large CAD models (up to 200K edges) compiled from real use cases. Our initial experiments expose some limitations in the original MeshCNN implementation that make it incapable of handling these kind of workloads. As part of this project we address some of the identified limitations in the implementation. Others are circumvented through data preprocessing and listed for future work.

The first bottleneck is GPU memory usage, which grows quadratically with the number of mesh edges. Each pooling layer keeps in memory a tensor of \(edges^{2}\) elements mapping edge collapses (MeshUnion class). This makes training with half of the meshes in the ABC dataset impossible (a single pooling operation in a 50K-edge mesh would allocate ~18GB of memory just for edge collapse bookkeeping!).

\[ U = \begin{bmatrix} u_{11} & u_{12} & \dots \\ \vdots & \ddots & \\ u_{N1} & & u_{NM} \end{bmatrix} \begin{cases} M = \text{#input edges}\\ N = \text{#output edges}\\ u_{ij} = \text{# collapses of input edge i into output edge j} \end{cases} \]

The tensor though is inherently sparse, therefore we reimplement MeshUnion and MeshUnpool classes using torch.sparse_coo_tensor and sparse operations such as torch.sparse.mm and torch.sparse.sum. The memory consumption at function create_GeMM in the MeshConv layer was also reduced to less than a half by deleting temporary tensors once they're not needed and rewriting the symmetric functions so that they are done in place.

Another important bottleneck is that the edge collapsing at the MeshPool layer is done sequentially in CPU. As a result, MeshCNN training is CPU-bound and GPU utilization is low (under 30% in a NVIDIA Tesla P100). Training times become impractical for small datasets (9h per epoch for 1K meshes of 35K edges). To alleviate this problem, we modify MeshCNN code to allow for distributed training across multiple CPUs, GPUs and even nodes. Using torch.nn.parallel.DistributedDataParallel, each batch is split and processed by identical instances of the model in separate devices, averaging the gradients for each node during the backwards pass.

Thanks to these optimizations we can train MeshCNN with ABC models of average size (~35K edges) and typical batch sizes (~10), having still room to increase the number of pooling layers.

Method

The methodology followed to evaluate MeshCNN's performance on surface type segmentation for CAD models is iteratively refined through a set of initial experiments that help us finding MeshCNN limitations as well as the required ABC dataset preprocessing steps for the classification task at hand.

Dataset preprocessing

As first preprocessing step for the compilation of dataset, we detect and exclude non-manifold meshes, since they're not supported by MeshCNN. Additionally, small connected components (<10% of total number of faces) within a model are likely to become non-manifold during pooling. Models with such small components must be excluded with the original MeshCNN code. To overcome this problem, we modify MeshCNN to skip edge collapses that would result in a non-manifold mesh.

Non-manifold mesh (Model 34471)

Small component (Model 49466)

Dataset remeshing

ABC models are published with meshes of very different sizes, ranging from 2000 edges and a few surface patches, up to 200 thousand edges and thousands of surfaces. However, MeshCNN relies on input meshes having a similar number of edges. Larger meshes need to be pooled further and at more levels than smallers ones, since they have features at more scales that need to be captured in order detect surface patches of many sizes. Therefore a given MeshCNN parametrization (number of pooling layers and target sizes) may be optimal for some mesh size, but inadequate for smaller meshes, or even impracticable if they cannot be simplified enough without breaking their manifold properties.

In our first experiments we use the meshes in OBJ format provided as part of ABC dataset, taking samples of a target number of edges with a small error margin. Since ABC meshes have very high resolution, sampling meshes of the smallest sizes (5000 edges or fewer) results in very small datasets (3% of the whole ABC dataset) consisting of very simple models made of a few plane and cylindrical surfaces. To obtain subsets of acceptable size and complexity, one has to sample meshes sizes near 35000 edges (average size of ABC models). However, learning segmentation on larger meshes requires more complex models (more convolution and pooling stages), which in turn need larger training sets to obtain the same generalization error than simpler ones. Thus, in order to have reasonable training times, we remesh ABC models to a target size of 2000 edges (-2% error) using cadmesh, ABC's meshing tool . We limit the meshing error with respect to the original CAD surfaces by setting curvature-dependent face size constraints (gmsh's MeshSizeFromCurvature). That way we obtain more than 10000 meshes out of the first 170000 ABC samples.

Dataset augmentation and regularization

Edge class frequencies on 35K-edge mesh dataset

Surfaces in ABC dataset models are predominantly planes and cylinders, with less than 10% of the surfaces belonging to other classes. This imbalance in the surface class occurrence biases the model towards the overrepresented classes and impedes the learning of the underrepresented surface types.

Given that experiments using weighted loss function to penalize more misclassifications of the underrepresented classes show no noticeable improvement, we address the class imbalance by resampling. Any model made exclusively of planes and cylinders is discarded, hence downsampling these surface types. Additionally, synthetic samples are created for the other classes to upsample them. The synthetic samples were created using gmsh to mesh parametric surfaces with uniformly distributed random parameters.

Although the generation of open meshes with a single surface seems the most straightforward option to have a fine control in the class upsampling, this approach presents multiple problems:

MeshCNN applies boundary-preserving edge collapses during pooling. As a result, the prediction accuracy near the mesh boundary is severely affected (see example below)
Samples are too simplistic, the classifier cannot learn to identify class boundaries with synthetic samples made of a single surface.
Most ABC models are closed meshes.

Open cone prediction (original size)

Open cone prediction (last pool)

Plane

Ice breaker

Train set size	262
Test set size	30
Mesh size	<=5000 edges
Conv. channels	32, 64, 128, 256
Pooling sizes	4000, 3000, 1800
Batch size	4
Optimizer	Adam
Test accuracy	91.6% (15 epochs)

In this first experiment we train MeshCNN with a toy ABC subset of size and mesh resolutions similar to those used in [1], with the following objectives:

Validate the training setup, in particular the transformation of label data into the format expected by MeshCNN.
Identify potential changes required in MeshCNN and dataset preprocessing.
Verify the feasibility of using MeshCNN for surface class segmentation on ABC models.

As a result of the experiment, we identify the memory and processing bottlenecks as well as required preprocessing steps covered in the methodology.

35K-edge mesh dataset

Train set size	862
Test set size	93
Mesh size	[33000,35000] edges
Conv. channels	32, 64, 128, 256
Pooling sizes	20000, 15000, 10000
Batch size	10
Optimizer	Adam
Test accuracy	89.5% (19 epochs)

In this next experiment we compile a larger dataset with meshes of sizes close to the ABC average. We constrain the size difference (in edge count) among meshes in this dataset to be at most 10%, since training with meshes of very different sizes could potentially affect the classifier performance as explained in the methodology.

The model learns to predict the two most frequent classes (plane and cylinder surfaces), but not the rest. The optimization seems to stay at a local minima in the loss function, most probably due to the imbalance in the class occurrence.

Model 38019 (prediction)

Model 38019 (ground truth)

Plane

Cylinder

Cone

Torus

BSpline

Sphere

Revolution

Extrusion

Other

Model 38191 (prediction)

Model 38191 (ground truth)

Plane

Cylinder

Cone

Torus

BSpline

Sphere

Revolution

Extrusion

Other

Remeshed & resampled dataset

Train set size	4000
Test set size	1000
Mesh size	[1920-2000]
Conv. channels	32, 64, 128, 256, 512
Pooling sizes	1600, 1280, 1024, 850
Batch size	10
Optimizer	AdamW
Test accuracy	85% (20 epochs)

Other

Surface type frequency and prediction accuracy

Remeshed, resampled and augmented dataset

Train set size	5104(ABC)+5100(synth)
Test set size	1275
Mesh size	[1960-2000]
Conv. channels	32, 64, 128, 256, 512
Pooling sizes	1600, 1280, 1024, 850
Batch size	10
Optimizer	AdamW
Test accuracy	86.2% (20 epochs)

In this final experiment the dataset is increased to more than 10K samples, half of which are synthetic closed meshes made of randomly parameterized surfaces added to upsample the underrepresented surface types. The results show an important improvement in the prediction accuracy for sphere surfaces. There's no improvement however in the accuracy for other classes.

Discussion and Future Work

In this project we evaluate the performance of MeshCNN predicting mesh segmentation by surface type in CAD models from ABC dataset. The outcome of the experiments helps us identify multiple root causes for the poor accuracy exhibited by the tested models, some intrinsic to the input data, others related to architectural and implementation shortcomings in MeshCNN.

CAD surface type imbalance and overlapping

We've already established that the problem of segmenting ABC meshes by surface type suffers from class imbalance. More than 90% of the surfaces in ABC models are plane and cylinders, an imbalance that can probably be extrapolated to CAD models in general. [Prati 2004] supports the view that it's class overlapping, a phenomenon highly correlated to class imbalance, what has a greater impact on classification performace. Certainly, surface classes have overlaps, where some surfaces types are a subset of more general ones:

Cylinder, Cone, Torus, Sphere \( \subset \) Revolution
Cylinder, Cone, Torus, Sphere \( \subset \) BSpline
Cylinder \( \subset \) Extrusion
Plane \( \cap \) Extrusion \( \neq \emptyset \) (the surfaces resulting from the extrusion of a polygon are planes)

The most important consequence of surface class overlapping is the presence of ambiguous labelling: concrete, simpler surface class (e.g. cylinder) are sometimes generated in CAD model using a more abstract parameterization (e.g. a revolution surface) and thus labelled likewise.

The following CAD model represents a perfect pathological case, where planar surfaces, predicted as planes by the classifier, were actually created using BSplines. The other surfaces were result of a revolution, however their edges are predicted to belong to an extrusion or cylinder, and in fact they could have been created out of those surfaces too.

Model 18963 (prediction)

Model 18963 (ground truth)

Plane

Other limitations reside in the core concept behind MeshCNN: the extrapolation of the convolution and pooling operations powering CNNs to 3D meshes. MeshCNN's edge-oriented pooling exhibits some problems that its pixel-based counterpart does not have. One of these limitations, which affects exclusively open meshes, is that MeshCNN mesh pooling uses boundary-preserving edge collapsing, and thus edges in the 1-ring neighbourhoods at mesh boundaries are never collapsed. A reason behind this design choice may be that a mesh boundary could have information relevant to the classification tasks (e.g. classification of open meshes representing polygons). However, we've shown how in segmentation tasks this policy cripples the edge classification accuracy near boundaries. A potential future line of investigation could evaluate the impact of unconstrained edge collapsing on MeshCNN performance for segmentation tasks on open meshes.

A more general limitation in mesh pooling is that the target mesh size may only be achieved if the necessary amount of edge collapses can be performed without breaking the mesh manifold. Image pooling on the contrary doesn't have any limit, and in fact images are typically pooled down to much smaller sizes (e.g. 10% of the original size) in order to learn global features. For meshes, the minimum achievable pooling size varies between samples, but lies around 40% of the original size for ABC dataset models. This means that mesh features at the global scale cannot be learned by MeshCNN, since they're never pooled down enough to fit in the CNN receptive field (i.e. the convolution filter size). While some surfaces classes may be identifiable out of local features such as local curvature (e.g. sphere, cylinder, cone, torus), more general surface types, like revolution and extrusion, may only be characterized by global properties (e.g. axial symmetry in revolution surfaces). This inability to learn global features may be further exacerbated by MeshCNN adaptive edge collapsing policy, by which pooling happens non-uniformly across the mesh, thus increasing the chances that large scale features are missed.

The following example shows clearly how after the last pooling stage most of the edge collapses were done at the planar surfaces, leaving many of the original edges in the extrusion surfaces intact. Features calculated for the unpooled edges are therefore strictly local and as such indistinguishable from those that would be produced for a cylindrical surface. The global features of the extrusion, namely, the relation between the two planar surfaces at both ends of the extrusion and the extrusion surface itself, remains mostly out of reach of the model receptive field.

Model 12235 last pool (prediction)

Model 12235 last pool (ground truth)

Plane

Cylinder

Cone

Torus

BSpline

Sphere

Revolution

Extrusion

Other

Appendix

Presentations
1. Project proposal
2. Tech presentation

Surface class segmentation in CAD models with MeshCNN

Project by Andrés Mandado

Introduction

MeshCNN architecture

ABC Dataset

MeshCNN improvements

Method

Dataset preprocessing

Non-manifold mesh (Model 34471)

Small component (Model 49466)

Dataset remeshing

Dataset augmentation and regularization

Open cone prediction (original size)

Open cone prediction (last pool)

Closed cone ground truth

Closed sphere ground truth

Closed Cylinder ground truth

Closed Torus ground truth

Closed BSpline ground truth

Closed Extrusion ground truth

Closed BSpline Revolution ground truth

Closed Polygon Revolution ground truth

Experiments

Ice breaker

35K-edge mesh dataset

Model 38019 (prediction)

Model 38019 (ground truth)

Model 38191 (prediction)

Model 38191 (ground truth)

Remeshed & resampled dataset

Model 25725 (prediction)

Model 25725 (ground truth)

Model 170737 (prediction)

Model 170737 (ground truth)

Model 1578 (prediction)

Model 1578 (ground truth)

Remeshed, resampled and augmented dataset

Discussion and Future Work

CAD surface type imbalance and overlapping

Model 18963 (prediction)

Model 18963 (ground truth)

Model 1966 (prediction)

Model 1966 (ground truth)

Model 19806 (prediction)

Model 19806 (ground truth)

MeshCNN limitations

Model 12235 last pool (prediction)

Model 12235 last pool (ground truth)

Appendix

References