Recently researchers Davide Boscaini and Fabio Poiesi from partner Fondazione Bruno Kessler (FBK) have published the scientific paper entitled “PatchMixer: Rethinking network design to boost generalization for 3D point cloud understanding.” Their contributions are from a novel deep learning architecture to process 3D point cloud data based on MLP layers only to minimize inductive biases. Additionally, they have explained a learnable patch-level feature aggregation mechanism that is performed by an attentive token mixer module and provided a comprehensive transfer learning evaluation inexistent in the literature.
While implementing an industrial-end-user-driven project to provide a human-centred AI-based solutions ecosystem, the AI-PRISM researchers are investigating the best technical means to develop the system and infrastructure that will enable the integration, interaction, and deployment of our solutions. Our partner FBK is working on the Human Centred Collaborative Platform, aiming at generating a digitalized environment by sensor data fusion.
Today, we are interviewing Dr Davide Boscaini from FBK, involved in this process of research in the AI-PRISM project, to discuss their findings and impact. Davide is a Research Scientist at the Technologies of Vision Lab of the Fondazione Bruno Kessler in Trento, Italy.
Hello, Dr Boscaini it is a pleasure to have you with us!
Could you give us further details of the challenges of assessing the quality of a deep network architecture?
During our technology transfer activities, we found that methods reaching state-of-the-art performance on official benchmarks often underperform in real-world scenarios. By further investigating this finding, we found that one of the reasons behind this behaviour is because official benchmarks measure performance only across data distributions sampled form the same domain. In this paper we show that this choice does not suffice to comprehensively assess the effectiveness of a model because datasets provided by official benchmarks have limited cardinality and their training and test distributions are similar. We instead propose to perform comparative experiments by assessing the generalization ability of different architectures to different domains.
Can you walk us through the approach described in your research?
In this work, we propose PatchMixer, a simple yet effective architecture that extends the ideas behind the recent MLP-Mixer paper to 3D point clouds. The novelties of our approach are the processing of local patches instead of the whole shape to promote robustness to partial point clouds, and the aggregation of patch-wise features using an MLP as a simpler alternative to the graph convolutions or the attention mechanisms that are used in prior works. PatchMixer introduces less inductive priors than competitors and exhibits improved generalization abilities.
What were the results of the experiments conducted, and how will they be used for AI-PRISM developments?
We evaluated our method on the shape classification and part segmentation tasks, achieving superior generalization performance compared to a selection of the most relevant deep architectures. PatchMixer can be listed as one of the AI-based Perception Enhancing modules (T4.2) of the AI-PRISM project, and it will be used to classify the 3D shapes created by the Ambient Digitalization AI-PRISM module (T3.2) without requiring manual annotations of AI-PRISM data.
Finally, which other stakeholders can benefit from your contributions, and what is their impact on their industries or sectors?
PatchMixer impacts academia by proposing a novel approach to measure the quality of a 3D deep learning architecture design.
PatchMixer is of interest to any stakeholder processing 3D shapes for which it is not practical to provide manual annotation.
Thank you very much Dr Bocaini
Thanks for having me.
The full paper is available here !
Do you want to learn more about AI-PRISM research and developments? Subscribe to our newsletter and follow us on LinkedIn and Twitter, so you don’t miss a thing!