Paper status: completed

Machine learning and deep learning based predictive quality in manufacturing: a systematic review

Published:05/28/2022
Original Link
Price: 0.100000
2 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This review systematically analyzes 2012-2021 studies on ML and DL for predictive quality in manufacturing, categorizing methods and data usage, identifying challenges, and outlining future research to advance data-driven quality assurance.

Abstract

Journal of Intelligent Manufacturing (2022) 33:1879–1905 https://doi.org/10.1007/s10845-022-01963-8 Machine learning and deep learning based predictive quality in manufacturing: a systematic review Hasan Tercan 1 · Tobias Meisen 1 Received: 22 December 2021 / Accepted: 5 May 2022 / Published online: 28 May 2022 © The Author(s) 2022 Abstract With the ongoing digitization of the manufacturing industry and the ability to bring together data from manufacturing processes and quality measurements, there is enormous potential to use machine learning and deep learning techniques for quality assurance. In this context, predictive quality enables manufacturing companies to make data-driven estimations about the product quality based on process data. In the current state of research, numerous approaches to predictive quality exist in a wide variety of use cases and domains. Their applications range from quality predictions during production using sensor data to automated quality inspection in the field based on measurement data. However, there is currently a lack of an overall view of where predictive quality research stands as a whole, what approaches are currently being investi

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Machine learning and deep learning based predictive quality in manufacturing: a systematic review

1.2. Authors

Hasan Tercan and Tobias Meisen

1.3. Journal/Conference

The paper was published in an academic journal, specifically stated as being "Published online: 28 May 2022". Given the typical publication venues for systematic reviews in engineering and computer science, it is likely a reputable journal in the field of manufacturing, artificial intelligence, or industrial informatics. The Publisher's Note at the end mentions "Springer Nature", indicating it's published by a major academic publisher.

1.4. Publication Year

Received: 22 December 2021 / Accepted: 5 May 2022 / Published online: 28 May 2022. The publication year is 2022. The review covers literature from 2012 to 2021.

1.5. Abstract

This systematic review investigates the application of machine learning (ML) and deep learning (DL) techniques for predictive quality in manufacturing. Motivated by the increasing digitization of manufacturing and the availability of process and quality data, the paper highlights the potential for data-driven quality assurance. Despite numerous existing approaches across diverse use cases—from in-production quality predictions using sensor data to automated quality inspection via measurement data—a comprehensive overview of the field's current state, prevalent approaches, and existing challenges is lacking. To address this, the authors conducted a systematic review of scientific publications from 2012 to 2021. They categorized these publications based on the manufacturing processes addressed, the data bases utilized, and the ML models employed. The review aims to provide key insights into the field's scope, identify gaps and similarities in solution approaches, and derive open challenges. Finally, it offers an outlook on future research directions to overcome these challenges.

The original source link is provided as /files/papers/6903060f59708f78ec6faed6/paper.pdf. Based on the abstract and the provided text content, this appears to be the officially published version of the paper.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the lack of a comprehensive and up-to-date overview of the research landscape concerning predictive quality in manufacturing, particularly using machine learning (ML) and deep learning (DL).

This problem is important because:

  • The manufacturing industry is undergoing significant digitization (Industry 4.0), leading to an abundance of process data and quality measurements.

  • This data presents an enormous potential for quality assurance through ML and DL techniques.

  • Predictive quality allows manufacturing companies to make data-driven estimations about product quality based on process data, enabling proactive decision-making to avoid defects and improve efficiency.

    Specific challenges or gaps in prior research include:

  • While numerous predictive quality approaches exist across various use cases and domains (e.g., inline fault predictions, automated quality inspection), they are often considered in isolation.

  • This isolated view makes it difficult to compare proposed approaches and understand the overall state of predictive quality research.

  • Existing reviews are either too broad (e.g., general ML applications in production) or outdated, failing to cover recent advancements in ML and DL.

    The paper's entry point or innovative idea is to conduct a systematic, comprehensive, and up-to-date review specifically focused on ML and DL based predictive quality in manufacturing over the last decade (2012-2021), categorizing publications by manufacturing processes, data bases, and ML models to identify trends, gaps, and future directions.

2.2. Main Contributions / Findings

The paper's primary contributions are:

  • Comprehensive Overview: Providing a systematic review of 81 scientific publications from 2012 to 2021 on ML and DL based predictive quality in manufacturing.

  • Categorization Framework: Establishing a clear categorization scheme based on manufacturing processes (DIN 8580), quality criteria, data sources, input variables, data modality, learning tasks, and ML/DL models.

  • Key Insights: Identifying the scope of predictive quality applications, common data sources and characteristics, and prevalent ML/DL models used across different manufacturing domains.

  • Identification of Gaps and Challenges: Pinpointing significant research gaps, such as imbalances in covered manufacturing processes, lack of integration into real production, small data amounts, scarcity of benchmark datasets, and underutilization of novel DL methods.

  • Future Research Directions: Proposing concrete future research directions, including synthetic data generation, benchmark data sets, exploration of novel deep learning methods (e.g., Transformer networks, Graph Neural Networks), advanced time series classification and forecasting, transfer learning and continual learning, and strategies for integration and deployment in industrial settings with MLOPs and certification.

    Key conclusions or findings reached by the paper include:

  • Predictive quality is a versatile and powerful tool, primarily validated for prognostic quality and accuracy.

  • Cutting and joining processes dominate the research, while others like coating and changing material properties are underrepresented.

  • The majority of studies rely on real manufacturing data, often generated experimentally with small sample sizes (median of 144 samples).

  • Process parameters and sensor data are the most common input variables, sometimes combined for better performance. Product design and material properties are often overlooked.

  • Numerical/continuous and image data are the most frequent data modalities. Time series data is often transformed into scalar features.

  • Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are the most popular prime models, especially in recent years, with CNNs excelling in image data tasks.

  • There's a significant need for larger, more representative benchmark datasets and better methods for synthetic data generation.

  • The process integration and real-time capability of predictive quality solutions in actual manufacturing environments remain a significant challenge, with a lack of evaluation using quality-oriented metrics.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a reader should be familiar with the following fundamental concepts:

  • Industry 4.0 (I4.0): This refers to the fourth industrial revolution, characterized by the digitalization and integration of manufacturing processes. It involves the use of cyber-physical systems, the Internet of Things (IoT), cloud computing, and artificial intelligence to create "smart factories." The goal is to achieve greater automation, real-time data exchange, and decentralized decision-making, leading to increased efficiency, flexibility, and productivity in manufacturing.

  • Digitization of Manufacturing: The process of converting information from analog to digital form and integrating digital technologies into manufacturing operations. This includes collecting data from machines and sensors, using digital twins, and implementing advanced analytics and automation.

  • Quality Assurance (QA): A system of activities designed to ensure that products or services meet specified quality requirements. In manufacturing, it involves monitoring and verifying processes and products throughout their lifecycle to prevent defects and ensure customer satisfaction.

  • Predictive Quality: A data-driven approach to quality assurance that uses machine learning and deep learning models to estimate the quality of a product or process based on various input data (e.g., process parameters, sensor readings). The goal is to predict potential quality issues before they occur or are fully realized, enabling proactive intervention to prevent defects, reduce waste, and optimize production. The paper defines it as: "Predictive quality comprises the use of machine learning and deep learning methods in production to estimate product-related quality based on process and product data with the goal of deriving quality-enhancing insights."

  • Machine Learning (ML): A subfield of artificial intelligence that enables systems to learn from data without being explicitly programmed. ML algorithms build a mathematical model based on sample data, known as "training data," to make predictions or decisions without being specifically programmed to perform the task. Key types of ML relevant here are:

    • Supervised Learning: This is the primary focus of predictive quality. In supervised learning, the algorithm learns from labeled data, meaning the input data is paired with the correct output (target variable). The goal is to learn a mapping function from input to output so that it can predict the output for new, unseen input data.
      • Classification: A supervised learning task where the model predicts a categorical (discrete) output. For example, classifying a product as "OK" or "Not OK" (OK/NOK), or identifying different types of defects (e.g., "crack," "porosity," "roughness").
      • Regression: A supervised learning task where the model predicts a numerical (continuous) output. For example, predicting the exact value of surface roughness, tensile strength, or product dimensions.
    • Unsupervised Learning: Algorithms learn from unlabeled data, identifying patterns or structures within the data without prior knowledge of correct outputs. While not the main focus, anomaly detection is a related concept often addressed by unsupervised learning, though explicitly excluded from this review's scope as anomalies are not initially associated with a known defect.
    • Reinforcement Learning: Algorithms learn to make decisions by performing actions in an environment and receiving rewards or penalties based on their outcomes. Not a primary focus of this paper.
  • Deep Learning (DL): A subfield of ML that uses artificial neural networks with multiple layers (hence "deep") to learn representations of data with multiple levels of abstraction. DL models have shown remarkable success in tasks involving large amounts of data, especially for image, speech, and text processing.

  • Artificial Neural Networks (ANNs): Computational models inspired by the structure and function of biological neural networks. They consist of interconnected nodes (neurons) organized in layers (input, hidden, output). Each connection has a weight, and neurons apply an activation function to their weighted sum of inputs. ANNs can learn complex, non-linear relationships.

    • Multilayer Perceptron (MLP): A type of feed-forward ANN characterized by multiple layers of neurons (at least three: an input layer, one or more hidden layers, and an output layer). MLPs are versatile and widely used for both classification and regression tasks.
    • Convolutional Neural Network (CNN): A specialized type of ANN primarily used for processing data with a grid-like topology, such as image data. CNNs employ convolutional layers that automatically learn spatial hierarchies of features from the input, making them highly effective for image recognition, object detection, and visual inspection tasks.
    • Recurrent Neural Network (RNN): A type of ANN designed to process sequential or time series data. Unlike feed-forward ANNs, RNNs have connections that form directed cycles, allowing them to maintain an internal state (memory) and use information from previous inputs in the sequence to influence the processing of current inputs.
      • Long Short-Term Memory (LSTM): A special kind of RNN capable of learning long-term dependencies. LSTMs have internal memory cells and gating mechanisms (input, output, and forget gates) that regulate the flow of information, effectively addressing the vanishing gradient problem common in traditional RNNs and enabling them to capture patterns over extended sequences.
    • Transformer Networks: A DL model, primarily used in natural language processing (NLP), that relies entirely on self-attention mechanisms, eschewing recurrence and convolutions. They are highly effective for sequential data and have recently been adapted for computer vision (e.g., Vision Transformers).
  • Support Vector Machine (SVM): A supervised learning model used for classification and regression. SVMs find an optimal hyperplane in a high-dimensional space that best separates different classes (or fits data points in regression), maximizing the margin between them. They are effective for non-linear classification by using kernel tricks.

  • Random Forest: An ensemble learning method for classification and regression. It operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (for classification) or mean prediction (for regression) of the individual trees. It reduces overfitting and improves accuracy compared to single decision trees.

  • Decision Tree: A non-parametric supervised learning method used for classification and regression. It creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. It structures decisions in a tree-like model of choices and their possible consequences.

  • Data Modality: Refers to the type or format of data collected. Common modalities in manufacturing include:

    • Numerical/Continuous Data: Quantitative data with scalar values that can take any value within a given range (e.g., temperature, pressure, dimensions).
    • Categorical/Discrete Data: Qualitative data that can take on a limited number of values, often representing types or categories (e.g., tool type, material batch, OK/NOK status).
    • Time Series Data: A sequence of data points indexed in time order, typically collected from sensors over a period (e.g., vibration signals, welding current over time).
    • Image Data: Visual data, often 2D images captured by cameras or other imaging sensors, used for visual inspection and defect detection.

3.2. Previous Works

The paper explicitly mentions and differentiates itself from several related survey papers:

  • Köksal et al. (2011): This study conducted an extensive literature review of data mining applications for quality improvement tasks in manufacturing.
    • Differentiation: While relevant, this study dates back to 2011, meaning it does not cover the significant advancements in ML and DL that have occurred since, especially with the rise of deep learning. The current paper aims to provide an up-to-date view.
  • Rostami et al. (2015): This paper proposed a similar approach but with a specific focus on applications of Support Vector Machines (SVMs) in manufacturing quality assessment.
    • Differentiation: Similar to Köksal et al., this review is also several years old and narrow in scope, focusing only on SVMs. It does not encompass the broader range of ML and DL methods, particularly newer deep learning architectures.
  • Weichert et al. (2019): This review focused on machine learning applications for production process optimization with regard to product- or process-specific metrics, often based on root-cause analysis and fault diagnosis.
    • Differentiation: While there are overlaps, Weichert et al. primarily addressed process optimization approaches. The current paper, in contrast, focuses specifically on quality estimation (prediction and classification) and evaluates approaches based on the data and methods used for that purpose.
  • Broader ML/AI in Production Surveys: The paper also acknowledges other extensive studies that cover AI and ML techniques in the production and manufacturing context but with a broader scope:
    • Shang and You (2019): Overview of data analytics for various application task areas like process monitoring and optimization, also discussing usability and interpretability.
    • Fahle et al. (2020) and Mayr et al. (2019): Studied ML applications in different task scenarios such as process planning and control.
    • Sharp et al. (2018): Focused on cross-domain applications in the product lifecycle.
    • Related Fields: Surveys on ML-based predictive maintenance (Dalzochio et al., 2020; Zonta et al., 2020), condition monitoring (Serin et al., 2020b), and machine fault diagnosis (Ademujimi et al., 2017).
    • Differentiation: The current paper distinguishes itself by explicitly focusing on approaches that primarily address the quality of the products produced, rather than broader ML applications in manufacturing, process optimization, predictive maintenance, or fault diagnosis for machinery.

3.3. Technological Evolution

The evolution of technologies in this field can be traced through several stages:

  1. Early Data-Driven Quality (Pre-2010s): Initial attempts at using data for quality involved traditional statistical process control (SPC), Six Sigma, and basic data mining techniques. These methods often relied on simpler linear models or expert-rule systems. Reviews like Köksal et al. (2011) represent this era, focusing on data mining for quality improvement.

  2. Rise of Traditional Machine Learning (Early 2010s): With increased computational power and algorithm development, more sophisticated ML models like SVMs, Random Forests, and MLPs became accessible. Rostami et al. (2015) focusing on SVMs highlights the growing interest in specific ML algorithms for quality applications.

  3. Deep Learning Revolution (Mid-2010s onwards): The breakthrough of deep learning, particularly CNNs for image data and RNNs/LSTMs for sequential data, transformed AI capabilities. This period saw DL models achieve state-of-the-art performance in complex tasks like visual inspection and time series prediction. The current paper's timeframe (2012-2021) directly captures this shift, showing a significant increase in CNN and LSTM usage.

  4. Industry 4.0 Integration: Parallel to ML/DL advancements, the Industry 4.0 paradigm has driven the integration of sensor technologies, data collection infrastructure, and digital twins into manufacturing. This creates the necessary data ecosystem for predictive quality solutions to thrive.

    This paper fits within the technological timeline by documenting the shift from earlier data mining and traditional ML approaches to the widespread adoption and exploration of deep learning models within the context of Industry 4.0 for the specific application of predictive quality.

3.4. Differentiation Analysis

Compared to the main methods in related work, the core differences and innovations of this paper's approach are:

  • Specific Focus: Unlike broader surveys of ML in manufacturing, this paper has a precise and narrow focus: machine learning and deep learning specifically for predictive quality of products (prediction/classification of product-related quality based on process/product data). This excludes related but distinct fields like predictive maintenance, fault diagnosis (for machines), and general process optimization.
  • Time Scope: By analyzing publications from 2012 to 2021, it captures the most recent decade of research, encompassing the rapid advancements in deep learning that older reviews missed.
  • Systematic Categorization: The review employs a structured methodology (defined guiding questions, detailed categories like DIN 8580 for processes, various data characteristics, and model types) to provide a granular and comparable analysis across studies. This structured approach allows for the identification of specific trends, commonalities, and gaps that a less systematic review might overlook.
  • Identification of Gaps and Future Directions: Beyond summarizing existing work, the paper actively derives open challenges and future research directions grounded in the systematic analysis, offering a forward-looking perspective crucial for guiding future academic and industrial efforts. This includes highlighting the need for synthetic data, benchmark datasets, and the adoption of novel deep learning architectures like Transformers.

4. Methodology

4.1. Principles

The core idea of the method used in this paper is to conduct a systematic review of scientific literature to provide a comprehensive, structured overview of machine learning and deep learning applications for predictive quality in manufacturing. The theoretical basis or intuition behind this approach is that by systematically collecting, categorizing, and analyzing a defined body of literature, one can gain a clear understanding of the current state of research, identify prevalent trends, uncover existing gaps, and formulate informed directions for future work. This methodology ensures rigor and reduces bias compared to anecdotal or less structured literature reviews.

The authors define predictive quality as: "Predictive quality comprises the use of machine learning and deep learning methods in production to estimate product-related quality based on process and product data with the goal of deriving quality-enhancing insights."

The common approach to predictive quality includes four main steps (schematically illustrated in Fig. 1):

  1. Formulation of the manufacturing process and target quality: Clearly defining what process is being analyzed and what specific quality aspect (e.g., surface roughness, defect types) is to be predicted.

  2. Selection and collection of process and quality data: Gathering relevant data from the manufacturing environment, which could include process parameters, sensor data, or product measurements.

  3. Training of an ML/DL model: Using the collected data to train a machine learning or deep learning model to learn the relationship between input data and product quality.

  4. Use of the model for estimations and decision support: Deploying the trained model to make predictions about product quality, which can then inform decisions for quality enhancement, process adjustment, or automated inspection.

    The following figure (Figure 1 from the original paper) illustrates the four main steps of the common approach to predictive quality:

    该图像是一个示意图,展示了制造业中基于机器学习和深度学习的预测质量流程,包括制造过程、数据基础和模型训练与预测三个主要环节。 该图像是一个示意图,展示了制造业中基于机器学习和深度学习的预测质量流程,包括制造过程、数据基础和模型训练与预测三个主要环节。

4.2. Core Methodology In-depth (Layer by Layer)

The systematic review methodology employed in this paper follows a structured process to ensure comprehensiveness and relevance.

4.2.1. Guiding Questions (Q1, Q2, Q3)

The review is driven by three main questions, which serve as the framework for data extraction and analysis:

  • Q1: What are the addressed manufacturing processes and quality criteria? This question aims to understand the scope of predictive quality applications, identify common manufacturing processes and quality criteria being studied, and uncover similarities or gaps in domains.
  • Q2: What are the characteristics of the data used for model training? This focuses on the data sources (e.g., running production, simulation), input variables (e.g., sensor data), data modality (e.g., time series, categorical data), and data amount utilized.
  • Q3: Which machine learning models of supervised learning are commonly trained? This question explores the supervised learning tasks (e.g., classification, regression), specific ML and DL models employed for quality estimations, and whether they are compared against other models.

4.2.2. Categorization Scheme

To answer the guiding questions, the authors developed a set of categories for reviewing and summarizing the publications. The following are the categories (Table 1 from the original paper):

The following are the results from Table 1 of the original paper:

Question Category Description
Q1 Use case Main use case of paper and purpose of predictive quality
Process Addressed manufacturing process (e.g. laser welding, deep drawing)
Category Category of process according to DIN 8580 (e.g. cutting, forming)
Criterion Estimated quality criterion (e.g. product dimensions, OK/NOK quality)
Q2 Data source Main source of process data (e.g. running production, simulation)
Input variables Data parameters used for the model training (e.g. sensor data)
Data modality Data types of gathered data (e.g. time series, categorical data)
Data amount Number of observations used for model training
Q3 Learning task Formulated learning task (e.g. classification, regression)
Prime model Primarily used (or best performing) ML/DL model (e.g. SVM, CNN)
Baselines ML/DL-models used for comparison to prime model (e.g. SVM, CNN)

4.2.3. Literature Search Strategy

The literature search was performed in the Web of Science and ScienceDirect databases. To cover the broad scope of predictive quality and manufacturing, a comprehensive set of search terms was defined and categorized into three groups:

  • Predictive Quality: Terms directly related to quality prediction (predictive quality, predictive analytics, fault prediction, fault classification, defect prediction, quality prediction, smart manufacturing).

  • Learning: Terms related to the core methodologies (deep learning, neural network, machine learning).

  • Domain: Terms specifying the application area (manufacturing, production, industrial, engineering, automation, assembly).

    The following are the results from Table 2 of the original paper:

    Category Search terms
    Predictive Quality Predictive quality, predictive analytics, fault prediction, fault classification, defect prediction, quality prediction, smart manufacturing
    Learning Deep learning, neural network, machine learning
    Domain Manufacturing, production, industrial, engineering, automation, assembly

Search queries were formulated to find publications containing at least one term from each of these three categories. The search was restricted to publications from 2012 to 2021 (conducted on June 29, 2021). This initial search yielded 1,261 potentially relevant publications.

4.2.4. Screening and Exclusion Criteria

A multi-stage screening process was applied:

  1. Initial Screening (Title and Abstract): Publications were first screened based on their titles and abstracts. Many were discarded if they fell into unrelated fields such as predictive maintenance, fault diagnosis, remaining useful lifetime prediction, software defect prediction, water quality prediction, process engineering, or civil engineering. This step reduced the pool to 144 publications.
  2. Detailed Review (Full Text) and Exclusion Criteria: The remaining 144 publications were then read in detail and categorized according to the scheme in Table 1. During this process, publications were excluded if they met any of the following criteria:
    • Do not contain information about the addressed manufacturing process or the data basis.

    • Are survey papers or literature studies (as this review aims to analyze primary research).

    • Do not perform any development, implementation, or evaluation of methods.

    • Are not accessible to the reviewers.

      This rigorous selection process resulted in a final corpus of 81 publications for the systematic review. The overall methodology is summarized in the following figure (Figure 2 from the original paper):

      Fig. 2 Methodology of the literature search 该图像是论文中的流程示意图,展示了文献检索的方法论。图中显示从两个数据库“Web of Science”和“ScienceDirect”检索到的初始文献数量,经过预筛选和排除标准,最终确定用于综述的文献数量。

4.2.5. Final Corpus

The final corpus consisted of 81 selected publications. The majority (69%) were published in journals, and 31% in scientific conference proceedings. The number of publications per year showed a consistent increase, particularly between 2012 and 2020.

The following figure (Figure 3 from the original paper) illustrates the number of publications per year:

Fig. 3 Number of publications per year. The literature search was performed on June 29, 2021 该图像是一个柱状图,展示了2012年至2021年间预测质量相关论文的年度发表数量,从2012年至2019年逐步增长,2020年达到峰值30篇,2021年有所下降。图表反映了该领域研究的增长趋势。

5. Experimental Setup

As this paper is a systematic review, it does not involve traditional experimental setups with datasets, models, and evaluation metrics in the same way an empirical research paper would. Instead, its "experimental setup" is its methodology for conducting the review itself.

5.1. Datasets

The "dataset" for this systematic review is the corpus of 81 selected scientific publications.

  • Source: Publications identified through searches in Web of Science and ScienceDirect databases.
  • Scale: The corpus consists of 81 peer-reviewed papers.
  • Characteristics: These papers exclusively deal with machine learning and deep learning for predictive quality in manufacturing. They span the period from 2012 to 2021. The types include both journal articles (69%) and conference papers (31%).
  • Domain: The publications cover a wide range of manufacturing processes and quality criteria, as detailed in the results section.
  • Why chosen: This corpus was chosen to provide a comprehensive and up-to-date overview of the research landscape, capturing recent advancements in ML/DL and Industry 4.0. The stringent selection criteria ensure that only highly relevant and methodologically sound primary research papers are included.

5.2. Evaluation Metrics

The "evaluation metrics" for this systematic review are the categories defined in Table 1, which serve as criteria for extracting and synthesizing information from the selected publications. These categories are structured to address the three guiding questions (Q1, Q2, Q3) of the review.

For each publication, the following information was extracted and categorized:

  • Use case: The main application scenario and purpose of predictive quality.

  • Process: The specific manufacturing process addressed (e.g., laser welding, deep drawing).

  • Category: The manufacturing process category according to DIN 8580 (e.g., cutting, forming).

  • Criterion: The estimated quality criterion (e.g., product dimensions, OK/NOK quality).

  • Data source: The origin of the process data (e.g., running production, simulation).

  • Input variables: The data parameters used for model training (e.g., sensor data).

  • Data modality: The data types of gathered data (e.g., time series, categorical data).

  • Data amount: The number of observations used for model training.

  • Learning task: The formulated learning task (classification or regression).

  • Prime model: The primarily used (or best performing) ML/DL model.

  • Baselines: ML/DL models used for comparison to the prime model.

    These categories serve as qualitative and quantitative metrics to systematically analyze and compare the diverse approaches presented in the literature. For example, Data amount allows for quantitative analysis of data usage trends, while Prime model and Baselines enable an understanding of model prevalence and comparative studies.

5.3. Baselines

In the context of a systematic review, "baselines" don't refer to models being compared in experiments, but rather to existing literature reviews or surveys that cover similar ground. The paper differentiates its scope from these previous works to establish its unique contribution.

The main baselines (related surveys) against which this review implicitly compares its scope and timeliness are:

  • Köksal et al. (2011): This review focused on data mining for quality improvement. The current paper differentiates by being more recent and focused on ML/DL, especially deep learning.

  • Rostami et al. (2015): This review focused on SVM applications. The current paper differentiates by covering a broader range of ML/DL models and being more up-to-date.

  • Weichert et al. (2019): This review focused on process optimization. The current paper differentiates by focusing specifically on product quality estimation.

  • Broader ML/AI in Production Surveys: Other surveys on ML/AI in manufacturing (e.g., Shang & You, 2019; Fahle et al., 2020; Mayr et al., 2019; Sharp et al., 2018) or predictive maintenance (Dalzochio et al., 2020; Zonta et al., 2020) are mentioned but deemed to have a broader or different scope.

    By highlighting these prior works, the authors establish that their review fills a specific gap: providing a current, comprehensive, and focused analysis of ML/DL for predictive quality in manufacturing.

6. Results & Analysis

This section presents the findings of the systematic review, structured according to the guiding questions.

6.1. Manufacturing Process Types and Quality Criteria

The review categorized manufacturing processes based on DIN8580:200309DIN 8580:2003-09. Additionally, additive manufacturing, assembly processes, and multi-stage processes were added.

The following figure (Figure 4 from the original paper) illustrates the number of publications for each manufacturing process type:

Fig.4 Number of publications for each manufacturing process type 该图像是图4的柱状图,展示了不同制造工艺类型的相关文献发表数量。图中显示,切削工艺的文献数量最多,其次是连接和初级成形工艺,其他如成形、增材制造等数量相对较少。

  • Overall Distribution: As seen in Figure 4, cutting processes comprise the largest group (26 publications), followed by joining (14 publications). Primary shaping and forming each have 10 publications. Additive manufacturing has 8, assembly has 5, and coating has 4. Multi-stage processes also have 4 publications. Notably, there are no publications primarily addressing changing material properties.

6.1.1. Cutting Processes

  • Dominance: This category has the most research.
  • Common Criteria: Most research focuses on quality characteristics reflecting the product's shape, with surface roughness being the most prevalent. Other criteria include hole diameter, kerf waviness, geometric deviation, material removal rate, machinability, and groove geometry.
  • Examples:
    • Turning: Surface roughness (e.g., Du et al., 2021; Elangovan et al., 2015) using multivariate regression or ANNs based on sensor data and machine parameters.

    • Drilling: Hole diameter (Neto et al., 2013) and surface roughness (Vrabel et al., 2016).

    • Laser cutting: Surface roughness (Tercan et al., 2016, 2017) and kerf waviness (Nguyen et al., 2020).

      The following are the results from Table 3 of the original paper, showing considered cutting and joining processes and quality criteria:

      Category Process Quality criteria
      Cutting Turning (6) Surface roughness (Acayaba & de Escalona, 2015; Du et al., 2021; Elangovan et al., 2015; Moreira et al., 2019; Tuar et al., 2017), machinability (Lutz et al., 2020)
      Drilling (6) Diameter (Neto et al., 2013), Schorr et al. 2020a, 2020b, surface roughness (Vrabel et al., 2016), hole defects (Jiao et al., 2020), surface gap (Bustillo et al., 2018)
      Laser cutting (4) Surface roughness (Tercan et al., 2016, 2017; Zhang & Lei, 2017), kerf waviness (Nguyen et al., 2020)
      Milling (3) Surface roughness (Hossain & Ahmad, 2014; Serin et al., 2020a), geometric deviation (de Oliveira Leite et al., 2015)
      Honing (2) Curface roughness (Gejji et al., 2020; Klein et al., 2020)
      C. M. polishing (1) Material removal rate (Yu et al., 2019)
      Diamond wire cutting (1) Surface roughness (Kayabasi et al., 2017)
      Grinding (1) Surface roughness (Varma et al., 2017)
      Laser micro grooving (1) Groove geometry (Zahrani et al., 2020)
      Joining Laser machining (1) Dimensions (McDonnell et al., 2021)
      Laser welding (4) Weld bead dimensions (Ai et al., 2016; Lei et al., 2019), tensile strength (Yu et al., 2016), quality types (Yu et al., 2020)
      Resistance spot welding (3) Tensile shear strength (Hamidinejad et al., 2012), tensile shear load bearing (Martín et al., 2016), welding deformation (Li et al., 2020a)
      Ultrasonic welding (3) Quality types (Goldman et al., 2021; Li et al., 2020b), tensile strength (Natesh et al., 2019)
      Gas metal arc welding (2) Weld penetration (Gyasi et al., 2019), weld bead dimensions (Wang et al., 2021)
      Gluing (1) Glue volume (Dimitriou et al., 2020)
      Welding (1) Residual stress (Dhas & Kumanan, 2014)

6.1.2. Joining Processes

  • Focus: The second-largest category, mainly welding applications.
  • Common Criteria: Tensile strength, weld bead dimensions, residual stress, quality types (classification).
  • Examples:
    • Laser welding: ANNs trained on welding parameters or sensor data to predict tensile strength (Yu et al., 2016) or weld bead dimensions (Ai et al., 2016).
    • Spot welding and ultrasonic welding: Similar approaches for tensile shear strength or quality types.
    • Gluing: Glue volume estimation based on 3D laser topology scans (Dimitriou et al., 2020).

6.1.3. Primary Shaping Processes

  • Focus: Producing a defined shape from shapeless material (10 publications).
  • Common Criteria: Casting defects, product dimensions, product weight, warpage, product geometry, yield stress, yarn quality (count-strength-product), leveling action point.
  • Examples:
    • Casting: CNNs on X-ray images (Ferguson et al., 2018) or MLPs on sensor data (Kim et al., 2018) for defect detection.
    • Injection molding: Product dimensions (Ke & Huang, 2020) or product weights (Ge et al., 2012) from machine parameters.

6.1.4. Forming Processes

  • Focus: Transforming raw parts into different shapes (10 publications).
  • Common Criteria: Surface defects, slab geometry, part defects, machine speed, process feasibility, shear deformation.
  • Examples:
    • Metal rolling: In-line quality estimations like surface defects detection using CNNs on camera images (Yun et al., 2020) or NOK quality prediction from ultrasonic measurements (Lieber et al., 2013).
    • Sheet metal forming: Part defects prediction using LSTMs on sensor data (Meyes et al., 2019) or simulated experiments (Dib et al., 2020).

6.1.5. Additive Manufacturing Processes

  • Focus: 8 publications, used both in design and realization phases.
  • Common Criteria: Geometric deviation, inherent strain, structural defects, single-track width, volume porosity, tensile strength, surface roughness.
  • Examples:
    • Design phase: ANNs for predicting inherent strain (Li & Anand, 2020) or geometric deviations (Zhu et al., 2020) from process simulations.

    • Realization phase: Quality predictions based on optical measurements (Gaikwad et al., 2020) or machine sensors (Li et al., 2019).

      The following are the results from Table 4 of the original paper, showing considered primary shaping, forming, and additive manufacturing processes and quality criteria:

      Process type Process Quality criteria
      Primary shaping Casting (3) Casting defects (Ferguson et al., 2018; Kim et al., 2018; Lee et al., 2018)
      Injection molding (3) Product dimensions (Ke & Huang, 2020), product weight (Ge et al., 2012), warpage (Alvarado-Iniesta et al., 2012)
      Plastics extrusion (2) Product geometry (Garcia et al., 2019), yield stress (Mulrennan et al., 2018)
      Forming Spinning (2) Yarn strength (Nurwaha & Wang, 2012), sliver evenness (Abd-Ellatif, 2013)
      Metal rolling (5) Surface defects (Li et al., 2018; Lieber et al., 2013; Liu et al., 2021; Yun et al., 2020), slab geometry (Stähl et al., 2019)
      Sheet metal forming (3) Part defects (Dib et al., 2020; Meyes et al., 2019), machine speed (Essien & Giannetti, 2020)
      Additive manuf. Forging (1) Process feasibility (Ciancio et al., 2015)
      Textile draping (1) Shear deformation (Zimmerling et al., 2020)
      Laser powder bed fusion (4) Geometric deviation (Zhu et al., 2020), inherent strain (Li & Anand, 2020), structural defects (Bartlett et al., 2020), single-track width (Gaikwad et al., 2020)
      Direct metal deposition (1) Volume porosity (Zhang et al., 2019a)
      Fused deposition modeling (2) Tensile strength (Zhang et al., 2018, 2019b)
      PLA 3D printing (1) Surface roughness (Li et al., 2019)

6.1.6. Assembly Processes

  • Focus: 5 publications, mainly on ML-based classification of successful/unsuccessful assembly tasks.
  • Common Criteria: Operation success, component position.
  • Examples: Detection of functioning products in manual assembly (Wagner et al., 2020), correct positioning in SMT assembly (Schmitt et al., 2020a), wire plug connection quality using acoustic signals (Sarivan et al., 2020), fastened screws detection using line camera images (Martinez et al., 2020).

6.1.7. Coating Processes

  • Focus: 4 publications, applying an adhesive layer.
  • Common Criteria: Defect detection (defect types), OK/NOK classification, paint structure.
  • Examples: SVM-based defect detection in dispensing (Oh et al., 2019), CNNs on machine sensor data for electric wafer quality (Hsu & Liu, 2021).

6.1.8. Multi-stage Processes

  • Focus: 4 publications, dealing with complex production lines.

  • Common Criteria: Battery capacity, state of health, product dimensions, quality types, fabric defects.

  • Challenge: Increased complexity and data sources.

  • Examples: DL-methods (e.g., LSTMs) for quality prediction in larger production lines based on multimodal sensor data (Liu et al., 2020b).

    The following are the results from the table continued after Table 4 of the original paper, showing considered assembly, multi-stage, and coating processes and quality criteria:

    Process type Process Quality criteria
    Assembly Manual assembly (2) Operation success (Wagner et al., 2020; Sarivan et al., 2020)
    Screw fastening (1) Operation success (Martinez et al., 2020)
    SMT assembly (1) Component position (Schmitt et al., 2020a)
    Snap-fit assembly (1) Operation success (Doltsinis et al., 2020)
    Multi-stage Battery-cell manufacturing (1) Battery capacity and state of health (Turetskyy et al., 2021)
    Metal forming and machining (1) Product dimensions (Papananias et al., 2019)
    Production line (1) Quality types (Liu et al., 2020b)
    Textile manufacturing (1) Fabric defects (Jun et al., 2021)
    Coating Chemical vapor deposition (1) quality types (Hsu & Liu, 2021)
    Lacquering (1) Defect types (Thomas et al., 2018)
    Painting (1) Paint structure (Kebisek et al., 2020)
    Primer-sealer dispensing (1) Defect types (Oh et al., 2019)

6.2. Data Bases and Characteristics

6.2.1. Data Set Sources and Amount

The review identifies three main data sources: real data from manufacturing processes (further divided into experimental and running production), virtual data from simulations, and freely available data sets (benchmark/competition).

The following are the results from Table 6 of the original paper, showing main sources of process data to train machine learning models:

Data source Publications
Simulation Alvarado-Iniesta et al. (2012), Ciancio et al. (2015), Dib et al. (2020), Li and Anand (2020), Tercan et al. (2016, 2017), Zhu et al. (2020), Zimmerling et al. (2020)
Benchmark/competition Ferguson et al. (2018), Jun et al. (2021), Liu et al. (2020b, 2021), Yu et al. (2019)
Real data (running production) Essien and Giannetti (2020), Goldman et al. (2021), Kebisek et al. (2020), Lee et al. (2018), Li et al. (2018), Meyes et al. (2019), Oh et al. (2019), Schmitt et al. (2020a), Sthl et al. (2019), Wagner et al. (2020), Yun et al. (2020)
Real data (experimental) Acayaba and de Escalona (2015), Ai et al. (2016), Bartlett et al. (2020), Bustillo et al. (2018), Dhas and Kumanan (2014), Dimitriou et al. (2020), Doltsinis et al. (2020), Du et al. (2021), Elangovan et al. (2015), Gaikwad et al. (2020), Garcia et al. (2019), Gejji et al. (2020), Zahrani et al. (2020), Gyasi et al. (2019), Hamidinejad et al. (2012), Hossain and Ahmad (2014), Hsu and Liu (2021), Jiao et al. (2020), Kayabasi et al. (2017), Ke and Huang (2020), Kim et al. (2018), Klein et al. (2020), Li et al. (2019, 2020a, 2020b), Lutz et al. (2020), Mulrennan et al. (2018), Lei et al. (2019), de Oliveira Leite et al. (2015), Lieber et al. (2013), Martín et al. (2016), Martinez et al. (2020), McDonnell et al. (2021), Moreira et al. (2019), Natesh et al. (2019), Neto et al. (2013), Nguyen et al. (2020), Papananias et al. (2019), Sarivan et al. (2020), Schorr et al. (2020a, 2020b), Serin et al. (2020a), Thomas et al. (2018), Turetskyy et al. (2021), Varma et al. (2017), Vrabel et al.
  • Dominance of Real Data: The majority of publications (65%) use real data obtained experimentally, while 14% use real data from running production. Simulation data accounts for 10%, and benchmark data for 6%.
  • Simulation Data: Used in 8 publications, often to demonstrate feasibility or generate fast ML models for process design. Data samples vary widely (e.g., 30 to >22,000, with an average of 9,864). Data augmentation techniques were sometimes used (e.g., Zhu et al., 2020).
  • Benchmark Data: Identified in 5 publications, primarily for image-based defect classification (e.g., GRIMA X-Ray casting data, NEU-DET, Xuelang manufacturing AI challenge data set). Average size is 5,722 samples/images. Data augmentation was also used here.
  • Real Data (Experimental): Most research (65%) conducts predefined experiments, varying process parameters under fixed conditions. Design of Experiments (e.g., full factorial, Taguchi) is commonly used. The number of parameters varied is typically small (2-8).
    • Data Amount (Experimental): The majority use around 100 samples (median 144). The average is about 5,600, but this is skewed by some publications using data augmentation or multiple measurements per experiment (Figure 5, blue bars).
  • Real Data (Running Production): 11 publications use data from running manufacturing processes, collected over longer periods (e.g., months to years).
    • Data Amount (Running Production): These datasets are generally larger (average 73,984 samples; Figure 5, red bars), with the largest containing 525,600 samples.

      The following figure (Figure 5 from the original paper) illustrates the distribution of publications according to the number of data samples used for model training and evaluation:

      Fig. 5 Distribution of publications according to the number of data samples used for model training and evaluation 该图像是图表,展示了文献中用于模型训练和评估的数据样本数量分布(对数坐标)。其中蓝色柱表示实验数据,红色柱表示运行时采集的真实数据,大部分研究集中在样本数量100至1000之间。

6.2.2. Input Variables

Three major types of input variables are identified: process parameters, sensor data, and product measurements.

The following are the results from Table 7 of the original paper, showing types of input variables used for predictive quality models:

Variable type Publications
Process parameters (30) Acayaba and de Escalona (2015), Ai et al. (2016), Alvarado-Iniesta et al. (2012), Bustillo et al. (2018), Ciancio et al. (2015), Dhas and Kumanan (2014), Dib et al. (2020), Ge et al. (2012), Gejji et al. (2020), Zahrani et al. (2020), Hamidinejad et al. (2012), Hossain and Ahmad (2014), Jiao et al. (2020), Kayabasi et al. (2017), Kebisek et al. (2020), Mulrennan et al. (2018), Lei et al. (2019), Li and Anand (2020), Martín et al. (2016), McDonnell et al. (2021), Natesh et al. (2019), Nguyen et al. (2020), Serin et al. (2020a), Tercan et al. (2016, 2017), Varma et al. (2017), Yu et al. (2020), Zhang and Lei (2017), Zhu et al. (2020), Zimmerling et al.
Sensor Data (22) (2020) Essien and Giannetti (2020), Doltsinis et al. (2020), Du et al. (2021), Garcia et al. (2019), Goldman et al. (2021), Gyasi et al. (2019), Hsu and Liu (2021), Kim et al. (2018), Lee et al. (2018), Li et al. (2019, 2020a, 2020b), Lieber et al. (2013), Meyes et al. (2019), Moreira et al. (2019), Neto et al. (2013), Nurwaha and Wang (2012), Papananias et al. (2019), Sarivan et al. (2020), Schorr et al. 2020a, 2020b, Turetskyy et al. (2021)
Sensor data + process parameters (9) Elangovan et al. (2015); Ke and Huang (2020), Klein et al. (2020), Lutz et al. (2020), Thomas et al. (2018), Vrabel et al. (2016), Yu et al. (2016), Zhang et al. (2018, 2019b)
Product measurements (16) Bartlett et al. (2020), Dimitriou et al. (2020), Ferguson et al. (2018), Gaikwad et al. (2020), Jun et al. (2021), de Oliveira Leite et al. (2015), Li et al. (2018), Liu et al. (2021), Martinez et al. (2020), Oh et al. (2019), Schmitt et al. (2020a), Sthl et al. (2019), Tuar et al. (2017), Wang et al. (2021), Yun et al. (2020), Zhang et al. (2019a)
  • Process Parameters (30 publications): Settings for production (e.g., feed rate, cutting speed, laser power, focal position, process times, temperatures). Used to predict quality under new parameter spaces. Some works also include design parameters (e.g., hatch patterns, product size).

  • Sensor Data (22 publications): Real-time data from the process/machine (e.g., welding current, temperature, pressure, vibration, torque, force). Used for in-process quality estimation.

  • Sensor Data + Process Parameters (9 publications): A combination of both, shown to significantly improve performance (Elangovan et al., 2015). Also includes material batches and tool types in some cases (Lutz et al., 2020).

  • Product Measurements (16 publications): Data from the product itself during or after production (e.g., images from cameras, geometric measurements, thermal images, X-ray images, melt pool images, laser topology scans). Primarily used for automated defect detection. Image data is particularly common.

    The following figure (Figure 6 from the original paper) shows the number of occurrences of the input variable types in the addressed manufacturing processes:

    Fig. 6 Number of occurrences of the input variable types in the addressed manufacturing processes. Note that only publications using real data (experimental or running production) are considered here 该图像是图表,展示了图6中不同制造工艺下输入变量类型的出现次数,分类包括切削、连接、初级成形和成形,数据仅来自真实的实验或生产环境。圆点大小表示出处数量,横轴为制造工艺,纵轴为输入数据类型。

Figure 6 illustrates the varying prevalence of input variable types across different manufacturing processes. For instance, turning and drilling widely use both parameters and sensor data, while laser cutting predominantly relies on process parameters. Metal rolling shows a strong emphasis on product measurements.

6.2.3. Data Modality

The review identifies four data modalities: categorical, time series, image, and numerical/continuous.

The following are the results from Table 8 of the original paper, showing occurring modalities of data sets used for training the ML and DL models:

Variable Type Publications
Categorical/discrete Time Series Lee et al. (2018), Liu et al. (2020b), Lutz et al. (2020), Thomas et al. (2018)
Essien and Giannetti (2020), Goldman et al. (2021), Gyasi et al. (2019), Hsu and Liu (2021), Meyes et al. (2019), Sarivan et al. (2020), Sthl et al. (2019), Zhang et al. (2018), Zhang et al. (2019b)
Image Bartlett et al. (2020), Dimitriou et al. (2020), Ferguson et al. (2018), Jun et al. (2021), Li et al. (2018), Liu et al. (2021), Martinez et al. (2020), Oh et al. (2019), Wang et al. (2021), Yun et al. (2020), Zhang et al. (2019a)
Continuous/numerical Abd-Ellatif (2013), Acayaba and de Escalona (2015), Ai et al. (2016), Alvarado-Iniesta et al. (2012), Bustillo et al. (2018), Ciancio et al. (2015), Dhas and Kumanan (2014), Dib et al. (2020), Doltsinis et al. (2020), Du et al. (2021), Elangovan et al. (2015), Gaikwad et al. (2020), Garcia et al. (2019), Ge et al. (2012), Gejji et al. (2020), Zahrani et al. (2020), Hamidinejad et al. (2012), Hossain and Ahmad (2014), Jiao et al. (2020), Kayabasi et al. (2017), Ke and Huang (2020), Kebisek et al. (2020), Kim et al. (2018), Klein et al. (2020), Mulrennan et al. (2018), Lee et al. (2018), Lei et al. (2019), de Oliveira Leite et al. (2015), Li et al. (2019, 2020a, 2020b), Li and Anand (2020), Lieber et al. (2013), Liu et al. (2020b), Lutz et al. (2020), Martín et al. (2016), McDonnell et al. (2021), Moreira et al. (2019), Natesh et al. (2019), Neto et al. (2013), Nguyen et al. (2020), Nurwaha and Wang (2012), Papananias et al. (2019), Schmitt et al. (2020a), Schorr et al. (2020a, 2020b), Serin et al. (2020a), Tercan et al. (2017), Tercan et al. (2016), Thomas et al. (2018), Turetskyy et al. (2021), Tuar et al. (2017), Varma et al. (2017), Vrabel et al. (2016), Yu et al. (2016,
  • Categorical Data: Least common, representing non-numeric entities like tool type or material batch.
  • Time Series Data: Used in 9 publications, primarily derived from sensor data (8 pubs) or product measurements (1 pub). Can be univariate or multivariate.
  • Image Data: Common, especially 2D images from product measurements for defect detection. 3D point clouds are also used. Data augmentation (e.g., adding noise, cropping) is frequently applied to increase image datasets.
  • Numerical/Continuous Data: The vast majority of publications use this type, representing scalar values from parameter settings or transformed sensor/measurement data. Feature extraction (e.g., mean, max, min) or expert-driven aggregation is common to convert time series data into scalar numerical quantities.

6.3. Machine Learning Methods

6.3.1. Learning Tasks

  • Classification: 30 of 81 publications, for error detection or defect type classification.
  • Regression: 51 of 81 publications, for numerical prediction of quality variables.

6.3.2. Model Comparison

  • Single Model/Variants: 49% of publications (40 out of 81) evaluate only a single model or its variants.

  • Multiple Models (Comparison): 51% of publications (41 out of 81) compare several models experimentally.

  • Prime Model: The focus model or the best-performing model.

  • Baseline Models: Other models used for comparison.

    The following are the results from Table 9 of the original paper, showing overview of addressed learning tasks and used prime models in all publications:

    Learning task ML-model Publications
    Classification CNN Ferguson et al. (2018), Goldman et al. (2021), Hsu and Liu (2021), Jun et al. (2021), Li et al. (2018), Liu et al. (2021), Martinez et al. (2020), Sarivan et al. (2020), Yun et al. (2020), Zhang et al. (2019a)
    Decision tree Tercan et al. (2016, 2017)
    Ensemble model Gejji et al. (2020), Kim et al. (2018), Thomas et al. (2018)
    K-NN Lieber et al. (2013)
    MLP Bustillo et al. (2018), Dib et al. (2020), Ke and Huang (2020), Kebisek et al. (2020), Lee et al. (2018), Wagner et al. (2020), Yu et al. (2020)
    Naive Bayes Bartlett et al. (2020)
    Random forest Zahrani et al. (2020)
    RNN Liu et al. (2020b), Meyes et al. (2019)
    SVM Doltsinis et al. (2020), Oh et al. (2019), Schmitt et al. (2020a)
    Regression ANFIS Hossain and Ahmad (2014), Moreira et al. (2019), Varma et al. (2017), Zhang and Lei (2017)
    CNN Dimitriou et al. (2020), Wang et al. (2021), Zhu et al. (2020), Zimmerling et al. (2020)
    Ensemble model Li et al. (2019)
    Extra tree Schorr et al. (2020a)
    EML Nguyen et al. (2020)
    GA-BPNN Ai et al. (2016)
    Linear regression Elangovan et al. (2015)
    MLP Abd-Ellatif (2013), Acayaba and de Escalona (2015), Ciancio et al. (2015), Du et al. (2021), Gyasi et al. (2019), Hamidinejad et al. (2012), Jiao et al. (2020), Kayabasi et al. (2017), Lei et al. (2019), de Oliveira Leite et al. (2015), Li et al. (2020a, 2020b), Li and Anand (2020), Lutz et al. (2020), McDonnell et al. (2021), Natesh et al. (2019), Neto et al. (2013), Nurwaha and Wang (2012), Papananias et al. (2019), Serin et al. (2020a), Turetskyy et al. (2021), Vrabel et al. (2016), Yu et al. (2016)
    NN-GA-PSO Dhas and Kumanan (2014)
    Quadratic regression Martín et al. (2016)
    Random forest Klein et al. (2020), Mulrennan et al. (2018), Schorr et al. (2020b), Tuar et al. (2017), Yu et al. (2019)
    Relevance vector machine Ge et al. (2012)
    RNN Alvarado-Iniesta et al. (2012), Essien and Giannetti (2020), Stähl et al. (2019), Zhang et al. (2018), Zhang et al. (2019b)
    SeDANN Gaikwad et al. (2020)
    SVM Garcia et al. (2019)

6.3.3. Prime Models

  • Multilayer Perceptron (MLP): Most frequently used prime model (30 publications). Popular even in 2020-2021. Versatile for both classification (e.g., fault types, OK/NOK) and regression (e.g., surface roughness, tensile strength, dimensions). Often compared against SVM, Random Forest, Linear Regression, Decision Tree, K-NN, AdaBoost.

  • Convolutional Neural Network (CNN): Second most frequent prime model (14 publications). Well-suited for pattern recognition in higher dimensional and spatial data, applied to 2D images, 3D point clouds, and time series data. CNN architectures are often compared with variations of CNNs or other deep learning models like AlexNet, VGG-16, ResNet-50, and R-CNN variants.

  • Recurrent Neural Network (RNN): Used as a prime model in 7 publications, primarily for time-dependent or sequential data (e.g., time series sensor data). Focus is on LSTM network architectures for binary classification of defects or regression of quantities like material warpage or machine speed. Compared with SVM, Random Forest, XGBoost, polynomial regression, logistic regression, and ARIMA.

    The following figure (Figure 7 from the original paper) illustrates the proportions of ML models (prime) used in publications in 2020 and 2021:

    Fig. 7 Proportions of ML models (prime) used in publications in 2020 and 2021 该图像是一张图表,展示了2020年和2021年相关文献中各类机器学习模型的使用比例,其中MLP占38%,CNN占30%,其他模型占14%,随机森林8%,SVM和RNN各占5%。

Figure 7 shows that in 2020 and 2021, MLP (38%) and CNN (30%) together account for 68% of prime models, highlighting their dominance in recent research.

6.3.4. Non-linear ML Models

Includes SVM (for classification and regression), Relevance Vector Machine (RVM), Decision Trees, Quadratic Regression, K-NN, and Naive Bayes. These are often compared among themselves and with MLPs, gradient boosted trees, or generalized additive models (GAM).

6.3.5. Ensembles

Ensemble methods combine multiple models. Random Forest is the most popular, used for both classification and regression. Extensive comparisons show ensembles can outperform single models (Gejji et al., 2020).

6.3.6. Variants and Hybrid Models with Neural Networks

These include ANFIS (Adaptive Neuro-Fuzzy Inference System), ANN variants like SeDANN (Sequential Decision Analysis Neural Network) and Extreme Machine Learning (EML), and hybrid models combining neural networks with evolutionary computation methods like genetic algorithms or particle swarm optimization. These are frequently compared with regular MLPs.

The following figure (Figure 8 from the original paper) illustrates the occurrences of ML models as baselines in all publications:

Fig. 8 Occurrences of ML models as baselines in all publications 该图像是图表,展示了图8中在所有文献中作为基线模型出现的机器学习模型次数统计。图中显示支持向量机(SVM)、多层感知器(MLP)和随机森林(Random Forest)为最常用的基线模型。

Figure 8 reveals that SVM, MLP, and Random Forest are the most common baseline models used for comparison, indicating their established role in evaluating new predictive quality approaches.

7. Conclusion & Reflections

7.1. Conclusion Summary

This systematic review provided a comprehensive overview of 81 scientific publications from 2012 to 2021 concerning machine learning and deep learning based predictive quality in manufacturing. The analysis was structured around three guiding questions, categorizing publications by manufacturing processes and quality criteria, data bases and characteristics, and machine learning models.

Key findings include:

  • Predictive quality is applied across diverse manufacturing processes, with cutting and joining being the most researched areas, while others like coating and changing material properties are underrepresented.

  • The majority of studies use real manufacturing data, often generated experimentally with relatively small sample sizes, although data from running production tends to be larger. Process parameters, sensor data, and product measurements are common input variables.

  • Numerical/continuous data and image data are the predominant data modalities.

  • Multilayer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are the most popular prime models, particularly in recent years, with CNNs being extensively used for image-based tasks. Traditional ML models like SVMs and Random Forests frequently serve as baselines.

    The review highlights that ML and DL offer significant potential for quality assurance and inspection in manufacturing, but the field is heterogeneous and often isolated in its approaches and results.

7.2. Limitations & Future Work

The authors identified several limitations and suggested future research directions:

7.2.1. Limitations (Identified Gaps)

  • Manufacturing Domain Imbalance: Significant differences in research activity across manufacturing process groups. Many established processes (e.g., riveting, gluing, soldering within joining) are hardly covered. This may stem from varying digitization levels and data availability.
  • Lack of Process Integration: While promising, predictive quality approaches are rarely integrated into actual manufacturing processes. Training, evaluation, and model usage often occur offline. Discussions on implementation in real quality assurance processes are scarce, as is evaluation using quality-oriented metrics (e.g., reject rate reduction, yield rate).
  • Limited Input Variables: Most approaches focus on a few input variables of the same type. Other crucial factors like product design or material properties are often neglected. Models are typically for a single product type, leaving the question of generalizability open.
  • Data Scarcity and Representation: Many approaches are developed on small amounts of experimentally generated data, which may not be representative of running production. This limits the generalizability and industrial applicability of results.
  • Lack of Benchmark Data & Reproducibility: A significant absence of freely available benchmark datasets and shared source code hinders comparability between approaches and reproducibility of research.
  • Limited Deep Learning Model Exploration: Only CNNs and LSTM-based models are extensively investigated. Novel deep learning methods like Transformer networks and Graph Neural Networks are largely unexplored for predictive quality.
  • Time Series Data Processing: While deep learning on image data is mature, time series data often undergoes feature extraction into scalar features rather than being directly processed by DL models capable of handling sequential data more effectively.

7.2.2. Future Research Directions

To address these limitations, the paper envisions the following research directions:

  • Synthetic Data Generation: Developing and researching methods, particularly generative deep learning models, to create realistic synthetic training data in large quantities, especially for rare process variations and product defects. Expanding data augmentation for sensor and time series data.
  • Benchmark Data Sets: Establishing freely available benchmark datasets for predictive quality tasks to enhance comparability, reproducibility, and further development of approaches.
  • Novel Deep Learning Methods: Exploring the applicability of Transformer networks (for sequential and image data) and Graph Neural Networks (for graph data like CAD or simulation data) for predictive quality scenarios.
  • Time Series Classification and Forecasting: Further research into deep learning model approaches specifically designed for time series classification and forecasting to directly leverage raw sensor data.
  • Transfer Learning and Continual Learning: Investigating data-efficient and cost-effective models using transfer learning (adapting models from one domain to another) and continual learning (learning continuously from new data without forgetting old knowledge) to overcome challenges posed by continuous changes in manufacturing processes and process variants.
  • Integration and Deployment: Researching strategies for evaluating predictive quality solutions in real quality assurance processes, developing automated feedback mechanisms, and quantifying impact using quality-oriented metrics (yield rate, reject reduction). Advancing MLOPs (machine learning operationalization) strategies for continuous monitoring, integration, and delivery of ML models, and establishing certification processes for ML models in industrial manufacturing.

7.3. Personal Insights & Critique

This systematic review provides a valuable, structured overview that clearly demarcates the field of predictive quality from related ML applications in manufacturing. Its strict adherence to inclusion/exclusion criteria and detailed categorization makes the findings robust and easy to comprehend, even for beginners.

Inspirations:

  • Bridging the Gap: The emphasis on synthetic data generation, benchmark datasets, transfer learning, and continual learning highlights a crucial challenge in industrial ML: the perennial struggle with data scarcity and domain adaptation. These areas are vital for transitioning predictive quality from academic prototypes to robust industrial solutions.
  • Unexplored DL Potential: The call to explore Transformer networks and Graph Neural Networks is exciting. Transformers, with their self-attention mechanisms, could revolutionize time series and even multimodal data fusion in manufacturing. Graph neural networks offer a powerful way to model complex relationships in CAD or process flow graphs, which is a rich but underutilized data source.
  • Operationalization Focus: The explicit mention of MLOPs and certification underscores the practical realities of industrial AI deployment. It's not enough to build accurate models; they must be reliable, maintainable, and trustworthy in safety-critical environments.

Potential Issues, Unverified Assumptions, or Areas for Improvement:

  • Definition of "Predictive Quality": While the paper provides a clear definition of predictive quality, the distinction between "prediction of quality" and "fault diagnosis" (for the machine, not product) or "anomaly detection" can sometimes be subtle in practice. Some boundary cases might be open to interpretation, potentially leading to slight variations in what is included/excluded in similar reviews by other researchers. The explicit exclusion of anomaly detection is a clear scope limitation, but the justification for it could be elaborated more, especially since some anomalies might eventually correlate with product defects.
  • Granularity of Process Categories: While DIN 8580 is a standard, some categories like "Cutting" are still very broad and encompass highly diverse processes. More granular sub-categorization within these major groups could reveal even finer trends or unique challenges specific to, say, milling versus laser cutting.
  • Bias in Publication Selection: Despite systematic search, reliance on Web of Science and ScienceDirect might inadvertently favor certain publication types or regions. Including Scopus or Google Scholar could potentially broaden the initial pool, though it would also increase the workload significantly.
  • Qualitative Depth: While the categorization is quantitative (number of papers), deeper qualitative analysis of why certain ML models are chosen for specific data modalities beyond "well-suited for images" could be beneficial. For example, why MLP is so versatile, or why LSTMs are preferred for time series over other RNN variants.
  • Real-world Impact Metrics: The critique about the lack of quality-oriented metrics for process impact is valid and highlights a major gap. Future work should not only focus on model accuracy but also on quantifiable benefits like cost savings, waste reduction, and throughput increase to truly demonstrate industrial value.

Applicability to Other Domains: The methodology and many of the identified challenges (e.g., data scarcity, need for benchmark data, integration into real systems, transfer learning) are highly transferable to other industrial AI applications beyond predictive quality, such as predictive maintenance, process optimization, or supply chain forecasting. The framework for systematically reviewing literature based on specific guiding questions, categorization, and identification of research gaps is a broadly applicable scientific practice.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.