HUMAN ACTIVITY RECOGNITION AND OPTIMIZATION OF BIPED EXOSKELETES THROUGH ARTIFICIAL INTELLIGENCE: AN INTEGRATED APPROACH
TL;DR Summary
The study integrates inertial sensor-based human activity recognition with reinforcement learning to optimize bipedal exoskeletons, achieving 92% classification accuracy and a 15% metabolic cost reduction, enhancing adaptability and energy efficiency for rehabilitation and augmen
Abstract
Journal of Engineering Science Vol. XXXII, no. 1 (2025), pp. 71 - 79 Fascicle Electronics and Computer Science ISSN 2587-3474 Topic Biomedical Engineering eISSN 2587-3482 Journal of Engineering Science March, 2025, Vol. XXXII (1) HUMAN ACTIVITY RECOGNITION AND OPTIMIZATION OF BIPED EXOSKELETES THROUGH ARTIFICIAL INTELLIGENCE: AN INTEGRATED APPROACH Mihaela Rusanovschi *, ORCID: 0000 - 0002 -2447-5997, Galina Marusic, ORCID: 0000 - 0002 -2984- 2055 Technical University of Moldova, 168 Stefan cel Mare Blvd., Chisinau, Republic of Moldova * Corresponding author: Mihaela Rusanovschi, mihaela.rusanovschi@iis.utm.md Received: 03 . 02 . 2025 Accepted: 03 . 24 . 2025 Abstract. This paper explores the integration of inertial sensor -based human activity recognition (HAR) with the optimization of bipedal exoskeletons using artificial intelligen ce (AI) techniques. The motivation for the study stems from the need to improve the adaptability and energy efficiency of exoskeletons for practical applications. The specific hypothesis is that combining HAR with reinforcement learning (RL) can lead to pe rsonalized and efficient control strate
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
HUMAN ACTIVITY RECOGNITION AND OPTIMIZATION OF BIPED EXOSKELETONS THROUGH ARTIFICIAL INTELLIGENCE: AN INTEGRATED APPROACH
1.2. Authors
-
Mihaela Rusanovschi (Corresponding author, ORCID: 0000-0002-2447-5997, mihaela.rusanovschi@iis.utm.md)
-
Galina Marusic (ORCID: 0000-0002-2984-2055)
Affiliation: Technical University of Moldova, 168 Stefan cel Mare Blvd., Chisinau, Republic of Moldova
1.3. Journal/Conference
The paper is published in the Journal of Engineering Science. The specific issue is March, 2025, Vol. XXXII (1), pp.1-79. The journal appears to be a peer-reviewed publication in the field of engineering.
1.4. Publication Year
2025 (Received: 03.02.2025, Accepted: 03.24.2025)
1.5. Abstract
This paper presents an integrated approach combining inertial sensor-based human activity recognition (HAR) with artificial intelligence (AI) techniques, specifically reinforcement learning (RL), to optimize bipedal exoskeletons. The core motivation is to enhance the adaptability and energy efficiency of exoskeletons in practical applications. The study hypothesizes that this integration can lead to personalized and efficient control strategies. The research develops a robust HAR system to classify activities such as normal walking, stair climbing, and sitting/standing. This system involves preprocessing accelerometer and gyroscope data through segmentation and feature extraction, followed by supervised classification using Support Vector Machines (SVM) and Random Forest algorithms. Concurrently, RL optimization is performed in simulated environments like Webots to improve exoskeleton control. Preliminary results demonstrate a 92% accuracy in HAR and a 15% reduction in metabolic cost through RL, which also improves exoskeleton stability and user comfort. This innovative, integrated approach aims to minimize manual adjustments in exoskeleton design, with promising applications in rehabilitation and physical augmentation.
1.6. Original Source Link
/files/papers/690214ed84ecf5fffe471893/paper.pdf
Publication Status: Officially published in the Journal of Engineering Science in 2025.
2. Executive Summary
2.1. Background & Motivation
The core problem addressed by this paper is the limited adaptability and energy inefficiency of traditional bipedal exoskeletons to diverse user needs and environmental conditions. While exoskeletons have advanced significantly for medical, industrial, and military applications, their development faces challenges such as extensive testing on human subjects, complex manually established control laws, and a lack of personalized response.
Prior research often focuses on Human Activity Recognition (HAR) for monitoring purposes or Reinforcement Learning (RL) for control in isolation, without effectively integrating real-time activity data into adaptive control strategies. This creates a gap where exoskeletons struggle to dynamically understand user intentions and adjust their assistance accordingly, leading to suboptimal performance, higher metabolic cost, and reduced user comfort. The paper's entry point is the recognition that combining HAR to identify user activities with RL to adapt control policies could create a synergistic framework, reducing reliance on manual adjustments and improving overall exoskeleton performance.
2.2. Main Contributions / Findings
The paper makes several significant contributions:
- Integrated HAR-RL Framework: It proposes and validates an innovative, integrated approach that combines
inertial sensor-based HARwithRLforbipedal exoskeletonoptimization, addressing a critical gap in existing research. - Robust HAR System Development: The research develops a
Human Activity Recognition (HAR)system capable of classifying five distinct activities (normal walking, climbing stairs, descending stairs, sitting down, and rising from a chair) with a high accuracy of 92% usingSupport Vector Machines (SVM)andRandom Forestalgorithms onaccelerometerandgyroscopedata. - RL-based Exoskeleton Optimization: It demonstrates the effectiveness of
Reinforcement Learning (RL)(specificallyProximal Policy Optimization - PPO) in simulated environments (WebotsandOpenSim) to optimizeexoskeletoncontrol, leading to a 15% reduction inmetabolic costcompared to traditionalPID controllers. - Enhanced Exoskeleton Performance: The integrated
HAR-RLsystem significantly improvesexoskeletonstability (e.g., 10% increase for stair climbing) and user comfort (e.g., 16% decrease in quadriceps muscle force, 18% reduction in ankle impact). - Reduced Adaptation Time: The integration of
HARpredictions enabled dynamic adjustment ofexoskeletoncontrol, reducing the adaptation time to activity changes from 1.2 seconds (without HAR) to 0.5 seconds, highlighting the system's responsiveness. - Paving the Way for Scalable Applications: By minimizing manual adjustments and enhancing energy efficiency, the research contributes to
exoskeletondesign with promising applications in rehabilitation and physical augmentation, makingexoskeletonsmore practical and accessible.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand this paper, a foundational grasp of several key concepts is essential:
-
Bipedal Exoskeletons: These are wearable robotic devices designed to enhance human physical capabilities or assist in mobility. "Bipedal" refers to their two-legged structure, mimicking human locomotion. They typically consist of a frame, motors (actuators), sensors, and a control system, worn by a user to provide assistive torque or force to joints. Their applications range from assisting individuals with mobility impairments (rehabilitation) to augmenting strength for industrial or military tasks.
-
Artificial Intelligence (AI): A broad field of computer science that enables machines to perform tasks typically requiring human intelligence, such as learning, problem-solving, decision-making, perception, and understanding language. In this paper, AI techniques are used to analyze human movement and optimize robotic control.
-
Human Activity Recognition (HAR): A subfield of AI that focuses on identifying and classifying human actions or movements from sensor data. The goal is to automatically determine what a person is doing (e.g., walking, running, sitting, climbing stairs).
Inertial sensorsare commonly used forHAR. -
Inertial Sensors: Electronic devices that measure and report a body's velocity, orientation, and gravitational forces.
- Accelerometer: A sensor that measures non-gravitational acceleration, which is the rate of change of velocity of an object in its own reference frame. It typically measures acceleration along three perpendicular axes (X, Y, Z).
- Gyroscope: A sensor that measures angular velocity, which is the rate of rotation around a particular axis. Like accelerometers, they commonly measure rotation around three axes.
- Together,
accelerometersandgyroscopesare often combined inInertial Measurement Units (IMUs)to provide comprehensive motion data.
-
Reinforcement Learning (RL): A paradigm of
machine learningwhere anagentlearns to make decisions by performing actions in anenvironmentto maximize a cumulativereward.- Agent: The learner or decision-maker (e.g., the
exoskeleton's control system). - Environment: The world with which the agent interacts (e.g., the simulated physical world where the
exoskeletonoperates). - State: A snapshot of the
environmentat a given time (e.g.,exoskeletonjoint angles, sensor readings). - Action: A decision or output from the
agentthat affects theenvironment(e.g., torques applied byexoskeletonactuators). - Reward: A scalar feedback signal from the
environmentindicating the desirability of anagent's actions (e.g., a high reward for stable, energy-efficient movement). - Policy: A strategy that maps states to actions, determining the agent's behavior. The goal of
RLis to learn an optimal policy.
- Agent: The learner or decision-maker (e.g., the
-
Supervised Learning: A type of
machine learningwhere an algorithm learns from a labeled dataset (input-output pairs). The model learns a mapping from inputs to outputs and can then predict outputs for new, unseen inputs.HARsystems often usesupervised learningfor classification. -
Support Vector Machines (SVM): A powerful
supervised learningalgorithm used forclassificationandregression.SVMswork by finding an optimalhyperplanethat best separates different classes in the feature space.- Hyperplane: A decision boundary that separates data points of different classes. In a 2D space, it's a line; in 3D, it's a plane; and in higher dimensions, it's a hyperplane.
- Kernel Trick: A technique used by
SVMsto handle non-linearly separable data by implicitly mapping the input features into a higher-dimensional space where a linear separation is possible, without explicitly calculating the coordinates in that space. TheRadial Basis Function (RBF) kernelis a common choice for this.
-
Random Forest: An
ensemble learningmethod forclassificationandregressionthat operates by constructing a multitude ofdecision treesat training time. Forclassificationtasks, the output of theRandom Forestis the class selected by most trees (majority vote).- Ensemble Learning: The process of combining multiple machine learning models (often called "weak learners") to achieve better predictive performance than a single model.
- Decision Tree: A flowchart-like structure where each internal node represents a "test" on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label.
- Bootstrap Aggregating (Bagging): A technique where multiple subsets of the original training data are created by sampling with replacement. Each subset is then used to train a separate model.
-
Proximal Policy Optimization (PPO): A popular
Reinforcement Learningalgorithm that balances ease of implementation, sample efficiency, and good performance. It's anactor-criticmethod that aims to update thepolicyin a stable manner by taking multiple small steps, ensuring that the newpolicydoes not deviate too much from the old one.- Actor-Critic: An
RLarchitecture where two neural networks work together: an "actor" network learns the policy (what action to take), and a "critic" network estimates the value function (how good an action is).
- Actor-Critic: An
-
Metabolic Cost: In the context of
exoskeletons, this refers to the physiological energy expenditure of the human user. Reducingmetabolic costmeans the user expends less energy to perform a task, leading to less fatigue and increased endurance. It's often measured in terms of oxygen consumption orJoules per kilogramof body mass. -
Zero Moment Point (ZMP): A concept in robotics and
biomechanicsused to analyze the stability of dynamic bipedal locomotion. It represents the point on the ground about which the net moment of all forces (gravitational and inertial) acting on the robot/human equals zero. If theZMPstays within the support polygon (the area defined by the feet on the ground), the robot/human is stable.
3.2. Previous Works
The paper contextualizes its work by citing several key developments and identifying gaps in existing research:
- RL for Metabolic Cost Reduction: Studies like [6] (presumably specifically cited) demonstrate that
RLcan significantly reducemetabolic energy consumptioninexoskeleton-assisted locomotion, reporting reductions of up to 20%. This highlights the potential ofRLfor optimizingexoskeletonefficiency. - High-Accuracy HAR Systems: Research such as [7] shows that
HARsystems, particularly those usingsupervised learning, can achieve accuracies of over 90% for basic tasks. This establishesHARas a mature field for activity detection. - Divergent Research Paths: The authors point out a critical gap: much research focuses either exclusively on
HARfor activity monitoring [8] or solely onRLfor control without leveraging real-time activity data [9]. This divergence meansexoskeletonsoften lack the ability to truly understand user intent and adapt dynamically. - Lack of Standardized Metrics: The paper also references [10], which highlights the challenge of comparing
exoskeletonperformance across different studies due to a lack of standardizedperformance metrics. This complicates the assessment of novel control strategies. - Need for Personalized Control: Recent work, including [11], emphasizes the importance of personalized control strategies because human
biomechanicsvary significantly between individuals. This underscores the need for adaptive systems that can tailor assistance to each user.
3.3. Technological Evolution
The field of exoskeletons has evolved from rigid, pre-programmed devices to more intelligent, adaptive systems. Early exoskeletons relied on manually tuned control laws and extensive physical testing, which was time-consuming, costly, and limited their adaptability. The integration of Artificial Intelligence has been a significant step forward. HAR emerged as a way for exoskeletons to "understand" what a user is doing, moving beyond simple predefined motion patterns. Concurrently, Reinforcement Learning has shown promise in learning complex, optimal control policies in simulated environments, reducing the need for exhaustive real-world experimentation.
This paper's work fits within the current technological trajectory of increasingly intelligent and autonomous exoskeletons. It addresses the current limitation where HAR and RL often exist as separate components, rather than synergistically informing each other.
3.4. Differentiation Analysis
Compared to the main methods in related work, the core innovation and differentiation of this paper's approach lie in its integrated framework of Human Activity Recognition (HAR) and Reinforcement Learning (RL).
-
Traditional
Exoskeletons: Rely onmanually established control lawsorPID controllers. These are often rigid, require extensive tuning, and lack adaptability to varying user intentions or environmental changes. This paper's approach replaces or augments these withAI-driven adaptive policies. -
HAR-only Systems: While successful in identifying activities, they often provide only "monitoring" capabilities [8]. They detect an activity but don't inherently feed that information into a real-time, adaptivecontrol systemfor theexoskeleton. This paper usesHARpredictions as direct input to guideRL's reward function and state space, makingexoskeletoncontrol responsive to detected activities. -
RL-only Systems: SomeRLapproaches optimizeexoskeletoncontrol [9], but they might do so without explicit, high-levelHARinput. They learn from generic sensor data, but might not explicitly "know" if the user intends to climb stairs versus walk, which could lead to less optimal or slower adaptation for specific tasks. This paper's method usesHARto explicitly inform theRLagent about the current activity, allowing for more targeted and efficientpolicyadjustments (e.g., prioritizing stability for stair climbing).In essence, the paper differentiates itself by creating a feedback loop where
HARprovides context and intent, which then dynamically shapes theRL's optimization goals, leading to faster adaptation, improved efficiency, and enhanced comfort tailored to the specific activity being performed. This is a significant step towards truly personalized and intelligentexoskeletonassistance.
4. Methodology
This section describes the integrated human activity recognition (HAR) system and the reinforcement learning (RL) optimization process for bipedal exoskeletons. The methodology is divided into two main parts: HAR based on inertial sensor data and RL optimization in simulated environments.
4.1. Principles
The core idea behind this integrated approach is to create an exoskeleton control system that can intelligently adapt to a user's current activity and optimize its assistance accordingly. This is achieved by first accurately identifying the user's activity using Human Activity Recognition (HAR). Once an activity (e.g., walking, stair climbing) is recognized, this information is fed into a Reinforcement Learning (RL) framework. The RL agent then uses this activity context to dynamically adjust its control policy, focusing on relevant optimization goals (e.g., prioritizing stability for stair climbing or speed for normal walking). This combination aims to provide personalized, energy-efficient, and responsive assistance, minimizing the need for manual adjustments and improving overall user experience.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Human Activity Recognition (HAR)
The HAR system aims to classify human activities using data from inertial sensors.
Data Acquisition
Data is collected from integrated inertial sensors, specifically a triaxial accelerometer and a triaxial gyroscope.
- Accelerometer: Measures linear acceleration along three axes (x, y, z), denoted as , , .
- Gyroscope: Measures angular velocity along three axes (x, y, z), denoted as , , . The data is collected from human subjects performing five distinct activities: normal walking, climbing stairs, descending stairs, sitting down, and rising from a chair.
- Sampling Rate: (meaning 50 data points are collected per second).
- Recording Length: Approximately 30 seconds for each activity.
- Format: Continuous time series stored in
CSV (Comma Separated Values)format.
Preprocessing
Raw sensor data undergoes several preprocessing steps to make it suitable for classification.
-
Segmentation: The continuous time series data is divided into smaller, overlapping segments called
time windows. This is crucial because activities are continuous, and windows help create discrete samples forclassification.- Window Size: samples. At a sampling rate, this corresponds to seconds per window.
- Overlap: 50% overlap between consecutive windows. This helps capture transitions between activities and ensures that no important information is lost at window boundaries.
- Each window contains the sensor values for a particular channel (e.g., ).
-
Feature Extraction: From each segmented window, statistical features are calculated. These features reduce the dimensionality of the raw data while extracting relevant characteristics that distinguish different activities. The paper specifies four features:
- Mean (): Represents the average value of the sensor signal within a window.
$
\mu = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \omega _ { i }
$
Where:
- : The number of samples in the time window (128 in this case).
- : The -th sensor value within the window .
- : The mean value of the window.
- Standard Deviation (): Measures the amount of variation or dispersion of the sensor signal values from the mean. A higher standard deviation indicates greater variability.
$
\sigma = \sqrt { \frac { 1 } { N - 1 } \sum _ { i = 1 } ^ { N } ( \omega _ { i } - \mu ) ^ { 2 } }
$
Where:
- : The number of samples in the time window.
- : The -th sensor value within the window .
- : The mean value of the window.
- : The standard deviation of the window.
- Root Mean Square (RMS): Represents the quadratic mean of the sensor signal values, often used to quantify the magnitude of a varying quantity. It is particularly useful for signals that oscillate around zero.
$
RMS = \sqrt { \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \omega _ { i } ^ { 2 } }
$
Where:
- : The number of samples in the time window.
- : The -th sensor value within the window .
RMS: The Root Mean Square value of the window.
- Signal Magnitude Area (SMA): A feature commonly used in
HARto quantify the overall magnitude of the acceleration signal over a period. It is the sum of the absolute values of the acceleration components. $ SMA = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } ( | acc _ { x } ( i ) | + | acc _ { y } ( i ) | + | acc _ { z } ( i ) | ) $ Where:- : The number of samples in the time window.
- , , : The -th accelerometer readings for the x, y, and z axes, respectively, within the window.
SMA: The Signal Magnitude Area for the accelerometer data within the window. These four features are calculated for each sensor channel (3 accelerometer channels + 3 gyroscope channels). This results in a feature vector per window, where is the total number of features. If 4 features are extracted for each of the 6 channels (3 accelerometer, 3 gyroscope), then .
- Mean (): Represents the average value of the sensor signal within a window.
$
\mu = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } \omega _ { i }
$
Where:
-
Normalization: The extracted features are standardized using
Z-score transformation. This process scales the features so they have a mean of 0 and a standard deviation of 1, which helps prevent features with larger numerical ranges from dominating theclassificationprocess. $ x _ { scaled } = \frac { x - \mu _ { t r a i n } } { \sigma _ { t r a i n } } $ Where:- : The original feature value.
- : The normalized feature value.
- : The mean of that specific feature calculated only from the training dataset.
- : The standard deviation of that specific feature calculated only from the training dataset.
This ensures that the
test setfeatures are scaled consistently with thetraining set, avoiding data leakage.
Classification
Two supervised classification models are employed to identify activities:
-
Support Vector Machines (SVM):
SVMaims to find an optimalhyperplanethat best separates different activity classes in the feature space.- Objective: The model minimizes the objective function subject to the constraints .
Where:
- : The normal vector to the
hyperplane. Minimizing maximizes the margin between classes. - : The feature vector of the -th training sample.
- : The class label for the -th sample (typically -1 or 1 for binary classification, extended for multi-class).
- : The bias term (offset) of the
hyperplane. - : Represents the
hyperplaneequation. The constraints ensure that all samples are correctly classified and lie outside the margin.
- : The normal vector to the
- Kernel: For
non-linear separability(when a straight line or plane cannot separate the data), aRadial Basis Function (RBF) kernelis used. This kernel implicitly maps the data into a higher-dimensional space where a linear separation might be possible. $ K _ { ( x _ { i } , x _ { j } ) } = \exp ( - \gamma | \big | x _ { i } - x _ { j } \big | | ^ { 2 } ) $ Where:- : The
kernel functionoutput, representing the similarity between two feature vectors and . - : A
hyperparameterthat defines how much influence a single training example has. A small means a large radius of influence, and a large means a small radius of influence. It is adjusted viacross-validation.
- : The
- Objective: The model minimizes the objective function subject to the constraints .
Where:
-
Random Forest:
Random Forestis anensemblemethod that aggregates the predictions of multipledecision trees.- Training: An
ensembleofdecision treesis trained. Each tree is trained on abootstrap sample(random sampling with replacement) of the original data. Additionally, at each node split, only a random subset of features is considered, which decorrelates the trees. - Prediction: The final
predictionfor a new sample is determined bymajority voteamong all the individual trees. $ y _ { pred } = mode ( { y _ { 1 } , y _ { 2 } , \dots , y _ { T } } ) $ Where:- : The final predicted class label.
- : The statistical mode (the most frequent class) among the predictions of the individual trees.
- : The class predictions from each of the
decision trees.
- Training: An
Training and Testing
- Library: The implementation uses the
scikit-learnlibrary (version 1.2.2) inPython 3.9. - Data Split: The dataset is divided into 70% for
trainingand 30% fortesting. - Stratification: The
train_test_splitfunction is used withstratification, meaning that the proportion of each activity class is maintained in both thetrainingandtestingsets. This prevents an uneven distribution of classes that could bias evaluation.
Statistical Evaluation
The performance of the HAR models is assessed using standard classification metrics derived from the confusion matrix.
- Accuracy: The proportion of correctly classified samples out of the total number of samples.
$
Accuracy = \frac { \sum TP _ { k } } { N }
$
Where:
- : The number of
true positivesfor class (samples correctly identified as class ). - : The total number of samples across all classes.
- : The number of
- Precision: For a given class , it is the ratio of correctly predicted positive observations to the total predicted positive observations. It measures the quality of positive predictions.
$
Precision _ { k } = \frac { TP _ { k } } { TP _ { k } + FP _ { k } }
$
Where:
- :
True positivesfor class . - :
False positivesfor class (samples incorrectly identified as class ).
- :
- Recall (Sensitivity): For a given class , it is the ratio of correctly predicted positive observations to all observations in the actual class. It measures the ability of the model to find all the positive samples.
$
Recall _ { k } = \frac { TP _ { k } } { TP _ { k } + FN _ { k } }
$
Where:
- :
True positivesfor class . - :
False negativesfor class (samples from class that were incorrectly classified as another class).
- :
- F1 Score: The
harmonic meanofPrecisionandRecall. It provides a single metric that balances bothPrecisionandRecall, particularly useful when there is an uneven class distribution. $ F1 _ { k } = 2 \cdot \frac { Precision _ { k } \cdot Recall _ { k } } { Precision _ { k } + Recall _ { k } } $ Where:- : The
precisionfor class . - : The
recallfor class .
- : The
- K-fold Cross-validation: A technique to estimate the
robustnessof the models. The dataset is divided into equal folds. The model is trained onk-1folds and tested on the remaining fold, and this process is repeated times, with each fold serving as the test set once. The average performance across all folds provides a more reliable estimate ofgeneralization capability.
4.2.2. Optimization of Exoskeletons through Reinforcement Learning (RL)
The RL component focuses on optimizing exoskeleton control policies in simulated environments.
Simulation Environment
To safely and efficiently train RL policies, the research utilizes two simulation environments:
- Webots: An open-source robotic simulator (version 2023a). It incorporates an
ODE (Open Dynamics Engine)physics engine to realistically model theexoskeleton's dynamics. Theexoskeletonmodel withinWebotsincludeship,knee, andankle joints, each equipped with simulatedelectric actuatorsthat apply torques. - OpenSim:
Biomechanical software(version 4.4) used to simulate thehuman-exoskeleton interaction. This software allows for modelingmuscle forcesandjoint anglesbased on a standardhuman musculoskeletal skeleton, providing insights into the physiological impact ofexoskeletonassistance.
RL Configuration
The Proximal Policy Optimization (PPO) algorithm is chosen for its stability and effectiveness in continuous action spaces.
- State Space (): The
state vectoris the information theRL agentuses to make decisions. It contains:- Joint angles (of the
hip,knee, andanklejoints). IMU data(acceleration and angular velocity fromexoskeletonor user, if integrated).- Interaction forces at the
human-exoskeletoninterface (how the human andexoskeletonpush/pull on each other).
- Joint angles (of the
- Action Space (): The
action vectorrepresents the control outputs from theRL agent, which are the torques applied by the actuators at the threeexoskeletonjoints.- : Torque applied at the
hipjoint. - : Torque applied at the
kneejoint. - : Torque applied at the
anklejoint.
- : Torque applied at the
- Reward Function: The
reward functionguides theRL agentto learn desired behaviors. It is designed to maximize forward speed, minimizemetabolic cost, and maintain stability. $ R = \omega _ { 1 } \cdot v _ { forward } - \omega _ { 2 } \cdot E _ { metabolic } + \omega _ { 3 } \cdot S _ { stability } $ Where:- : The scalar
rewardvalue. - : The forward speed of the
exoskeletonand user, measured in . TheRL agentis rewarded for higher forward speed. - : The estimated
metabolic cost(energy expenditure) of the user, measured in . This term is subtracted because the goal is to minimizemetabolic cost. - : The
stability marginbased on the position of theZero Moment Point (ZMP)within the support base. Higher values indicate greater stability. - : Weighting parameters that adjust the relative importance of speed,
metabolic cost, and stability in thereward function. These were adjusted empirically (e.g., 1.0, 0.5, 0.8).
- : The scalar
Training
- Library: Training is performed using the
Stable-Baselines3library (version 1.6.0) inPython. - Architecture: An
actor-critic neural networkis used, comprising 2 hidden layers, each with 64 neurons. Theactornetwork learns thepolicy(mapping states to actions), and thecriticnetwork estimates the value function (how good a state or action is). - Training Steps: The
RL policyis trained for 1 million training steps, indicating a substantial amount of interaction with the simulated environment to learn robustcontrol strategies.
HAR-RL Integration
The crucial step that links the two main components of the research.
HARpredictions (i.e., the detected activity label like "normal walking" or "stair climbing") are used todynamically adjust the reward functionof theRL agent.- Mechanism: This integration is simulated by passing the
HARlabels as anadditional inputto theRL'sstate space. This allows theRL agentto be aware of the user's current activity. - Dynamic Adjustment Example: If the
HARsystem detects "stair climbing," thereward functionmight be adjusted to place a higher weight onstability(e.g., increasing ). If "normal walking" is detected, thereward functionmight prioritizeforward speedandmetabolic efficiency(e.g., increasing and decreasing ). This ensures theexoskeletonprovides context-aware assistance.
5. Experimental Setup
5.1. Datasets
The paper primarily relies on custom-collected data for HAR and simulated environments for RL optimization.
-
For
Human Activity Recognition (HAR):- Source: Data was collected from human subjects performing five distinct activities.
- Characteristics: It consists of
triaxial accelerometerandtriaxial gyroscopedata.- Activities:
normal walking,climbing stairs,descending stairs,sitting down, andrising from a chair. - Sampling Rate: .
- Recording Length: Approximately 30 seconds per recording.
- Format: Continuous time series data stored in
CSVformat.
- Activities:
- Choice Justification: This custom dataset allows for specific control over the types of activities relevant to
exoskeletonuse and the sensor placement. The chosen activities represent common movements that anexoskeletonwould need to assist. - Data Sample: The paper does not provide a concrete example of a raw data sample (e.g., a short snippet of CSV data), but describes its format as continuous time series of , , , , , values.
-
For
Reinforcement Learning (RL)Optimization:- Source: Simulated environments:
Webots(version 2023a) andOpenSim(version 4.4). - Characteristics:
Webots: Modelsexoskeletondynamics (hip, knee, ankle joints, electric actuators) with anODE physics engine.OpenSim: Simulateshuman-exoskeleton interaction, includingmuscle forcesandjoint anglesbased on a standardhuman musculoskeletal skeleton.
- Choice Justification: Simulation environments are chosen to reduce the risks and costs associated with real-world experiments, allowing for extensive training of
RL policiesbefore deployment on physical hardware. They provide a controlled and repeatable environment for testing different control strategies and assessing their impact onmetabolic cost, stability, and muscle effort.
- Source: Simulated environments:
5.2. Evaluation Metrics
The paper uses a comprehensive set of metrics to evaluate both the HAR system and the RL optimization, as well as their integration.
-
For
Human Activity Recognition (HAR): These metrics are derived from theconfusion matrixand are standard forclassification tasks. They are explained in detail in Section 4.2.1.- Accuracy: The overall proportion of correctly classified instances.
- Precision (): The proportion of true positive predictions among all positive predictions for a specific class .
- Recall (): The proportion of true positive predictions among all actual positive instances for a specific class .
- F1 Score (
F1_k): The harmonic mean ofPrecisionandRecallfor a specific class .
-
For
Reinforcement Learning (RL)Optimization and HAR-RL Integration:- Speed (): The forward velocity of the
exoskeletonand user.- Conceptual Definition: Quantifies how quickly the
exoskeletoncan move the user in a straight line. Higher speed is generally desirable for efficient locomotion. - Mathematical Formula: Measured in .
- Symbol Explanation: : meters, : seconds.
- Conceptual Definition: Quantifies how quickly the
- Metabolic Cost (): The estimated energy expenditure of the human user.
- Conceptual Definition: Represents the physiological energy consumed by the user during movement. A primary goal of
exoskeletonsis to reduce this cost, thus easing user effort and prolonging endurance. - Mathematical Formula: Measured in (Joules per kilogram). While the paper provides the unit, it does not explicitly provide the calculation formula. A common approach involves estimating metabolic power based on joint torques and velocities, or using models of muscle activity.
- Symbol Explanation: : Joules (unit of energy),
kg: kilograms (unit of mass).
- Conceptual Definition: Represents the physiological energy consumed by the user during movement. A primary goal of
- Stability (): A measure of the
exoskeleton's balance, often quantified by the position of theZero Moment Point (ZMP).- Conceptual Definition: Indicates how well the
exoskeleton(and user) maintains balance during dynamic movements. A higher stability value implies a lower risk of falling. The paper mentions it's based onZMPposition within the support base. - Mathematical Formula: Measured in
cm. The specific internal formula for is not provided, but it would typically be inversely related to the deviation of theZMPfrom the center of the support polygon. - Symbol Explanation:
cm: centimeters (unit of length).
- Conceptual Definition: Indicates how well the
- Muscle Force (e.g., quadriceps muscle force):
- Conceptual Definition: The force generated by specific muscles, indicating the effort exerted by the user. Reducing muscle force implies less physical strain on the user.
- Mathematical Formula: Measured in Newtons ().
- Symbol Explanation: : Newtons (unit of force).
- Ankle Impact:
- Conceptual Definition: The peak force experienced at the ankle joint, particularly during events like foot strike. Reducing impact forces can improve user comfort and reduce the risk of injury.
- Mathematical Formula: Measured in Newtons ().
- Symbol Explanation: : Newtons (unit of force).
- Adaptation Time: The time it takes for the
exoskeletoncontrol system to adjust its behavior in response to a detected change in activity.- Conceptual Definition: A measure of responsiveness. Faster adaptation time means the
exoskeletoncan quickly provide appropriate assistance when the user changes activity. - Mathematical Formula: Measured in seconds ().
- Symbol Explanation: : seconds.
- Conceptual Definition: A measure of responsiveness. Faster adaptation time means the
- Speed (): The forward velocity of the
5.3. Baselines
The proposed methods are compared against established techniques to demonstrate their effectiveness.
-
For
Human Activity Recognition (HAR):Support Vector Machines (SVM): One of the two primaryclassificationalgorithms used and evaluated againstRandom Forest.Random Forest: The other primaryclassificationalgorithm used and evaluated againstSVM. BothSVMandRandom Forestserve as internal baselines for each other within theHARcomponent, allowing for a comparison of their performance on the specific dataset.
-
For
Reinforcement Learning (RL)Optimization:- Traditional
PID Controller: This is a widely used and well-establishedcontrol systemoften employed in robotics, includingexoskeletons, as a baseline for comparison withRL-based control. APID (Proportional-Integral-Derivative)controller calculates an "error" value as the difference between a desired setpoint and a measured process variable and applies a correction based on proportional, integral, and derivative terms. It represents a common, non-adaptive control strategy.
- Traditional
6. Results & Analysis
This section presents and interprets the experimental results for human activity recognition (HAR) and exoskeleton optimization using reinforcement learning (RL), including the performance of their integration.
6.1. Core Results Analysis
6.1.1. HAR System Performance
The HAR system demonstrated strong performance in classifying human activities.
- The
SVMmodel with anRBF kernelachieved an overallaccuracyof 92% on the test set. - The
Random Forestalgorithm achieved a comparableaccuracyof 91%. This indicates that bothsupervised learningmodels are effective for thisHARtask.
The detailed performance by class, focusing on SVM, is presented in the table below:
The following are the results from [Table 1] of the original paper:
| Activity | Precision | Recall | F1 Score |
| Normal walking | 0.95 | 0.94 | 0.94 |
| Climbing the stairs | 0.93 | 0.91 | 0.92 |
| Went down the stairs | 0.91 | 0.90 | 0.91 |
| Sitting on a chair | 0.90 | 0.88 | 0.89 |
| Getting up from a chair | 0.89 | 0.87 | 0.88 |
Analysis of Table 1:
-
Normal walkingshowed the highestaccuracy(indicated by highPrecision,Recall, andF1 Scoreof 0.95, 0.94, and 0.94 respectively). This is attributed to its consistent and often periodic sensor signals. -
Activities like
sitting downandrising up from a chairshowed slightly lowerRecallvalues (0.88 and 0.87, respectively). This suggests minor confusion between these two activities, likely due to similar movement dynamics (e.g., vertical motion, joint flexion/extension) during certain phases. -
Cross-validation() confirmed therobustnessof the models, with a standard deviation ofaccuracyof , implying goodgeneralizationto unseen data.The confusion matrix for the
SVMmodel is presented, illustrating an idealized classification.
该图像是论文中用于展示基于支持向量机(SVM)的人体活动识别(HAR)分类结果的混淆矩阵,显示各类活动的分类准确率,类别包括爬楼、下楼、起身、坐下和行走。
Figure 1. Confusion matrix for HAR classification with SVM.
Analysis of Figure 1:
The confusion matrix (Figure 1) is presented as an idealized scenario with values of 1 on the diagonal, implying perfect classification for all activities. In a real-world scenario, the minor confusion between "sitting down" and "rising up from a chair" mentioned in the text would manifest as non-zero values off the diagonal in the corresponding cells. For instance, some "sitting down" instances might be misclassified as "rising up from a chair," and vice-versa. The authors suggest that additional features, such as signal energy or entropy, could potentially improve the distinction between these similar activities.
6.1.2. Channel Analysis and Feature Importance
Accelerometerdata contributed more significantly toclassificationthangyroscopedata.- The
SMA (Signal Magnitude Area)feature calculated fromaccelerometerdata was highly correlated with dynamic activities. Fornormal walking, theSMAwas . - For
gyroscopedata, theRMS (Root Mean Square)was . This suggestsaccelerometersare better at capturing the linear motion and impact of activities, whilegyroscopescapture rotational aspects.
6.1.3. Effect of Window Size
Additional tests explored the impact of segmentation window size on HAR performance and computational cost.
The following are the results from [Table 2] of the original paper:
| Window size (samples) | Accuracy | Processing time (s) |
| 64 | 0.89 | 0.15 ± 0.02 |
| 128 | 0.92 | 0.22 ± 0.03 |
| 256 | 0.93 | 0.35 ± 0.04 |
Analysis of Table 2:
- A smaller window size ( samples, 1.28 seconds) resulted in slightly lower
accuracy(89%) but reduced processing time by 30%. This represents a trade-off betweenaccuracyandcomputational efficiency. - Increasing the window size to samples (5.12 seconds) slightly improved
accuracyto 93% but came with a highercomputational cost(0.35s processing time). - The chosen (2.56 seconds) provided a good balance with 92% accuracy and a reasonable processing time of 0.22s.
6.1.4. SVM Model Training Evolution
The training process of the SVM model was monitored for accuracy and loss over 20 epochs.
该图像是图表,展示了论文中SVM模型训练过程中训练准确率和验证准确率随训练周期(Epoch)变化的趋势,反映模型性能的提升。
Figure 2. Evolution of SVM Model Accuracy During Training.
Analysis of Figure 2:
Figure 2 shows the evolution of accuracy for the SVM model. The blue line (training accuracy) rapidly increased, reaching 0.9 after 5 epochs and stabilizing at 0.95. The orange line (validation accuracy) leveled off at 0.92. The discrepancy between training and validation accuracy (0.95 vs. 0.92) suggests a slight overfitting of the model to the training dataset. While not severe, this indicates that the model performed marginally better on data it had seen during training than on unseen validation data.
该图像是图表,展示了论文中SVM模型训练过程的损失变化,横轴为训练轮次(Epoch),纵轴为损失值(Loss),包括训练损失和验证损失,显示随着训练进行,损失逐渐下降,表明模型性能提升。
Figure 3. Evolution of SVM Model Loss During Training.
Analysis of Figure 3:
Figure 3 displays the evolution of loss for the SVM model during training. Both the training loss (blue line) and validation loss (orange line) consistently decreased from an initial value of 1.5 to a final value of 0.2. This consistent reduction in loss for both sets confirms the model's convergence, indicating that the SVM successfully learned to minimize classification errors over the training period.
6.1.5. RL Optimization Performance
The Reinforcement Learning (RL) policy, trained using PPO in Webots, demonstrated significant improvements compared to a traditional PID controller for simulated normal walking.
The following are the results from [Table 3] of the original paper:
| Prosody | RL (PPO) | PID |
| Speed (m·s-1) | 1.2 ± 0.1 | 1.1 ± 0.1 |
| Metabolic cost (J·kg-1) | 5.1 ± 0.3 | 6.0 ± 0.4 |
| Stability (cm) | 4.5 ± 0.2 | 4.0 ± 0.3 |
Analysis of Table 3:
- Metabolic Cost Reduction: The
RLpolicy reduced the estimatedmetabolic costby 15% (from forPIDto forRL). This is a key finding, demonstrating theenergy efficiencybenefits ofRL. - Speed Improvement:
RLalso achieved a slightly higherforward speed() compared toPID(). - Stability Enhancement:
RLimprovedstability(measured asstability margin) to , better thanPID's .
6.1.6. HAR-RL Integration Performance
The integration of HAR predictions with RL control allowed for dynamic control adjustment based on the recognized activity.
-
Stair Climbing: For "stair climbing," the
RL policy(informed byHAR) increasedstabilityby 10% (reaching ), which is crucial for reducing the risk of falls during this challenging activity. -
Normal Walking: For "normal walking," the
RL policyincreasedvelocityby 8% (achieving ), optimizing forenergy efficiencyduring sustained locomotion. -
Variable Terrain: On variable terrains (e.g., slopes),
RLdemonstrated superiorstabilitycompared toPID, reducing theZMP deviationby 12% (from3.5 cmto3.1 cm). This highlights theadaptabilityof theRLapproach. -
Reduced Muscle Effort: Tests conducted in
OpenSimvalidated a reduction inmuscle effort. The averagequadriceps muscle forcedecreased from (withPID) to (withRL), representing a 16% decrease. This directly translates to reduced user fatigue. -
Improved Comfort: During "stair descending,"
RLreducedankle impactby 18% (from300 Nto246 N), indicating improved user comfort during impact-heavy movements.Crucially, the integration of
HAR-RLsignificantly reduced theexoskeleton'sadaptation timeto activity changes from 1.2 seconds (withoutHAR) to 0.5 seconds. This represents a substantial improvement in responsiveness.
The following are the results from [Table 4] of the original paper:
| Activity | Speed (m·s-1) | Metabolic cost (J·kg-1) | Stability (cm) |
| Normal walking (RL) | 1.2 ± 0.1 | 5.1 ± 0.3 | 4.5 ± 0.2 |
| Normal walking (HAR-RL) | 1.3 ± 0.1 | 4.9 ± 0.2 | 4.6 ± 0.2 |
| Climbed the stairs (RL) | 0.8 ± 0.1 | 6.5 ± 0.4 | 4.7 ± 0.2 |
| Climbed the stairs (HAR-RL) | 0.9 ± 0.1 | 6.2 ± 0.3 | 4.9 ± 0.2 |
Analysis of Table 4:
This table further clarifies the benefits of HAR-RL integration over RL alone for specific activities.
- For
Normal walking:HAR-RLslightly increased speed (1.3vs ), further reducedmetabolic cost(4.9vs ), and marginally improvedstability(4.6vs4.5 cm) compared toRLwithoutHARcontext. - For
Climbed the stairs:HAR-RLalso showed improvements across the board, increasing speed (0.9vs ), reducingmetabolic cost(6.2vs ), and notably improvingstability(4.9vs4.7 cm) compared toRLwithoutHARcontext. These results confirm that explicitly informing theRLagent about the current activity viaHARleads to more refined and optimizedexoskeletonassistance tailored to the task.
6.2. Ablation Studies / Parameter Analysis
The paper implicitly conducts an ablation study on the HAR component by evaluating the effect of window size on performance (Table 2). This shows a trade-off:
- Smaller windows (e.g., 64 samples) offer faster processing but lower
accuracy. - Larger windows (e.g., 256 samples) provide slightly higher
accuracybut increasecomputational cost. The choice of 128 samples (2.56s) represents an optimizedhyper-parametersetting that balancesaccuracy(92%) andprocessing time(0.22s).
While not a full ablation study in the sense of removing HAR entirely to compare with RL alone, the comparison of RL with and without HAR predictions (Table 4) serves a similar purpose, demonstrating the added value of the HAR component to the RL system. The weights () in the reward function are mentioned as being adjusted empirically, indicating some parameter tuning was performed to achieve the reported RL performance.
7. Conclusion & Reflections
7.1. Conclusion Summary
This study successfully demonstrated the efficacy of an integrated framework combining Human Activity Recognition (HAR) with Reinforcement Learning (RL) for optimizing the control of bipedal exoskeletons. The HAR system achieved a high accuracy of 92% in classifying five common human activities using inertial sensor data, allowing for precise identification of user movements. Concurrently, the RL optimization, implemented in simulated environments, led to a significant 15% reduction in metabolic cost compared to traditional PID controllers, while also enhancing exoskeleton stability by 10% in dynamic scenarios like stair climbing. The key contribution lies in the seamless HAR-RL integration, which enabled rapid adaptation to activity changes, reducing the response time from 1.2 seconds to 0.5 seconds. This adaptive control minimized user effort (e.g., 16% decrease in quadriceps strength) and improved comfort, offering substantial implications for medical rehabilitation and physical augmentation. The research moves towards reducing reliance on manual adjustments, paving the way for more adaptable and efficient exoskeleton systems.
7.2. Limitations & Future Work
The authors acknowledge several limitations:
-
HAR Dependence on Training Data: The
HARsystem's performance is highly dependent on thetraining data, which, being collected from a limited number of subjects, may introducebiasand limit itsgeneralizabilityto a wider population. -
Complexity of RL Simulations: The
RL simulationsinWebotsandOpenSimdo not fully reproduce complex real-world conditions, such asirregular surfacesor unpredictable human movements. This is a common challenge inrobotics, known as thesim-to-real gap. -
Real-time Processing Computational Resources: Processing
HARdata in real-time can demand significantcomputational resources, which might limit its direct applicability onwearable deviceswith constrained processing power.Based on these limitations, the authors suggest the following future research directions:
-
Expanding the Dataset: To improve
HARrobustnessandgeneralizability, future work should focus on collecting data from a larger and more diverse group of subjects. -
Real-world Testing: Validating the integrated
HAR-RLsystem under real-world conditions, beyond simulations, is crucial to address thesim-to-real gapand confirm its practical effectiveness. -
Deep Learning for HAR: Exploring
deep learningarchitectures (e.g.,convolutional neural networks - CNNsorrecurrent neural networks - RNNs) forHARcould potentially improveaccuracyandrobustness, especially for differentiating nuanced activities likesitting downandrising up from a chair.
7.3. Personal Insights & Critique
This paper presents a highly relevant and promising approach to developing more intelligent and user-adaptive exoskeletons. The integration of HAR and RL is a logical and powerful synergy, addressing the critical need for exoskeletons to understand user intent and respond dynamically. The demonstrated reductions in metabolic cost and adaptation time are significant practical improvements.
Inspirations:
- The concept of a
context-aware RL agentis particularly inspiring. By feedingHARlabels as an explicit part of theRL state, the agent is not just reacting to raw sensor data but is informed about the high-level task. This approach could be transferred to other human-robot interaction domains whererobot autonomyneeds to be guided by human intent, such as collaborative industrial robots or assistive home robots. - The detailed analysis of
HARperformance with differentwindow sizeshighlights the importance ofsignal processing parametersinAIsystems, a crucial consideration often overlooked in favor of purelymodel-centric optimizations.
Potential Issues/Critique:
-
Sim-to-Real Gap: While simulations are invaluable for
RLtraining, thesim-to-real gapremains a major challenge. Themetabolic costandstabilitymetrics are derived from simulations (Webots,OpenSim), and their direct translation to real human physiology and safety in physicalexoskeletonsrequires rigorous validation. Factors like skin-exoskeleton interface friction, sensor noise in real-world scenarios, and unpredictable human perturbations are difficult to fully capture in simulations. -
Generalizability of HAR: The
HARdataset is custom-collected. Without details on the number of subjects, age, gender, or any physical limitations, thegeneralizabilityof the 92%accuracyis uncertain. Different individuals have varyingbiomechanicsand activity execution styles, which could challenge the learnedHARmodel. -
Idealized Confusion Matrix: Presenting an "idealized classification" in the confusion matrix (Figure 1) rather than the actual one slightly diminishes the transparency of the
HARresults. While the text discusses confusions, visualizing them would have provided stronger evidence for the stated limitations. -
Computational Cost for Wearable Devices: The paper mentions the
computational costas a limitation forwearable devices. Future work should explicitly consideredge AIsolutions,model quantization, orpruningtechniques to ensure that theseAI algorithmscan run efficiently on power-constrained, real-timeexoskeletoncontrollers. -
Empirical Weight Adjustment: The
reward functionweights () were adjusted empirically. While common, this can be a tedious process. Future research could exploremeta-learningorautoMLtechniques to optimize thesehyperparametersmore systematically.Overall, this paper provides a solid foundation for
adaptive exoskeletoncontrol. Overcoming thesim-to-real gapand ensuring thegeneralizabilityandreal-time efficiencyof theHARcomponent will be crucial steps in translating this promising research into widely applicable and impactful technologies.
Similar papers
Recommended via semantic vector search.