Inhaltsverzeichnis

2020 Maximilian Stasica

2020 Maximilian Stasica

Title	Comparison of human motor control models and their application to robotics
Supervision	Prof. Dr. André Seyfarth, Prof. Dr. Frank Jäkel
Assistance	Dr. Karl-Otto Steinmetz
Author	Maximilian A. Stasica
Last edit
PDF-Version

Abstract

Walking on soft grass, reaching for a glass of cold water, children playing - humans perform these everyday movements with greatest ease. Even though most movements seemingly take no cognitive effort, the underlying processes are tremendously complex. Hitherto, these processes have not been understood, despite major efforts from the community. In the past half century, many researchers tackled the task of decoding and modelling the cognitive processes of motor control. Many of their results will be discussed in this work. The common practice of classifying motor control models into inverse and forward models will be extended by hybrid models, bionic models and reinforcement-learning based models. Four inverse models will be considered, namely Direct Inverse Modelling, Distal Supervised Learning, Feedback-Error Learning and Auto-imitation Learning. The forward models are represented by the classic Reafference-Principle, while these approaches are combined in the hybrid MOSAIC-model. Extending the classic models, the Bionic approach is added, where the Virtual Pivot-Point Principle is featured. Newer approaches tend to focus more on Reinforcement-Learning based systems. In this category, the PoWER-algorithm has proved to be successful and will therefore be included in this work. Finally, to get an insight into a more technical approach, a simple PID-controller is considered. To fully understand all these processes, the three layers of Marr will be consulted. Therefore, the author discusses the goals of motor control on the level of computational theory and the hardware implementation in the human body, before moving on to the representations and algorithms of the models. After clarifying the algorithms, several of them will be simulated in a MATLAB/Simulink-environment. In the simulation, the models are applied to a simulated arm and tasked with three types of movements: a cyclic movement, a continuous movement and a single-stroke movement. The results will then be compared and evaluated to shed a light on possibly different performances of the models. In a second step, it was intended to test the models on a real hardware - a two degree-of-freedom arm - to evaluate their performance in a pseudo-realistic environment. In comparison, AIL proved to outperform FEL in all considered movement types. FEL could not cope with perturbations, but instead began to oscillate into instability, while AIL corrected the perturbation within a few seconds and continued in a stable manner. The PoWER algorithm could only be tested against a different sort of movement task, but yielded solid results nevertheless. In conclusion, the idea of comparing ideas from different backgrounds is thought to be not only interesting but also beneficial to the research community.

Categories of models

Inverse models

Inverse models perform a transformation from sensory variables to motor variables which allows the motor control system to derive motor actions from the desired sensory consequences (Jordan & Wolpert, 1999). Generally, these models describe the transformation of target-angles to the necessary succession of efferences or forces (Schiebl, 2008). Therefore, as they provide the inverse dynamics of muscles or respectively limbs, they are called inverse-dynamic models (Schiebl, 2008). This characteristic distinguishes them from inverse-kinematic models which transform the coordinates of a limb into a succession of joint angles (Schiebl, 2008).

Direct Inverse Modelling

One of the first approaches for learning an inverse internal model is direct inverse modelling (DIM) by Jordan (1996). This approach involves presenting a test set to a plant which then produces outputs. These input-output-pairs are then fed into a controller, therefore using supervised learning, while the controller is producing the input for the plant (Jordan, 1996).

Such a DIM is able to produce good results for linear systems and converges towards correct parameter estimates, as shown by Goodwin and Sin (1984), but the redundancy of nonlinear systems prompts the plant to produce an erroneous controller (Jordan, 1996). This is very much the same as the degrees-of-freedom problem introduced by Bernstein (1967). This problem is concerned with the redundancy of the human body, as there is no unique set of controls to reach a target which at a glance greatly complicates the process of finding a solution (Haith & Krakauer, 2013).

Distal Supervised Learning

In contrast, Distal Supervised Learning uses an indirect approach. According to Jordan (1996) it uses a combination of both indirect self-tuning control (Åström & Wittenmark, 1973) and indirect model reference adaptive control (Åström & Wittenmark, 1989). Considering an first-order forward plant model which is linear in the estimated state and desired output, this linearity allows for a Least-Mean-Square-regression. Therefore this forward model is an LMS processing unit. If the learner acquires a perfect forward model, the inverse model is perfect, too (Jordan, 1996). While this solution is intuitive for linear plants, we need to consider the more complex case of nonlinear equations. Jordan (1996) therefore defines an inverse model as any transformation that when placed in series with a plant yields the identity transformation. Now, considering the controller and the plant together as a composite system, Jordan (1996) concludes, that the inverse model can also be trained by training this composite to be the identity transformation. To cope with the problem of not knowing the dynamics and state of the plant used, Jordan (1996) derives an internal forward model of the plant which is then used for training. In this scenario, the acquired internal forward model is trained, using the prediction error, while the composite is trained to be the identity transformation using nonlinear supervised learning (Jordan, 1996). During training, the parameters of the forward model are held fixed, only allowing the controller’s parameters to alter (Jordan, 1996). Thus, this is an indirect method of training the controller (Jordan, 1996).

Feedback Error Learning

In an attempt to analyse the different computations needed to fullfill motor control’s needs, Kawato, Furukawa, and Suzuki (1987) proposed the determination of desired trajectory in visual space, the transformation to the body coordinates and the generation of motor commands as the ones to be solved. To cope with these requirements Feedback-Error-Learning uses a feedforward component as well as feedback (Kawato, 1990). An operator calculates a feedforward torque which is thought to be sufficient to reach the target movement. It is then added to a feedback torque which is calculated by comparing the desired angular position with the sensed angular position (K. T. Kalveram & Seyfarth, 2009). Therefore, as the feedback torque acts as a measure of error and the output is used not only for sensorimotor control but also to adapt the inverse model, Feedback-Error-Learning allows for online training, which is thought to be advantageous (Schiebl, 2008). Moreover, FEL’s goal-oriented development proves to be advantageous, if a goal-orientation exists before learning begins (Jordan & Wolpert, 1999).

Auto Imitation Learning

As shown earlier, FEL uses supervised learning, which, according to K. T. Kalveram and Seyfarth (2010), is a sequential rule which seems time consuming and non-precise. Therefore K. T. Kalveram (2004) proposes the idea of a non-error-based algorithm using non-supervised learning, which is applicable in parallel. The concept is based on so called auto-imitation, introduced by (K. T. Kalveram, 1981). To train the system, muscles or a blind teacher produce arbitrary torques which are then fed into the controlled system (K. T. Kalveram, 2004). Then, the torque and the resulting sensed angular kinematics are fed into the learner where they are correlated (K. T. Kalveram, 2004). Using the modified Hebbian learning rule (K. T. Kalveram, 1999), synaptic weights are calculated (K. T. Kalveram, 2004). After learning the learner is disconnected and inserted into the operator and where it is used as an inverse model of the plant’s dynamics (K. T. Kalveram, 2004).

In a simulation K. T. Kalveram (1999) showed, that this learning approach is capable of learning the inverse model of a one-jointed -arm within 30 seconds.

Forward models

Astonishingly, humans are able to update the estimate of their limbs position during movement well before proprioceptive feedback is received (Shadmehr & Wise, 2005). In case of motor control, simple knowledge of the motor command obviously is not sufficient to provide a solid estimate of the limbs future location, as there are more factors to account for (Shadmehr & Wise, 2005). Most notably, the limbs dynamics, that is its mass, viscosity etc, and the state of the arm have to be known or at least estimated (Shadmehr & Wise, 2005). This requires the generation of a so called forward model of the arm to estimate the mapping from force to motion, that is the dynamics (Shadmehr & Wise, 2005). So, a good forward model should be capable of predicting future states of the limb (Shadmehr & Wise, 2005).

The Reafference Principle

To gain an insight into forward models, it seems obvious to start with some very basic concepts. As neurons will transfer signals both to and from the motoric centre, we distinguish between afferent and efferent signals. Afferences provide information to the motoric centre while efferences parse commands to muscles, limbs or any other effector. This concept is critical for von Holst and Mittelstaedt’s Reafference-Principle, which is a very commonly known motor control scheme since it’s publication in 1950. It contains several motoric centres with different ranks. If a movement is intended, the high level center sends a command K to the respective lower ranked centers. This stores a so called efference copy which later allows it to compare the intended efference to the afference provided by the effector. This difference is then fed into the high level center which can now alter its commands accordingly (von Holst & Mittelstaedt, 1950).

This system therefore is able to distinguish between signals send by the self and signals send by the environment, to provide a continuous and stable motor control (Schiebl, 2008). However it is not easily imaginable, how the lowest unit would be able to directly parse an efference copy and compare it to the afference. To cope with this problem, Hein and Held (1962) introduced the concept of a correlation storage which K. T. Kalveram (1998) used to create a modified version of the Reafference-Principle. Through the correlation storage neuronally recoding the efference copy, an estimated reafference can be calculated which then is tested against the afference. This produces an estimated exafference fed back to the motoric centre (K. T. Kalveram, 1998). According to Schiebl (2008) the quality of this correlation storage is of great importance, as it not only determines the estimated reafference but also the exafference. Errors therefore could as well multiply and propagate towards new commands (Schiebl, 2008).

Mixture/Hybrid models

Bridging the gap between inverse and forward models, Miall (2002) introduced the MOSAIC-model. MOSAIC is a combination of both inverse and forward models. It uses information about the context of the plant and the desired feedback. The output of different inverse models is fed into the plant and a forward model used to estimate the sensory consequences. This estimate is then compared to the actual outcome (Miall, 2002). Therefore, MOSAIC allows various inverse models to exist, which are trained online and weighed against each other, depending on the context.

Modular Selection and Identification for Control

One of the most common drawbacks of the single models discussed so far is that they not only have to adapt to every new modularity but also have to re-adapt to already known situations (Schiebl, 2008). Therefore Wolpert and Kawato (1998) presented the multiple paired forward-inverse model, which features multiple control-loops, each of which is responsible for a specific set of Situations (Schiebl, 2008). This allows for a convenient way or relearning already known models (Schiebl, 2008). Thus, the modular selection and identification for control (MOSAIC) model, which contains multiple predictor-controller pairs, has been proposed (Wolpert & Ghahramani, 2000). Sensory information is used to set the prior probabilities of the possible contexts (Wolpert & Ghahramani, 2000). Suppose, Context 1 has the higher probability. If so, the motor commands for Context 1 are computed and - before being applied - an efference copy is stored to estimate the sensory consequences for both contexts (Wolpert & Ghahramani, 2000). These predictions however are compared to the sensed feedback, which leads to a likelihood-value for each context (Wolpert & Ghahramani, 2000). Using Bayes’ Theorem, they are combined to generate a posterior for both contexts (Wolpert & Ghahramani, 2000). To apply this principle to motor control, the sensory-based prior is not used to generate the motor command, but to select the inverse model which is best suited for generating the appropriate motor command (Wolpert & Ghahramani, 2000). Therefore, a prior is calculated, which is then fed into several forward models and the system dynamics (Wolpert & Ghahramani, 2000). Subsequently, the estimates are tested, which allows for an assessment of the context and thus the inverse model to use (Miall, 2002). According to Wolpert and Ghahramani (2000) such a system can learn multiple inverse and forward models simultaneously. However, due to its design being oriented on biological plausibility, MOSAIC is hard to describe mathematically (Osaga, Hirayama, Takenouchi, & Ishii, 2008).

Bionic models

For balance control, there are several interesting models, derived from biological aspects. Even though this is note the prime focus of this work, it has been chosen to include at least one promising representative model of this vast field. The model discussed is the Virtual Pivot Point Principle, a model that allows postural control by redirecting the ground reaction force vector (Sharbafi & Seyfarth, 2017).

Virtual Pivot Point Principle

To understand the Virtual Pivot Point Principle (VPP), one has to know, that in Locomotion, humans encounter so called ground reaction forces (Sharbafi & Seyfarth, 2017). These forces act on the feet, and in the concept of a virtual pendulum, are redirected towards a point above the body’s centre of mass (Sharbafi & Seyfarth, 2017). This point is called Virtual Pivot Point (Maus et al., 2010). This allows the body to mimic a regular physical pendulum with the Virtual Pivot Point as hinge joint (Sharbafi & Seyfarth, 2017), contrary to the previously proposed inverted pendulum models, like inverted pendulum (Cavagna, Saibene,&Margaria, 1963) or the spring-loaded inverted pendulum (SLIP) (Blickhan, 1989). Preferably, the Virtual Pivot Point Principle does not require active state feedback control during the whole gait cycle and is not inherently unstable (Sharbafi & Seyfarth, 2017). The hip redirects the ground reaction forces such that stability is achieved (Sharbafi & Seyfarth, 2017) wherefore experimental evidence has been found (Maus et al., 2010). According to Maus et al. (2010), the Virtual Pivot Point can be considered a general template for locomotion, which nature takes advantage of in several animals to allow for a low energy consuming postural stability.

Reinforcement Learning based models

The last learning mechanism is considered to be Reinforcement Learning (RL) (Sutton & Barto, 2018). In RL, for each system output, there exists a feedback of reward or punishment provided by the environment (Wolpert et al., 2001). Let us, for example, consider the game of darts. Here, the control system provides a movement of an arm as an output which then leads to a consequent throw and consequent points. If the dart hits for example the center of the dart board, the athlete gets rewarded with a high point score. If he misses the board completely he gets punished with a zero score. The whole aim of the game is to maximize the point score achieved by the athlete - therefore in training, the athlete focuses on a precise and reproducible motor command to achieve maximum success. A good reward function allows RL to weigh the immediate gain against a long-term gain (Wolpert et al., 2001). Considering the darts athlete, it may not always be favorable to score maximum points (which would be the treble twenty). In a standard 501-game with double finish, the optimal result would be a so called 9-darter, which cannot be achieved by firing every dart into the treble twenty. To allow for a perfect run, the algorithm therefore has to weigh long-term reward (9-darter) against immediate point score. However, RL’s major difference with other learning methods is that it explicitly requires exploration (Dhawale, Smith, & Ölveczky, 2017). It therefore explores the motor space to find further stable and efficient solutions (Dhawale et al., 2017). Similarly one can assume RL to search the parameter space introduced by Braun, Aertsen, Wolpert, and Mehring (2009). The parameter space includes the values an agent can vary to achieve its goal. As parameter space may become exponentially large with higher dimension problems (Bishop, 2006) the aim is to reduce the computational cost from the beginning. A commonly used method is the one of imitation (Kormushev, Calinon, & Caldwell, 2010) which uses an expert tutor’s performance as a benchmark and explores neighboring solutions via a trial-and-error method (Dhawale et al., 2017). Probably the most well known example for such an algorithm is Google’s AlphaGo agent, which at first imitated human players and improved using RL afterwards (Silver et al., 2016).

Policy Learning by Weighting Exploration with the Returns

As an alternative for value function-based RL-algorithms, so called policy search has received much attention over recent years. It is now a commonplace, that parameterized motor primitives can be used to learn motor skills - much as in imitation learning (Kober & Peters, 2009). But as dimensions increase, most common methods are not sufficient enough, to cope with the demands of high-dimensional RL-problems (Kober & Peters, 2009). As many motor learning tasks are episodic, the need for a powerful but also episodic RL method arose. Thus, Kober and Peters (2009) created a framework from which they derived a new algorithm called Policy Learning by Weighting Exploration with the Returns (PoWER). Their goal was to find a RL-technique, that could be applied to motor primitives, which can be considered as a kind of prestructured parameterized policy, in the context of high-dimensional motor control tasks (Kober & Peters, 2009). Kober and Peters (2009) proved, that the resulting algorithm outperforms Episodic Natural Actuator Critic (eNAC), Vanilla Policy Gradient (VPG), Finite Difference Gradient (FDG) and episodic Reward-Weighted Regression (RWR) in several simulated benchmark comparisons.

References

Åström, K. J., & Wittenmark, B. (1973). On self tuning regulators. Automatica, 9(2), 185–199.

Åström, K. J., & Wittenmark, B. (1989). Adaptive control. Addison Wesley Publishing Company, Reading, MA.

Bernstein, N. (1967). The regulation and coordination of movements. Oxford: Pergamon.

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

Blickhan, R. (1989). The spring-mass model for running and hopping. Journal of biomechanics, 22(11-12), 1217–1227.

Braun, D. A., Aertsen, A.,Wolpert, D. M., & Mehring, C. (2009). Motor task variation induces structural learning. Current Biology, 19(4), 352–357.

Cavagna, G., Saibene, F., & Margaria, R. (1963). External work in walking. Journal of applied physiology, 18(1), 1–9.

Dhawale, A. K., Smith, M. A., & Ölveczky, B. P. (2017). The role of variability in motor learning. Annual review of neuroscience, 40, 479–498.

Goodwin, G., & Sin, K. (1984). Adaptive Filtering Prediction and Control. Prentice Hall Inc., Englewood Cliffs, New Jersey, USA.

Haith, A. M., & Krakauer, J. W. (2013). Theoretical models of motor control and motor learning. Routledge handbook of motor control and motor learning, 1–28.

Hein, A., & Held, R. (1962). A neural model for labile sensorimotor coordinations. In Biological prototypes and synthetic systems (pp. 71–74). Springer.

Jordan, M. I. (1996). Computational aspects of motor control and motor learning. In Handbook of perception and action (Vol. 2, pp. 71–120). Elsevier.

Jordan, M. I., & Wolpert, D. M. (1999). Computational motor control. MIT Press Cambridge, MA.

Kalveram, K. T. (1981). Erwerb sensumotorischer Koordinationen unter störenden Umwelteinflüssen: Ein Beitrag zum Problem des Erlernens von Werkzeuggebrauch. (Acquisition of sensorimotor co-ordinations under environmental disturbations. A contribution to the problem of learning to use a tool.). Erkennen, Wollen, Handeln. Festschrift für Heinrich Düker, 336–348. (German)

Kalveram, K. T. (1998). Wie das Individuum mit seiner Umwelt interagiert: psychologische, biologische und kybernetische Betrachtungen über die Funktion von Verhalten. (How the individual interacts with his environment: psychological, biological and cybernetic Considerations about the function of behaviour.). Pabst Science Publ. (German)

Kalveram, K. T. (1999). A modified model of the hebbian synapse and its role in motor learning. Human Movement Science, 18(2-3), 185–199.

Kalveram, K. T. (2004). The inverse problem in cognitive, perceptual, and proprioceptive control of sensorimotor behaviour: towards a biologically plausible model of the control of aiming movements. International Journal of Sport and Exercise Psychology, 2(3), 255–273.

Kalveram, K. T., & Seyfarth, A. (2009). Inverse biomimetics: How robots can help to verify concepts concerning sensorimotor control of human arm and leg movements. Journal of Physiology-Paris, 103(3-5), 232–243.

Kalveram, K. T., & Seyfarth, A. (2010). Learning the inverse dynamics of a robot arm by auto-imitation. In Proceedings of The International Multi-Conference on Complexity, Informatics and Cybernetics, April (pp. 6–9).

Kawato, M. (1990). Feedback-error-learning neural network for supervised motor learning. In Advanced neural computers (pp. 365–372). Elsevier.

Kawato, M., Furukawa, K., & Suzuki, R. (1987). A hierarchical neural-network model for control and learning of voluntary movement. Biological cybernetics, 57(3), 169–185.

Kober, J., & Peters, J. R. (2009). Policy search for motor primitives in robotics. In Advances in neural information processing systems (pp. 849–856).

Kormushev, P., Calinon, S., & Caldwell, D. G. (2010). Robot motor skill coordination with EM-based reinforcement learning. In 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 3232–3237).

Maus, H.-M., Lipfert, S., Gross, M., Rummel, J., & Seyfarth, A. (2010). Upright human gait did not provide a major mechanical challenge for our ancestors. Nature communications, 1(1), 1–6.

Miall, C. (2002). Modular motor learning. Trends in cognitive sciences, 6(1), 1–3.

Osaga, S., Hirayama, J.-i., Takenouchi, T., & Ishii, S. (2008). A probabilistic modeling of mosaic learning. Artificial Life and Robotics, 12(1-2), 167–171.

Schiebl, F. (2008). Force-Feedback unter besonderer Berücksichtigung interner Modelle. (Force feedback with special consideration of internal models.). Peter Lang. (German)

Shadmehr, R., & Wise, S. P. (2005). The computational neurobiology of reaching and pointing: a foundation for motor learning. MIT Press.

Sharbafi, M. A., & Seyfarth, A. (2017). Bioinspired legged locomotion: models, concepts, control and applications. Butterworth- Heinemann.

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., . . . others (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.

von Holst, E., & Mittelstaedt, H. (1950). Das Reafferenzprinzip. (The Reafference-principle.). Naturwissenschaften, 37, 464–476. (German)

Wolpert, D. M., & Ghahramani, Z. (2000). Computational principles of movement neuroscience. Nature neuroscience, 3(11), 1212–1217.

Wolpert, D. M., Ghahramani, Z., & Flanagan, J. R. (2001). Perspectives and problems in motor learning. Trends in cognitive sciences, 5(11), 487–494.

Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for motor control. Neural networks, 11(7-8), 1317–1329.