EP 4174720 A1 20230503 - METHOD AND CONTROL DEVICE FOR CONTROLLING A TECHNICAL SYSTEM

Title (en)

METHOD AND CONTROL DEVICE FOR CONTROLLING A TECHNICAL SYSTEM

Title (de)

VERFAHREN UND STEUEREINRICHTUNG ZUM STEUERN EINES TECHNISCHEN SYSTEMS

Title (fr)

PROCÉDÉ ET DISPOSITIF DE COMMANDE D'UN SYSTÈME TECHNIQUE

Publication

EP 4174720 A1 20230503 (DE)

Application

EP 21205071 A 20211027

Priority

EP 21205071 A 20211027

Abstract (en)

[origin: WO2023072528A1] In order to control a technical system (TS), training data are read in, a training dataset (TD) comprising in each case a state dataset (S), an action dataset (A) and a resulting performance value (R) of the technical system. Using the training data, a first machine learning module (NN1) is trained to reproduce a resulting performance value (R) on the basis of a state dataset (S) and an action dataset (A). State datasets (S) are also supplied to a plurality of different deterministic control agents (P1, P2,...) and resulting output data are fed into the trained first machine learning module (NN1) as action data sets. Depending on performance values output by the trained first machine learning module (NN1), several control agents are then selected. According to the invention, the technical system is controlled in each case by the selected control agents (SP1,...,SPK), wherein further state datasets (ES), action datasets (EA) and performance values (ER) are captured and added to the training data. Using the training data thus enhanced, the above method steps are repeated, starting from the training of the first machine learning module (NN1).

Abstract (de)

Zum Steuern eines technischen Systems (TS) werden Trainingsdaten eingelesen, wobei ein jeweiliger Trainingsdatensatz (TD) einen Zustandsdatensatz (S), einen Aktionsdatensatz (A) sowie einen resultierenden Performanzwert (R) des technischen Systems umfasst. Anhand der Trainingsdaten wird ein erstes Maschinenlernmodul (NN1) darauf trainiert, anhand eines Zustandsdatensatzes (S) und eines Aktionsdatensatzes (A) einen resultierenden Performanzwert (R) zu reproduzieren. Weiterhin werden einer Vielzahl von unterschiedlichen deterministischen Steuerungsagenten (P1, P2,...) jeweils Zustandsdatensätze (S) zugeführt und resultierende Ausgabedaten als Aktionsdatensätze in das trainierte erste Maschinenlernmodul (NN1) eingespeist. Abhängig von durch das trainierte erste Maschinenlernmodul (NN1) ausgegebenen Performanzwerten werden dann mehrere der Steuerungsagenten selektiert. Erfindungsgemäß wird das technische System durch die selektierten Steuerungsagenten (SP1,...,SPK) jeweils gesteuert, wobei weitere Zustandsdatensätze (ES), Aktionsdatensätze (EA) und Performanzwerte (ER) erfasst und zu den Trainingsdaten hinzugefügt werden. Mit den so ergänzten Trainingsdaten werden die obigen Verfahrensschritte ab dem Training des ersten Maschinenlernmoduls (NN1) wiederholt.

IPC 8 full level

G06N 3/02 (2006.01); G06N 20/20 (2019.01); G06N 3/08 (2023.01)

CPC (source: EP)

G06N 3/02 (2013.01); G06N 20/20 (2019.01); G06N 3/086 (2013.01); G06N 3/088 (2013.01)

Citation (applicant)

TATSUYA MATSUSHIMAHIROKI FURUTAYUTAKA MATSUOOFIR NACHUMSHIXIANG GU, DEPLOYMENT-EFFICIENT REINFORCEMENT LEARNING VIA MODEL-BASED OFFLINE OPTIMIZATION, 2021, Retrieved from the Internet <URL:https://arxiv.org/abs/2006.03647>
VON CHENG S.SHI Y.QIN Q.: "Neural Information Processing", 2011, ICONIP, article "Promoting Diversity in Particle Swarm Optimization to Solve Multimodal Problems"
"Lecture Notes in Computer Science", vol. 7063, SPRINGER

Citation (search report)

[I] DE 102016224207 A1 20180607 - SIEMENS AG [DE]
[A] DE 102012216574 A1 20140320 - SIEMENS AG [DE]
[A] PHILLIP SWAZINNA ET AL: "Overcoming Model Bias for Robust Offline Deep Reinforcement Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 22 July 2021 (2021-07-22), XP081999699
[AD] TATSUYA MATSUSHIMA ET AL: "Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization", ARXIV.ORG, 23 June 2020 (2020-06-23), XP081692404
[A] WANG YUE ET AL: "Competitive Multi-agent Deep Reinforcement Learning with Counterfactual Thinking", 2019 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), IEEE, 8 November 2019 (2019-11-08), pages 1366 - 1371, XP033700541, DOI: 10.1109/ICDM.2019.00175

Designated contracting state (EPC)

AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

Designated extension state (EPC)

BA ME

Designated validation state (EPC)

KH MA MD TN

DOCDB simple family (publication)

EP 4174720 A1 20230503; CN 118176509 A 20240611; WO 2023072528 A1 20230504

DOCDB simple family (application)

EP 21205071 A 20211027; CN 202280072453 A 20220930; EP 2022077309 W 20220930