About

My research connects computer science and economic theory, around the intersection between multiagent systems, game theory and optimization. I study the computational aspects of rational decision making in two wide distinct settings: when the players do not have complete information and should gather information by interacting and learning from its environment and when players have information but are unable to accurately compute an exact solution (e.g. due to computational, memory or time limitations). I’m a member of the board of directors of the Association for Trading Agent Research (ATAR) since 2015.

Multiagent Systems, Machine Learning, Reinforcement Learning, Algorithmic Trading, Computational Game Theory, Mechanism Design, Sensor Networks, Internet of Things, Distributed Systems

New journal paper in JAAMAS

drifterVSnodetection

Efficiently detecting switches against non-stationary opponents

Pablo Hernandez-Leal · Yusen Zhan · Matthew E. Taylor · L. Enrique Sucar · Enrique Munoz de Cote

Abstract

Interactions in multiagent systems are generally more complicated than single agent ones. Game theory provides solutions on how to act in multiple agent scenarios; however, it assumes that all agents will act rationally. Moreover, some works also assume the opponent will use a stationary strategy. These assumptions usually do not hold in real world scenarios where agents have limited capacities and may deviate from a perfect rational response. Our goal is still to act optimally in this cases by learning the appropriate response and without any prior policies on how to act. Thus, we focus on the problem when another agent in the envi- ronment uses different stationary strategies over time. This will turn the problem into learning in a non-stationary environment, posing a problem for most learning algorithms. This paper introduces DriftER, an algorithm that 1) learns a model of the opponent, 2) uses that to obtain an optimal policy and then 3) determines when it must re-learn due to an opponent strategy change. We provide theoretical results showing that DriftER guarantees to detect switches with high probability. Also, we provide empirical results showing that our approach outperforms state of the art algorithms, in normal form games such as prisoner’s dilemma and then in a more realistic scenario, the Power TAC simulator.

Keywords Learning · Non-stationary environments · Switching strategies · Repeated games

more...

Journal paper in Adaptive Behaviour

quadheliTransfer learning by prototype generation in continuous spaces

Enrique Munoz de Cote, E. O. Garcia, E. Morales

Abstract

In machine learning, learning a task is expensive (many training samples are needed) and it is therefore of general interest to be able to reuse knowledge across tasks. This is the case in aerial robotics applications, where an autonomous aerial robot cannot interact with the environment hazard free. Prototype generation is a well known technique commonly used in supervised learning to help reduce the number of samples needed to learn a task. However, little is known about how such techniques can be used in a reinforcement learning task. In this work we propose an algorithm that, in order to learn a new (target) task, first generates new samples—prototypes—based on samples acquired previously in a known (source) task. The proposed approach uses Gaussian processes to learn a continuous multidimensional transition function, rendering the method capable of reasoning directly in continuous (states and actions) domains. We base the prototype generation on a careful selection of a subset of samples from the source task (based on known filtering techniques) and transforming such samples using the (little) knowledge acquired in the target task. Our experimental evidence gathered in known reinforcement learning benchmark tasks, as well as a challenging quadcopter to helicopter transfer task, suggests that prototype generation is feasible and, furthermore, that the filtering technique used is not as important as a correct transformation model.

more...

Journal paper in JAAMAS

jaamasAn exploration strategy for non-stationary opponents

Pablo Hernandez-Leal · Yusen Zhan · Matthew E. Taylor · L. Enrique Sucar · Enrique Munoz de Cote

Abstract

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. In this work we shed light on how to design exploration strategies in non-stationary and adversarial environ- ments. Our proposed adversarial drift exploration is able to efficiently explore the state space while keeping track of regions of the environment that have changed. This proposed exploration is general enough to be applied in single agent non- stationary environments as well as in multiagent settings where the opponent changes its strategy in time. We use a two agent strategic interaction setting to test this new type of exploration, where the opponent switches between different behavioral patterns to emulate a non-deterministic, stochastic and adversarial en- vironment. The agent’s objective is to learn a model of the opponent’s strategy to act optimally. Our contribution is twofold. First, we present drift exploration as a strategy for switch detection. Second, we propose a new algorithm called R- max# for learning and planning against non-stationary opponent. To handle such opponents, R-max# reasons and acts in terms of two objectives: 1) to maximize utilities in the short term while learning and 2) eventually explore opponent behav- ioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent’s switch and learn a new model in terms of finite sample complexity. R-max# makes efficient use of exploration experiences, which results in rapid adaptation and efficient drift exploration, to deal with the non-stationary nature of the opponent. We show experimentally how using drift exploration out- performs the state of the art algorithms that were explicitly designed for modeling opponents (in terms average rewards) in two complimentary domains.

more...

2nd place at PowerTAC finals! Congratulations to the trading team

2nd place coldpowerThe trading team has won 2nd place at the PowerTac 2016 tournament finals held at IJCAI. You can check the results at the tournament's website.

pdf of the certificate

more...

Paper at IEEE-CEC

Congratulations to Fernando and Ansel for the acceptance of the paper

Load Pattern Clustering Using Differential Evolution With Pareto Tournament, to be presented at IEEE-CEC.

more...

Consider submitting to Agent-Mix workshop at IJCAI 2016 are open

===============================================
CALL FOR PAPERS
First Workshop on Interactions with Mixed Agent Types (Agent-Mix)
Held at the Int. Joint Conference on Artificial Intelligence 2016 (IJCAI-16)
http://ccc.inaoep.mx/inmat
===============================================

more...


Paper at AAMAS'16

Congratulations to Pablo Hernandez-Leal, Benjamin Rosman, Matthew Taylor, L. Enrique Sucar for the paper A Bayesian approach for Learning and Tracking Switching, Non-stationary Opponents to be presented at AAMAS'16, Singapore 9-13 May, 2016

 

more...

Paper "Electrical Load Pattern Shape Clustering Using Ant Colony Optimization" accepted at EvoStar'16

Full paper at EvoStar'16, Porto, Portugal, 30 March - 1 April 2016

Abstract

Electrical Load Pattern Shape (LPS) clustering of customers is an important part of the tariff formulation process. Nevertheless, the patterns describing the energy consumption of a customer have some characteristics (e.g., a high number of features corresponding to time se- ries reflecting the measurements of a typical day) that make their analysis different from other pattern recognition applications. In this paper, we propose a clustering algorithm based on ant colony optimization (ACO) to solve the LPS clustering problem. We use four well-known clustering metrics (i.e., CDI, SI, DEV and CONN), showing that the selection of a clustering quality metric plays an important role in the LPS cluster- ing problem. Also, we compare our LPS-ACO algorithm with traditional algorithms, such as k-means and single-linkage, and a state-of-the-art Electrical Pattern Ant Colony Clustering (EPACC) algorithm designed for this task. Our results show that LPS-ACO performs remarkably well using any of the metrics presented here.

more...