Electric demand response and bounded rationality: mean-field control for large populations of heterogeneous bounded-rational agents

The increased penetration of renewable energy sources into existing power systems induces challenges in supply–demand balancing. Demand-side flexibility is seen as an option to accommodate variability and limited predictability from renewable energy generation. Heat pumps at residential level, if well coordinated, can be one of those flexibility sources. The complexity involved is high though, since their coordinated operation combines control, population effects and the fact agents may actually not behave as rational decision-makers. We describe here a coordinated control framework that accounts for those aspects altogether. Decentralized model predictive control for large populations of heterogeneous agents is employed. As the cost to be minimized is affected by the population behaviour as a whole through the electricity price, the decentralized control is re-thought as a mean-field game. Existence and uniqueness of a Nash equilibrium are discussed while the Picard–Banach algorithm is used as a solution approach. It is extended to the case of bounded-rational agents. The impact on system dynamics of modelling agents as bounded rational is illustrated through numerical simulations. This article is part of the theme issue ‘The mathematics of energy systems’.

The increased penetration of renewable energy sources into existing power systems induces challenges in supply-demand balancing. Demandside flexibility is seen as an option to accommodate variability and limited predictability from renewable energy generation. Heat pumps at residential level, if well coordinated, can be one of those flexibility sources. The complexity involved is high though, since their coordinated operation combines control, population effects and the fact agents may actually not behave as rational decision-makers. We describe here a coordinated control framework that accounts for those aspects altogether. Decentralized model predictive control for large populations of heterogeneous agents is employed. As the cost to be minimized is affected by the population behaviour as a whole through the electricity price, the decentralized control is re-thought as a mean-field game. Existence and uniqueness of a Nash equilibrium are discussed while the Picard-Banach algorithm is used as a solution approach. It is extended to the case of bounded-rational agents. The impact on system dynamics of modelling agents as bounded rational is illustrated through numerical simulations.
This article is part of the theme issue 'The mathematics of energy systems'. systems with a more proactive role of consumers, it is of utmost importance to better account for behaviour in the design of mechanisms and operations [20]. An unaddressed problem in the demand response literature is whether bounded rationality impacts equilibria and corresponding agent decisions.
Our main contribution here is hence to show how bounded rationality may be accounted for within a decentralized model predictive control set-up for demand response seen as a mean-field game, as well as its impact on equilibria. The existence and uniqueness of a Nash equilibrium are preserved when generalizing to this bounded rationality case, while it allows to simulate different forms of bounded rationality. Note, however, that we are not aiming to hedge against bounded rationality as could be done in a robust optimization framework [21]. Similarly, we are not looking at how bounded rationality specifically affects decisions (here for demand response), e.g. in the way, the sparse-max model of [22] is employed.
The coordination of a large population of heat pumps through dynamic pricing is used as a representative problem, with interesting complexities coming from constraints on system state (i.e. indoor temperature within dwellings) and the heat pumps themselves (power setpoints and ramping constraints). Network-related aspects are not considered in our problem set-up, which is in line with a broad range of use cases related to demand response, e.g. peak shaving and balancing support, for which grid constraints are not accommodated. Electricity consumers are increasingly facing dynamic pricing of electric energy, whether grid-related constraints are considered or not. We adopt a mean-field decentralized control approach in which agents respond optimally to common price sequences issued by a non-profit central coordinator. These depend both on the predicted inelastic demand and total generation capacity, as-well as on the overall population flexible consumption. The objective is to find the price sequence that drives the population to the Nash equilibrium. The mean-field control of thermostatically controlled loads is performed through model predictive control (often abbreviated MPC) at an agent-based level and on a rolling-window set-up, hence allowing to adapt to information updates. Agents update their optimal control trajectories every time forecasts are revised. We eventually relax the rational agent assumption and assume that agents can deviate from their optimal (if rational) trajectory. We then characterize the resulting equilibria and compare them with the ideal rational case based on simulations. The existence and uniqueness of these equilibria are proven using tools from variational inequalities. The Nash equilibria are obtained using algorithms proposed in [23] for deterministic mean-field games with heterogeneous convex constraints. Finally, our approach is applied to different case studies where we explore the impact of bounded rationality and forecast accuracy on decisions and system outcomes.
The remaining of the paper is organized as follows: §2 describes the decentralized model predictive control set-up and related mean-field game. Section 3 covers the relevant tools from variational inequalities to study the Nash Equilibrium, as well as the solution approach. Section 4 presents the concept of bounded rationality in demand response and the way it is considered in the present work. Section 5 gathers results from numerical simulations to illustrate and analyse the impact of bounded rationality on-demand response within our decentralized model predictive control framework. Finally, §6 gathers a set of conclusions and perspectives for future work.

From decentralized model predictive control to the related mean-field game
We first present the coordination problem as a whole. The local optimization problems at agent level are then described, to eventually close with a discussion of the price determination, which makes the problem a mean-field game. All agents are so far considered rational decision-makers.  corresponding household-the heat pump design problem is not considered here, as emphasis is placed on operational problems only. A heat pump converts electricity into heat according to its coefficient of performance. The heat dynamics of the building allow to partly decouple electricity and heat in time, i.e. the building and air masses have enough thermal inertia to shift the heat peak in time to an hour when the electricity price is cheaper. Heat pumps then provide flexibility to the electricity system side, which may help to accommodate variability and limited predictability of renewable power generation, or different types of power system contingencies.

(a) Coordination problem
Being at a given time w within a simulation period of interest W, a forward-looking time window with τ lead times (referred to as the planning horizon) is considered in order to account for the time-binding constraints of the heat pumps, as well as the temporal dynamics of the indoor temperature in dwellings. In practice, for instance, if aiming to steer the population of heat pumps over the next 6 h with a 15 min temporal resolution, this yields 24 lead times. This also yields τ prices p t , one for each lead time t, t ∈ T = {0, . . . , τ − 1}. When relevant, this price sequence will be summarized by p, with p = [p 0 , . . . , p τ −1 ] . In a model predictive control framework with rolling window, however, only the decisions at lead time t = 0 are implemented and binding-decisions being (i) the electricity price for the central coordinator, and (ii) the electricity consumption of the heat pump at the agent level. Then, the window is shifted by one step, and the process continues.
Each household equipped with a heat pump is considered as an agent in the game, with index i. The set-up is illustrated in figure 1. Agents compete to procure electricity to cover their heat demand in the most cost-effective way-we are dealing with a non-cooperative game. This is while the population of electricity consumers is being coordinated by a non-profit central entity through prices. That central coordinator could be an aggregator, the supervisory node of an energy community [24] or a local electricity market. For a given agent i, i ∈ N = {1, . . . , N}, electricity demand is the sum of an inflexible (and must-serve) component d i and of a flexible (and elastic) component u i . The average inflexible demand (over the population of N agents) is an input to the problem of the central coordinator, in the form of a series of load forecastŝ d = [d 0 , . . . ,d τ −1 ] for the coming τ lead times. The reason why one works with averages instead of sums is due to the mean-field formulation that will be explained towards the end of this section. The other input to the central coordinator is κ, the total generation capacity scaled by the number of agents N.
For simplicity, the flexible component only relates to heat pumps. This may be readily extended to more advanced set-ups with additional flexible appliances, though at the cost of modelling complexity. In addition, having appliances that require integer variables in the modelling of their operations would affect convexity of the problems involved, and hence the results that we describe. This could potentially be accommodated through some form of convex relaxation, though not considered here and left for future work.
At the agent level, decisions are affected by the local system dynamics, but also by the prices p t , t ∈ T , communicated by the central coordinator.
the incentive to adjust her consumption profile according to her inherent flexibility. Following a decentralized model predictive control approach, each agent i computes her optimal heat pump consumption ] throughout a common planning horizon and communicates it to the central coordinator. Based on the aggregated response of the population, the central coordinator updates the price sequence p and communicates them to all agents. This process is repeated until a Nash equilibrium is reached, i.e. where no agent has an incentive to deviate unilaterally from her strategy.

(b) Local optimization problem
Assume that the heat pump dynamics can be described by a deterministic discrete-time linear model, so that where s t i ∈ S t i and u t i ∈ U t i are the state and input (or, decision) variables, respectively, of agent i at time t. From now on, we refer to ] as the sequences of states and input variables along the planning horizon. Note that in the dynamical system model (2.1), the constant C t i summarizes the impact of exogenous ambient conditions at time t, which also drive that dynamical system. In the simplest case, C t i is time-invariant and representing the long-term indoor temperature when heat pumps are off. In the more general case, C t i is a linear function of meteorological drivers, e.g. outdoor temperature, solar irradiance and wind, parameterized by the building characteristics. The detailed formulation of heat dynamics model for heat pump control that we eventually used can be found in [25] and references therein. Finally, for an extensive coverage of more general dynamical models for the heat dynamics of buildings, the reader is referred to [26] among others.
In the above, the state variable represents the indoor temperature, which is bounded by predefined agent comfort setpoints [s i , s i ], also referred to as comfort bounds. For instance, the indoor temperature must be within s i = 19 • C and s i = 22 • C. A t i , B t i and C t i are the coefficients that describe the heat dynamics of the dwelling, considering thermal resistances, heat capacities, coefficient of performance, temperature of the building envelope, ambient temperature, window area and solar irradiance. The initial indoor temperature s 0 i is known and the objective of each agent is to keep it close to a tracking temperatures i set by the agent throughout the planning horizon. The tracking temperature is considered time-invariant, but the time-varying case could be similarly handled. At each lead time, the control has to decide the power u t i it should consume in order to keep the indoor temperature within the comfort bounds [s i , s i ] and as close as possible to the tracking temperature. The sequence of input variables u i is also constrained by the heat pump capacity c i , and ramping limits δu i , δu i , defining the feasibility set U i , i.e.
Consequently, the strategy of agent i ∈ N gathers both state and input variables within a single vector x i = [s i ; u i ]. Those are necessarily linked to the dynamics of the system. The strategy of each agent i ∈ N belongs to the convex and compact strategy set described by the model for the dynamical system, and the natural boundaries for the state and input variables. Each agent i is commonly assumed rational, hence willing to find the strategy x i that minimizes the cost function J i given that all other agents are consuming according to x −i = {x j } j∈N \{i} . This is the aspect we will aim at relaxing in a later part of this work. In practice, x −i is obviously not known to agent i, but its effect is summarized through the price sequence broadcast by the central coordinator. The cost function of agent i ∈ N is defined as We use a game-theoretical notation here since, as will be further described in the next paragraph, the decisions of agent i are necessarily influenced by those of all other agents j, j = i, through the sequence of electricity prices. That cost function is the sum of two terms: the electricity procurement cost and the discomfort term. This discomfort term penalizes the (squared) deviations of the indoor temperature from the tracking temperature by a factor α i > 0. This part of the loss function is based on local information only, and does not relate to the other agents. By contrast, for the calculation of the electricity procurement cost, one needs the sequence of electricity prices p t , t ∈ T . This sequence is communicated by the central coordinator, and is a function of the decisions of the population as a whole.
Eventually, the local optimization problem to be solved by agent i, every time a price sequence is communicated, is The solution of this problem is denoted x i (p), as it readily depends on the price sequence communicated by the central coordinator.

(c) Price determination and mean-field game
The electricity prices p t , t ∈ T , are calculated based on the predicted inelastic demand id t i , on the total generation capacity of the system κ N , and on the aggregated consumption of the heat pumps ( i u t i ). We follow here an approach similar to that used in [10]. The electricity price p t is assumed to be a linear function of the ratio between total demand and total generation capacity κ N at time t. The total demand is the sum of the inflexible demand of the system and the aggregated demand from the population of heat pumps. The total generation capacity is a function of the population size N (hence the use of the index N for the related variable). This assumption of a linear relationship may be deemed acceptable if considering that the energy prices received by the consumers are directly proportional to wholesale energy prices, and overlooking temporal constraints. And, even if it was not linear, that assumption could be justified by the idea of a local linearization around an operating point in the market.
Consequently, the electricity price at time t ∈ T can be expressed as where β > 0 is the coefficient for the linear function,d t i is the predicted inflexible demand for agent i at time t, u t i is the flexible demand from the heat pump of agent i at time t and κ N is the total generation capacity.
The basic idea when rethinking this problem set-up as a mean-field game is to consider the case where N tends towards infinity. There, one has i.e. asymptotically the total generation capacity is the scaled value κ, while This goes along common practice, which is that, instead of predicting consumption for each household individually, one aims at predicting the total consumption of these households readily (here, in a scaled version). Finally, one can also defineū t as so that the sequenceū = [ū 0 , . . . ,ū τ −1 ] is the average electricity consumption profile for heat pumps across the whole population. Then, the individual profiles are to be seen as local deviations due to variations in initial states, constraints, etc. Subsequently, considering that N is large enough, the electricity price at time t ∈ T from (2.6) is approximated as The optimal decisions of all agents involved are coupled by the common electricity price resulting in a deterministic mean-field game: the decision of each agent depends on the mean input variable across the whole population [23]. We refer to this mean-field game as G A . At equilibrium, we write p = [p 0 , . . . , p τ −1 ] the optimal price sequence, and x i (p ) the optimal strategy of agent i, i ∈ N .
x (p ) then concatenates the optimal strategies of all agents.

Theoretical results and solution approach
Since the set-up considered here is different from what can be found in the literature for MF games for, e.g. fleets of electric vehicles [27] and populations of thermostatically controlled loads [12], we first aim here at showing existence and uniqueness of Nash equilibrium for the problem at hand, based on known techniques to connect mean-field games and variational inequalities. We then describe an adequate solution approach to obtain it.

(a) Existence and uniqueness of a Nash equilibrium
We show existence and uniqueness of a Nash equilibrium by connecting the solution of the meanfield game G A to the solution of variational inequalities. This means that, independently of the agent heterogeneity, there exists a price sequence p that drives the population to a unique mean consumption profile, based on the individual optimal strategies x i of all agents. This consumption profile is a Nash equilibrium, implying that each agent does not have an incentive to deviate from her electricity consumption decisions, given the electricity consumption decisions of the other agents. If this price sequence can be computed, it can be used to perform forward and real-time control of heat pump populations. Necessarily, this solution is also a social welfare maximizer, in the sense that it minimizes the overall procurement costs for the population of electricity consumers. Let us first define the variational inequality problem. In the following, we write K = i K i the cartesian product of the strategy sets of all agents. And, to lighten notations, we use m = 2τ N as the dimension of K.
A key result that we are to exploit, for instance found in [7], is that non-cooperative games may be readily connected to the solution of a variational inequality problem. Let us write K −i the cartesian product of the strategy sets of all agents except for agent i, i.e. K −i = j∈N \{i} K j .

Theorem 3.2 (Nash equilibria and variational inequalities, Theorem 2 in [7]
). Suppose that, for the game G A , the non-empty strategy sets K i are closed and convex, and that for every fixed In view of the problem set-up described previously, our mean-field game G A can be written as a variational inequality problem. We consequently use two results from [28] to show existence and uniqueness of the solution of the VI(K, F) problem and, by theorem 3.2, to the mean-field game G A .

Theorem 3.3 (Corollary 2.2.5 in [28]).
Suppose that K is a compact and convex set and that the mapping F is continuous. Then, the set SOL(K, F) is non-empty and compact.
For our mean-field game G A , readily using theorem 3.3, we first deduce that the solution to our VI(K, F) problem exists. That solution is a Nash equilibrium for the mean-field game G A .

Proposition 3.5. For the non-cooperative mean-field game G A related to the coordination of a population of heat pumps within a demand-response framework, a Nash equilibrium exists.
Proof. The strategy set of each agent K i is formed by the upper and lower temperature of the dwelling, as well as the power bounds and ramping constraints of the heat pump. This satisfies the compactness and convexity of K i , and so it does for the strategy set of the game K = i K i . The vector x stands for the decision variables of all agents stacked together Furthermore, the map F(x) ∈ R m is continuous, since the cost functions J i are quadratic and thus, continuous over x and convex over x i for fixed values of x −i ∈ K −i . Therefore, the solution of the game G A exists by theorem 3.3.
Following theorem 3.4, we rewrite the variational inequality problem as an optimization problem and study its properties to eventually deduce the uniqueness of the Nash equilibrium. Proposition 3.6. For the non-cooperative mean-field game G A related to the coordination of a population of heat pumps within a demand-response framework, the Nash equilibrium is unique.
Proof. If the conditions of theorem 3.3 are satisfied, the game G A can be expressed as a VI(K, F) problem. Following theorem 3.4, we can write an equivalent optimization problem that solves VI(K, F). To show that the solution to the VI(K, F) is unique (and by equivalence, uniqueness of Nash equilibria for G A ), we compute the Jacobian ∇F(x) ∈ R m×m as Since the Jacobian of F is symmetric and condition (b,c) of theorem 3 holds, the complete version of theorem 1.
which then results in the following optimization problem: Problem (3.1) has a unique solution due to its convex and compact constraint set, and strict convexity of the objective function, being the sum of strictly convex functions [29]. Since the solution of (3.1) defines the solution set SOL(K, F), the solution of the original mean-field game G A is also unique.

(b) Solution approach
At the agent level, a requirement for computing the optimal strategy is that each agent i has access to the electricity price, which is broadcast by the central coordinator. We use this information structure because it allows agents to compute their optimal strategy without having access to other agent states or aggregation parameters. They all respond to a macroscopic incentive p, the price sequence, that accounts for their contribution u i . Therefore, the objective of the mean-field control is to find the price sequence p , such that the set of resulting strategies {x i } i∈N of the individual agents yield an almost mean-field Nash equilibrium of the game G A . The definition of an almost mean-field Nash equilibrium (also referred to as ε-Nash equilibrium) can be found in [23]. Basically, it slightly relaxes the definition of a Nash equilibrium for the case of mean-field games. For a mean-field ε-Nash equilibrium, agent i has a benefit of maximum ε (with ε small) if altering her own strategy, given the strategies of all other agents. For ε = 0, one obtains a classical Nash equilibrium.
The solution approach that is used here is inspired from the developments in [23], and references therein. As the population size N tends towards infinity (or at least, with N large enough), the mean-field Nash equilibrium becomes a mean-field ε-Nash Equilibrium with ε = O(1/N). In order to compute the price sequence that yields the ε-Nash equilibrium both Picard-Banach and Mann algorithms could be seen as relevant. The Picard-Banach algorithm was already used in [10] to compute optimal price signals for the decentralized charging of fleets of electric vehicles. Even-though our problem does not strictly fulfil the conditions of the Picard-Banach algorithm for finding a fixed point, we have observed from numerical simulations that it actually converged, and even faster than the Mann algorithm. Therefore, we only describe and focus on the Picard-Banach algorithm here. Further theoretical analysis should be performed to understand why it converges and performs so well with our problem structure.
Remember that when describing the problem set-up and related game, emphasis was placed on a planning horizon with τ time steps. However, the start of this planning horizon is at a given time w ∈ W, with W the simulation period. The problem is not solved a single time only. It is solved successively on sliding windows. For a given time w ∈ W even if the problem is solved for the whole period [w, . . . , w + τ − 1], only the decisions for time w is implemented (or 'binding')i.e. both price at the central coordinator level, and electricity consumption for heat pumps at the agent level. Then, the window is moved by one step with new information available, and the problem is solved for that new window. The process iterates this way until the end of the simulation period W.
Because we place ourselves at a given time w that conditions the information available, state of the system, etc., this is here reflected in our notations with '. |w ' indicating that the related variable is conditioned by the information available at time w. The solution approach for the decentralized mean-field model predictive control (hence, the MPC described) consists of the following steps: This solution approach is presented in algorithmic form in Algorithm 1. Compute u k+1 Update p k+1 |w using (2.10)

Accounting for bounded rationality
After reviewing various aspects of bounded rationality that are of relevance to demand response, we look at a simple, though generic, approach based on the perturbation of the agent optimal strategies. It allows looking at the effect of bounded rationality on outcomes, without having to invest into complex and possibly intractable models.

(a) Bounded rationality in demand response
Different strands in behavioural economics have established a new framework to understand human decision-making, as well as its biases, by fundamentally questioning the rational behaviour of human decision-makers. Based on a large collection of experimental studies, behavioural economists and psychologists have shown that decision-makers are not rational, as normative models of utility maximization would predict [15]. These models assume that decisionmakers have full information and can compute optimal decisions. Conversely, experimental evidence shows that decision-makers do not always have known, ordered, and consistent preferences, leading to sub-optimal decisions and systematic biases [16]. Typically, decisionmaking occurs under time constraints and incomplete knowledge. Processing capabilities of decision-makers are restricted by cognitive limitations. Decision-makers are therefore bounded rational, using heuristics and mental shortcuts, especially in situations characterized by high level of complexity and uncertainty [30]. The literature on behavioural economics is now very rich, while easily communicating human biases in decision-making to a broad audience-see [31], for instance. Thus far, demand response programs that assume consumer rationality do not rely on realistic views of energy consumers. Economists argue that existing theories do not represent energyrelated behaviour in the residential sector [17]. In general, consumer information regarding energy use is incomplete and systematically incorrect, e.g. people tend to overestimate the amount of energy they use and the amount that can be saved by adopting energy-efficient technologies [32]. There are also often inconsistencies between observable behaviour and individuals selfreported knowledge, values, attitudes and intentions [16].
A few examples of such bounded rationality include: Time inconsistency: decision-makers show a preference towards immediate outcomes. When valuing costs and benefits over time, they apply hyperbolic discounting rather than a constant discount rate as described by normative economic theory. In this way, decisions in the future are farsighted while decisions in the present are short-sighted. This immediacy effect gives rise to time-inconsistencies, e.g. not prioritizing time and money to purchase new energy-efficient appliances [16]; Anchoring on defaults: there is a tendency to maintain the status-quo with resistance to change even if alternatives may yield better outcomes. Decision-makers generally choose the default option, especially when the amount of information or complexity increases [16]. An example in the residential energy sector is the selection of an electricity retailer: consumers tend to stick to the default option even if there are better deals in the market; Social comparisons: decisions are affected to a large extent by social norms and behaviours of others. There is also a tendency to form social comparisons and to evaluate performance or well-being in relative terms [16]. A good example is the result of the campaign carried out by an electricity retailer who managed to influence consumers behaviour by comparing their energy use to their neighbours.
In all cases, the basic idea of bounded rationality when used in decision-making in relation to demand response is that there is going to be a deviation from the optimal decision one would make if being rational (having complete information, enough time and computing power, etc.). The characteristics of that deviation may depend on the type of bounded rationality, e.g. in terms of bias and variance of such deviations, temporal and population-based structure, etc. It is intuitively expected that it leads to some form of sub-optimality in terms of overall social welfare-hence, higher electricity procurement costs overall for our demand response problem.

(b) Bounded rationality in the decentralized mean-field model predictive control
There are several models proposed by behavioural economists that describe bounded rationality in decision-making. However, it becomes challenging to adapt them to engineering applications such as demand response, for them to be included into decentralized control problems like the one considered here. Our approach to account for bounded rationality hence does not consist in explicitly modelling it in its different forms, but instead in perturbing optimal agent strategies. More advanced approaches should be considered in the future, e.g. by re-thinking the local optimization problem (2.5) using prospect theory (see [33] and references therein) or similar.
We can assume that if agents are bounded rational they will deviate from their optimal response by a small amount ξ . We use this ξ to simulate the gap between the optimal strategy obtained by solving the local optimization problem (2.5) and the actual response of the agents. Focusing on a given time step w of the simulation period W, based on the outcome of (2.5), the actual implemented controlũ 0 i|w for the heat pump is where u ,0 i|w is the optimal power consumption of the heat pump and ξ i|w is the small deviation applied by the bounded-rational agent. This small deviation ξ i|w may be fixed-to be more general, it is considered here as the realization from a random variable Ξ i|w with distribution F i|w , Ξ i|w ∼ F i|w . For instance, in the result section, Gaussian distributions will be employed for simplicity. Other distributions may be used, possibly asymmetric, to reflect different views on bounded rationality. In addition, such distributions may be defined for the population as a whole while being time-invariant. However, in a more general case, such distributions may be defined on an individual agent basis, or more likely for groups of agents, while also varying in time e.g. with the time of day, to reflect that bounded rationality may be linked to the type of activity of the agents. An alternative approach to perturbing the decisions of the agents could consist in defining a function that transforms the prices originally broadcast by the central coordinator into 'perceived' prices used as input to the local optimization problems.
This small deviation is only added for t = 0 of the planning horizon since it is the only decision that the model predictive control actually implements anyway. Therefore, when each agent computes her optimal strategy, at Step 9 in Algorithm 1, we add a value ξ i|w drawn from F i|w to u ,0 i|w and re-optimize fixing u 0 i|w toũ 0 i|w in order to verify that the bounded-rational strategy is within the feasible space K i . Consequently, the bounded-rational decision implemented at t = 0 actually propagates throughout the planning period T . One eventually obtainsx i|w (p k |w ) = [s i|w (p k |w );ũ i|w (p k |w )] the bounded-rational strategy of the agent. The updated version of the Picard-Banach decentralized model predictive control, for the case of bounded-rational agents, is described in Algorithm 2.
This approach to account for bounded rationality in the decentralized model predictive control alters the original mean-field game that was obtained in §2. It is not straightforward to see, based on theoretical arguments, whether the existence and uniqueness of the Nash equilibrium is preserved or not. Extensive sets of simulations showed that it seems to be the case though. The necessary theoretical treatment of this modified mean-field game is left for future work.

Application and case study
The aim of this section is firstly to illustrate the workings of the decentralized model predictive control for a large population of heat pumps. In parallel, it allows illustrating the impact of Calculate p k |w withd |w and u k |w = 1 Obtainũ 0 i|w based on ξ i|w using (4.1), ∀i ∈ N 11 Obtainx i|w (p k |w ) = [s i|w (p k |w );ũ i|w (p k |w )] by re-optimizing through the planning Update p k+1 |w using (2.10)

14
Update Compute s 1 i|w as in (2.1) and set s 0 forecast uncertainty and information updates. Finally, the impact of bounded rationality is assessed, both at the agent and system levels.

(a) General set-up
Different case studies allow showing how forecast accuracy and bounded rationality affect the Nash equilibrium. In all the simulations, we consider that the number of agents is N = 10 6 . We base our studies on the Danish power system, with an installed capacity of 1.2 10 7 kW and a typical inflexible load curve for March 2018 depicted in figure 2. This results in the total generation capacity scaled by the population size κ = 12 kW, as well as the d t the inflexible demand scaled by the population size as shown in the figure. In the perfect information case, the forecast for the inflexible load is the same of the observation. The planning horizon considered in the optimization problem of each agent is τ = 13 h, also using a rolling window approach. It means that, at each time step and based on the information updates, agents consistently solve their local optimization problems for these τ hours. Note we will also focus on a given period of 13 h in following when analysing the results, for which updates where computed at each time step based on local optimization problems solved for the following τ hours. The linear coefficient used in the price function is set to β = 0.12 following [7]. In the simulations, we consider that the population of agents is heterogeneous and is characterized by different input parameters, e.g. heat pump capacity, ramping limits, building thermal characteristics, solar irradiance and ambient temperature profiles, as well as tracking temperature and comfort bounds. Parameter values for individual agents are drawn independently from a Gaussian distribution. An overview of the mean and standard deviation for each parameter distribution can be seen in table 1.
The data used to simulate the building heat dynamics is taken from [25] and the solar irradiance and ambient temperature profiles are obtained from [34]. These sources are used as the mean of a normal distribution from which we sample the building parameters and weather  Table 1. Characterization of the heterogeneous population of agents considered, based on the mean and standard deviation of their parameters. time series for each agent. The temperature penalization factor was set to α = 100. This value was chosen after doing some exploratory tests: lower values were not enough to track the reference temperature while higher values resulted in indifference to electricity prices. The predicted inelastic demand of the system is generated with adding a multivariate Gaussian deviation with an exponentially decreasing covariance structure to the inflexible load measurements. The covariance matrix Σ ∈ R τ ×τ is based on a correlation matrix C ∈ R τ ×τ and a model for the standard deviation σ ∈ R τ as a function of the lead time. This writes where C = (ρ ij ) ij is a correlation matrix, σ a standard deviation vector and • the Schur product. The parameter Ψ controls the growth of uncertainty with increasing lead times, and hence the forecast accuracy. In parallel, ν is a range parameter that controls the exponential decay of the covariance. With such an approach, forecasts are inherently unbiased. Relevant biases in the forecasts could be readily accounted for, and their effects additionally studied.

(b) Model predictive control versus open-loop strategy
We start by showing the difference between the open-loop and the model predictive control strategy. In the first case, the forecasts are not updated, and the profile of heat pump consumption is obtained at once with the available information at 20:00 for the following 13 h of the planning horizon. In contrast, with the model predictive control strategy, forecasts are updated every hour, allowing the population of agents to adjust their profile to changes in the inelastic demand reflected through the electricity prices. To compare both strategies, we simulate a disturbance in the system: at 00:00 the inelastic demand undergoes a step reduction and recovers at 3:00. We differentiate between two cases, perfect and imperfect forecasts. In the former case, the forecast captures perfectly the step reduction and its length, while in the latter case, the forecast step reduction lasts until 8:00. The mean heat pump consumption across the population obtained with the open-loop strategy is plotted in figure 2 with dark blue and cyan dots for both perfect and imperfect forecast respectively. Both coincide since the information available at 20:00 is the same in both cases. However, there is a difference with the model predictive control strategy, plotted with dark blue and cyan lines, for perfect and imperfect forecast cases, respectively. In the open-loop strategy, the agents do not know that there is going to be a reduction in the inelastic demand and consequently in the electricity prices, and only adjusts their consumption to general shape for the inelastic demand trajectory over the coming period, with a succession of smooth ramps down (20:00 to 3:00) and up (3:00 to 9:00). Meanwhile, with the model predictive control strategy, the agents increase their consumption in the hours when the step reduction is happening. In the perfect forecast case, the increase is steeper, since the low electricity prices only last 3 h. In contrast, in the imperfect forecast case the increase happens a bit later, assuming that the electricity prices remain lower for longer time. Obviously, after the disturbance has passed (at 3:00), both MPC with perfect and imperfect forecasts realize that the system is back on the original trajectory. However, the impact of the decisions made during the 3 h window of the disturbance necessarily propagates due to system dynamics and constraints.

(c) Effect of forecast accuracy
Once we know that the MPC strategy works better than the open-loop one (in the sense that it better adapts to the changes in the environment), we now focus on the impact of forecast accuracy. For that purpose, different forecasts are generated by adding multivariate Gaussian deviations as described in the above. We vary the forecast accuracy by choosing different values of Ψ = {0.1, 0.2, 0.3} and perform a Monte Carlo simulation with 10 runs for each value of Ψ . The parameter ν that controls the exponential decay of correlation among lead times is set to ν = 13.
In figure 3, the mean consumption across the population of heat pumps is depicted for each forecast accuracy and compared to the perfect forecast case. The larger forecast inaccuracies lead to larger deviations from the perfect forecast case. This deviation increases with the lead time, which is in line with the covariance structure, and are more pronounced when the inelastic demand rate of change is higher, e.g. at 22:00. Even if the Nash equilibrium obtained in each case is different it does not have a significant impact on the objective function. The mean objective function across the population is A C1649.58 in the perfect forecast case, and A C1649.86, A C1649.21, A C1648.99 for Ψ = {0.1, 0.2, 0.3}, respectively. Those values are very close, and this result should be interpreted as the fact that changes in forecast uncertainty are mitigated by population effects when it comes to population-average costs. In addition, the distributions have similar shapes and quantiles, therefore, we can conclude that the effect of the forecast accuracy is mitigated by the population effects in terms of cost, beyond the mean only, but not in terms of consumption. Recall that the plot shows the mean consumption across the population, but the aggregated effect of this small deviations has a significant impact on the system demand.

(d) Effect of bounded rationality
Finally, we relax the rationality assumption and study the impact of bounded rational agents on the system equilibrium. We model bounded rationality following (4.1). In the cases where adding the ξ i|w lead to infeasibility we stick to the rational decision instead. This assumption could fit to reality, i.e. the local controller for the heat pump may check whether the human intervention may lead to infeasibility and if so overrides it. Further work could focus on other alternatives, e.g. projection on the feasible set. Five different set-ups of bounded rational agents are defined and denoted by 'BR 1', 'BR 2', 'BR 3', 'BR 4' and 'BR 5':   -In the first three cases ('BR 1', 'BR 2' and 'BR 3'), ξ is drawn from a normal distribution with μ = 0 and σ = {83, 167, 250}, respectively. Those distributions are the same for all agents and all times. In practice, for instance, with 'BR 1', 99.7% of the population will deviate from the rational choice by ±249W; -In the last two cases ('BR 4' and 'BR 5'), we cluster the agents into three groups with ξ i|w drawn from three different normal distributions characterized by μ = {−500, 0, 500} and σ = 83. In 'BR 4', agents are equally distributed between the groups, while in 'BR 5' the distribution among groups was randomly chosen. There, the distributions are timeinvariant, but vary among groups of agents.
In all the cases, for simplicity, we consider that there is perfect forecast accuracy. We did not include results where both bounded rationality and forecast accuracy are studied together, since by construction they are not interdependent, and it may then be difficult to isolate the effect of bounded rationality. Obviously, in practice, both effects will be there at the same time and possibly magnify each other.  Figure 4 shows the difference of the mean consumption across the population between the bounded-rational and the rational case. During the hours with low inelastic demand, the equilibria computed for the bounded-rational cases are systematically lower than for the rational case. In those hours, the heat pumps are running at high load, taking advantage of low electricity prices. Therefore, the bounded-rational agents trying to increase their consumption would hit the heat pump capacity. Meanwhile, the bounded-rational agents who try to reduce power consumption of the heat pumps are still capable to do so, resulting in a lower mean consumption. This effect is inverted when the inelastic demand increases, resulting in a higher mean consumption. The spikes that appear in 'BR 4' and 'BR 5' are a result of how bounded rationality diffuses in time. In those cases, there are groups of bounded-rational agents who systematically decrease their consumption by a larger amount. This makes that for given time steps, either the lower bound of the heat pump or the low temperature bound of the comfort settings is reached, and the system needs to recover in the following time step. Further work could focus on adding complexity to the modelling of the heat pump; e.g. by introducing minimum onoff times, the bounded rational decisions are expected to have a higher diffusion in time. Lower mean consumption translates to lower electricity prices, especially in the hours of low inelastic demand as we discussed before. The time series of electricity prices can be observed in figure 5.
The bounded rational decisions also have an impact on indoor temperature and consequently on agent comfort. Figure 6 illustrates the deviations of the indoor temperature from the reference temperatures i . The more the agents deviate from the optimal response, the more the temperature deviates from its reference. This translates to higher discomfort and then higher costs.
The total costs including both the cost of electricity and the discomfort term are depicted in figure 7. The cost function is on average higher for bounded rational agents. This is mainly due to the discomfort term which is 586-15761% larger than in the rational case on average. On the other hand, the cost of electricity is 0-6% smaller than in the rational case on average, while the 90% quantile shows a cost of electricity that is 0-22 larger than in the rational case. Note that the lower electricity costs are linked to the period simulated, as previously discussed with figure 4.

Conclusion
With the increasing deployment of distributed energy resources, it is of utmost relevance to investigate alternative approaches to their control in a market-based environment for various applications, e.g. demand response. We placed emphasis here on the case of population of heat pumps to be coordinated. Model predictive control was chosen as the control framework, to accommodate the temporal inter-dependencies in the heat dynamics of dwellings, and information renewal with regularly updated forecasts. The decentralized control was re-thought as a mean-field game, where the price sequence communicated by the central coordinator summarizes the impact of the decisions (as well as constraints and states) of all agents involved. It allowed to give a number of theoretical results related to the existence and uniqueness of a Nash equilibrium for such a game. The Picard-Banach algorithm was used as a solution approach to obtain the optimal price sequence and strategies of the agents. Importantly, we relaxed the rational assumptions for the agents in order to obtain a complete framework that allows analysis of the impact of bounded rationality on outcomes at system and agent levels. A Picard-Banach algorithm was proposed for the case of bounded-rational agents. Additional theoretical considerations are necessary to better characterize the properties of the game with bounded-rational agents and the approach to account for bounded rationality that we proposed.
The numerical simulations showed that the effect of forecast accuracy is mitigated by population effects. By contrast, bounded rationality has a big influence on system equilibria, and consequently on electricity prices, comfort and total costs. Further emphasis should be placed on understanding human decision-making and behaviour when it comes to electricity consumers and their use of distributed energy resources (also including stationary battery storages, electric cars, etc.) in order to optimally use and steer available flexibility on the demand side. It is clear that more advanced models for various types of bounded rationality, e.g. anchoring and hyperbolic