Document Type




Publication Date



Institute of Electrical and Electronic Engineers

Source Publication

IEEE Transactions on Vehicular Technology

Source ISSN



This paper proposes a demand response method to reduce the long-term charging cost of single plug-in electric vehicles (PEV) while overcoming obstacles such as the stochastic nature of the user's driving behaviour, traffic condition, energy usage, and energy price. The problem is formulated as a Markov Decision Process (MDP) with an unknown transition probability matrix and solved using deep reinforcement learning (RL) techniques. The proposed method does not require any initial data on the PEV driver's behaviour and shows improvement on learning speed when compared to a pure model-free reinforcement learning method. A combination of model-based and model-free learning methods called Dyna-Q reinforcement learning is utilized in our strategy. Every time a real experience is obtained, the model is updated, and the RL agent will learn from both the real experience and “imagined” experiences from the model. Due to the vast amount of state space, a table-lookup method is impractical, and a value approximation method using deep neural networks is employed for estimating the long-term expected reward of all state-action pairs. An average of historical price and a long short-term memory (LSTM) network are used to predict future price. Simulation results demonstrate the effectiveness of this approach and its ability to reach an optimal policy quicker while avoiding state of charge (SOC) depletion during trips when compared to existing PEV charging schemes.


Accepted version. IEEE Transactions on Vehicular Technology, Vol. 69, No. 11 (23 September 2020): 12609-12620. DOI. © 2020 Institute of Electrical and Electronic Engineers (IEEE). Used with permission.

gao_14432acc.docx (509 kB)
ADA Accessible Version