Here in the proposed system reinforcement
learning is used as a machine learning algorithm. Reinforcement learning is
different from either of these learning approaches. It sides a more with the
supervised end of the spectrum. In reinforcement learning, the mapping of state
to action is learned through an aggregate reward for its actions. The mapping
takes place online, through a balance of exploration and exploitation. IBM
applied this reinforcement learning to IBM Watson, a question-and-answer system
which understands natural language and can respond in natural language. The
mathematical framework for defining a solution in reinforcement learning
scenario is Markov Decision Process (MDP). The implementation of the
reinforcement learning is done using Deep Q-Learning algorithm is described in
Table 3: Deep Q-Learning algorithm.
Q values to 0 for all states
action value for terminal states as 0
user login do initialize state s
(s is in S)
a ß action for s derived
take action a, observe r, s?
Q(s,a) ß Q(s,a) + ?r + ? max?Q(s?,a?)-Q(s,a)
s ß s?
The following steps describes
the Q learning algorithm in detail.
Initialize the Q-Values table ‘Q(s, a)’
and action value for terminal states to 0.
the current state (s) which is the
intent and supporting entities questions.
an action (a) for that state based
on mapping the states with the energy pattern.
4. Take the
action, and observe the reward (r)
and the new state (s’).
Update the Value for the state in table using
the observed reward.
6. Set the
state to the new state, and repeat the process until a terminal state is
The proposed system uses the Q learning
algorithm which consist of agent, states and set of actions. The agent senses a
finite set S of distinct states and has a finite set A of distinct actions. For
each user, the agent senses the state (s), chooses a relevant action (a), and
executes it. This algorithm finds the Q-value for a given state (s) and action
(a). It maintains the current reward (r) plus the maximum discount factor (?)
future reward expected according to our own table for the next state (s?).
In the proposed work state (s) represents the
questions related to the intents (fans, lights, 5 amp sockets or 15 amp socket)
and its corresponding factors that we considered in improving energy efficiency
named as entities. Action (a) represents the answers for the question which is
the energy plan values.
Q(st,at) ß Q(st,at) +
? r + ? . max Q(st+1,a)
reward discount estimate of optimal
is the learning rate (0< ?<1). It determines to what extent the newly acquired information will overrule the old information. If the factor is 0 it will make the agent not learn anything, while if the factor is 1 it would make the agent consider only the most recent information. The discount variable helps us to decide how important the possible future rewards are compared to the present reward. This discount factor is calculated because the need of the energy may change in future based on certain factors which could affect the current energy plan to an infinite extends. So, to handle those imbalances the discount factor is added. In order to find the optimal Q function, reward is summed with the discounted maximum future expected reward. To find this optimal value (?) takes the values between 0.7 and 0.99. This way of updating table slowly begins to obtain accurate measurements of the expected future reward. It helps the system to provide the efficient energy plan which can withhold for a long period without any changes required. Figure 3: Reinforcement learning DeepQA is an architecture with an associative methodology. The DeepQA has the following underlying principles. They are massive parallelism, many experts, pervasive confidence estimation, and integration of shallow and deep knowledge. Massive parallelism: Attains massive parallelism in the consideration of multiple perceptions and hypotheses. Many experts: Aids in the integration, application, and contextual evaluation of a wide range of loosely coupled probabilistic question and subject analytics. Pervasive confidence estimation: No component accepts to an answer; all of the components produce features and associated confidences, scoring different question and content clarifications. Integrate shallow and deep knowledge: Balance the use of strict linguistic and shallow linguistic, leveraging many loosely formed ontologies. By using the above machine learning algorithm the system takes the data from the end user. It trains the bot using this self-learning algorithm and make it capable to come out with all the possible answers for the questions asked by the end users. 3. Results and discussion The electricity consumption of India households has been increasing rapidly in the past decades. It is projected to continue its upward trend. Based on a sound understanding of the factors which the proposed system considered helps to identify the efficient energy pattern for different needs of the end users which helps to come out with patterns for suggesting which in turn reduces the energy wastage and dissipation of energy. Figure 4: Energy Consumption Growth in India These measures can target macro-level factors such as technological developments, regulations, cultural and social norms. They can target micro-level factors such as individual decision-making of households for energy efficiency and conservation. The Chat bot is used as a medium to gather the need of the individual end users which is used for mapping the factors to the energy pattern.