ABSTRACT:

In this paper, we used
a hybrid method based on wavelet transforms and ARIMA models and applied on the
time series annual data of rain precipitation in Erbil Province-Iraq in
millimeters which represents a sample size (45) observations during the period
1970 and 2014.We aimed to describe how the hybrid method can be used in time
series forecasting and enhance the forecasting quality through presenting and
applying it on real data and make a comparison between the classical ARIMA
method and our suggested method depending on some statistical criteria. Results
of the study proved an advantage of the statistical hybrid method and showed
that the forecast error could be reduced when using Wavelet-ARIMA method and
this leads to enhance the classical model in forecasting. Furthermore, it was
found that out of wavelet families, Daubechies wavelet of order two using fixed
form thresholding with soft function is very suitable when de-noising the data
and performed better than the others. The annual rainfall in Erbil in the
coming years will be close to 370 millimeters.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

KEYWORDS:
Forecasting, Time series,
ARIMA, Wavelet transforms, De-noising.

1.1    
Introduction

Rainfall forecasting
is one of the most challenging objects. Many algorithms have been developed and
proposed but still accurate prediction of rainfall is very difficult. (Tantanee
et al., 2005), presented a new method for rainfall prediction by using a combination
of wavelet analysis and conventional autoregressive AR model. The research
showed that the wavelet autoregressive model process provides a better annual
rainfall prediction than the simple AR model. (Al-Safawi et al., 2009) have
estimated the autoregressive model using wave shrink. The results showed that
the suitable model using classical ARIMA method is AR(6) and this model has
improved when using wave shrink technique and especially when using Haar
wavelet with a soft threshold to forecast the quantity of the annual rainfall
in Erbil city for the period 1992-2007. (Al-Shakarchy, 2010) applied the factor
analysis for forecasting two series representing rain rates and relative
humidity in Mosul province. Results showed that the suitable model for the two
series is ARIMA(0,0,1) and ARIMA(1,0,0) respectively. (Ali, 2013) used ARIMA
method for analyzing and forecasting of Baghdad rainfall. It is found that the
seasonal ARIMA model of the orders SARIMA(2,1,3)x(0,1,1) is the best model and
according to this model, rainfall forecast for the next years was also done and
showing similar trend and extent of the original data. (Venkata
Ramana et al., 2013) studied to find an appropriate method for monthly
rainfall data prediction by combining the wavelet technique with artificial
neural network ANN. The study Indicated that the performances of wavelet neural
network models are more effective than the ANN models. (Shoba and Shobha, 2014)
have made an analysis of various algorithms of data mining used for rainfall
prediction model. The study showed that sometimes when certain algorithms are
combined, they perform better and are more effective. (Eni and Adeyeye, 2015) applied
seasonal ARIMA method for building a suitable model and for forecasting the
rainfall in Warri Town, Nigeria. Results showed that seasonal ARIMA (1, 1, 1)
(0, 1, 1) model is appropriate depending on some statistical criteria.

Recently, (Shafaei et
al., 2016) offered some techniques for testing their capability of predicting
the monthly precipitation such as wavelet analysis WA, seasonally mixed model
SARIMA and ANN method which represents the artificial neural network. The study
concluded that searching for the effect of decomposition level on model
performance, it was indicated that going from 2 to 3 decomposition levels
increased the correlation between observed and estimated data, but no
significant difference was found between predictions from 2 and 3 level models.
(Ramesh Reddy et al., 2017) applied ARIMA model to forecast the monthly mean rainfall
of coastal Andhra -India. They found that the best model for fitting data is
ARIMA (5,0,0)(2,0,0) depending on some performance criteria. (Ashley et al.,
2017) applied DCT presenting the discrete cosine transform and DWT presenting
discrete wavelet transform to make a reduction in the 5 dimensionalities of
rainfall time series observations. The results of the analysis demonstrated
that the DWT is superior to the DCT and best preserves and characterizes the
observed rainfall data records.

From the above-suggested
methods, we observe that most of these approaches and models are limited to
short period forecasts. This paper introduces a new technique for forecasting
the long-range of annual rainfall data. In another word, it mainly deals with
combining wavelet transformation with classical ARIMA methodology for modeling
of annual rain precipitation based on the available data. The remainder of this
paper is prepared as the following: Section 2 gives brief concepts of ARIMA
methodology and wavelet transformation and then offers the hybrid method.
Section 3 deals with an application on real data. In section 4, conclusions are
presented.   

 

2. ARIMA
Methodology, Wavelet Transformation, and Hybrid Method

 

2.1 ARIMA
Methodology

 

Box-Jenkins, suggested
an approach for analyzing time series data including model identification,
parameters estimation diagnostic checking for the identified model, and
applying the model in forecasting purposes. ARIMA model is a mixed model which
depends on parameters p, d, q representing a combination of autoregressive
order part (AR); the degree of difference involved and the moving average order
part (MA) respectively. The model becomes popular by (Box et al., 1970) and can
be expressed through the following mathematical formula:

 

 

Where p is a non-seasonal
autoregressive order, q is a non-seasonal moving average order, are called autoregressive coefficients,  are moving average coefficients and stand for the random
error. If the data is non-stationary, first or second order of differencing is
depended. For obtaining the convenient model, we will depend on Autocorrelation
Function ACF and Partial Autocorrelation Function PACF. The pattern of the
ACF/PACF plot gives us an idea towards which model could be the best fit for
making a prediction and depending on some statistical performance. Also, we
will use the statistic called Portmanteau test (i;e. Box-Pierce) for the
randomness of time series. We refer the reader to (Makridakis et al., 1998) for
more details.

 

2.2 Wavelet Transformation

 

A wavelet transformation is a proceeding tool in signal processing
that has been very interest since its theoretical development (Grossman and Morlet, 1984). applications of wavelet analysis have increased in many fields
such as in communications, image processing, optical engineering, and time
series applications as alternate to the Fourier transformation in maintaining
local, not involving periodic and multi-scaled phenomenon. The difference
between wavelets and Fourier transforms is that wavelets can give the specific
locality of any changes in the dynamical patterns of the sequence, whereas the
Fourier transforms focus mainly on their frequency. in addition, Fourier
transform supposes infinite length signals, whereas wavelet transforms can be
used to any kind and any size of time series, even when these time series are
not sampled homogeneously (Antonios
and Constantine, 2003). Generally, wavelet transforms
can be used to seek, denoise and filter time series data which help and also
support forecasting and other analysis of the experiment. The formula of
wavelet transform can be presented as the following:

 

 

where
?(t) represents the essential wavelet with efficient length (t) that is
commonly much shorter than the target time series f(t), ‘a’ represents the
scale factor or dilation that specifies the information of characteristic
frequency so that its variation yields increase to a spectrum and ‘b’
represents the translation in time information so, its difference displays the
‘sliding’ of the wavelet over f(t)
(Burrus et al., 1998).

 

2.3 Hybrid Method

 

The concept of the suggested method is
based on combining ARIMA methodology with wavelet transforms. As the wavelet
approach can be easily used for signal analysis, this study used the approach
to decompose the details (which are small differences) from the approximations
(which represents the important part) of rain data. In wavelet analysis, the
approximations are the high-scale and limited frequency components of the
signal, and the details represent the limited-scale and high-frequency
components (Fugal, 2009). The process is done by applying discrete wavelet transform DWT
because the rain data is recorded in discrete time.  Figure 1, shows the hybrid technique.

 

 

Figure 1. The process of hybrid method

 

3. Application

3.1 Information About
the City

 

Erbil which is the central Kurdish,  is the capital
city of Kurdistan Region in Iraq. The city of
Erbil is located between (36°12?17?N 44°20?33?E). It is located about 350 kilometres
north of Baghdad.
The climate of Erbil is very hot in summer and very cold and
wet in winters. There is more rainfall in the winter than in the summer in Erbil.
The average total of receiving rain of the city is
between 300-400 millimetres annually. The city represents the managerial
centre of Erbil province. It is bounded from the north by Turkey and nearby
Dohuk Province, from the east by Iran and near to Sulaymaniyah Province, from
the south, is close to Kirkuk province, and from the west by Mosul province (Wahab and Khayyat, 2014).

 

3.2 Application Using
ARIMA Methodology

 

The variable used in
the analysis represents the annual data of rain precipitation in Erbil province
in Kurdistan Region of Iraq (in millimeters) and represents a sample size (45)
observations from 1970 to 2014 which is shown in table 1. The data were
obtained from the General Directorate of Meteorology and Seismic Monitoring in
Erbil province.

 

Table 1: Annually data on rain precipitation from 1970
to 2014

Year

Amount of Rain

Year

Amount of Rain

1970

255.4

1993

601.6

1971

448.2

1994

583.0

1972

406.4

1995

494.4

1973

261.5

1996

418.9

1974

547.5

1997

441.6

1975

417.2

1998

337.2

Continue table1:

Year

Amount of Rain

Year

Amount of Rain

1976

452.3

1999

229.2

1977

347.2

2000

272.3

1978

380.1

2001

330.9

1979

375.6

2002

361.5

1980

321.5

2003

587.7

1981

141.8

2004

255.6

1982

444.1

2005

297.5

1983

178.3

2006

514.6

1984

43.9

2007

273.4

1985

463.9

2008

410.7

1986

154.0

2009

411.0

1987

235.9

2010

359.6

1988

626.9

2011

301.6

1989

367.3

2012

366.4

1990

332.0

2013

345.2

1991

344.1

2014

385.2

1992

694.0

 

 

 

Time series plots of rain data for Erbil
region is shown in Figure 2. Based on Box-Jenkins methodology, the first step
to do is identification through employing the autocorrelation function ACF and
partial autocorrelation function PACF plots which are clear in figure 3.

Figure
2: Time
series plot of rain data in Erbil province from 1970 to 2014

 

Depending on PACF and PACF plots and
checking for stationarity in mean and variance, the appropriate model for the
respected series is identified as ARIMA(2,1,0) after careful consideration of
modelling and fitting and depending on two performance measures such as root
mean square error RMSE and mean absolute error MAE. Table 2 shows the estimated
model.

 

 

Figure 3: Autocorrelation function and partial
autocorrelation function of rain data

 

Table
2: Estimation
of ARIMA(2,1,0)

Parameter

Estimates

Std. Error

t-ratio

P-value

AR(1)

-0.72091

0.129125

-5.58304

0.000002

AR(2)

-0.540025

0.128616

-4.19875

0.000136

 

After getting the estimation of the ARIMA
(2,1,0) model, we should check for obtaining randomness. Figure4 presents the ACF
and PACF of residuals using ARIMA (2,1,0) on series data.

Figure 4: ACF and PACF of residuals using ARIMA(2,1,0) on series data.

From Figure 4, none of the autocorrelations
coefficients of ACF and PACF are significant, which concluding that the time series
may well is completely random (i.e.; white noise). Also, we did a test for
randomness of residuals using a Portmanteau test (or Box-Pierce test), which
has been mentioned in the theoretical section. The value of the test statistics
was (7.326) and the P-value was (0.835) indicating that we cannot reject the
hypothesis that at the 95% or higher confidence level the series is random.

 

3.3 Application Using
a Hybrid Method

 

In this part, the original data will be
converted from time domain to frequency domain to make filtration. Figure 5
shows wavelet analysis using Daubechies wavelet with
five levels multiresolution of the rain precipitation for 45 sequential
observations, where s is the signal and it is equal to the summation of its
approximation and details, a5 is an approximation at level 5 and d5; d4; d3;
d2; d1 is the details at level 5,4,3,2 and 1.

 

 

Figure 5: wavelet analysis using Daubechies wavelet with
five levels multiresolution of the rain precipitation

 

The
original data of rain precipitation denoised using wavelet denoising procedure
mentioned in theoretical section (using MATLAB software, version 2013) with
Daubechies wavelet family of order 2,3,4, and 5 as shown in figure 6. It is
necessary here to say that after making many empirical experiments with many
wavelet families, it has been found that Daubechies wavelet performs better
than others in terms of de-noising the rain data. Figure 7 shows the original and
de-noised signals using the Daubechies wavelet with Fixed Form Threshold (Patil and Raskar, 2015).

Figure 6: Daubechies wavelet of order 2,3,4, and 5

 

Figure 7: The original and de-noised signals using Daubechies wavelet with Fixed
Form Threshold.

 

The data was first analysed for five
multi-resolution levels for the selected wavelet, and de-noised using Fixed
Form Threshold with soft thresholding. Then, the new series was modelled again
using ARIMA methodology. Also, the forecasting criteria were calculated and
compared with those in the first method. Table 3 summarizes the performance of
the two indicators of selecting an optimal model for the original data model
using ARIMA method and hybrid method.

 

Table 3: The performance measures for the original data
model using classical ARIMA methodology and hybrid method.

Method

Kind

RMSE

MAE

Classical
ARIMA Method
 
Original data

ARIMA(2,1,0)

133.937

106.565

Hybrid Method
 
Fixed Form
 
De-noised data

Daubechies(2)

131.380

104.143

Daubechies(3)

131.555

104.553

Daubechies(4)

131.593

104.411

Daubechies(5)

131.706

104.546

 

From Table 3, we observe that the best
estimation model for the original data after careful modelling and fitting was
ARIMA(2,1,0). However, when hybrid method based on wavelet de-noising applied
to the original data the forecasting errors have decreased for all wavelet
orders and the new models have been improved depending on the forecasting
measures. To make a comparison of the two procedures, we can see that the
reduction is maximum when applying Fixed Form Thresholding and use Daubechies
wavelet of order 2 (i.e.; note from the Table 3 good reduction in RMSE and MAE
from 133.937 to 131.380 and from 106.565 to 104.143, respectively). Figure 8
presents the original and filtered data using Daubechies wavelet of order 2:

Figure 8: The original and
filtered signals using Daubechies wavelet of order 2

The forecast values of our hybrid method
are presented in table 4 which shows the forecasting for the next years
starting from 2015 up to 2030 of the annual rain precipitation (in millimetres)
of Erbil province – Iraq.

 

Table 4: Forecast values of the
annual rain of Erbil province-Iraq using hybrid method

 

Forecast

Period

367.8

2015

360.3

2016

373.5

2017

368.1

2018

364.9

2019

370.1

2020

368.1

2021

366.7

2022

368.8

2023

368.0

2024

367.4

2025

368.3

2026

368.0

2027

367.7

2028

368.1

2029

368.0

2030

4. Conclusions

 

In this paper, we
suggested a hybrid method for improving the Box-Jenkins ARIMA methodology when
forecasting time series data. Indeed, we concluded that

1- The appropriate model for forecasting using
classical Box – Jenkins method was ARIMA(2,1,0).

2- The classical model has been enhanced and
improved when filtering the data and using Daubechies wavelets of order 2,3,4,
and 5 and among them, the Daubechies wavelet of order 2 achieved better than
others.

3- Depending on our
hybrid method to forecast for the coming years, the Erbil city will receive an
average total rainfall of 360-370 millimeters annually.

 

References

 

Ali S.M. (2013). Time series analysis of
Baghdad rainfall using ARIMA method, Iraqi Journal of Science,54, 1136-1142.

Al-Safawi S., Ali T., & Badal M. (2009). Estimation
AR(p) model using wave shrink, Second Scientific Conference of Mathematics
– Statistics and Informatics, University of Mosul, 274-299.

Al-Shakarchy DH. (2010). Using factor
analysis to forecast of time series with an application on two series rain
rates and relative humidity in Mosul city, Tikrit Journal of Administrative
and Economic Sciences, 6, 93-108.

Antonios A., & Constantine E.V. (2003).
Wavelet exploratory analysis of the FTSE ALL SHARE index. In Proceedings of
the 2nd WSEAS international conference on non-linear analysis. Non-linear
systems and Chaos, Athens, 1-13.

Ashley W., Walker J.
P., Robertson D. E., & Pauwels V. R.N. (2017). A Comparison of the discrete cosine and wavelet
transforms for hydrologic model input data reduction, Journal of
Hydrology and Earth System Sciences, 3, 1-23.

Box G., Jenkins G., & Reinsel G. (2008). Time
series analysis: Forecasting and control, third edition, Prentice-Hall
International Inc., New Jersey, USA.

Burrus C., Gopinath R., & Guo H., (1998). Introduction
to wavelet and wavelet transforms, Prentice Hall, New Jersey, USA.

Eni D., & Adeyeye F. (2015). Seasonal
ARIMA modeling and forecasting of rainfall in Warri Town, Nigeria, Journal
of Geoscience and Environment Protection, 3, 91-98.

Fugal D. (2009). Conceptual wavelets in
digital signal processing, Space and Signals Technologies LLC, San Diego,
California.

Grossman, A. & Morlet, J., (1984). Decomposition
of Hardy functions into square integrable wavelets of constant shape, SIAM,
Journal of Mathematical Analysis, 15, 723-736.

Makridakis S., Wheelwright S., & Hyndman R.
(1998). Forecasting methods and applications, Third edition, Wiley&
Sons, Inc, New York.

Patil P. L., & Raskar V. B., (2015). Image
denoising with wavelet thresholding method for different level of decomposition,
International Journal of Engineering Research and General Science, 3,
1092-1099.

Ramesh Reddy J. C., Ganesh T., Venkateswaran
M., & Reddy P. (2017). Forecasting of monthly mean rainfall in Coastal
Andhra, International Journal of Statistics and Applications, 7, 197-204.

Shafaei M., Adamowski J., Fakheri-Fard A.,
Dinpashoh Y., & Adamowski K. (2016). A wavelet-SARIMA-ANN hybrid
model for precipitation forecasting, Journal of Water and Land Development,
28, 27-36.

Shoba G., & Shobha G. (2014). Rainfall
prediction using data mining techniques: A survey, International Journal of
Engineering and Computer Science, 3, 6206-6211.

Tantanee S., Patamatammakul S., Oki T., Sriboonlue
V., & Prempree T. (2005). Coupled wavelet-autoregressive model for
annual rainfall prediction, Journal of Environmental Hydrology,13,
1-8.                                                                                                                                                                                                                                                                    

Venkata Ramana R. Krishna S., Kumar R., & Pandey
N. G. (2013). Monthly rainfall prediction using wavelet neural network
analysis, Springer, Water Resource Manage, 27, 3697–3711.

Wahab S., &
Khayyat A. (2014). Modeling
the suitability analysis to establish new fire stations in Erbil City using the
analytic hierarchy process and geographic information systems, Journal of
Remote Sensing and GIS, 2, 1-10.

Author