Anna Nguyen, Carl Mario Britto, Ines Mezerreg, Jad Osseiran, Clement Ruin, Uday Kumar R K (UC Berkeley)
This project aims at forecasting a metal price listed in the commodity market for the next 10 weeks based on the historical price. We have used Mean Absolute Percentage Error (MAPE) to evaluate the results. Among the different models we used, the Linear Regression (MAPE 1.9%) and the traditional times series forecasting method ARIMA (MAPE 2.4%) fared better than XGBoost (MAPE 3.1%) and the deep learning GRU encoder-decoder (MAPE 7.1%).
The data given by our industry partner DeepVu contained about 140 features, with day wise granularity. DeepVu expected the forecast of nominal values for next 10 weeks. One fundamental step was therefore pre-processing the data to the granularity from day-wise to weeks-wise. The next important part is feature selection. Indeed, the performance of the methods described later heavily relies on how we select and clean our features.
During the pre-processing part, we noticed that there were missing values of data for certain days and for entire weeks. We calculated the mean of every week to reduce granularity. We then used forward-backward filling to account for cases where an entire week was missing.
We set a threshold of 20 features to be kept for the modeling part and used two methods to select those features: Correlated features (eliminate those which were highly correlated to each other) and PCA (used to reduce dimensions using orthogonal projections in the features space).
Then, We used four different models for this time-series prediction task: ARIMA, Linear Regression, XGBoost and a GRU encoder-decoder.