Machine learning > Time Series Forecasting > Time Series Analysis > Prophet
Time Series Forecasting with Prophet
Learn how to use Facebook's Prophet library for time series forecasting. This tutorial covers installation, data preparation, model building, evaluation, and practical applications.
Introduction to Prophet
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonality effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
Installation
Install the Prophet library using pip. Make sure you have Python and pip installed on your system.
pip install prophet
Data Preparation
Prophet requires the input data to be in a specific format. The time column must be named 'ds' (datetime), and the value column must be named 'y'. The code snippet demonstrates how to load data from a CSV file, rename the columns accordingly, and convert the 'ds' column to datetime objects. Note: Replace 'example_data.csv' with your actual data file path and 'Date' and 'Value' with the correct column names.
import pandas as pd
from prophet import Prophet
# Load the data
df = pd.read_csv('example_data.csv')
# Rename columns to 'ds' (datetime) and 'y' (value)
df.rename(columns={'Date': 'ds', 'Value': 'y'}, inplace=True)
# Convert 'ds' to datetime objects
df['ds'] = pd.to_datetime(df['ds'])
print(df.head())
Model Building
Create a Prophet model instance and fit it to your historical data. The fit()
method trains the model using the provided time series data.
model = Prophet()
model.fit(df)
Making Predictions
To make predictions, first create a dataframe that contains the dates for which you want to forecast. The make_future_dataframe()
method generates a dataframe with the specified number of periods (days in this case) into the future. Then, use the predict()
method to generate the forecast. The forecast dataframe includes columns for the predicted values (yhat), lower bound (yhat_lower), and upper bound (yhat_upper).
# Create a future dataframe for predictions
future = model.make_future_dataframe(periods=365)
# Make predictions
forecast = model.predict(future)
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
Visualizing the Forecast
Prophet provides built-in plotting functions to visualize the forecast and its components (trend, yearly seasonality, weekly seasonality). The plot()
method displays the forecast along with the historical data. The plot_components()
method shows the individual components of the forecast.
# Plot the forecast
fig1 = model.plot(forecast)
# Plot the components of the forecast (trend, seasonality)
fig2 = model.plot_components(forecast)
Evaluating Model Performance
To evaluate the model's performance, use cross-validation. The cross_validation
function splits the historical data into training and validation sets. The initial
parameter specifies the initial training period, the period
parameter specifies the spacing between cutoff dates, and the horizon
parameter specifies the forecast horizon. Then, calculate performance metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) using the performance_metrics
function.
from prophet.diagnostics import cross_validation
from prophet.diagnostics import performance_metrics
# Perform cross-validation
df_cv = cross_validation(model, initial='730 days', period='180 days', horizon = '365 days')
# Calculate performance metrics
df_p = performance_metrics(df_cv)
print(df_p.head())
Adding Seasonality
Prophet automatically detects yearly and weekly seasonality. However, you can add custom seasonality patterns, like monthly seasonality, by using the add_seasonality()
method. The period
parameter specifies the length of the seasonality cycle, and the fourier_order
parameter controls the flexibility of the seasonality curve.
model = Prophet(weekly_seasonality=False, yearly_seasonality=False)
model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
model.fit(df)
Adding Holidays
Prophet can also incorporate the effects of holidays on the time series. Create a dataframe containing the dates and names of the holidays, along with optional lower and upper window parameters to capture effects before and after the holiday. Pass this dataframe to the Prophet
constructor using the holidays
parameter.
# Create a dataframe of holidays
holidays = pd.DataFrame({
'holiday': 'new_year',
'ds': pd.to_datetime(['2017-01-01', '2018-01-01', '2019-01-01']),
'lower_window': 0,
'upper_window': 0,
})
# Initialize Prophet with the holidays
model = Prophet(holidays=holidays)
model.fit(df)
Concepts Behind the Snippet
Prophet leverages a decomposable time series model with three main components: trend, seasonality, and holidays. The trend component models long-term changes in the data. Seasonality captures recurring patterns, and holidays account for irregular events that impact the time series.
Real-Life Use Case Section
Retail Sales Forecasting: Predicting future sales based on historical data, taking into account seasonality (e.g., holiday shopping seasons) and promotional events. This allows retailers to optimize inventory management and staffing levels. Demand Forecasting for Energy: Predicting energy demand to optimize power generation and distribution, considering factors like weather patterns (temperature-dependent usage) and time of day. Website Traffic Forecasting: Predicting website traffic to plan server capacity, marketing campaigns, and content releases, accounting for weekly and yearly patterns and special events.
Best Practices
Data Cleaning: Ensure your time series data is clean and free of outliers. Handle missing values appropriately (e.g., interpolation or removal).
Feature Engineering: Consider adding relevant external regressors to improve the model's accuracy (e.g., weather data, economic indicators).
Parameter Tuning: Experiment with different Prophet parameters (e.g., seasonality strength, changepoint prior scale) to optimize model performance.
Cross-Validation: Use cross-validation to rigorously evaluate the model's performance and avoid overfitting to the training data.
Interview Tip
When discussing Prophet in an interview, emphasize its strengths in handling seasonality and holiday effects. Be prepared to explain the underlying model and its components. Also, be ready to discuss scenarios where Prophet may not be the best choice (e.g., time series with complex dependencies or short time horizons).
When to Use Them
Use Prophet when you have time series data with strong seasonality and/or holiday effects, and when you need a relatively easy-to-use and automated forecasting tool. It is particularly well-suited for business forecasting problems.
Memory Footprint
Prophet's memory footprint depends on the size of the input data and the complexity of the model. For large datasets, consider downsampling or using a smaller number of changepoints to reduce memory usage.
Alternatives
ARIMA: Autoregressive Integrated Moving Average models. Suitable for stationary time series data.
SARIMA: Seasonal ARIMA models. Extend ARIMA to handle seasonality.
Exponential Smoothing: Methods like Holt-Winters, suitable for time series with trend and seasonality.
Deep Learning (LSTM): Long Short-Term Memory networks. Can handle complex time series patterns, but require more data and computational resources.
Pros
Easy to use: Simple API and automatic handling of many common time series characteristics.
Robust to missing data and outliers: Can handle missing values and outliers relatively well.
Interpretable: Provides insights into trend, seasonality, and holiday effects.
Automatic seasonality detection: Automatically detects yearly and weekly seasonality.
Cons
Limited ability to model complex dependencies: May not perform well when time series are highly dependent on external factors that are not included in the model.
Requires sufficient historical data: Needs several seasons of historical data to accurately model seasonality.
Can be less accurate than more complex models: May not achieve the same level of accuracy as more sophisticated models like deep learning methods in some cases.
FAQ
-
What data format does Prophet require?
Prophet requires a pandas DataFrame with two columns: 'ds' (datetime) and 'y' (numeric value). -
How does Prophet handle missing data?
Prophet is robust to missing data. It will estimate the missing values during the fitting process. -
How can I add custom seasonality to Prophet?
Use theadd_seasonality()
method to add custom seasonality patterns, specifying the period and Fourier order. -
How do I evaluate the performance of my Prophet model?
Use thecross_validation
andperformance_metrics
functions from theprophet.diagnostics
module. -
How does the changepoint_prior_scale parameter affect the model?
Thechangepoint_prior_scale
parameter controls the flexibility of the trend. Higher values allow the trend to change more frequently, while lower values constrain the trend to be more linear.