SynErgie | Prognose-Service¶

Der Prognose-Service wurde in SynErgie vom Institut für Produktionsmanagement, Technologie und Werkzeugmaschinen (PTW) der TU Darmstadt entwickelt, um Ingenieur*innen die Erstellung von Prognose-Modellen der Nutzenergiebedarfe (Wärme, Kälte, Strom) in Produktionsbereichen zu erleichtern. Mit dem Service können die erstellten Modelle zusätzlich in Echtzeitumgebungen eingesetzt werden. Damit wird die prädiktive Regelung der Energieversorgungssysteme für den energieflexiblen Fabrikbetrieb unterstützt. Der Service soll Forschungspartner*innen und Unternehmen dabei unterstützen, den energieflexiblen Betrieb umzusetzen.

Der Service besteht, wie in Abbildung 1 gezeigt, aus den Schritten Modellierung (Teil 1) und Deployment (Teil 2). Die Nutzenergiebedarfsprognosen werden als Eingangsgröße für die modell-prädiktive Regelung der Produktionsinfrastruktur benötigt und können darüber hinaus für weitere Anwendungsfälle genutzt werden.

Die Einbettung des Prognose-Service in den Regelkreis der energieflexiblen Fabrik wird in Abbildung 1 aufgezeigt. Die Prognose bildet darin einen integralen Bestandteil, um die Energieversorgungssysteme vorausschauend betreiben zu können. Typische Eingangsgrößen für die Prognosemodelle sind Messdaten (Energiebedarf der letzten Zeit), Produktionsplandaten (geplante Fertigungsaufträge, Fabrikauslastung) und Wetterdaten (Wetterprognose des nächsten Tags). Mithilfe von aufgenommenen Daten wird im Schritt Modellierung ein Machine-Learning-Modell antrainiert. Dieses liefert dann nach dem Deployment im Live-Betrieb Prognosen für den Lastgang des Produktionssystems bis zu 24h in die Zukunft.

Abbildung 1: Einbettung des Prognose-Service in den Regelkreis der energieflexiblen Fabrik

Das Jupyter Notebook unten ist eine kurze Demonstration des Modellierungsteils des Services. Aus Sicherheitsgründen wird die Demonstration statisch zur Verfügung gestellt. Falls Sie SynErgie-Partner sind und den Service mit eigenen Daten testen möchten, wenden Sie sich bitte an die verantwortliche Person. Diese, sowie die Dokumentation des Services, findet sich unter folgendem Link: Dokumentation

Ein Video der Demonstration des Teils Deployment ist auf Youtube zu finden.

Video Prognose-Service

Im folgenden Video wird die Nutzung des Jupyter Notebooks erläutert. Unten können Sie in den Code-Zellen folgen.

Jupyter Notebook

Imports¶

In [2]:
# Imports
from sys_tool import SystemidentificationTool
import pandas as pd
import numpy as np
from datetime import datetime

Import dataset¶

To insert a dataset, navigate to your preferred location in jupyter, optionally add a folder and upload your data there.

Here, we added the example data in the directory prognoseservice_example/data/example_data.csv

In [3]:
# Import dataset

data_path = 'data/example_data.csv' # Relative path from notebook directory
data = pd.read_csv(data_path, index_col=0, parse_dates=True) # Specify index_col and parse_dates 
# to set Datetime-Index automatically. See pandas-documentation for more options, e.g. separator or decimal separator.

Visualize dataset¶

The data must be of type pandas.DataFrame with DatetimeIndex, following the format of this example

In [4]:
# Visualize dataset

data.head()
Out[4]:
Pu_Mgn3.425.ThHy_T_VL Ort_AU.WS.Atmo_T_Luft Wetterprognose_Temperatur thermal_power
Time (with timezone)
2020-10-30 00:00:00+01:00 28.660971 10.311627 10.837501 4381.777003
2020-10-30 00:15:00+01:00 31.782938 10.433481 10.912500 8830.517407
2020-10-30 00:30:00+01:00 37.977928 10.480000 10.987500 16571.353580
2020-10-30 00:45:00+01:00 32.884285 10.506665 11.062500 7840.641652
2020-10-30 01:00:00+01:00 30.151451 10.471549 11.125000 4688.792202
In [5]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 11807 entries, 2020-10-30 00:00:00+01:00 to 2021-03-01 23:30:00+01:00
Data columns (total 4 columns):
 #   Column                     Non-Null Count  Dtype
---  ------                     --------------  -----
 0   Pu_Mgn3.425.ThHy_T_VL      11784 non-null  float64
 1   Ort_AU.WS.Atmo_T_Luft      11784 non-null  float64
 2   Wetterprognose_Temperatur  11807 non-null  float64
 3   thermal_power              11784 non-null  float64
dtypes: float64(4)
memory usage: 461.2 KB

Parametrize the service¶

Define parameters like target_name, forecasting_horizon, search_space

In this example, just the most important parameters are set. For a complete list of usable parameters, refer to the user documentation (--> Link).

In [6]:
# Import dataclasses for parametetrization

from sys_tool.default_data_classes.general import General
from sys_tool.default_data_classes.eval_pipeline import EvalPipeline
from sys_tool.default_data_classes.feature_preprocessing import FeaturePreprocessing
from sys_tool.default_data_classes.target import Target
from sys_tool.default_data_classes.search_space import SearchSpace
from sys_tool.default_data_classes.post_processing import PostProcessing
In [15]:
# Most important parameters

# Target
target_name = 'thermal_power'  # Define which column of data should be forecasted
forecasting_horizon = [1, 2, 3, 4]  # Define which time steps should be forecasted (must be of type list)

# Hyperparameter optimization
evaluations = 100  # Define how many evaluations the hyperparameter optimizer should run, e.g. 100
test_set_size = 0.2  # Define the fraction of data used for evaluating the final model
cv_splits = 5  # Define how many cross-validation splits should be used

# Feature engineering
n_time_steps = 4  # Number of past time steps to include in the input data
moving_average_windows = [4]  # Size of moving average windows to include (as list[int])

# Search space
max_n_features = 40  # Maximum number of features to be selected
pca_n_components = 0.9  # Parameter n_components of PCA

General parameters¶

In [16]:
from eta_ml_lib.hyperparameter_optimization import BayesOptimizer
In [17]:
general_params = General(
            result_path="results", # Where results should be saved
            hyperparam_optimizer=BayesOptimizer(
                max_evaluations=evaluations, # How many evaluations to run the hyperparameter optimization
            ),
        )

Evaluation pipeline parameters¶

In [18]:
from sklearn.model_selection import TimeSeriesSplit
In [19]:
eval_pipeline_params = EvalPipeline(
            cvSplit=TimeSeriesSplit(n_splits=cv_splits), # Cross-validation splitter, must be sklearn type (e.g. TimeSeriesSplit, KFold, ...)
            test_set_size=test_set_size, # Fracture of the dataset to use as final test set (will not be used during hyperparameter tuning)
        )

Target parameters¶

In [20]:
from sys_tool import MyTargetPreprocessor
from eta_ml_lib.target_preprocessing import MyMultiTimeStepForecastingPreprocessor
from sklearn.impute import KNNImputer
In [21]:
target_params = Target(
            target_preprocessing=MyTargetPreprocessor(
                steps=[
                    (
                        "forecasting_preprocessor",
                        MyMultiTimeStepForecastingPreprocessor(
                            steps_to_predict=forecasting_horizon, # 
                            feature_target_names=[target_name],
                            # TODO variable umbenennen
                        ),
                    ),
                    ("target_imputer", KNNImputer()),
                ]
            ),
        )

Feature engineering parameters¶

In [22]:
from sys_tool import MyFeaturePreprocessor
from eta_ml_lib.preprocessing import MyOutlierFilterHampel
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import PolynomialFeatures
from eta_ml_lib.feature_engineering import (
    MyTimeStep,
    MyMovingAverage,
    MyDateTimeFeatures,
    MyDifferential,
    GausianMixtureModelClassification,
    DummyFeatureEngineering
)
In [25]:
feature_preprocessing_params = FeaturePreprocessing(
    transformer=MyFeaturePreprocessor(
                steps=[  # Include any sklearn transformer as pipeline-step here, if no hyperparameters should be optimized for that step
                    ("imputer", KNNImputer()),  # syntax: ("step_name", Transformer(params)),
                    ("outlier_filter", MyOutlierFilterHampel()),
                    (
                        "feature_engineering",
                        FeatureUnion(
                            transformer_list=[
                                ("polynomial_degree2", PolynomialFeatures(degree=2)),
                                ("time_delay", MyTimeStep(steps=np.arange(1, n_time_steps + 1, dtype=int), transform_features="all")),
                                ("wheather_forecast", MyTimeStep(steps=np.arange(-1, -5, -1, dtype=int), transform_features="Wetterprognose_Temperatur")),
                                ("smoothing", MyMovingAverage(windows=moving_average_windows, transform_features="all")),
                                ("date_time_features", MyDateTimeFeatures(transform_features="all")),
                                ("differential", MyDifferential(differential_list=[1, 2], transform_features="all")),
#                                 (
#                                     "auto_classification",
#                                     GausianMixtureModelClassification(
#                                         time_diff=10, n_components=[2], transform_features="all"
#                                     ),
#                                 ),
                                ("original_features", DummyFeatureEngineering()),
                            ]
                        ),
                    ),
                ]
            ),
)

Modeling and hyperparameter search space¶

In [26]:
from sys_tool.basics import hypercheck  # utility function to check if a transformer is given with the correct hyperparameters
from hyperopt import hp
from hyperopt.pyll.base import scope

# Preprocessing
from sklearn.preprocessing import StandardScaler
from eta_ml_lib.feature_selection import SelectFromModelOwn
from sklearn.ensemble import ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, ExtraTreeRegressor
from sklearn.linear_model import LassoCV, ElasticNet, LinearRegression, Lasso, Ridge
from sklearn.decomposition import PCA

# Modeling
from eta_ml_lib.estimator import KerasRegressorOwn
from eta_ml_lib.estimator import DenseLayer, RecurrentLayer
from sklearn.ensemble import (
    BaggingRegressor,
    RandomForestRegressor
)
from sklearn.neighbors import KNeighborsRegressor
In [27]:
# Model definition

ann = [
    # Feedforward neural network
    hypercheck(
        KerasRegressorOwn,
        hyperparameters={
            "epochs": 1000,
            "loss": "mse",
            "optimizer": "adam",
            "batch_size": hp.quniform("ffn_batch_size", 50, 1000, 2),  # hp.qloguniform("batch_size_ffn", np.log(64), np.log(1024), 2)
            "num_hidden_layer": hp.qlognormal("ffn_layer_num", 0, 0.5, 1),
            "net": hp.choice(
                "ffn_architecture",
                [
                    {i: DenseLayer(i) for i in range(5)},  # Adds 5 optional Dense layers
                ],
            ),
            "output_layer": {
                "kernel_initializer": "random_uniform",
                "out_activation": "linear",
            },
            "callbacks": [
                {
                    "callback_function": "EarlyStopping",
                    "hyperparameters": {
                        "monitor": "loss",
                        "patience": 5,
                        "mode": "min",
                        "verbose": 1,
                        "restore_best_weights": True,
                    },
                }
            ],
        },
        returnKeys=("create_model_function", "hyperparameters"),
    ),
]

rnn = [
    # Recurrent neural network
    hypercheck(
        KerasRegressorOwn,
        hyperparameters={
            "epochs": 1000,
            "loss": "mse",
            "optimizer": "adam",
            "batch_size": hp.quniform("rnn_batch_size", 50, 1000, 2),  # hp.qloguniform("rnn_batch_size", np.log(64), np.log(1024), 2)
            "num_hidden_layer": hp.qlognormal("rnn_layer_num", 0, 0.5, 1),
            "net": hp.choice(
                "rnn_architecture",
                [
                    {i: RecurrentLayer(i) for i in range(3)}  # 3 optional LSTM layers
                ],
            ),
            "output_layer": {
                "kernel_initializer": "random_uniform",
                "out_activation": "linear",
            },
            "max_shift": n_time_steps,
            "callbacks": [
                {
                    "callback_function": "EarlyStopping",
                    "hyperparameters": {
                        "monitor": "loss",
                        "patience": 5,
                        "mode": "min",
                        "verbose": 1,
                        "restore_best_weights": True,
                    },
                }
            ],
        },
        returnKeys=("create_model_function", "hyperparameters"),
    ),
]

ensemble_trees = [
    hypercheck(
        BaggingRegressor,
        hyperparameters={"n_estimators": scope.int(hp.quniform("n_estimators", 5, 40, 5))},
        returnKeys=("create_model_function", "hyperparameters"),
    ),
    hypercheck(
        ExtraTreeRegressor,
        hyperparameters={"max_depth": scope.int(hp.quniform("max_depth", 20, 100, 10))},
        returnKeys=("create_model_function", "hyperparameters"),
    ),
    hypercheck(
        ExtraTreesRegressor,
        hyperparameters={
            "n_estimators": scope.int(hp.loguniform("n_estimators_ext", np.log(20), np.log(1000))),
            "max_depth": scope.int(hp.quniform("max_depth_ext", 10, 100, 1)),
            "n_jobs": -1,
        },
        returnKeys=("create_model_function", "hyperparameters"),
    ),
    hypercheck(
        RandomForestRegressor,
        hyperparameters={
            "n_estimators": scope.int(hp.loguniform("n_estimators_rf", np.log(20), np.log(1000))),
            "max_depth": scope.int(hp.quniform("max_depth_rf", 10, 100, 1)),
            "n_jobs": -1,
        },
        returnKeys=("create_model_function", "hyperparameters"),
    ),
]

decision_tree = [
    hypercheck(
        DecisionTreeRegressor,
        hyperparameters={
            "criterion": hp.choice("criterion", ["friedman_mse", "mse", "mae"]),
            "splitter": hp.choice("splitter", ["best", "random"]),
        },
        returnKeys=("create_model_function", "hyperparameters"),
    ),
]

linear_models = [
    hypercheck(
        ElasticNet,
        hyperparameters={
            "alpha": hp.uniform("alpha", 0.5, 1),
            "l1_ratio": hp.choice("l1_ratio", [0.3, 0.5, 0.8]),
        },
        returnKeys=("create_model_function", "hyperparameters"),
    ),
    hypercheck(
        LinearRegression, hyperparameters={}, returnKeys=("create_model_function", "hyperparameters")
    ),
    hypercheck(
        Lasso,
        hyperparameters={"alpha": hp.choice("alpha1", [0.5, 1])},
        returnKeys=("create_model_function", "hyperparameters"),
    ),
    hypercheck(
        Ridge,
        hyperparameters={"alpha": hp.choice("alpha2", [0.5, 1])},
        returnKeys=("create_model_function", "hyperparameters"),
    ),
]

k_neighbors = [
    hypercheck(
        KNeighborsRegressor,
        hyperparameters={"n_neighbors": scope.int(hp.quniform("n_neighbors", 5, 10, 5))},
        returnKeys=("create_model_function", "hyperparameters"),
    ),
]
In [28]:
search_space = SearchSpace(
    ##########################
    # Preprocessing Pipeline #
    ##########################
    dataset_processes={
        ### Scaling
        "scaler": hypercheck(StandardScaler, {}),
        ###Feature Selection
        "feature_selection": hp.choice(  # Use hyperopt-objects to define steps/parameters that should be varied during optimization
            "feature_selection",
            [
                hypercheck(
                    SelectFromModelOwn,
                    hyperparameters={
                        "estimator_type": hp.choice(
                            "feature_selection_estimatortype",
                            [
                                hypercheck(ExtraTreesRegressor, {}),
                                hypercheck(LassoCV, {}),
                                hypercheck(GradientBoostingRegressor, {}),
                                hypercheck(DecisionTreeRegressor, {}),
                            ],
                        ),
                        "k": hp.quniform("Feature_number", 5, max_n_features, 1),
                        "threshold": -np.inf,
                    },
                ),
                # hypercheck(DummyProcess, {}),
                hypercheck(PCA, {'n_components': pca_n_components}),
            ],
        ),
    },
    ###################
    # Model selection #
    ###################
    multi_target_model = ann + ensemble_trees + linear_models,
    single_target_model = [],
    quantile_regression_model = [],
)

Postprocessing parameters¶

In [29]:
from sys_tool import MyPostprocessor
from eta_ml_lib.postprocessing import MyScorer, Plotter
from eta_ml_lib.postprocessing.my_metrics import root_mean_squared_error, nrmse_quantile_5_95, mean_error, std_error
from sklearn.metrics import mean_absolute_error, r2_score
In [30]:
post_processing_params = PostProcessing(
    post_processor=MyPostprocessor(
        eval_processes=[
            MyScorer(
                metrics={
                    "MAE": {
                        "function_name": mean_absolute_error,
                        "kwargs": {"multioutput": "raw_values"},
                    },
                    "RMSE": {
                        "function_name": root_mean_squared_error,
                        "kwargs": {},
                    },
                    "nRMSE": {
                        "function_name": nrmse_quantile_5_95,
                        "kwargs": {},
                    },
                    "R2-score": {
                        "function_name": r2_score,
                        "kwargs": {"multioutput": "raw_values"},
                    },
                    "mean_error": {
                        "function_name": mean_error,
                        "kwargs": {},
                    },
                    "std_error": {
                        "function_name": std_error,
                        "kwargs": {},
                    },
                }
            ),
            Plotter(),
        ]
    ),
)

Initialize SystemidentificationTool¶

In [31]:
my_tool = SystemidentificationTool(
    general=general_params,
    eval_pipeline=eval_pipeline_params,
    target=target_params,
    feature_preprocessing=feature_preprocessing_params,
    search_space_class=search_space,
    post_processing=post_processing_params,
)

Run hyperparameter optimization¶

In [32]:
my_tool.identification(data, pd.DataFrame())

Evaluate results¶

The results are saved to the specified folder. Plots of target and predictions and the optimization history can be evaluated from there. For information about the trials-object, refer to the hyperopt documentation.

Example result plot¶

  • Copyright © 2022 SynErgie |
  • Impressum & Datenschutz