Der Prognose-Service wurde in SynErgie vom Institut für Produktionsmanagement, Technologie und Werkzeugmaschinen (PTW) der TU Darmstadt entwickelt, um Ingenieur*innen die Erstellung von Prognose-Modellen der Nutzenergiebedarfe (Wärme, Kälte, Strom) in Produktionsbereichen zu erleichtern. Mit dem Service können die erstellten Modelle zusätzlich in Echtzeitumgebungen eingesetzt werden. Damit wird die prädiktive Regelung der Energieversorgungssysteme für den energieflexiblen Fabrikbetrieb unterstützt. Der Service soll Forschungspartner*innen und Unternehmen dabei unterstützen, den energieflexiblen Betrieb umzusetzen.
Der Service besteht, wie in Abbildung 1 gezeigt, aus den Schritten Modellierung (Teil 1) und Deployment (Teil 2). Die Nutzenergiebedarfsprognosen werden als Eingangsgröße für die modell-prädiktive Regelung der Produktionsinfrastruktur benötigt und können darüber hinaus für weitere Anwendungsfälle genutzt werden.
Die Einbettung des Prognose-Service in den Regelkreis der energieflexiblen Fabrik wird in Abbildung 1 aufgezeigt. Die Prognose bildet darin einen integralen Bestandteil, um die Energieversorgungssysteme vorausschauend betreiben zu können. Typische Eingangsgrößen für die Prognosemodelle sind Messdaten (Energiebedarf der letzten Zeit), Produktionsplandaten (geplante Fertigungsaufträge, Fabrikauslastung) und Wetterdaten (Wetterprognose des nächsten Tags). Mithilfe von aufgenommenen Daten wird im Schritt Modellierung ein Machine-Learning-Modell antrainiert. Dieses liefert dann nach dem Deployment im Live-Betrieb Prognosen für den Lastgang des Produktionssystems bis zu 24h in die Zukunft.
Abbildung 1: Einbettung des Prognose-Service in den Regelkreis der energieflexiblen Fabrik
Das Jupyter Notebook unten ist eine kurze Demonstration des Modellierungsteils des Services. Aus Sicherheitsgründen wird die Demonstration statisch zur Verfügung gestellt. Falls Sie SynErgie-Partner sind und den Service mit eigenen Daten testen möchten, wenden Sie sich bitte an die verantwortliche Person. Diese, sowie die Dokumentation des Services, findet sich unter folgendem Link: Dokumentation
Ein Video der Demonstration des Teils Deployment ist auf Youtube zu finden.
Im folgenden Video wird die Nutzung des Jupyter Notebooks erläutert. Unten können Sie in den Code-Zellen folgen.
# Imports
from sys_tool import SystemidentificationTool
import pandas as pd
import numpy as np
from datetime import datetime
To insert a dataset, navigate to your preferred location in jupyter, optionally add a folder and upload your data there.
Here, we added the example data in the directory prognoseservice_example/data/example_data.csv
# Import dataset
data_path = 'data/example_data.csv' # Relative path from notebook directory
data = pd.read_csv(data_path, index_col=0, parse_dates=True) # Specify index_col and parse_dates
# to set Datetime-Index automatically. See pandas-documentation for more options, e.g. separator or decimal separator.
The data must be of type pandas.DataFrame
with DatetimeIndex
, following the format of this example
# Visualize dataset
data.head()
Pu_Mgn3.425.ThHy_T_VL | Ort_AU.WS.Atmo_T_Luft | Wetterprognose_Temperatur | thermal_power | |
---|---|---|---|---|
Time (with timezone) | ||||
2020-10-30 00:00:00+01:00 | 28.660971 | 10.311627 | 10.837501 | 4381.777003 |
2020-10-30 00:15:00+01:00 | 31.782938 | 10.433481 | 10.912500 | 8830.517407 |
2020-10-30 00:30:00+01:00 | 37.977928 | 10.480000 | 10.987500 | 16571.353580 |
2020-10-30 00:45:00+01:00 | 32.884285 | 10.506665 | 11.062500 | 7840.641652 |
2020-10-30 01:00:00+01:00 | 30.151451 | 10.471549 | 11.125000 | 4688.792202 |
data.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 11807 entries, 2020-10-30 00:00:00+01:00 to 2021-03-01 23:30:00+01:00 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Pu_Mgn3.425.ThHy_T_VL 11784 non-null float64 1 Ort_AU.WS.Atmo_T_Luft 11784 non-null float64 2 Wetterprognose_Temperatur 11807 non-null float64 3 thermal_power 11784 non-null float64 dtypes: float64(4) memory usage: 461.2 KB
Define parameters like target_name
, forecasting_horizon
, search_space
In this example, just the most important parameters are set. For a complete list of usable parameters, refer to the user documentation (--> Link).
# Import dataclasses for parametetrization
from sys_tool.default_data_classes.general import General
from sys_tool.default_data_classes.eval_pipeline import EvalPipeline
from sys_tool.default_data_classes.feature_preprocessing import FeaturePreprocessing
from sys_tool.default_data_classes.target import Target
from sys_tool.default_data_classes.search_space import SearchSpace
from sys_tool.default_data_classes.post_processing import PostProcessing
# Most important parameters
# Target
target_name = 'thermal_power' # Define which column of data should be forecasted
forecasting_horizon = [1, 2, 3, 4] # Define which time steps should be forecasted (must be of type list)
# Hyperparameter optimization
evaluations = 100 # Define how many evaluations the hyperparameter optimizer should run, e.g. 100
test_set_size = 0.2 # Define the fraction of data used for evaluating the final model
cv_splits = 5 # Define how many cross-validation splits should be used
# Feature engineering
n_time_steps = 4 # Number of past time steps to include in the input data
moving_average_windows = [4] # Size of moving average windows to include (as list[int])
# Search space
max_n_features = 40 # Maximum number of features to be selected
pca_n_components = 0.9 # Parameter n_components of PCA
from eta_ml_lib.hyperparameter_optimization import BayesOptimizer
general_params = General(
result_path="results", # Where results should be saved
hyperparam_optimizer=BayesOptimizer(
max_evaluations=evaluations, # How many evaluations to run the hyperparameter optimization
),
)
from sklearn.model_selection import TimeSeriesSplit
eval_pipeline_params = EvalPipeline(
cvSplit=TimeSeriesSplit(n_splits=cv_splits), # Cross-validation splitter, must be sklearn type (e.g. TimeSeriesSplit, KFold, ...)
test_set_size=test_set_size, # Fracture of the dataset to use as final test set (will not be used during hyperparameter tuning)
)
from sys_tool import MyTargetPreprocessor
from eta_ml_lib.target_preprocessing import MyMultiTimeStepForecastingPreprocessor
from sklearn.impute import KNNImputer
target_params = Target(
target_preprocessing=MyTargetPreprocessor(
steps=[
(
"forecasting_preprocessor",
MyMultiTimeStepForecastingPreprocessor(
steps_to_predict=forecasting_horizon, #
feature_target_names=[target_name],
# TODO variable umbenennen
),
),
("target_imputer", KNNImputer()),
]
),
)
from sys_tool import MyFeaturePreprocessor
from eta_ml_lib.preprocessing import MyOutlierFilterHampel
from sklearn.pipeline import FeatureUnion
from sklearn.preprocessing import PolynomialFeatures
from eta_ml_lib.feature_engineering import (
MyTimeStep,
MyMovingAverage,
MyDateTimeFeatures,
MyDifferential,
GausianMixtureModelClassification,
DummyFeatureEngineering
)
feature_preprocessing_params = FeaturePreprocessing(
transformer=MyFeaturePreprocessor(
steps=[ # Include any sklearn transformer as pipeline-step here, if no hyperparameters should be optimized for that step
("imputer", KNNImputer()), # syntax: ("step_name", Transformer(params)),
("outlier_filter", MyOutlierFilterHampel()),
(
"feature_engineering",
FeatureUnion(
transformer_list=[
("polynomial_degree2", PolynomialFeatures(degree=2)),
("time_delay", MyTimeStep(steps=np.arange(1, n_time_steps + 1, dtype=int), transform_features="all")),
("wheather_forecast", MyTimeStep(steps=np.arange(-1, -5, -1, dtype=int), transform_features="Wetterprognose_Temperatur")),
("smoothing", MyMovingAverage(windows=moving_average_windows, transform_features="all")),
("date_time_features", MyDateTimeFeatures(transform_features="all")),
("differential", MyDifferential(differential_list=[1, 2], transform_features="all")),
# (
# "auto_classification",
# GausianMixtureModelClassification(
# time_diff=10, n_components=[2], transform_features="all"
# ),
# ),
("original_features", DummyFeatureEngineering()),
]
),
),
]
),
)
from sys_tool.basics import hypercheck # utility function to check if a transformer is given with the correct hyperparameters
from hyperopt import hp
from hyperopt.pyll.base import scope
# Preprocessing
from sklearn.preprocessing import StandardScaler
from eta_ml_lib.feature_selection import SelectFromModelOwn
from sklearn.ensemble import ExtraTreesRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, ExtraTreeRegressor
from sklearn.linear_model import LassoCV, ElasticNet, LinearRegression, Lasso, Ridge
from sklearn.decomposition import PCA
# Modeling
from eta_ml_lib.estimator import KerasRegressorOwn
from eta_ml_lib.estimator import DenseLayer, RecurrentLayer
from sklearn.ensemble import (
BaggingRegressor,
RandomForestRegressor
)
from sklearn.neighbors import KNeighborsRegressor
# Model definition
ann = [
# Feedforward neural network
hypercheck(
KerasRegressorOwn,
hyperparameters={
"epochs": 1000,
"loss": "mse",
"optimizer": "adam",
"batch_size": hp.quniform("ffn_batch_size", 50, 1000, 2), # hp.qloguniform("batch_size_ffn", np.log(64), np.log(1024), 2)
"num_hidden_layer": hp.qlognormal("ffn_layer_num", 0, 0.5, 1),
"net": hp.choice(
"ffn_architecture",
[
{i: DenseLayer(i) for i in range(5)}, # Adds 5 optional Dense layers
],
),
"output_layer": {
"kernel_initializer": "random_uniform",
"out_activation": "linear",
},
"callbacks": [
{
"callback_function": "EarlyStopping",
"hyperparameters": {
"monitor": "loss",
"patience": 5,
"mode": "min",
"verbose": 1,
"restore_best_weights": True,
},
}
],
},
returnKeys=("create_model_function", "hyperparameters"),
),
]
rnn = [
# Recurrent neural network
hypercheck(
KerasRegressorOwn,
hyperparameters={
"epochs": 1000,
"loss": "mse",
"optimizer": "adam",
"batch_size": hp.quniform("rnn_batch_size", 50, 1000, 2), # hp.qloguniform("rnn_batch_size", np.log(64), np.log(1024), 2)
"num_hidden_layer": hp.qlognormal("rnn_layer_num", 0, 0.5, 1),
"net": hp.choice(
"rnn_architecture",
[
{i: RecurrentLayer(i) for i in range(3)} # 3 optional LSTM layers
],
),
"output_layer": {
"kernel_initializer": "random_uniform",
"out_activation": "linear",
},
"max_shift": n_time_steps,
"callbacks": [
{
"callback_function": "EarlyStopping",
"hyperparameters": {
"monitor": "loss",
"patience": 5,
"mode": "min",
"verbose": 1,
"restore_best_weights": True,
},
}
],
},
returnKeys=("create_model_function", "hyperparameters"),
),
]
ensemble_trees = [
hypercheck(
BaggingRegressor,
hyperparameters={"n_estimators": scope.int(hp.quniform("n_estimators", 5, 40, 5))},
returnKeys=("create_model_function", "hyperparameters"),
),
hypercheck(
ExtraTreeRegressor,
hyperparameters={"max_depth": scope.int(hp.quniform("max_depth", 20, 100, 10))},
returnKeys=("create_model_function", "hyperparameters"),
),
hypercheck(
ExtraTreesRegressor,
hyperparameters={
"n_estimators": scope.int(hp.loguniform("n_estimators_ext", np.log(20), np.log(1000))),
"max_depth": scope.int(hp.quniform("max_depth_ext", 10, 100, 1)),
"n_jobs": -1,
},
returnKeys=("create_model_function", "hyperparameters"),
),
hypercheck(
RandomForestRegressor,
hyperparameters={
"n_estimators": scope.int(hp.loguniform("n_estimators_rf", np.log(20), np.log(1000))),
"max_depth": scope.int(hp.quniform("max_depth_rf", 10, 100, 1)),
"n_jobs": -1,
},
returnKeys=("create_model_function", "hyperparameters"),
),
]
decision_tree = [
hypercheck(
DecisionTreeRegressor,
hyperparameters={
"criterion": hp.choice("criterion", ["friedman_mse", "mse", "mae"]),
"splitter": hp.choice("splitter", ["best", "random"]),
},
returnKeys=("create_model_function", "hyperparameters"),
),
]
linear_models = [
hypercheck(
ElasticNet,
hyperparameters={
"alpha": hp.uniform("alpha", 0.5, 1),
"l1_ratio": hp.choice("l1_ratio", [0.3, 0.5, 0.8]),
},
returnKeys=("create_model_function", "hyperparameters"),
),
hypercheck(
LinearRegression, hyperparameters={}, returnKeys=("create_model_function", "hyperparameters")
),
hypercheck(
Lasso,
hyperparameters={"alpha": hp.choice("alpha1", [0.5, 1])},
returnKeys=("create_model_function", "hyperparameters"),
),
hypercheck(
Ridge,
hyperparameters={"alpha": hp.choice("alpha2", [0.5, 1])},
returnKeys=("create_model_function", "hyperparameters"),
),
]
k_neighbors = [
hypercheck(
KNeighborsRegressor,
hyperparameters={"n_neighbors": scope.int(hp.quniform("n_neighbors", 5, 10, 5))},
returnKeys=("create_model_function", "hyperparameters"),
),
]
search_space = SearchSpace(
##########################
# Preprocessing Pipeline #
##########################
dataset_processes={
### Scaling
"scaler": hypercheck(StandardScaler, {}),
###Feature Selection
"feature_selection": hp.choice( # Use hyperopt-objects to define steps/parameters that should be varied during optimization
"feature_selection",
[
hypercheck(
SelectFromModelOwn,
hyperparameters={
"estimator_type": hp.choice(
"feature_selection_estimatortype",
[
hypercheck(ExtraTreesRegressor, {}),
hypercheck(LassoCV, {}),
hypercheck(GradientBoostingRegressor, {}),
hypercheck(DecisionTreeRegressor, {}),
],
),
"k": hp.quniform("Feature_number", 5, max_n_features, 1),
"threshold": -np.inf,
},
),
# hypercheck(DummyProcess, {}),
hypercheck(PCA, {'n_components': pca_n_components}),
],
),
},
###################
# Model selection #
###################
multi_target_model = ann + ensemble_trees + linear_models,
single_target_model = [],
quantile_regression_model = [],
)
from sys_tool import MyPostprocessor
from eta_ml_lib.postprocessing import MyScorer, Plotter
from eta_ml_lib.postprocessing.my_metrics import root_mean_squared_error, nrmse_quantile_5_95, mean_error, std_error
from sklearn.metrics import mean_absolute_error, r2_score
post_processing_params = PostProcessing(
post_processor=MyPostprocessor(
eval_processes=[
MyScorer(
metrics={
"MAE": {
"function_name": mean_absolute_error,
"kwargs": {"multioutput": "raw_values"},
},
"RMSE": {
"function_name": root_mean_squared_error,
"kwargs": {},
},
"nRMSE": {
"function_name": nrmse_quantile_5_95,
"kwargs": {},
},
"R2-score": {
"function_name": r2_score,
"kwargs": {"multioutput": "raw_values"},
},
"mean_error": {
"function_name": mean_error,
"kwargs": {},
},
"std_error": {
"function_name": std_error,
"kwargs": {},
},
}
),
Plotter(),
]
),
)
my_tool = SystemidentificationTool(
general=general_params,
eval_pipeline=eval_pipeline_params,
target=target_params,
feature_preprocessing=feature_preprocessing_params,
search_space_class=search_space,
post_processing=post_processing_params,
)
my_tool.identification(data, pd.DataFrame())
The results are saved to the specified folder. Plots of target and predictions and the optimization history can be evaluated from there. For information about the trials-object, refer to the hyperopt
documentation.