使用机器学习预测莱茵河水位

时间:2024-10-10 07:11:33

Long Short Term Memory (LSTM) models are a powerful type of neural network ideally suited to predict time-dependent data. Rhine water levels fit right into this category: they vary over time, depending on a range of variables such as rain, temperatures and snow cover in the Alps.

长短期记忆(LSTM)模型是一种功能强大的神经网络,非常适合预测与时间有关的数据。 莱茵河的水位正好属于这一类:它们随时间而变化,取决于阿尔卑斯山的降雨,温度和积雪等一系列变量。

The Rhine is Europe’s lifeblood. For centuries it has been used as a major artery for shipping goods into Germany, France, Switzerland and Central Europe. However, with climate change, water levels on the river are likely to become more variable. Forecasting the river’s level accurately is therefore a primary concern for a whole range of actors, from shipping companies to commodity traders and industrial conglomerates.

莱茵河是欧洲的命脉。 几个世纪以来,它一直被用作将货物运往德国,法国,瑞士和中欧的主要动力。 然而,随着气候变化,河流上的水位可能变得更加可变。 因此,准确地预测河流的水位是从航运公司到商品贸易商和工业集团等所有参与者的首要考虑。

Image for post
A barge laden with coal navigating the Rhine (source: /wiki/File:Coal_barge_Chilandia_on_Rhine_-_looking_south.jpg)
一艘载有煤炭的驳船驶向莱茵河(来源: https : ///wiki/File : Coal_barge_Chilandia_on_Rhine _-_ looking_south.jpg )

Unlike classical regression-based models, LSTMs are able to capture non-linear relationships between different variables; more precisely, the sequence dependence among these variables. This blog focuses on the problem of Rhine river forecasting using LSTMs, rather than the theory behind these models.

与经典的基于回归的模型不同,LSTM能够捕获不同变量之间的非线性关系。 更确切地说,这些变量之间的序列依赖性。 该博客主要关注使用LSTM预测莱茵河的问题,而不是这些模型背后的理论。

眼前的问题 (Problem at hand)

The problem we are looking to solve here is the following: we would like to forecast next-day water levels at Kaub, a key chokepoint in western Germany, with the highest possible accuracy.

我们要在此处解决的问题如下:我们希望以尽可能最高的精度预测第二天德国西部的主要阻塞点Kaub的水位。

We have historical daily data from 2 January 2000 to 27 July 2020, equivalent to 7513 observations. The dataset includes 15 different categories, displayed as columns:

我们拥有2000年1月2日至2020年7月27日的每日历史数据,相当于7513次观测。 数据集包括15个不同的类别,显示为列:

  • ‘date’: the date of the observation

    “日期”:观察日期
  • ‘Kaub’: the day-on-day difference in Kaub water level, in centimetres — this is the ‘y’ value we are trying to forecast (source: WSV)

    “ Kaub”:Kaub水位的每日差异,以厘米为单位-这是我们要预测的“ y”值(来源:WSV)
  • ‘Rheinfelden’: the absolute value of the water flow at Rheinfelden, in Switzerland, in cubic metres per second (source: BAFU)

    “莱茵费尔登”:瑞士莱茵费尔登的水的绝对值,以立方米/秒为单位(来源:BAFU)
  • ‘Domat’: the absolute value of the water flow at Domat, near the source of the Rhine, in cubic metres per second (source: BAFU)

    “ Domat”:靠近莱茵河源头的Domat水流的绝对值,以立方米/秒为单位(来源:BAFU)
  • ‘precip_middle’: the average daily amount of rain recorded at 20 weather stations along the Rhine, in millimetres (source: DWD)

    'precip_middle':沿莱茵河的20个气象站记录的日平均降雨量,以毫米为单位(来源:DWD)
  • ‘avgtemp_middle’: the average temperature recorded at the same stations, in degrees Celsius

    'avgtemp_middle':同一站记录的平均温度,以摄氏度为单位
  • ‘maxtemp_middle’: the maximum temperature recorded at the same stations

    'maxtemp_middle':同一站记录的最高温度
  • ‘mintemp_middle’: the minimum temperature recorded at the same stations

    'mintemp_middle':同一站记录的最低温度
  • ‘precip_main’: the average daily amount of rain recorded at 8 weather stations along the Main, a major tributary of the Rhine, in millimetres (source: DWD)

    'precip_main':沿美因河(莱茵河的主要支流)的8个气象站记录的日平均降雨量,以毫米为单位(来源:DWD)
  • ‘avgtemp_main’: the average temperature recorded at the same stations, in degrees Celsius

    'avgtemp_main':同一站点记录的平均温度,以摄氏度为单位
  • ‘maxtemp_main’: the maximum temperature recorded at the same stations

    'maxtemp_main':在相同站点上记录的最高温度
  • ‘mintemp_main’: the minimum temperature recorded at the same stations

    'mintemp_main':同一站记录的最低温度
  • ‘precip_neckar’: the average daily amount of rain recorded at 7 weather stations along the Neckar, also a major tributary of the Rhine, in millimetres (source: DWD)

    “ precip_neckar”:内卡河(也是莱茵河的主要支流)内的7个气象站记录的日平均降雨量,以毫米为单位(来源:DWD)