Abstract:Accurate forecasting of building energy consumption is crucial for optimizing energy management, reducing operational costs, and achieving carbon neutrality goals. This study proposes a multi-scale interpretable temporal prediction network model (ITSFN), which enhances prediction accuracy and reliability through the collaborative optimization of long short-term temporal (LSTM) networks and Kolmogorov-Arnold networks (KAN). The model integrates temporal-environmental feature decoupling with a dynamic attention mechanism, explicitly decomposing time-series data into seasonal, trend, and residual components to construct a structured feature space. It employs a parallel architecture of gated recurrent units (GRU) and multi-head attention to model multi-scale features. Tested on an energy consumption dataset from a university building in a hot-summer/cold-winter region, ITSFN outperforms traditional models: it reduces the root mean square error (RMSE) of total energy consumption prediction by 13.9% compared to LSTM and decreases the RMSE of sub-item energy consumption prediction by 31.1% compared to Transformer. Additionally, ITSFN enhances the noise suppression coefficient to 0.89 through feature decoupling, achieves a local attention angle of 0.92 in mutation regions, and reduces over-smoothing by 29.6% compared to traditional methods. By quantifying feature contributions, the model reveals the evolutionary patterns of component weights, further validating its effectiveness and practical applicability.