Abstract:Building energy consumption forecasting is crucial for optimizing energy management, reducing operational costs, and achieving carbon neutrality goals. To improve prediction accuracy and result reliability, this study proposes a Multi-Scale Interpretable Temporal Prediction Network Model(ITSFN) through the collaborative optimization of Long Short-Term Temporal Networks (LSTM) and Kolmogorov-Arnold Networks (KAN). The model integrates temporal-environmental feature decoupling and a dynamic attention mechanism, explicitly decomposing time-series data into seasonal, trend, and residual components to construct a structured feature space. It employs a parallel architecture of Gated Recurrent Units (GRU) and multi-head attention for multi-scale feature modeling. Tested on an energy consumption dataset from a university teaching building in a hot-summer/cold-winter region, the results show that ITSFN reduces the RMSE of total energy consumption prediction by 13.9% compared to LSTM and decreases the RMSE of sub-item energy consumption prediction by 31.1% compared to Transformer. Additionally, ITSFN enhances the noise suppression coefficient to 0.89 through feature decoupling, achieves a local attention angle of 0.92 in mutation regions, and reduces over-smoothing by 29.6% compared to traditional methods. By quantifying feature contributions, it reveals the evolutionary patterns of component weights, validating the model"s effectiveness and practicality.