Abstract:The basic requirement for catering staff to wear masks is widely acknowledged, but it still has not paid sufficient attention to mask wearing specification. To deal with this, a detection method for mask wearing specification was proposed for complex working environments of kitchen such as oil fumes, water vapor, and flames. Firstly, a targeted dataset named CKEMFD-12k was collected and constructed. Secondly, a multi-task convolutional neural network(MMWN) was constructed to extract key element information for assessing mask wearing status. By utilizing the self-designed multi-scale hybrid spatial pyramid pool module(MHSPP) and tube-embedded transformer attention mechanism(TETAM), the method achieved the highest average target detection accuracy of 94.68%, the minimum mouth-nose key points mean error of 4.62%, and the optimal mask region segmentation pixel accuracy of 94.32% compared to the existing network. At last, a algorithm was designed to calculate mouth-nose key triangle area and analyze their coverage relationship with the mask area, which provides a judgment method or three mask wearing status: normative wearing, improper wearing, and no wearing. Experiment shows that the comprehensive judgment accuracy is 93.57%, surpassing existing mainstream algorithms.