With the advent of deep learning techniques and large-scale datasets, recent years have witnessed rapid progress in monocular human mesh recovery. Despite the impressive performance of public benchmarks, existing methods are vulnerable to unusual poses, which prevents them from practical deployment to scenarios such as dance and martial arts.This issue is mainly attributed to the domain gap induced by data scarcity in relevant cases. However, most public datasets are captured under constrained settings and lack samples of such complex movements.
To mitigate data scarcity, we propose a pipeline for automatic data crawling, precise annotation, and hardcase mining. Based on this pipeline, we establish a large dataset in a short time. The dataset, named HardMo, contains 7M images along with precise annotations covering 15 categories of dance and 14 categories of martial arts. According to our observation, the failure in the two scenarios is mainly characterized by incorrect posture of hand-wrist and foot-ankle. For further investigation in the two hardcases, we leverage the proposed automatic pipeline to filter collected data and establish two subsets named HardMo-Hand and HardMo-Foot.
Extensive experiments demonstrate the efficacy of the annotation pipeline and collected dataset. Specifically, after being trained on HardMo, HMR, an early pioneering method, can even outperform the current state of the art, 4DHumans, on our benchmarks.