The task of one-shot face video re-enactment aims at generating target video of faces with the same identity of one source frame and facial deformation of the driving video. To achieve high quality generation, it is essential to precisely disentangle identity-related and identity-independent characteristics, meanwhile build expressive features keeping high-frequency facial details, which still remain unaddressed for existing approaches. To deal with these two challenges, we propose a two-stage generation model based on StyleGAN, whose key novel techniques lie in better disentangling identity and deformation codes in the latent space through an identity-based modeling and manipulating intermediate StyleGAN features at the second stage for augmenting facial details of the generating targets. To further improve identity consistency, a data augmentation method is introduced during training for enhancing the key features affecting identity such as hair and wrinkles. Extensive experimental results demonstrate the superiority of our approach compared to state-of-the-art methods.