Abstract: BERT (Bidirectional Encoder Representations from Transformers) model, as a pre-training language model based on transformer architecture, can capture rich contextual information and provide ...