拆解XLNet模型设计，回顾语言表征学习的思想演进(22)_机器之心发布作者：追一科技AILab研究

参考文献

[1] Firth, J. R. (1957). Papers in linguistics 1934–1951. London: Oxford University Press.

[2] Levy O, Goldberg Y. Neural word embedding as implicit matrix factorization[C]//Advances in neural information processing systems. 2014: 2177-2185.

[3] Mahajan D, Girshick R, Ramanathan V, et al. Exploring the limits of weakly supervised pretraining[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 181-196.

[4] Hestness J, Narang S, Ardalani N, et al. Deep learning scaling is predictable, empirically[J]. arXiv preprint arXiv:1712.00409, 2017.

[5] Louwerse M M. Knowing the meaning of a word by the linguistic and perceptual company it keeps[J]. Topics in cognitive science, 2018, 10(3): 573-589.

[6] Gordon J, Van Durme B. Reporting bias and knowledge acquisition[C]//Proceedings of the 2013 workshop on Automated knowledge base construction. ACM, 2013: 25-30.