Investigating the effect of sub-word segmentation on the performance of transformer language models. (arXiv:2305.05480v2 [cs.CL] UPDATED)


Investigating the effect of sub-word segmentation on the performance of transformer language models. (arXiv:2305.05480v2 [cs.CL] UPDATED)
By: <a href="http://arxiv.org/find/cs/1/au:+Hou_J/0/1/0/all/0/1">Jue Hou</a>, <a href="http://arxiv.org/find/cs/1/au:+Katinskaia_A/0/1/0/all/0/1">Anisia Katinskaia</a>, <a href="http://arxiv.org/find/cs/1/au:+Vu_A/0/1/0/all/0/1">Anh-Duc Vu</a>, <a href="http://arxiv.org/find/cs/1/au:+Yangarber_R/0/1/0/all/0/1">Roman Yangarber</a> Posted: June 23, 2023

We would like to explore how morphemes can affect the performance of a
language model. We trained GPT-2 and Bert model with StateMorph for both
Finnish and Russian, which is a morpheme segmenting algorithm. As a comparison,
we also trained a model with BPE and Morfessor. Our preliminary result shows
that StateMorph can help the model to converge more efficiently and achieve a
better validation score.

Provided by:
http://arxiv.org/icons/sfx.gif

DoctorMorDi

DoctorMorDi

Moderator and Editor