Comparison of Normalization of Indonesian Slang Words Using the FastText & Word2vec Model with the Natural Language Processing Approach
DOI:
https://doi.org/10.11594/nstp.2025.4805Keywords:
Communication, FastText, Natural Language Processing, slang words, Twitter, Word2VecAbstract
The use of slang words is often used as a means of communication on social media such as Twitter, but it is a problem for certain groups because they are difficult to understand if they are said out of context. This can cause communication to be less effective, especially for those who are not familiar with the slang. Therefore, a word normalization approach is needed to translate words into formal language so that they are better understood by the public. Natural Language Processing (NLP) is a computational technique that analyzes and represents text or spoken language to achieve human-like processing. This research focuses on feature extraction techniques such as FastText and Word2Vec to map words to numerical vectors. The results of testing slang words show that FastText has the highest similarity of 0.9934859978 and the lowest is 0.8928895496, while Word2Vec has the highest similarity of 0.9977979123 and the lowest is 0.0975351095. The time required for FastText for training is 0.432 seconds and for normalization 0.016 seconds, while Word2Vec requires 0.027 seconds for training and 0.006 seconds for normalization.
Downloads
References
Bakhri, A. I., Tuhpatussania, S., Asfa, N., & Mubarak, S. M. (2021). Normalisasi Teks Komentar Instagram Masyarakat Makassar Menggunakan Metode Levenshtein Distence. Explore, 12(2), 29-32.
Hrp, H. N., Fikry, M., & Yusra, Y. (2023). Angkola Batak language text stemming algorithm based on grammar rules. Journal of Computer System and Informatics (JoSYC), 4(3), 642–648. doi: 10.47065/josyc.v4i3.3458.
Juwiantho, H. (2020). Indonesian Twitter Sentiment Analysis Based on Word2vec Using Deep Convolutional Neural Network. Conference: 2020 International Conference on Data Science and Its Applications (ICoDSA), 7(1), 181–188. doi: 10.25126/jtiik.202071758.
Khairul, R. F. M., & Perdana, S. R. 2023. Architecture of automatic conversation system in Indonesian language with normalization of informal language to standard. Journal of Information Technology and Computer Science, 10(7), 1469–1476. doi: 10.25126/jtiik.1077984.
Khomsah, S., Ramadhani, D. R., & Wijaya, S. (2022). The accuracy comparison between Word2Vec and FastText on sentiment analysis of hotel reviews. Jurnal RESTI (Systems Engineering and Information Technology), 6(3), 352–358. doi: 10.29207/resti.v6i3.3711.
Nurdin, A., Anggo, B., Aji, S., Bustamin, A., & Abidin, Z. (2020). Performance comparison of word embedding Word2vec, Glove, and Fasttext in Text Classification. Technocompact Journal, 14(2), 74.
Pakpahan, I., & Pardede, J. (2023). Analisis sentimen penanganan covid-19 menggunakan metode long short-term memory pada media Sosial Twitter. Jurnal Publikasi Teknik Informatika, 2(1), 12–25. https://doi.org/10.55606/jupti.v1i1.767
Rahma, F. P., Revallina, H. P., & Naura, A. A. (2023). Use of Slang Among Environmental Engineering Students of UPN 'Veteran' East Java Class of 2022. Journal of Social Humanities and Education, 3(2).
Ramadhanti, F., Wibisono, Y., & Sukamto, A. R. (2019). Morphological analysis to handle out-of-vocabulary words in indonesian part-of-speech tagger using hidden markov model. Jurnal Linguistik Komputasional (JLK), 2(1), 6.
Riyaddulloh, R., & Romadhony, A. (2021). Normalization of Indonesian Text Based on Slang Dictionary Case Study: Gadget Product Tweets on Twitter. E-Proceedings of Engineering, 8(4).
Sabrina, N. A. (2021). Internet slang containing code-mixing of English And Indonesian Used By Millennials On Twitter. Kandai, 17(2), 153. doi: 10.26499/jk.v17i2.3422.
Samudro, A. A. (2019). Normalization of Indonesian text in social media based on FastText embeddings. Surabaya.
Togatorop, R. P., Simanjuntak, P. R., Manurung, B. S., & Silalahi, C. M. (2021). Generating entity relationship diagrams from requirement specifications using natural language processing for Indonesian Language. Journal of Computer and Informatics, 9(2), 196–206. doi: 10.35508/jicon.v9i2.5051.
Utami, D. (2010). Characteristics of language use in Facebook status.
Wijaya, U. K., & Setiawan, B. E. (2023). Hate speech detection using convolutional neural network and gated recurrent unit with FastText Feature Expansion on Twitter. Scientific Journal of Electrical Engineering, Computer and Informatics (JITEKI), 9(3), 619–631. doi: 10.26555/jiteki.v9i3.26532.
Downloads
Published
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Rifqah Nur Surayya M. Jen, Syarifuddin N. Kapita, Muhammad Fhadli

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this proceedings agree to the following terms:
Authors retain copyright and grant the Nusantara Science and Technology Proceedings right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this proceeding.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the proceedings published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this proceeding.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See the Effect of Open Access).