SISTEM PENDETEKSI KEMIRIPAN JUDUL SKRIPSI BERBASIS WEB MENGGUNAKAN NLP DAN WORD EMBEDDINGS

Salman Salman - Universitas Dipa Makassar

Abstract


The manual thesis title submission process is prone to duplication and similarity with previous research. This problem not only hinders innovation and originality in student research, but also creates an administrative burden for supervisors and program administrators. Manual checking of old title archives is highly inefficient, especially if there is no well-documented digital database. This often results in the approval of titles that have actually been used before. To overcome this, researchers developed a web-based thesis title similarity detection system using a Natural Language Processing (NLP) and Word Embeddings approach. The system measures the level of semantic similarity between titles using the Cosine Similarity algorithm. The dataset used consists of 500 thesis titles from the Informatics Engineering Study Program over the past five years. The test results show that the system is capable of detecting title similarities with an accuracy of up to 85%. This system is expected to assist academics in assessing the feasibility of thesis titles objectively, efficiently, and in a standardized manner.

ABSTRAK

Proses pengajuan judul skripsi yang dilakukan secara manual rentan terhadap duplikasi dan kemiripan dengan penelitian terdahulu. Masalah ini tidak hanya menghambat inovasi dan orisinalitas penelitian mahasiswa, tetapi juga menimbulkan beban administratif bagi dosen pembimbing dan pengelola program studi. Proses pengecekan manual terhadap arsip judul lama sangat tidak efisien, apalagi jika tidak tersedia database digital yang terdokumentasi dengan baik. Hal ini sering kali mengakibatkan disetujuinya judul-judul yang sebenarnya telah dikerjakan sebelumnya. Untuk mengatasi hal ini, peneliti mengembangkan sistem pendeteksi kemiripan judul skripsi berbasis web menggunakan pendekatan Natural Language Processing (NLP) dan Word Embeddings. Sistem mengukur tingkat kesamaan semantik antarjudul menggunakan algoritma Cosine Similarity. Dataset yang digunakan terdiri atas 500 judul skripsi dari Program Studi Teknik Informatika selama lima tahun terakhir. Hasil pengujian menunjukkan bahwa sistem mampu mendeteksi kemiripan judul dengan akurasi hingga 85%. Sistem ini diharapkan dapat membantu akademisi dalam menilai kelayakan judul skripsi secara objektif, efisien, dan terstandarisasi.


Keywords


similarity detection, thesis title, NLP, word embeddings, cosine similarity / deteksi kemiripan, judul skripsi, NLP, word embeddings, cosine similarity

Full Text:

PDF

References


Alzahrani, S. M., Salim, N., & Abraham, A. (2012). Understanding plagiarism: Linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(2), 133–149. https://doi.org/10.1109/TSMCC.2011.2134847

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp. 4171–4186). https://arxiv.org/abs/1810.04805

Han, B., Park, Y., & Lee, J. (2016). A plagiarism detection method using semantic feature analysis. Information Sciences, 372, 1–14. https://doi.org/10.1016/j.ins.2016.08.051

Iskandar, D., & Kurniawati, A. (2025). Analisis perbandingan teknik Word2vec dan Doc2vec dalam mengukur kemiripan dokumen menggunakan cosine similarity. Jurnal Teknologi Informasi dan Ilmu Komputer, 12(1), 133–144.

Jurafsky, D., & Martin, J. H. (2021). Speech and language processing (3rd ed., draft). Stanford University. Retrieved from https://web.stanford.edu/~jurafsky/slp3/

Kusuma, M. H., & Raharjo, B. (2020). Implementasi Word2Vec untuk deteksi kemiripan judul tugas akhir. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 7(2), 287–292. https://doi.org/10.25126/jtiik.2020722044

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119). https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

Nurhidayat, R., & Saputra, A. R. (2018). Sistem deteksi kemiripan judul skripsi menggunakan metode cosine similarity. Jurnal Teknologi dan Sistem Komputer, 6(3), 113–120. https://doi.org/10.14710/jtsiskom.6.3.2018.113-120

Putri, D. A., & Santosa, P. I. (2019). Analisis kemiripan judul skripsi menggunakan TF-IDF dan cosine similarity. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 3(1), 45–52. https://doi.org/10.29207/resti.v3i1.901

Rahutomo, F., Kitasuka, T., & Aritsugi, M. (2012). Semantic text similarity using local and global semantic information. In Proceedings of the Third International Conference on Advances in Information Technology (IAIT) (pp. 1–6).

Sari, D. P., & Munir, R. (2020). Evaluasi kemiripan teks menggunakan pendekatan word embedding dan Jaccard similarity. Jurnal Ilmiah Teknologi Informasi Asia, 14(1), 37–42.

Singh, A., & Sharma, D. (2021). Text similarity based plagiarism detection using NLP techniques. International Journal of Engineering Research & Technology (IJERT), 10(5), 225–230.

Sudarma, I. M., & Yuliandari, N. P. (2021). Pengembangan sistem deteksi plagiarisme judul skripsi menggunakan NLP. Jurnal Sistem dan Teknologi Informasi, 9(2), 89–96.

Turnitin. (2023). Plagiarism detection and academic integrity. Retrieved from https://www.turnitin.com

Wijayanto, A., & Nugroho, L. E. (2019). Penerapan metode NLP dalam sistem cerdas deteksi judul skripsi. Jurnal Ilmiah Komputer dan Informatika KOMPUTA, 8(2), 112–118.




DOI: https://doi.org/10.59818/jpi.v5i5.1964