SISTEM PENDETEKSI KEMIRIPAN JUDUL SKRIPSI BERBASIS WEB MENGGUNAKAN NLP DAN WORD EMBEDDINGS
Abstract
The manual thesis title submission process is prone to duplication and similarity with previous research. This problem not only hinders innovation and originality in student research, but also creates an administrative burden for supervisors and program administrators. Manual checking of old title archives is highly inefficient, especially if there is no well-documented digital database. This often results in the approval of titles that have actually been used before. To overcome this, researchers developed a web-based thesis title similarity detection system using a Natural Language Processing (NLP) and Word Embeddings approach. The system measures the level of semantic similarity between titles using the Cosine Similarity algorithm. The dataset used consists of 500 thesis titles from the Informatics Engineering Study Program over the past five years. The test results show that the system is capable of detecting title similarities with an accuracy of up to 85%. This system is expected to assist academics in assessing the feasibility of thesis titles objectively, efficiently, and in a standardized manner.
ABSTRAK
Proses pengajuan judul skripsi yang dilakukan secara manual rentan terhadap duplikasi dan kemiripan dengan penelitian terdahulu. Masalah ini tidak hanya menghambat inovasi dan orisinalitas penelitian mahasiswa, tetapi juga menimbulkan beban administratif bagi dosen pembimbing dan pengelola program studi. Proses pengecekan manual terhadap arsip judul lama sangat tidak efisien, apalagi jika tidak tersedia database digital yang terdokumentasi dengan baik. Hal ini sering kali mengakibatkan disetujuinya judul-judul yang sebenarnya telah dikerjakan sebelumnya. Untuk mengatasi hal ini, peneliti mengembangkan sistem pendeteksi kemiripan judul skripsi berbasis web menggunakan pendekatan Natural Language Processing (NLP) dan Word Embeddings. Sistem mengukur tingkat kesamaan semantik antarjudul menggunakan algoritma Cosine Similarity. Dataset yang digunakan terdiri atas 500 judul skripsi dari Program Studi Teknik Informatika selama lima tahun terakhir. Hasil pengujian menunjukkan bahwa sistem mampu mendeteksi kemiripan judul dengan akurasi hingga 85%. Sistem ini diharapkan dapat membantu akademisi dalam menilai kelayakan judul skripsi secara objektif, efisien, dan terstandarisasi.
Keywords
Full Text:
PDFReferences
Alzahrani, S. M., Salim, N., & Abraham, A. (2012). Understanding plagiarism: Linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(2), 133–149. https://doi.org/10.1109/TSMCC.2011.2134847
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT (pp. 4171–4186). https://arxiv.org/abs/1810.04805
Han, B., Park, Y., & Lee, J. (2016). A plagiarism detection method using semantic feature analysis. Information Sciences, 372, 1–14. https://doi.org/10.1016/j.ins.2016.08.051
Iskandar, D., & Kurniawati, A. (2025). Analisis perbandingan teknik Word2vec dan Doc2vec dalam mengukur kemiripan dokumen menggunakan cosine similarity. Jurnal Teknologi Informasi dan Ilmu Komputer, 12(1), 133–144.
Jurafsky, D., & Martin, J. H. (2021). Speech and language processing (3rd ed., draft). Stanford University. Retrieved from https://web.stanford.edu/~jurafsky/slp3/
Kusuma, M. H., & Raharjo, B. (2020). Implementasi Word2Vec untuk deteksi kemiripan judul tugas akhir. Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), 7(2), 287–292. https://doi.org/10.25126/jtiik.2020722044
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119). https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf
Nurhidayat, R., & Saputra, A. R. (2018). Sistem deteksi kemiripan judul skripsi menggunakan metode cosine similarity. Jurnal Teknologi dan Sistem Komputer, 6(3), 113–120. https://doi.org/10.14710/jtsiskom.6.3.2018.113-120
Putri, D. A., & Santosa, P. I. (2019). Analisis kemiripan judul skripsi menggunakan TF-IDF dan cosine similarity. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 3(1), 45–52. https://doi.org/10.29207/resti.v3i1.901
Rahutomo, F., Kitasuka, T., & Aritsugi, M. (2012). Semantic text similarity using local and global semantic information. In Proceedings of the Third International Conference on Advances in Information Technology (IAIT) (pp. 1–6).
Sari, D. P., & Munir, R. (2020). Evaluasi kemiripan teks menggunakan pendekatan word embedding dan Jaccard similarity. Jurnal Ilmiah Teknologi Informasi Asia, 14(1), 37–42.
Singh, A., & Sharma, D. (2021). Text similarity based plagiarism detection using NLP techniques. International Journal of Engineering Research & Technology (IJERT), 10(5), 225–230.
Sudarma, I. M., & Yuliandari, N. P. (2021). Pengembangan sistem deteksi plagiarisme judul skripsi menggunakan NLP. Jurnal Sistem dan Teknologi Informasi, 9(2), 89–96.
Turnitin. (2023). Plagiarism detection and academic integrity. Retrieved from https://www.turnitin.com
Wijayanto, A., & Nugroho, L. E. (2019). Penerapan metode NLP dalam sistem cerdas deteksi judul skripsi. Jurnal Ilmiah Komputer dan Informatika KOMPUTA, 8(2), 112–118.
DOI: https://doi.org/10.59818/jpi.v5i5.1964
.png)



