PHONOLOGICAL RULES IN URDU COMPOUND WORDS

Mahwish Farooq; Umm-e-Rooman Yaqoob; Asim Mahmood; Rabindra Dev Prasad; Nurien Hidayu Muhamad Rusly; Niaz Ali; Rachel Sing Ee Tan

doi:10.18623/rvd.v23.5190

Authors

Mahwish Farooq https://orcid.org/0000-0002-6333-138X
Umm-e-Rooman Yaqoob
Asim Mahmood
Rabindra Dev Prasad
Nurien Hidayu Muhamad Rusly
Niaz Ali
Rachel Sing Ee Tan

DOI:

https://doi.org/10.18623/rvd.v23.5190

Keywords:

Generative Phonology, Diacritic, Auditory Analysis, ational Reform and Developing Countries (SDG4), Educational Gap (SDG4)

Abstract

Phonetics and phonology are branches of linguistics that focus on the study of speech sounds and their patterns in language. Urdu is a major language spoken mainly in South Asia, with over 100 million speakers worldwide. Compound words play an essential role in the Urdu language, where two or more words combine to form a new word with its unique meaning. The study had been exploring the phonological rules that govern the formation of compound words in Urdu. Moreover, it discusses how diacritics affect pronunciation and provides evidence-based examples from different languages. Certain generative rules of auditory phonetics are justified with the help of the data that is collected from the speech of native Urdu speakers. The dominant role of diacritics has also been identified as the major reason for the multiple pronunciations among the speech of native Urdu speakers. This qualitative study has been dealing with the variation that comes within a single word or within the compound word due to any phonological action that takes place during the production of the speech of native Urdu speakers. The study focuses on the speaker's education, age and gender to find out the variation that occurs in any of the speech patterns. The rules of auditory generator phonology are identified by keenly observing the variations occurring in the speech of the speakers.

References

Abandah, G. A., Suyyagh, A. E., & Abdel-Majeed, M. R. (2022, April 15). Transfer learning and multi-phase training for accurate diacritization of Arabic poetry. Journal of King Saud University –Computer and Information Sciences, 4.

Akram, Q. u., Hussain, S., Niazi, A., Anjum, U., & Irfan, F. (2014). Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique. IEEE, 5.

Alghamdi, M., Muzaffar, Z., & Alhakami, H. (2010, December). Automatic restoration of Arabic diacritics: A simple, purely statistical approach. Arabian Journal for Science and Engineering, 11.

Ali, A. R. (2009). Automatic Urdu Diacritization. National University of Computer and Emerging Sciences , 52.

Ali, A. R., & Hussain, S. (2010, January). Automatic Diacritization for Urdu. Research Gate, 8.

Ali, W., Malik, M. K., Hussain, S., Siddiq, S., & Ali, A. (2010). Urdu Noun Phrase Chunking. International Conference on Educational and Information Technology, 4.

Allen, M. R. (1978). Morphological Investigations. Connecticut: University of Connecticut, 1978.

Arcodia, G. F. (2007). Chinese A Language of Compound Words. Giorgio Francesco Arcodia, 13.

Barreiro, P. L., & Albandoz, J. P. (2001). Population and Sample. Sampling Techniques. MaMaEuSch, 19.

Chetail, F., & Boursain, E. (2018). Shared or separated representations for letters with diacritics. Psychonomic Bulletin & Review, 6.

Downing, P. (1977). On the Creation and Use of English Compound Nouns. JSTOR.

Duff, P. A. (2012). Research Approaches in Applied Linguistics. Research Approaches in Applied Linguistics, 18.

Fakih, A.-H. A., & Al-Shwafi, N. A. (2015, September). Compounding as a Near Universal Phenomenon with Special Reference to Standard Arabic Nominal Compounding. Arab World English Journal (AWEJ), 6(3), 17.

Farooq, M., & Mahmood, A. (2021). Ellipsis in Urdu Content Words. Ilkogretim Online - Elementary Education Online, 20(4), 8.

Farooq, M., & Mahmood, M. A. (2020, December 30). Epenthesis in Urdu. Harf-o-Sukhan, 4(4), 10.

Farooq, M., & Mumtaz, B. (2016). Urdu Phonological Rules in Connected Speech. CLT16 - 6th Conference on Language and Technology, Lahore, 2016. Center for Language Engineering, University of Engineering and Technology, Lahore.

Goldrick, M. (2011). Linking Speech Errors and Generative Phonological. Language and Linguistics Compass, 16.

Han, E. (2003). Anti-faithfulness In Compounds. Language Research, 21.

Hieronymus, J. L., Laboratories, B., Technologies, L., & Hill, M. (1994). ASCII Phonetic Symbols for the World`s Languages: Worldbet. Worldbet, 48.

Hurskainen, A. (2020). Compounding in English to Swahili Machine Translation. Technical Reports in Language Technology , 19.

Hussain, S., Ali, S., & Akram, Q. u. (2015). Nastalique Segmentation-based Approach for Urdu OCR. Springer: CrossMark, 18.

Ijaz, M., & Hussain, S. (2007). Corpus Based Urdu Lexicon Development. Centre for Research in Urdu Language Processing National University of Computer and Emerging Sciences.

Javed, S. T., & Hussain, S. (2013). Segmentation Based Urdu Nastalique OCR. Springer-Verlag Berlin Heidelberg, 9.

Javed, S. T., Hussain, S., Maqbool, A., Asloob, S., Jamil, S., & Moin, H. (2010). Segmentation Free Nastalique Urdu OCR. World Academy of Science, Engineering and Technology.

Kain, A., Roten, A., & Gale, R. (2020). Diacritic-Level Pronunciation Analysis Using Phonological Features. Oregon Health & Science University, Portland, OR, USA.

Khan, S. A., Anwar, W., & Bajwa, U. I. (2011, November 8). Challenges in Developing a Rule based Urdu Stemmer. Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP), 6.

Khan, S. A., Anwar, W., Bajwa, U. I., & Wang, X. (2012, December). A Light Weight Stemmer for Urdu Language: A Scarce Resourced Language. Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing (SANLP), 10.

Khan, S. N., Khan, K., Khan, W., Khan, A., & Subhan, F. (2018). Urdu Word Segmentation using Machine Learning Approaches. (IJACSA) International Journal of Advanced Computer Science and Applications, 9(6), 8.

Khan, T. A. (2021). Phonemic Variations in Similar Words of Turkish and Urdu Language. Journal of Language and Linguistic Studies, 18.

Khattak, A., Asghar, M. Z., Saeed, A., Hameed, I. A., Hassan, S. A., & Ahmad, S. (2020, May 15). A survey on sentiment analysis in Urdu: A resource-poor language. Egyptian Informatics Journal, 22.

Kinoshita, S., Amos, A., & Norris, D. (2023). Diacritic Priming in Novice Readers of Diacritics. Journal of Experimental Psychology:Human Perception and Performance, 49(3), 14.

Krishnasamy, H.N.; Guechi, R.; Lary, A.; Haque, R.; Senathirajah, A.R.B.S.; Kumar, N.A.; Qazi, S.Z., 2025. A Way Forward to Successful Infusion of Culture in EFL Classrooms: A Teacher-Centered Approach, Scientific Culture. https://doi.org/10.5281/zenodo.17379369 Ladefoged, P., & Johnson, K. (2015). A Course in Phonetics. California: Cengage Leraning.

Lutf, M., You, X., Cheung, Y.-m., & Chen, C. P. (2013, August 3). Arabic font recognition based on diacritics features. Science Direct, 13.

Luu, T. A., & Yamamoto, K. (2012). A Pointwise Approach for Vietnamese Diacritics Restoration. IEEE, 4.

McIntyre, J. (2006, October 10). Hausa Verbal Compounds. 376.

Mirdehghan, M. (2014, August 10). Persian, Urdu, and Pashto: A comparative orthographic analysis. Writing Systems Research, 14.

Náplava, J., Straka, M., Straňák, P., & Hajič, J. (2017). Diacritics Restoration Using Neural Networks. Institute of Formal and Applied Linguistics.

Omachonu, G. S., & Onogu, W. S. (2012, September). Determining Compoundhood in Ígálà From Universal to Language Specific Focus. Journal of Universal Language.

Protopapas, A. (2006). On the use and usefulness of stress diacritics in reading Greek. Reading and Writing (2006) 19:171–198, Springer.

Rakholia, R., & Saini, D. l. (2015). The Design and Implementation of Diacritic Extraction Technique for Gujarati Written Script Using Unicode Transformation Format. IEEE, 6.

Raman, K.; W.W.; Hashim, H., 2026. Virtual reality for verbal communication development in English as second language learning: Advantages and optimisation strategies Computers and Education: X Reality https://doi.org/10.1016/j.cexr.2026.100145Reoper, T., & Siegel, M. E. (1978). A Lexical Transformation for Verbal Compounds. MIT, 9(2).

Satti, D. A., & Saleem, D. K. (2012). Complexities and Implementation Challenges in Offline Urdu Nastaliq OCR. Proceedings of the Conference on Language & Technology.

Shakeel, K., Tahir, G. R., Tehseen, I., & Ali, M. (2018). A Framework of Urdu Topic Modeling Using Latent Dirichlet allocation (LDA). IEEE.

Stankeviˇcius, L., Lukoševiˇcius, M., Kapoˇci¯ut˙e-Dzikien˙e, J., Briedien˙e, M., & Krilaviˇcius, T. (2022, March 3). Correcting Diacritics and Typos with a ByT5 Transformer Model. MDPI Applied Sciences, 33.

Tsakalidis, S., Prasad, R., & Natarajan, P. (2009). Context-Dependent Pronunciation Modeling for IRAQI ASR. BBN Technologies.

Tufiş, D., & Ceauşu, A. (2008, January). DIAC+: A Professional Diacritics Recovering System. Institute for Artificial Intelligence, Romanian Academy.

Udoh, I. (2018, February). Compounding in Leggbó. ReasearchGate.

Virk, S. M., Humayoun, M., & Ranta, A. (2010, August 21-22). An Open Source Urdu Resource Grammar. Asian Federation for Natural Language Processing.

Weiss, Y., Katzir, T., & Bitan, T. (2015, July 15). Many ways to read your vowels—Neural processing of diacritics and vowel letters in Hebrew. NeuroImage.

Zitouni, I., Sorensen, J. S., & Sarikaya, R. (2006, July). Maximum Entropy Based Restoration of Arabic Diacritics. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL.

PHONOLOGICAL RULES IN URDU COMPOUND WORDS

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Scopus

Scimago

CiteScore

Visitors

Language