ARTIFICIAL INTELLIGENCE IN LINGUISTICS: MODELING UNIVERSAL PHONOLOGICAL SYSTEMS FOR SUSTAINABLE COMMUNICATION
DOI:
https://doi.org/10.18623/rvd.v23.5126Palavras-chave:
Linguistic Typology, Interlinguistics, Universal Phonology, Orthography, Multilingual Fairness, NLPResumo
Objective: this study examines how artificial intelligence (AI) can be combined with empirical linguistic data to develop models of universal phonological and orthographic systems. The broader aim is to contribute to more sustainable and inclusive tools for cross-linguistic communication. The work focuses on a central challenge in contemporary linguistics: the lack of reproducible AI-driven methods that link computational modeling with theoretical analysis and that ensure fair and accessible use of digital language technologies. Method: A mixed-method framework has been adopted, in which corpus-driven linguistic analysis has been integrated with neural-network modeling. The empirical data have been drawn from two open-access resources: PHOIBLE (Phonetics Information Base and Lexicon) and the r12a database (r12a.github.io). After standardization and tokenization, the datasets have been processed using Python-based AI modules to extract frequency distributions, identify clusters and detect structural patterns. The analytical workflow has followed a clear, reproducible sequence of steps informed by PRISMA principles, ensuring transparency and methodological rigor. Originality/Relevance: the paper brings together corpus linguistics, interlinguistics and artificial intelligence to propose a data-driven approach for identifying shared phonological and orthographic patterns across languages. By combining extensive linguistic datasets with computational techniques, the study demonstrates the potential of AI to support the creation of sustainable knowledge infrastructures and to promote more inclusive forms of digital communication — domains that are becoming central to innovation and strategic growth in the humanities. Main conclusions: the analysis revealed a relatively small set of phonemes and grapheme correspondences that recur across a wide range of the world’s languages. These results offer empirical support for developing streamlined, accessible alphabetic systems and for designing universal auxiliary language models. The study further shows that AI-supported modeling can improve linguistic inclusivity and analytical precision, especially in low-resource and multilingual settings, while still relying on the interpretive judgement of human specialists. Theoretical/methodological contributions: the research contributes to interlinguistics by bringing together the concept of language universals and contemporary AI techniques. It outlines a reproducible pathway for connecting empirical linguistic data with computational tools and theoretical interpretation. In doing so, the study supports the sustainable development of language technologies and enriches our understanding of how human expertise and artificial intelligence can work together to strengthen global communication. Practical implications: identifying a universal phoneme core and stable sound-script correspondences can streamline multilingual analytical workflows, lessen structural biases toward non-Latin scripts and lower the overall costs of integrating low-resource languages into sustainable and reproducible knowledge systems.
Referências
Alaqlobi, O., Alduais, A., Qasem, F., & Alasmari, M. (2024). Artificial intelligence in applied linguistics: A content analysis and future prospects. Cogent Arts & Humanities, 11(1), 2382422. https://doi.org/10.1080/23311983.2024.2382422
Anderson, C., Tresoldi, T., Greenhill, S. J., Forkel, R., Gray, R., & List, J.-M. (2023). Variation in phoneme inventories: Quantifying the problem and improving comparability. Journal of Language Evolution, 8(2), 149-168. https://doi.org/10.1093/jole/lzad011
Baudouin de Courtenay, I. A. (1963). Vspomogatel’nyi mezhdunarodnyi yazyk [International auxiliary language]. In I. A. Baudouin de Courtenay, Izbrannye trudy po obshchemu yazykoznaniyu v 2 tomakh [Selected works on general linguistics in 2 volumes] (Vol. 2, pp. 144-160). Moscow: Izd-vo AN SSSR. (In Russian)
Cheng, S., Zhu, P., Liu, J., & Wang, Z. (2024). A survey of grapheme-to-phoneme conversion methods. Applied Sciences, 14(24), 11790. https://doi.org/10.3390/app142411790
Doucette, A., O'Donnell, T. J., Sonderegger, M., & Goad, H. (2024). Investigating the universality of consonant and vowel co-occurrence restrictions. Glossa: A Journal of General Linguistics, 9(1), 1-39. https://doi.org/10.16995/glossa.9373
Groenewald, E. S., Pallavi, P., Rani, S., Singla, P., Howard, E. M., & Groenewald, C. A. (2024). Artificial intelligence in linguistics research: Applications in language acquisition and analysis. Naturalista Campano, 28(1), 1253-1262. https://www.researchgate.net/publication/379239839
Hair, J. F., & Sabol, M. (2024). Leveraging artificial intelligence (AI) in competitive intelligence (CI) research. Journal of Sustainable Competitive Intelligence, 15(00), e0469. https://doi.org/10.24883/eagleSustainable.v15i.469
Ishida, R. (Ed.). (n.d.). r12a Scripts & Writing Systems App. World Wide Web Consortium (W3C). Available at: https://r12a.github.io/scripts/switch.html
Jespersen, O. (1928). An international language. London: Allen and Unwin.
Lammers, S., & Lasch, A. (2023). Linguistic framing of artificial intelligence: What language to use when talking about artificial intelligence. Chemie Ingenieur Technik, 95(7), 1012-1017. https://doi.org/10.1002/cite.202200226
Martinet, A. (1967). Les langues dans le monde de demain. Paris: Presses Universitaires de France.
Meillet, A. (1918). Les langues dans l’Europe nouvelle. Paris: Payot.
Micallef, L. O. (2025). Lingvistika neyrosetey kak paradigma sovremennoy nauki o yazyke [Neural network linguistics as a paradigm of modern language science]. World of Science, Culture, and Education, 1(110), 467-473. https://doi.org/10.24412/1991-5497-2025-1110-467-469
Micallef, L. O., & Yasnenko, I. P. (2024). Principles of international auxiliary languages creation on the base of essential and artificial languages. Macrosociolinguistics and Minority Languages, 2(1), 50-65. https://doi.org/10.22363/2949-5997-2024-2-1-50-65
Moran, S., & McCloy, D. (Eds.). (2019). PHOIBLE: Phonetics Information Base and Lexicon. Jena: Max Planck Institute for the Science of Human History. Available at: https://phoible.org
Pussaignolli de Paula, M., Noronha, M., Garcia Valente, U., Inacio Domingues, B. R., & Jahn Souza, L. (2024). Mapping of artificial intelligence and robotics technologies applied to offshore wind Energy. Journal of Sustainable Competitive Intelligence, 15(00), e0474. https://doi.org/10.24883/eagleSustainable.v15i.474
Saussure, R. de. (1918). La structure logique des mots dans les langues naturelles, considérée au point de vue de son application aux langues artificielles. Berne: Büchler.
Wu, S., Ponti, E. M., & Cotterell, R. (2021). Differentiable generative phonology [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2102.05717
Yang, B. (2025). Frequency distributions and phoneme associations in PHOIBLE. Proceedings of Speech Sciences, 17(3), 23-37.
Downloads
Publicado
Como Citar
Edição
Seção
Licença
Submeto (emos) o presente trabalho, texto original e inédito, de minha (nossa) autoria, à avaliação de Veredas do Direito - Revista de Direito, e concordo (amos) que os direitos autorais a ele referentes se tornem propriedade exclusiva da Revista Veredas, sendo vedada qualquer reprodução total ou parcial, em qualquer outra parte ou outro meio de divulgação impresso ou eletrônico, dissociado de Veredas do Direito, sem que a necessária e prévia autorização seja solicitada por escrito e obtida junto ao Editor-gerente. Declaro (amos) ainda que não existe conflito de interesse entre o tema abordado, o (s) autor (es) e empresas, instituições ou indivíduos.
Reconheço (Reconhecemos) ainda que Veredas está licenciada sob uma LICENÇA CREATIVE COMMONS:
Licença Creative Commons Attribution 3.0





