Abstract
One explanation for the poor performance of South African undergraduate students at university is related to the apparent gap between the exit level of secondary education and the entry level of tertiary education – the so-called articulation gap (Cooper and Van Dyk 2003). Cooper and Van Dyk conclude that academic vocabulary features as one of the most significant influences on students’ academic performance, as academic literacy cannot be developed without a solid understanding of academic vocabulary. The writers believe that academic vocabulary, subject-specific vocabulary and terminology can play a vital role in the success of students’ development of academic literacy. This view is also supported by Brooman-Jones, Cunningham, Hanna and Wilson (2011), who argue that academic literacy, particularly language competence, can be improved through subject-specific academic activities and assessment tasks.
Academic vocabulary, specifically subject-specific vocabulary, forms the foundation for deeper learning; in other words, deeper learning cannot occur if the student does not understand the meaning of certain terms and how they are used in specific contexts (America and Van der Merwe 2017). Academic vocabulary can be defined broadly as the vocabulary used in academic texts (Smith 2022). Subject-specific vocabulary refers to vocabulary that is specifically used within a particular subject, but it is not necessarily classified as a term. Terminology and academic terms are defined as words or expressions that have a precise meaning in some uses or are peculiar to a science, art, profession or subject (Merriam-Webster 2025). According to Alberts (2017) a term is a meaningful unit consisting of one word (a simple term) or consisting of many words (a complex term) that represents a single, specific concept within a subject area. Therefore, a term should have only one meaning: one concept, one term (Alberts 2017).
Corpora are utilised in this study. Therefore, it is necessary to define the term corpus before describing the relevant corpora. A corpus is a text section that maximally represents language or language variety (McEnery and Wilson 2001). Biber, Conrad and Reppen (1998) state that corpus linguistics is an empirical approach based on language analysis which involves the creation of a representative sample of the target language which is stored as an electronic database (or corpus) (Anthony 2017). The goal is to answer linguistic research questions through a comprehensive and systematic analysis of the distribution of linguistic phenomena within a linguistic corpus (McEnery, Xiao and Tono 2006). Nesselhauf (2005) explains that corpus linguistics includes studies in which texts are analysed in terms of grammar and vocabulary, and an investigation is carried out into how these factors relate to the relevant texts. In brief, corpus linguistics can be described as the analysis of language based on computer-based corpora to address questions regarding the nature of language (Nel and Olivier 2018).
To address questions regarding the nature of language, specifically subject-specific vocabulary, and to expand the vocabulary of undergraduate education students who specialise in Afrikaans Home Language, research into a variety of Afrikaans-related resources was initiated. The teaching curriculum for Afrikaans Home Language in South African public schools serves as the starting point when examining subject-related resources – whether about or in support of Afrikaans. The Curriculum and Assessment Policy Statement (CAPS) includes the policy documents of Afrikaans Home Language from grade R to grade 12. CAPS is the first sub-corpus of this investigation. The second subject-related resource for teaching students who specialise in Afrikaans Home Language, across all phases (Foundation phase, Intermediate phase, Senior phase and Further education and training phase) is the language textbook Afrikaansmetodiek deur ’n nuwe bril by Lawrence, Le Cordeur, Van der Merwe, Van der Vyver and Van Oort (2014). This textbook forms the second sub-corpus for this investigation.
A comparative corpus investigation was identified as the most appropriate research methodology to create the keyword lists by comparing the two sub-corpora, the specialised corpora, with the Language Commission Corpus (Taalkommissiekorpus, abbreviated as TK corpus) as the reference corpus. The TK corpus consists of a large collection of texts that represent written Afrikaans (Van Rooy and Kruger 2016). For this study the sub-corpus Nie-Fiksie_Boeke_Algemeen of the TK corpus was used as a reference corpus as it is a prototypical corpus which is considered as a representation of the standard of general Afrikaans common language, derived from various sources (Baker 2010 and Nesselhauf 2005). Since the two keyword lists were created from Afrikaans Home Language terms it was essential to identify Afrikaans sources relevant to education students who are subject-oriented as the specialised corpora. To distinguish between subject terms and everyday Afrikaans, the decision was made to base statistical findings on the sub-corpus of the TK corpus Nie-Fiksie_Boeke_Algemeen. Additionally, the Opvoedkundewoordeboek was used as a reference source to compile the two keyword lists.
Furthermore, quantitative research methods have also been employed in this study to determine the subject vocabulary associated with Afrikaans Home Language for education students. The quantitative analysis was conducted using the corpus program #LancsBox, which is a new-generation software package for the analysis of language data and corpora (Brezina, Weill-Tessier and McEnery 2021).
The pedagogical tools that have emerged from the investigation are the two core keyword lists, each consisting of 50 Afrikaans Home Language subject terms, specifically compiled for Afrikaans education students. The research suggests that these core keyword lists which focus on Afrikaans subject vocabulary can offer support to develop academic literacy as subject vocabulary and academic literacy are closely linked. Weideman (2018) asserts that students’ language ability is critical to their learning. Alberts (2003) further emphasises that terminology supports language development by enabling the creation and dissemination of knowledge, as terminology is the medium through which information and knowledge are transferred. This comparative corpus investigation contributes to ongoing relevant research focused on subject vocabulary as a key factor in strengthening complex and content-rich academic texts, and in identifying essential subject terms of Afrikaans.
Keywords: Afrikaans Home Language; comparative corpus study; corpus linguistics; key glossaries; subject-specific resources; subject vocabulary; undergraduate education students specialising in Afrikaans
- This article’s featured image was created by cottonbro studio and obtained from Pexels.

