PN-III-P4-ID-PCE-2016-0826 – Summary

This year, after successfully launching the first lexical database and the drafting interface for Dicționarul limbii române informatizat (DLRi) [Digital Dictionary of the Romanian Language], the creation of a large-scale corpus linguistics is strongly necessary in order to refine meanings and to extract quotations needed for dictionary items. ROMTEXT, a digital corpus of Romanian texts, aims therefore at meeting the need to publish online an annotated, dated and evolving corpus for the Romanian language (yet non-existent piece in our culture), with multiple uses in various areas of the traditional and computational linguistics, and to support lexicographers engaged in DLRi drafting. ROMTEXT reunites the outcomes of the project CNR. Corpus de referință al limbii române pentru constituirea de dicționare academice [CNR. Reference Corpus of the Romanian Language for the Development of Academic Dictionaries], project funded by CNCSIS between 2007 and 2008 and led by Monica Mihaela Busuioc. Moreover, ROMTEXT will make available for the public and researchers, throughout the project cycle, at least 500 reference pieces of the Romanian literature. Therefore, we hope to reach a significant number of occurrences of the basic forms in order to do, for the first time in Romania, frequency analysis on big corpora. In addition, we will apply a dating system and classification system on the corpus thus allowing researchers, by using DLRi interface, to link a particular quotation to the contexts where it originates. ROMTEXT will not take into consideration lexicographic works that already integrate another complementary corpus: CLRE. Corpus lexicografic românesc esențial. 100 de dicționare din Bibliografia DLR aliniate la nivel de intrare și la nivel de sens [CLRE. Essential Corpus Lexicography of the Romanian Language. 100 dictionaries from DLR Bibliography aligned on entry and meaning], drawn up by „Al. Philippide” Institute of Romanian Philology from Iași.