Home PROIECTE PN-III-P4-ID-PCE-2016-0826
PDF Imprimare Email


Scientific Report 2017

After obtaining the last approval for our research project, between July 12lh 2017 and December 31s1 2017, we carried out a series of administrative operations related directly to the start of the activity and to the project management, but also related to the preparation oi the actual project, starting from 2018.

-      Extending the work team with three additional researchers, according to the number of members provided in the project, in addition to the two already involved in the project (Monica Busuioc, PhD. project director, and Nicoleta Mihai, PhD, team member).

Thus, the legal procedures for organising the employment competition for the three job openings were complied with (placing the advertisement on the newspaper and the two websites in the competition information package; choosing the bibliography and the topic of the competition; setting the competition date, the competition board and the appeal board).

The following three persons participated in the competition on August 28th 2017, to occupy the three vacancies: Carolina Popupoi, PhD, Anghelina Alcxandru Dan, PhD and Diana Carburean. doctoral candidate.

The competition consisted of a written paper on a topic related to the research project profile: �Role of electronic corpora in the elaboration of lexicographic works. Romanian Language Dictionary�.

The three candidates obtained the maximum mark and were employed starting with September 1st 2017.

Anghelina Alexandru Dan. PhD was appointed secretary of the work team to handle financial, administrative and scientific matters.

-      Choosing the first set of literary reference works, in accordance with the Romanian Language Dictionary bibliography, amounting to 100, to be part of the corpus.

The works were selected to depict the evolution of the Romanian language under its various functional aspects, from the oldest Romanian written texts to the contemporary period.

The level of representativeness of the various functional styles in the elaboration of a reference corpus of the Romanian language was also discussed.

Special emphasis was placed on literary works of the period after 1990, when an unprecedented enrichment of the vocabulary was experienced as a result of the influence especially from the English language. This group of recent texts is necessary to complete the list of words in the previous vocabularies.

There were also talks about the introduction in the corpus of a representative number of journalistic texts and of a set of school textbooks to identify terminologies, etc.

-      Preparing the selected works to start the digitisation activity, using, under the circumstances of the current limitation, the work equipment and tools (before purchasing all that is required for the project). These activities consist of:

-                     scanning the books and publications in PDF format

-                     eliminating scanning faults (depending on the quality and age of the original)

-                     performing OCR with the existing software

-                     analysing the results of OCR

-                     choosing the most efficient OCR software

-                    finding solutions to adjust it to Romanian

           -       Testing and improving an assisted proofreading programme, adapted to Romanian, for PDFs that have run through OCR:

-                     comparing the text run through OCR to the scanned image

-                     calculating the reliability of OCR jobs

-                     automatically correcting systematic anomalies resulting from OCR software�s failure to adapt to particularities of Romanian

-                     creating a list of linguistic and symbolic forms missing from the proofreader, including foreign words and phrases

-                     creating a list of proper nouns missing from the proofreader

-                     finding solutions to automatically correct systematic errors

-                     preparing solutions for proofreader feedback

-                     analysing anomalies in terms of spelling or punctuation to be automatically reported or corrected

-                     implementing additional orthographic and orthoepic rules derived from the analysis of texts run through OCR

-                     handling special characters

-                     formatting texts run through OCR

-                     separating footnotes

-                     separating text blocks to correctly delineate paragraphs

-                     analysing quotes, mottos and any other alotext used by the author

-                     importing and exporting proofread texts

-       For the proper conduct of the project steps are being carried out to purchase the goods referred to in the Logistics section of the project.

-        Organising regular work sessions (MLW) in order for team members to improve and familiarise themselves with the work methodology and programmes necessary to digitise, proofread and associate metadata, and to estimate tasks for team members to work out the standards for 2018.

-         Making a decision regarding the employment of an expert-collaborator in the project team in 2018

-         Making a decision regarding the official launch of the project in May 2018.

Project Director, Dr. Monica Busuioc

December 11th 2017

Ultima actualizare în Joi, 14 Februarie 2019 11:12

Motorizat de Joomla!. Designed by: joomla 2.5 themes  Valid XHTML and CSS.