Designing a Multi-Dimensional Space for Hybrid Information Extraction
Sprache des Vortragstitels:
Englisch
Original Tagungtitel:
Text Information Retrieval 2012 Workshop
Sprache des Tagungstitel:
Englisch
Original Kurzfassung:
Information extraction systems are developed for various specific application domains to manage an increasing amount of unstructured data. The majority build either upon the knowledge-based approach, which promises high accuracy but involves labour-intensive coding of extraction rules, or upon the automatically trainable systems approach, which produces highly portable solutions but requires an appropriate learning set. In this paper, we present results of a project that aims to provide a new methodology which combines the knowledge-based and the machine learning approach into a hybrid one in order to compensate for their respective shortcomings and to achieve high IE performance. Firstly, we propose the idea of a multi-dimensional space that guides users in selecting appropriate methods, i.e., different hybrid concepts, depending on the extraction task and the level of available features. Secondly, we provide the concept of one hybrid approach, namely the sequential processing of a knowledge-based approach and a selection of different machine learning methods. Thirdly, we present the evaluation of an implementation of the sequential extraction on a curriculum vitae corpus. Thus, we provide first results for filling the multi-dimensional space for hybrid information extraction.