Now showing 1 - 2 of 2
  • Publication
    Die Deduplizierung von bibliothekarischen Metadaten am Beispiel der Datenintegration eines Institutskatalogs in den Bibliotheksverbund IDS St.Gallen
    (TH Wildau, 2019)
    Duplicates are part of everyday life in libraries. Since duplicates cause major problems with retrieval and database efficiency, a lot of effort is put into avoiding them. The topic of this master thesis is the deduplication of library metadata. The aim is to develop and parameterize a dedicated deduplication procedure based on existing procedures within the framework of data integration. The initial situation is the integration of an institute catalogue into the library network IDS St. Gallen. The data analysis shows that the institute data are very heterogeneous and the data quality varies greatly. Wherever possible, the original data should therefore be replaced by better-quality metadata. First, a catalogue of criteria is elaborated for the procedure. Existing deduplication procedures are then examined and their suitability for the present situation tested. Based on this evaluation, a dedicated deduplication procedure is developed. The analysis of the data to be integrated, the schema mapping and the data cleansing play an important role in the successful deduplication of the institute's data. The adjustments made are shown and the differences in the results - compared to the unadjusted data - are presented. The technical implementation of the own deduplication procedure is documented, the special features and the parameterization of the procedure are explained. In the present case, the data is deduplicated by queries in a large data pools such as swissbib or GVI, while improving the data quality at the same time. The tests carried out and the results of this procedure are presented and com-mented on. The results on the effectiveness and efficiency of the procedure are satisfactory and can be implemented.