RESEARCH
INTERNAL DEVELOPMENT PROJECTS
The KB engages continously in projects to improve the internal infrastructure. Reports on finalised projects are listed in our archive.
Comparative Analysis and Curation of German Metadata in Open Bibliometric Data (OPENBIB)
Term: May 2023 — December 2025
The goal of the project is to establish an open bibliometrics database within the German Kompetenznetzwerk Bibliometrie. This will open up the possibility for the fields of higher education research and science studies to use innovative and open data sources as an alternative to proprietary bibliometrics databases. At the same time, the database promises an enhanced analysis potential with regard to publication venues and modes that are not covered in the proprietary data.
Specifically, an open bibliometrics database based on OpenAlex is to be developed by the KB partners SUB Göttingen, Universität Bielefeld, FZ Jülich, GESIS and DZHW in collaboration with the KB hosting partner FIZ Karlsruhe and participation of further KB partners. The joint endeavour is pursuing four subsequent sub-goals:
- Database provision: Provision of a free and machine-readable developer instance of the bibliometric database OpenAlex as a basis for curating German publication data using an open licence.
- Database comparison: Comparative analysis of the coverage and quality of the open bibliometric database OpenAlex compared to the proprietary databases.
- Data curation: Development and application of technical procedures for curating the metadata of publications produced with the participation of authors from German research institutions.
- Networking and usage: Identification of national and international re-use opportunities.
Contact person: Najko Jahn (SUB Göttingen)
You can find more information on the project blog.
Data infrastructure
The KB operates a quality-assured data infrastructure hosted by FIZ Karlsruhe. At the center of the data infrastructure are the bibliographic databases Scopus (Elsevier) and the core collections of the Web of Science (WoS, Clarivate Analytics). The OpenAlex databases will be integrated into the infrastructure on an equal footing with the other two databases in the course of 2025.
The databases are checked using a series of automatic and semi-automatic procedures during the loading processes. Any errors during loading and mapping are corrected and data irregularities are reported to Elsevier and Clarivate. Some standardizations are made, especially in the case of identifiers and country information. Each database version is accompanied by an internally published quality assurance report and once a year aggregated data and indicators are compared with the previous year’s status in publicly available yearly reports.
The schemas of the bibliometric databases are designed and optimized for use in bibliometric analyses; they also contain data enrichments and pre-calculated indicators.
A particular added value of the data infrastructure operated by the Competence Network Bibliometrics is the implemented institution coding, which unifies the varying spellings contained in the address fields of the raw data supplied. The institution coding routine accesses address information in the WoS, Scopus and OAL raw data and provides a clear assignment of publications to research institutions, whereby structural changes in the institutional landscape over time are represented by means of two alternative mappings. The institution coding is processed for all publications from Germany, so that bibliometric evaluations of German research institutions are supported by a database of improved validity. The institutional coding is developed and operated by I²SoS, Bielefeld University, in cooperation with FIZ Karlsruhe.
Steps towards an open, reproducible infrastructure
To support the reproducibility of bibliometric analyses, the quarterly updated, quality-checked bibliometric databases are fixed and frozen at a defined point in time. The old versions of the databases are archived. Also to support the reproducibility and transparency of the data infrastructure, an article describing conceptual considerations regarding the technical infrastructure and the infrastructure itself, the database schema and loading processes as well as processes for data curation and quality assurance was developed in 2024 and published as a preprint on Zenodo.
The DDL script for creating the tables is also available on Zenodo.
Further details are provided in the respective reports.
A publication of selected, curated data segments from the OPENBIB project is available via Github and also via Zenodo.
Application
The Competence Network Bibliometrics is funded by the BMBF to provide this data infrastructure; however, research projects are generally not funded within the framework of the KB. The partner institutions of the KB use their basic funding or other third-party funding to conduct research on the basis of the data provided by the KB. In recent years, numerous publications and presentations have appeared that deal with methodological questions of bibliometrics or use bibliometric data, for example, for questions relating to sociology of science or the economics of innovation.
PUBLICATIONS AND TALKS
These publications and talks were made possible by using the infrastructure of the KB:
Culbert, J. H., Hobert, A., Jahn, N., Haupka, N., Schmidt, M., Donner, P., & Mayr, P. (2025).
Reference coverage analysis of OpenAlex compared to Web of Science and Scopus.
Scientometrics. 130, 2475–2492 https://doi.org/10.1007/s11192-025–05293‑3
Frietsch, R., Gruber, S., Bornmann, L. (2025):
Scientometrics, 130 (881–907). https://doi.org/10.1007/s11192-024–05158‑1
Lovakov, A., & Teixeira Da Silva, J. A. (2025).
Scientometrics, 130(3), 1813–1829. https://doi.org/10.1007/s11192-025–05269‑3
Mutz, R., Bornmann, L., & Haunschild, R. (2025).
Scientometrics, 130(3), 1519–1546. https://doi.org/10.1007/s11192-025–05254‑w
Stephen, D. (2025).
Journal of Informetrics, 19(2), 101640. https://doi.org/10.1016/j.joi.2025.101640