RESEARCH

INTERNAL DEVELOPMENT PROJECTS

The KB engages conti­nous­ly in projects to improve the inter­nal infra­struc­ture. Reports on finalised projects are listed in our archive.

Compar­a­tive Analy­sis and Curation of German Metada­ta in Open Biblio­met­ric Data (OPENBIB)

Term: May 2023Decem­ber 2025

The goal of the project is to estab­lish an open biblio­met­rics database within the German Kompe­ten­znet­zw­erk Bibliome­trie. This will open up the possi­bil­i­ty for the fields of higher educa­tion research and science studies to use innov­a­tive and open data sources as an alter­na­tive to propri­etary biblio­met­rics databas­es. At the same time, the database promis­es an enhanced analy­sis poten­tial with regard to publi­ca­tion venues and modes that are not covered in the propri­etary data.

Specif­i­cal­ly, an open biblio­met­rics database based on OpenAlex is to be devel­oped by the KB partners SUB Göttin­gen, Univer­sität Biele­feld, FZ Jülich, GESIS and DZHW in collab­o­ra­tion with the KB hosting partner FIZ Karlsruhe and partic­i­pa­tion of further KB partners. The joint endeav­our is pursu­ing four subse­quent sub-goals:

  1. Database provi­sion: Provi­sion of a free and machine-readable devel­op­er instance of the biblio­met­ric database OpenAlex as a basis for curat­ing German publi­ca­tion data using an open licence.
  2. Database compar­i­son: Compar­a­tive analy­sis of the cover­age and quali­ty of the open biblio­met­ric database OpenAlex compared to the propri­etary databases.
  3. Data curation: Devel­op­ment and appli­ca­tion of techni­cal proce­dures for curat­ing the metada­ta of publi­ca­tions produced with the partic­i­pa­tion of authors from German research institutions.
  4. Network­ing and usage: Identi­fi­ca­tion of nation­al and inter­na­tion­al re-use opportunities.

Contact person: Najko Jahn (SUB Göttingen)

You can find more infor­ma­tion on the project blog.

Data infra­struc­ture

The KB operates a quali­ty-assured data infra­struc­ture hosted by FIZ Karlsruhe and derived from the contents of the Scopus (Elsevi­er) and Web of Science (Clari­vate Analyt­ics) databas­es. The OpenAlex databas­es will be integrat­ed into the infra­struc­ture in the same manner as the other two databas­es in the course of 2025.

The databas­es are checked using a series of automat­ic and semi-automat­ic proce­dures during the loading process­es. Any errors during loading and mapping are correct­ed and data irreg­u­lar­i­ties are report­ed to Elsevi­er and Clari­vate. Unifi­ca­tions and standard­iza­tions, e.g. of journal names and country infor­ma­tion, are carried out. Each database version is accom­pa­nied by an inter­nal­ly published quali­ty assur­ance report and once a year aggre­gat­ed data and indica­tors are compared with the previ­ous year’s status in publicly avail­able yearly reports.

The schemas of the databas­es are designed and optimized for biblio­met­ric appli­ca­tions. In addition to the raw data the databas­es contain enhanced data and pre-comput­ed indicators.

One partic­u­lar improve­ment is the insti­tu­tion­al address disam­bigua­tion of German insti­tu­tions, that is, the clean­ing and unifi­ca­tion of address data. This sub-project is run by I²SOS at Biele­feld University.

To ensure repro­ducibil­i­ty of biblio­met­ric analy­ses a database incor­po­rat­ing the most recent data is gener­at­ed four times a year and old versions are archived.

An article that presents concep­tu­al consid­er­a­tions on the techni­cal infra­struc­ture and describes it, documents the database schema, as well as the loading process­es and proce­dures for data curation and quali­ty assur­ance, was devel­oped in 2024 and published as a preprint on Zenodo.

The DDL script for creat­ing the tables is also avail­able on Zenodo.

Further details are provid­ed in the respec­tive reports.

PUBLICATIONS AND TALKS

These publi­ca­tions and talks were made possi­ble by using the infra­struc­ture of the KB:

NETWORK PARTNERS

The KB is a cross-insti­tu­tion­al network in which the partners cooper­ate to contribute to the further devel­op­ment of biblio­met­rics and its applic­a­bil­i­ty on the basis of a shared data infrastructure.