RESEARCH

INTERNAL DEVELOPMENT PROJECTS

The KB engages conti­nous­ly in projects to improve the inter­nal infra­struc­ture. Reports on finalised projects are listed in our archive.

Compar­a­tive Analy­sis and Curation of German Metada­ta in Open Biblio­met­ric Data (OPENBIB)

Term: May 2023Decem­ber 2025

The goal of the project is to estab­lish an open biblio­met­rics database within the German Kompe­ten­znet­zw­erk Bibliome­trie. This will open up the possi­bil­i­ty for the fields of higher educa­tion research and science studies to use innov­a­tive and open data sources as an alter­na­tive to propri­etary biblio­met­rics databas­es. At the same time, the database promis­es an enhanced analy­sis poten­tial with regard to publi­ca­tion venues and modes that are not covered in the propri­etary data.

Specif­i­cal­ly, an open biblio­met­rics database based on OpenAlex is to be devel­oped by the KB partners SUB Göttin­gen, Univer­sität Biele­feld, FZ Jülich, GESIS and DZHW in collab­o­ra­tion with the KB hosting partner FIZ Karlsruhe and partic­i­pa­tion of further KB partners. The joint endeav­our is pursu­ing four subse­quent sub-goals:

  1. Database provi­sion: Provi­sion of a free and machine-readable devel­op­er instance of the biblio­met­ric database OpenAlex as a basis for curat­ing German publi­ca­tion data using an open licence.
  2. Database compar­i­son: Compar­a­tive analy­sis of the cover­age and quali­ty of the open biblio­met­ric database OpenAlex compared to the propri­etary databases.
  3. Data curation: Devel­op­ment and appli­ca­tion of techni­cal proce­dures for curat­ing the metada­ta of publi­ca­tions produced with the partic­i­pa­tion of authors from German research institutions.
  4. Network­ing and usage: Identi­fi­ca­tion of nation­al and inter­na­tion­al re-use opportunities.

Contact person: Najko Jahn (SUB Göttingen)

You can find more infor­ma­tion on the project blog.

Data infra­struc­ture

The KB operates a quali­ty-assured data infra­struc­ture hosted by FIZ Karlsruhe. At the center of the data infra­struc­ture are the bibli­o­graph­ic databas­es Scopus (Elsevi­er) and the core collec­tions of the Web of Science (WoS, Clari­vate Analyt­ics). The OpenAlex databas­es will be integrat­ed into the infra­struc­ture on an equal footing with the other two databas­es in the course of 2025.

The databas­es are checked using a series of automat­ic and semi-automat­ic proce­dures during the loading process­es. Any errors during loading and mapping are correct­ed and data irreg­u­lar­i­ties are report­ed to Elsevi­er and Clari­vate. Some standard­iza­tions are made, especial­ly in the case of identi­fiers and country infor­ma­tion. Each database version is accom­pa­nied by an inter­nal­ly published quali­ty assur­ance report and once a year aggre­gat­ed data and indica­tors are compared with the previ­ous year’s status in publicly avail­able yearly reports.

The schemas of the biblio­met­ric databas­es are designed and optimized for use in biblio­met­ric analy­ses; they also contain data enrich­ments and pre-calcu­lat­ed indicators.

A partic­u­lar added value of the data infra­struc­ture operat­ed by the Compe­tence Network Biblio­met­rics is the imple­ment­ed insti­tu­tion coding, which unifies the varying spellings contained in the address fields of the raw data supplied. The insti­tu­tion coding routine access­es address infor­ma­tion in the WoS, Scopus and OAL raw data and provides a clear assign­ment of publi­ca­tions to research insti­tu­tions, where­by struc­tur­al changes in the insti­tu­tion­al landscape over time are repre­sent­ed by means of two alter­na­tive mappings. The insti­tu­tion coding is processed for all publi­ca­tions from Germany, so that biblio­met­ric evalu­a­tions of German research insti­tu­tions are support­ed by a database of improved valid­i­ty. The insti­tu­tion­al coding is devel­oped and operat­ed by I²SoS, Biele­feld Univer­si­ty, in cooper­a­tion with FIZ Karlsruhe.

Steps towards an open, repro­ducible infrastructure

To support the repro­ducibil­i­ty of biblio­met­ric analy­ses, the quarter­ly updat­ed, quali­ty-checked biblio­met­ric databas­es are fixed and frozen at a defined point in time. The old versions of the databas­es are archived. Also to support the repro­ducibil­i­ty and trans­paren­cy of the data infra­struc­ture, an article describ­ing concep­tu­al consid­er­a­tions regard­ing the techni­cal infra­struc­ture and the infra­struc­ture itself, the database schema and loading process­es as well as process­es for data curation and quali­ty assur­ance was devel­oped in 2024 and published as a preprint on Zenodo.

The DDL script for creat­ing the tables is also avail­able on Zenodo.

Further details are provid­ed in the respec­tive reports.

A publi­ca­tion of select­ed, curat­ed data segments from the OPENBIB project is avail­able via Github and also via Zenodo.

Appli­ca­tion

The Compe­tence Network Biblio­met­rics is funded by the BMBF to provide this data infra­struc­ture; howev­er, research projects are gener­al­ly not funded within the frame­work of the KB. The partner insti­tu­tions of the KB use their basic funding or other third-party funding to conduct research on the basis of the data provid­ed by the KB. In recent years, numer­ous publi­ca­tions and presen­ta­tions have appeared that deal with method­olog­i­cal questions of biblio­met­rics or use biblio­met­ric data, for example, for questions relat­ing to sociol­o­gy of science or the econom­ics of innovation.

PUBLICATIONS AND TALKS

These publi­ca­tions and talks were made possi­ble by using the infra­struc­ture of the KB: