are not yet widely used to enhance retrieval processes in digital libraries, although they offer ... more are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this workshop we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of coauthorship network, can improve retrieval services for specific communities, as well as for large, cross-domain collections. This workshop aims to raise awareness of the missing link between information retrieval (IR) and bibliometrics/scientometrics and to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the digital library interface.
Die Tagung wurde u.a. von der Deutschen Forschungsgemeinschaft (DFG), dem Projektträger Fachinfor... more Die Tagung wurde u.a. von der Deutschen Forschungsgemeinschaft (DFG), dem Projektträger Fachinformation beim Forschungszentrum Informationstechnik GmbH (GMD-PTF), der Humboldt-Universität zu Berlin und der Senatsverwaltung für Wissenschaft, Forschung und Kultur des Landes Berlin unterstützt. dernen Industriegesellschaft gelten konnte und ganz selbstverständlich auch bei Effizienzund Wettbewerbsüberlegungen der Hochschulen zugrunde gelegt wurde, kommt man auch bei den gegenwärtigen gesellschaftlichen Rahmenbedingungen nicht vorbei.
Implementing FAIR Data Infrastructures (Dagstuhl Perspectives Workshop 18472)
This report documents the programme and the outcomes of Dagstuhl Perspectives Workshop 18472 &quo... more This report documents the programme and the outcomes of Dagstuhl Perspectives Workshop 18472 "Implementing FAIR Data Infrastructures". The workshop aimed at bringing together computer scientists with digital infrastructure experts from different domains to discuss open issues implementing and adopting the FAIR principles in research data infrastructures and to shape the role that the field of computer science has to play.
Die Idee der Zentralitat von Akteuren in sozialen Netzwerken ist eines der fruhesten Konzepte, we... more Die Idee der Zentralitat von Akteuren in sozialen Netzwerken ist eines der fruhesten Konzepte, welche die Netzwerkanalyse hervorgebracht hat. Sie geht im Wesentlichen auf die Pionierarbeit von Alex Bavelas zuruck, der sie (→ 1950) erstmals formal beschrieb. Seither wurde eine Vielzahl konkurrierender Konzepte von Zentralitat vorgeschlagen. Uber Dekaden fuhrte dies zu einer betrachtlichen Konfusion, denn die vorgeschlagenen Zentralitatsmase trugen zur Klarung des Konzepts selbst wenig bei, sondern reprasentierten vielmehr sehr unterschiedliche Interpretationen von Zentralitat.
In recent years. Linked Open Data (LOD) has matured and gained acceptance across various communit... more In recent years. Linked Open Data (LOD) has matured and gained acceptance across various communities and domains. Large potential of Linked Data technologies is seen for an application in scientific disciplines. In this article, we present use cases and applications for an application of Linked Data in the social sciences. They focus on (a) interlinking domain-specific information, and (b) linking social science data to external LOD sources (e.g. authority data) from other domains. However, several technical and research challenges arise, when applying Linked Data technologies to a scientific domain with its specific data, information needs and use cases. We discuss these challenges and show how they can be addressed.
KMIR 2014 - Knowledge Maps and Information Retrieval : Proceedings of the First Workshop on Knowledge Maps and Information Retrieval co-located with International Conference on Digital Libraries 2014 - ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014)
Proceedings of the annual conference of CAIS, Aug 17, 2016
This paper reports on a survey of 26 social scientists and computer scientists. Through the vigne... more This paper reports on a survey of 26 social scientists and computer scientists. Through the vignette technique, resource use in situations in which scholars have a lot and very little time were explored. Findings suggest academic discipline and time may play a role in resource use.
Purpose -The general science portal vascoda merges structured, high-quality information collectio... more Purpose -The general science portal vascoda merges structured, high-quality information collections from more than 40 providers on the basis of search engine technology (FAST) and a concept which treats semantic heterogeneity between different controlled vocabularies. First experiences with the portal show some weaknesses of this approach which come out in most metadata-driven Digital Libraries (DL) or subject specific portals. The purpose of the paper is to propose models to reduce the semantic complexity in heterogeneous DLs. The aim is to introduce value-added services (treatment of term vagueness and document re-ranking) that gain a certain quality in DLs if they are combined with heterogeneity components established in the project "Competence Center Modeling and Treatment of Semantic Heterogeneity". Design/methodology/approach -First, semantic heterogeneity components translate automatically between different indexing languages. This approach will have an impact on search in a scenario when the searcher uses controlled vocabularies which are cross-linked with cross-concordances. However, users usually formulate query terms freely without any vocabulary support. Empirical observations show that freely formulated user terms and terms from controlled vocabularies are often not the same or match just by coincidence. Therefore, a value-added service will be developed which rephrases the natural language searcher terms into suggestions from the controlled vocabulary, the Search Term Recommender (STR). Second, the result sets of transformed or expanded queries in distributed collections are often very large and tests show that the conventional web-based ranking methods are not appropriate for presenting heterogeneous metadata records as suitable result sets to the user. Therefore, two methods, which are derived from scientometrics and network analysis, will be implemented with the objective to re-rank result sets by the following structural properties: the ranking of the results by core journals (so-called Bradfordizing) and ranking by centrality of authors in co-authorship networks. Findings -The methods, which will be implemented, focus on the query and on the result side of a search and are designed to positively influence each other. Conceptually they will improve the search quality and guarantee that the most relevant documents in result sets will be ranked higher. Originality/value -The central impact of the paper focuses on the integration of three structural value-adding methods which aim at reducing the semantic complexity represented in distributed DLs at several stages in the information retrieval process: query construction, search and ranking, and re-ranking.
This workshop brings together experts of communities which often have been perceived as different... more This workshop brings together experts of communities which often have been perceived as different once: bibliometrics / scientometrics / informetrics on the one side and information retrieval on the other. Our motivation as organizers of the workshop started from the observation that main discourses in both fields are different, that communities are only partly overlapping and from the belief that a knowledge transfer would be profitable for both sides. Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. On the other side, more and more information professionals, working in libraries and archives are confronted with applying bibliometric techniques in their services. This way knowledge exchange becomes more urgent. The first workshop set the research agenda, by introducing in each other methods, reporting about current research problems and brainstorming about common interests. This follow-up workshop continues the overall communication, but also puts one problem into the focus. In particular, we will explore how statistical modelling of scholarship can improve retrieval services for specific communities, as well as for large, crossdomain collections like Mendeley or ResearchGate. This second BIR workshop continues to raise awareness of the missing link between Information Retrieval (IR) and bibliometrics and contributes to create a common ground for the incorporation of bibliometric-enhanced services into retrieval at the scholarly search engine interface.
Ein Informationssystem für die Sozialwissenschaften: das Projekt GESINE
Ingénierie Des Systèmes D'information, 1996
... und evaluiert werden. Um einen Vergleich zu ermöglichen wird in GESINE neben dem booleschen R... more ... und evaluiert werden. Um einen Vergleich zu ermöglichen wird in GESINE neben dem booleschen Recherche-modell die Indexierungs-und Retrievalsoftware freeWAIS-sf (cf. Pfeifer 1995a, 1995b) eingesetzt. Dieses System ...
Zenodo (CERN European Organization for Nuclear Research), Feb 13, 2023
In order to gain an overview of the current state of the discussion on PIDs and for the identific... more In order to gain an overview of the current state of the discussion on PIDs and for the identification of use cases for the initiation phase of a PID service within the NFDI basic services, the working group Persistent Identifier of the Section Common Infrastructures of the NFDI hosted an online workshop in January 2023. In the course of the workshop, members of nine different NFDI consortia presented the current application of PIDs in their consortia. Dr. Michael Selzer -Instrument database, metadata and related topic IAM 27.01.2023 5 The KIT instrument data base ("Gerätepool") -Inside Kadi4Mat Kadi4Mat is the Karlsruhe Data Infrastructure for Materials Science, a software for managing research data with the aim of combining new concepts with established technologies and existing solutions.
Zenodo (CERN European Organization for Nuclear Research), Aug 5, 2017
Research in Social Science is usually based on survey data where individual research questions re... more Research in Social Science is usually based on survey data where individual research questions relate to observable concepts (variables). However, due to a lack of standards for data citations a reliable identification of the variables used is often difficult. In this paper, we present a work-in-progress study that seeks to provide a solution to the variable detection task based on supervised machine learning algorithms, using a linguistic analysis pipeline to extract a rich feature set, including terminological concepts and similarity metric scores. Further, we present preliminary results on a small dataset that has been specifically designed for this task, yielding a significant increase in performance over the random baseline.
Zenodo (CERN European Organization for Nuclear Research), Dec 31, 2021
The Findability of research data is an important factor in enabling re-use and support the resear... more The Findability of research data is an important factor in enabling re-use and support the research data life cycle. In this report, we analyze the findability of data from KonsortSWD partners: we conducted a survey and interviews with the RDC, analyzed web traffic and query logs. From this analysis, we formulate a findability strategy and hands-on recommendations for partners on how to improve their findability, including recommendation on how to shape their metadata. With this contribution, we lay a foundation for on-going activities to sustainably improve the discoverability and visibility of our research data.
Zenodo (CERN European Organization for Nuclear Research), Jan 31, 2023
Persistente Identifikatoren (PIDs), die der Ebene der Attribute, z. B. einer Variable in sozialwi... more Persistente Identifikatoren (PIDs), die der Ebene der Attribute, z. B. einer Variable in sozialwissenschaftlichen Forschungsdaten, zugeordnet sind, erlauben es, Daten eindeutig zu zitieren und direkt abzurufen. Variablen können sich im Laufe der Zeit über Studienwellen hinweg ändern und PIDs fördern die Vernetzung über Erhebungswellen, Studien und andere Einheiten wie Fragen und Konzepte in Fragebögen. Angesichts der Bedeutung der Variablen und ihrer Verknüpfungen sollten die zugehörigen Metadaten solche Beziehungen dokumentieren und maschinell verwertbare Merkmale durch PIDs und kontrollierte Vokabulare umfassen. Die KonsortSWD-Maßnahme TA.5-M.1 Erweiterung der PID-Dienste als Basis für eine FAIR-Dateninfrastruktur" liefert eine PID-Registrierungserweiterung des da|ra-Dienstes zur Vergabe von PIDs. Das grundlegende Metadatenschema wurde erweitert, um den steigenden Anforderungen an Interoperabilität, Datenmappings und Wissensgraphen gerecht zu werden. Diese Lösung umfasst ein Metadatenschema für die Identifizierung von persistenten Variablen und zur Speicherung von Querbeziehungen zwischen Variablen.
Zenodo (CERN European Organization for Nuclear Research), Jan 31, 2023
Dieser Bericht erweitert die Beschreibungen der Anwendungsfälle und gibt Empfehlungen für die Zuw... more Dieser Bericht erweitert die Beschreibungen der Anwendungsfälle und gibt Empfehlungen für die Zuweisung von Persistent Identifiers (PIDs) unterhalb der Studienebene, d. h. hier, als erster Ansatz, einer Variablen innerhalb eines Datensatzes. Die Empfehlungen beziehen sich auf die Datentypen und Dienste der Anwendungsfallpartner. Sie würden jedoch auch anderen Einrichtungen zugute kommen, z. B. Datenarchiven, die über Tabellendaten verfügen und die Vorteile einer eindeutigen Identifizierung ihrer Daten auf einer niedrigeren Granularitätsebene nutzen möchten. Der Bericht beschreibt sieben Anwendungsfälle von vier Konsortialpartnern aus dem KonsortSWD. Die Zielelemente, die eine PID erhalten sollen, sind Variablen in großen und kleinen Datensätzen, PIDs für Informationsbündel (Gruppe von Variablen) und qualitative Daten, wie Beobachtungsaufzeichnungen, Interviews und Transkriptionen. PIDs für Variablen, die von Harmonisierungsinstrumenten verwendet werden, sind ebenfalls enthalten. Die Institutionen sind dafür verantwortlich, die Variable oder ein anderes Attribut, für das sie eine PID erhalten möchten, zu dokumentieren und die URL der Landing Page sowie die Metadatensätze an da|ra zu übermitteln, um eine PID zu erhalten. Zu diesem Zweck wurde der da|ra PID-Registrierungsdienst im Rahmen der KonsortSWD-Maßnahme TA.5-M.1 erweitert. Das da|ra registriert dann die entsprechenden Metadaten und holt sich die PID im Namen der Einrichtung und gibt die PID an die Einrichtung zurück, die die Daten mit den entsprechenden PIDs speichert.
Uploads
Papers by Peter Mutschke