Noticias em eLiteracias

🔒
❌ Sobre o FreshRSS
Há novos artigos disponíveis, clique para atualizar a página.
Antes de ontemInternational Journal of Digital Curation

Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

Workflows processing data from research activities and driving in silico experiments are becoming an increasingly important method for conducting scientific research. Workflows have the advantage that not only can they be automated and used to process data repeatedly, but they can also be reused – in part or whole – enabling them to be evolved for use in new experiments. A number of studies have investigated strategies for storing and sharing workflows for the benefit of reuse. These have revealed that simply storing workflows in repositories without additional context does not enable workflows to be successfully reused. These studies have investigated what additional resources are needed to facilitate users of workflows and in particular to add provenance traces and to make workflows and their resources machine-readable. These additions also include adding metadata for curation, annotations for comprehension, and including data sets to provide additional context to the workflow. Ultimately though, these mechanisms still rely on researchers having access to the software to view and run the workflows. We argue that there are situations where researchers may want to understand a workflow that goes beyond what provenance traces provide and without having to run the workflow directly; there are many situations in which it can be difficult or impossible to run the original workflow. To that end, we have investigated the creation of an interactive workflow visualization that captures the flow chart element of the workflow with additional context including annotations, descriptions, parameters, metadata and input, intermediate, and results data that can be added to the record of a workflow experiment to enhance both curation and add value to enable reuse. We have created interactive workflow visualisations for the popular workflow creation tool KNIME, which does not provide users with an in-built function to extract provenance information that can otherwise only be viewed through the tool itself. Making use of the strengths of KNIME for adding documentation and user-defined metadata we can extract and create a visualisation and curation package that encourages and enhances curation@source, facilitating effective communication, collaboration, and reuse of workflows.

  • 16 de Setembro de 2017, 21:00

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

  • 16 de Setembro de 2017, 21:00

Choose Your Own Research Data Management Guidance

The GW4 Research Data Services Group has developed a Research   Data Management Triage Tool to help researchers find answers quickly  to the more common research data queries, and direct them to appropriate guidance and sources of advice for more complex queries.   The tool takes the form of an interactive web page that asks users   questions and updates itself in response. The conversational and   dynamic way the tool progresses is similar to the behaviour of text   adventures, which are a genre of interactive fiction; this is one of the   oldest forms of computer game and was also popular in print form in,   for example,  the Choose Your Own Adventure and Fighting Fantasy   series of books.  In fact, the tool was written using interactive fiction   software.  It was tested with staff and students at the four UK   universities within the GW4 collaboration.

  • 16 de Setembro de 2017, 21:00

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data.  In   this context, defects in datasets can have far reaching consequences,  spreading from dataset to dataset, and affecting the consumers of  data in ways that are hard to predict or quantify.  Some form of waste   is often the result.   For example,  scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets.  Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.  

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect. However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately. Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available. 

In this paper,we explore one possible approach to maximising the  value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do. This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).  This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or  change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

  • 16 de Setembro de 2017, 21:00

Information Integration for Machine Actionable Data Management Plans

Data management plans are free-form text documents describing the data used and produced in scientific experiments. The complexity of data-driven experiments requires precise descriptions of tools and datasets used in computations to enable their reproducibility and reuse. Data management plans fall short of these requirements. In this paper, we propose machine-actionable data management plans that cover the same themes as standard data management plans, but particular sections are filled with information obtained from existing tools. We present mapping of tools from the domains of digital preservation, reproducible research, open science, and data repositories to data management plan sections. Thus, we identify the requirements for a good solution and identify its limitations. We also propose a machine-actionable data model that enables information integration. The model uses ontologies and is based on existing standards.

  • 16 de Setembro de 2017, 21:00

Standardising and Harmonising Research Data Policy in Scholary Publishing

To address the complexities researchers face during publication, and the potential community-wide benefits of wider adoption of clear data policies, the publisher Springer Nature has developed a standardised, common framework for the research data policies of all its journals. An expert working group was convened to audit and identify common features of research data policies of the journals published by Springer Nature, where policies were present. The group then consulted with approximately 30 editors, covering all research disciplines within the organisation. The group also consulted with academic editors, librarians and funders, which informed development of the framework and the creation of supporting resources. Four types of data policy were defined in recognition that some journals and research communities are more ready than others to adopt strong data policies. As of January 2017 more than 700 journals have adopted a standard policy and this number is growing weekly. To potentially enable standardisation and harmonisation of data policy across funders, institutions, repositories, societies and other publishers, the policy framework was made available under a Creative Commons license. However, the framework requires wider debate with these stakeholders and an Interest Group within the Research Data Alliance (RDA) has been formed to initiate this process.

  • 16 de Setembro de 2017, 21:00

Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services

This paper describes a preliminary study of research transparency, which draws on the findings from four focus group sessions with faculty in chemistry, law, urban and social studies, and civil and environmental engineering. The multi-faceted nature of transparency is highlighted by the broad ways in which the faculty conceptualised the concept (data sharing, ethics, replicability) and the vocabulary they used with common core terms identified (data, methods, full disclosure). The associated concepts of reproducibility and trust are noted. The research lifecycle stages are used as a foundation to identify the action verbs and software tools associated with transparency. A range of transparency drivers and motivations are listed. The role of libraries and data scientists is discussed in the context of the provision of transparency services for researchers.

  • 16 de Setembro de 2017, 21:00

Next-Generation Data Management Plans: Global, Machine-Actionable, FAIR

At IDCC 2016 the Digital Curation Centre (DCC) and University of California Curation Center (UC3) at the California Digital Library (CDL) announced plans to merge our respective data management planning tools, DMPonline and DMPTool, into a single platform. By formalizing our partnership and co-developing a core infrastructure for data management plans (DMPs), we aim to meet the skyrocketing demand for our services in our national, and increasingly international, contexts. The larger goal is to engage with what is now a global DMP agenda and help make DMPs a more useful exercise for all stakeholders in the research enterprise. This year we offer a progress report that encompasses our co-development roadmap and future enhancements focused on implementing use cases for machine-actionable DMPs.

  • 16 de Setembro de 2017, 21:00

Next-Generation Data Management Plans: Global, Machine-Actionable, FAIR

At IDCC 2016 the Digital Curation Centre (DCC) and University of California Curation Center (UC3) at the California Digital Library (CDL) announced plans to merge our respective data management planning tools, DMPonline and DMPTool, into a single platform. By formalizing our partnership and co-developing a core infrastructure for data management plans (DMPs), we aim to meet the skyrocketing demand for our services in our national, and increasingly international, contexts. The larger goal is to engage with what is now a global DMP agenda and help make DMPs a more useful exercise for all stakeholders in the research enterprise. This year we offer a progress report that encompasses our co-development roadmap and future enhancements focused on implementing use cases for machine-actionable DMPs.

  • 16 de Setembro de 2017, 21:00

Amplifying Data Curation Efforts to Improve the Quality of Life Science Data

In the era of data science, datasets are shared widely and used for many purposes unforeseen by the original creators of the data.  In   this context, defects in datasets can have far reaching consequences,  spreading from dataset to dataset, and affecting the consumers of  data in ways that are hard to predict or quantify.  Some form of waste   is often the result.   For example,  scientists using defective data to propose hypotheses for experimentation may waste their limited wet lab resources chasing the wrong experimental targets.  Scarce drug trial resources may be used to test drugs that actually have little chance of giving a cure.  

Because of the potential real world costs, database owners care about providing high quality data. Automated curation tools can be used to an extent to discover and correct some forms of defect. However, in some areas human curation, performed by highly-trained domain experts, is needed to ensure that the data represents our current interpretation of reality accurately. Human curators are expensive, and there is far more curation work to be done than there are curators available to perform it. Tools and techniques are needed to enable the full value to be obtained from the curation effort currently available. 

In this paper,we explore one possible approach to maximising the  value obtained from human curators, by automatically extracting information about data defects and corrections from the work that the curators do. This information is packaged in a source independent form, to allow it to be used by the owners of other databases (for which human curation effort is not available or is insufficient).  This amplifies the efforts of the human curators, allowing their work to be applied to other sources, without requiring any additional effort or  change in their processes or tool sets. We show that this approach can discover significant numbers of defects, which can also be found in other sources.

  • 16 de Setembro de 2017, 21:00

Information Integration for Machine Actionable Data Management Plans

Data management plans are free-form text documents describing the data used and produced in scientific experiments. The complexity of data-driven experiments requires precise descriptions of tools and datasets used in computations to enable their reproducibility and reuse. Data management plans fall short of these requirements. In this paper, we propose machine-actionable data management plans that cover the same themes as standard data management plans, but particular sections are filled with information obtained from existing tools. We present mapping of tools from the domains of digital preservation, reproducible research, open science, and data repositories to data management plan sections. Thus, we identify the requirements for a good solution and identify its limitations. We also propose a machine-actionable data model that enables information integration. The model uses ontologies and is based on existing standards.

  • 16 de Setembro de 2017, 21:00

Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

Workflows processing data from research activities and driving in silico experiments are becoming an increasingly important method for conducting scientific research. Workflows have the advantage that not only can they be automated and used to process data repeatedly, but they can also be reused – in part or whole – enabling them to be evolved for use in new experiments. A number of studies have investigated strategies for storing and sharing workflows for the benefit of reuse. These have revealed that simply storing workflows in repositories without additional context does not enable workflows to be successfully reused. These studies have investigated what additional resources are needed to facilitate users of workflows and in particular to add provenance traces and to make workflows and their resources machine-readable. These additions also include adding metadata for curation, annotations for comprehension, and including data sets to provide additional context to the workflow. Ultimately though, these mechanisms still rely on researchers having access to the software to view and run the workflows. We argue that there are situations where researchers may want to understand a workflow that goes beyond what provenance traces provide and without having to run the workflow directly; there are many situations in which it can be difficult or impossible to run the original workflow. To that end, we have investigated the creation of an interactive workflow visualization that captures the flow chart element of the workflow with additional context including annotations, descriptions, parameters, metadata and input, intermediate, and results data that can be added to the record of a workflow experiment to enhance both curation and add value to enable reuse. We have created interactive workflow visualisations for the popular workflow creation tool KNIME, which does not provide users with an in-built function to extract provenance information that can otherwise only be viewed through the tool itself. Making use of the strengths of KNIME for adding documentation and user-defined metadata we can extract and create a visualisation and curation package that encourages and enhances curation@source, facilitating effective communication, collaboration, and reuse of workflows.

  • 16 de Setembro de 2017, 21:00

Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services

This paper describes a preliminary study of research transparency, which draws on the findings from four focus group sessions with faculty in chemistry, law, urban and social studies, and civil and environmental engineering. The multi-faceted nature of transparency is highlighted by the broad ways in which the faculty conceptualised the concept (data sharing, ethics, replicability) and the vocabulary they used with common core terms identified (data, methods, full disclosure). The associated concepts of reproducibility and trust are noted. The research lifecycle stages are used as a foundation to identify the action verbs and software tools associated with transparency. A range of transparency drivers and motivations are listed. The role of libraries and data scientists is discussed in the context of the provision of transparency services for researchers.

  • 16 de Setembro de 2017, 21:00

Standardising and Harmonising Research Data Policy in Scholary Publishing

To address the complexities researchers face during publication, and the potential community-wide benefits of wider adoption of clear data policies, the publisher Springer Nature has developed a standardised, common framework for the research data policies of all its journals. An expert working group was convened to audit and identify common features of research data policies of the journals published by Springer Nature, where policies were present. The group then consulted with approximately 30 editors, covering all research disciplines within the organisation. The group also consulted with academic editors, librarians and funders, which informed development of the framework and the creation of supporting resources. Four types of data policy were defined in recognition that some journals and research communities are more ready than others to adopt strong data policies. As of January 2017 more than 700 journals have adopted a standard policy and this number is growing weekly. To potentially enable standardisation and harmonisation of data policy across funders, institutions, repositories, societies and other publishers, the policy framework was made available under a Creative Commons license. However, the framework requires wider debate with these stakeholders and an Interest Group within the Research Data Alliance (RDA) has been formed to initiate this process.

  • 16 de Setembro de 2017, 21:00

Connecting Data Publication to the Research Workflow: A Preliminary Analysis

The data curation community has long encouraged researchers to document collected research data during active stages of the research workflow, to provide robust metadata earlier, and support research data publication and preservation. Data documentation with robust metadata is one of a number of steps in effective data publication. Data publication is the process of making digital research objects ‘FAIR’, i.e. findable, accessible, interoperable, and reusable; attributes increasingly expected by research communities, funders and society. Research data publishing workflows are the means to that end. Currently, however, much published research data remains inconsistently and inadequately documented by researchers. Documentation of data closer in time to data collection would help mitigate the high cost that repositories associate with the ingest process. More effective data publication and sharing should in principle result from early interactions between researchers and their selected data repository. This paper describes a short study undertaken by members of the Research Data Alliance (RDA) and World Data System (WDS) working group on Publishing Data Workflows. We present a collection of recent examples of data publication workflows that connect data repositories and publishing platforms with research activity ‘upstream’ of the ingest process. We re-articulate previous recommendations of the working group, to account for the varied upstream service components and platforms that support the flow of contextual and provenance information downstream. These workflows should be open and loosely coupled to support interoperability, including with preservation and publication environments. Our recommendations aim to stimulate further work on researchers’ views of data publishing and the extent to which available services and infrastructure facilitate the publication of FAIR data. We also aim to stimulate further dialogue about, and definition of, the roles and responsibilities of research data services and platform providers for the ‘FAIRness’ of research data publication workflows themselves.

  • 16 de Setembro de 2017, 21:00

Choose Your Own Research Data Management Guidance

The GW4 Research Data Services Group has developed a Research   Data Management Triage Tool to help researchers find answers quickly  to the more common research data queries, and direct them to appropriate guidance and sources of advice for more complex queries.   The tool takes the form of an interactive web page that asks users   questions and updates itself in response. The conversational and   dynamic way the tool progresses is similar to the behaviour of text   adventures, which are a genre of interactive fiction; this is one of the   oldest forms of computer game and was also popular in print form in,   for example,  the Choose Your Own Adventure and Fighting Fantasy   series of books.  In fact, the tool was written using interactive fiction   software.  It was tested with staff and students at the four UK   universities within the GW4 collaboration.

  • 16 de Setembro de 2017, 21:00
❌