Transforming data into new knowledge: data pipelines for biodiversity research

Transforming data into new knowledge data pipelines for biodiversity research

“As a little girl, I was roaming around in the forest in spring, enjoying the fact that the snow had melted.
I grew up in Norway; we have long winters and looking for the spring flower was one of the favourite activities for kids…And looking for Epatica nobilis (Liver leaf) was one of the most important things we did because we got to get in the local newspaper if you were the first ones. We knew about specific places where the snow melted first and we had some hints of leaves etc that indicated that this is the place where we could find this precious flower.”

Remember searching for the first spring flowers as a child?

In our third BioDT Talks episode, Bente Lilja Bye, founder of the research and consulting company BLB, shares how her childhood quests for Hepatica nobilis in Norway and her mother’s meticulous nature diaries evolved into groundbreaking work in biodiversity data science!
The information about the first Epatica nobilis of the year and the “metadata” around these spring flowers, carefully handwritten in her mother’s diary, were her first experience collecting data, and the first repository of her life. An important channel for her to get involved in her current job.

In her talk, Bente Lilja Bye explains in particular why is worth learning about data pipelines and to build Digital Twins for biodiversity.
So, first of all, what is a Digital Twin? A Digital Twin for biodiversity is a sophisticated digital representation of ecosystems, species, and their interactions with the environment. This technology integrates various data sources to create a dynamic simulation that mirrors real-world biological systems. “A simple representation of a Digital Twin is that you have a physical system and a virtual system”, Bente Lilja Bye says. “Data or observations of the physical system are used to create the virtual system. Now, the virtual system is running models etc giving feedback into the physical system and in this way we have a loop called Digital Twin”.

The data is the core of a Digital Twin, we would not have Digital Twins without data. There are currently many sources and many types of data, and the challenge is to collect, harmonise, standardise, processing all this amount of information to put all these different types of data together. Data pipelines are essential for efficiently processing vast amounts of data and providing real-time insights for Digital Twins. In simpler words, they are systems leading from the collection and acquisition of data, to their final transformation into new knowledge or possible decisions. Moreover, data pipelines enable industry, academia, and the public sector to more efficiently share data, facilitating interdisciplinary collaboration.

“By being a foundational component of a Digital Twin, a data pipeline represents a transformative approach to biodiversity conservation, offering enhanced monitoring capabilities, improved decision-making processes, predictive insights, and fostering collaboration among stakeholders. And these benefits are instrumental in addressing the pressing challenges facing global biodiversity today”.
Watch the video, and find out how we can transform data into new knowledge.

BioDT is a research project funded by the European Union that aims to develop a digital twin prototype for the study and analysis of biodiversity, in support of the EU Biodiversity Strategy for 2030. The Biodiversity Digital Twin prototype provides advanced models for simulation and prediction capabilities, through practical use cases addressing critical issues related to global biodiversity dynamics.
The BioDT Talks is the new 6-part series illustrating how data science and technology are transforming our approach to the biodiversity crisis.
More information on the BioDT Project HERE.
Watch the full playlist on YouTube and find out more!

Semantic Academy: the registration for the LifeWatch ERIC Intensive School is now open!

In recent years, one of the major challenges in Environmental and Earth Sciences has been managing and searching larger volumes of data, collected across multiple disciplines. Many different standards, approaches, and tools have been developed to support the Data Lifecycle from Data Acquisition to Data Curation, Data Publishing, Data Processing and Data Use. In particular, modern semantic technologies provide a promising way to properly describe and interrelate different data sources in ways that reduce barriers to data discovery, integration, and exchange among biodiversity and ecosystem resources and researchers. Therefore, we are delighted to announce the launch of the 2023 edition of The Semantic Academy – The LifeWatch ERIC Intensive School: Boost your research with semantic artifacts. And this time, we are back in person!


This school is organized by LifeWatch ERIC and will take place in Lecce, from 25 to 29 September 2023.
This edition’s title is “Boost your research with semantic artifacts”. This course is built as a five-day intensive school providing the knowledge on how to create semantic artifacts for a specific domain and use them to annotate and analyse data in a Virtual Research Environment (VRE). It will cover topics such as Data Science, Semantics, Ontology, Vocabularies, Virtual Research Environments (VREs). The School is therefore mainly aimed at IT architects, Research Infrastructure (RI) service developers and user support staff, and RI staff.

The Semantic Academy will welcome participants with a welcome cocktail event and social dinner, while the actual Intensive School programme will last from Monday afternoon to Friday morning, closing with a certificate ceremony.

The outline of the School programme is as follows:

  1. Introducing the LifeWatch ERIC eScience Infrastructure
  2. Ontology Engineering
  3. Designing and Developing vocabularies
  4. Using Semantics for discovering, accessing and analysing data in the Notebook-as-a-VRE (NaaVRE)
  5. Putting everything together: practical activity with participants projects presentations

EXTENDED DEADLINE: Interested persons are invited to apply by 30 July by filling in the sign-up form here
Participation is free, but registration is compulsory. Three grants are made available by LifeWatch ERIC to support applicants younger than 30 years. Successful candidates will be offered accommodation for the whole duration of the intensive school on the basis of their motivation letter and their curricula, while travel must be self-funded. LifeWatch ERIC is an equal opportunity organisation, and encourages all qualified candidates to apply, regardless of race, gender, age, national origin, or sexual orientation. Follow LifeWatch ERIC updates!

You can access the dedicated minisite with more detailed information on the Semantic Academy here.
You can find information about other Summer Schools on Data FAIRness previously organised by LifeWatch ERIC and the ENVRI Community on our Training & Education page.