Workshops

Day 2 Workshops (20th May)

Data quality workshop

Practical Tools for Using Supermarket Data in Population and Planetary Health Research

A Community Dialogue on the Increasing Challenge of Data Quality in Digital Footprint Data

Day 3 Workshops (21st May)

Reimagining Data Donation – The Data Donation project for Climate Change (CDat and Beyond)

How can we better engage the public in Digital Footprints research?

Showcasing the Priority Places for Food Index among end-users

Data quality workshop

Organisers/speakers: Roy Ruddle, University of Leeds

Consider these three facts:
(a) There are more than 100 ways in which data can be of “low”' quality,
(b) Data preparation (which includes data quality checks) often takes more than half of a data science project’s time, and
(c) Few data scientists or analysts have had much training in data quality.

This practical workshop will teach you an efficient and rigorous method for investigating data quality that will:
1. Save you time, by defining a set of tasks and questions to ask about your data
2. Reduce cost by avoiding re-work
3. Improve your results, by correcting your assumptions and understanding any limitations of your data

Structure

You will learn the what, when, how and why of investigating data quality.

What? Will combine attendees’ own experiences with more than 40 tasks and associated questions to ask about your data that have been documented in plain English in a publicly accessible Practitioner’s Guide (https://doi.org/10.5518/1481).

When? Utilises a workflow with the following steps: (1) Look at your data (is anything obviously wrong or missing?), (2) Watch out for special values, (3) Is any data missing? (4) Check each variable (is it what you expect?), (5) Check combinations of variables (do they violate any of your assumptions or business rules?), and (6). Characterise the cleaned data.

How? Will include both computations and visualization techniques.

Why? Will centre on gathering evidence of benefits and the value proposition.

Practical examples will include showcasing application of the method on Open Data and to understanding structures of missing data in a large electronic health records dataset.

Practical Tools for Using Supermarket Data in Population and Planetary Health Research

Organisers/speakers: Dr Emma Wilkins, University of Leeds; Dr Alice Kininmonth, University of Leeds; Dr Vicki Jenneson, University of Leeds; Dr Romana Burgess, University of Bristol; Prof. Mark Green, University of Liverpool

The global food system accounts for one third of greenhouse gas emissions, 70% of freshwater withdrawals, and uses a third of available land. To meet net zero targets by 2050, consumer-side reductions of 15-30% in food and drink-related greenhouse gas emissions are needed in the UK. Concurrently, 11 million deaths are attributable to poor diet globally.

In the UK, two-thirds of adults and one-third of children are living with overweight and obesity, and public health nutrition continues to be high on the political agenda. For example, in the next 12-months, there are several extensions to nutrition-related legislation planned, e.g. for sugar taxes and placement and promotion of ‘high in fat, sugar, and salt’ (HFSS) foods.

To address the dual challenge of public and planetary health, we need a comprehensive understanding of current food and drink consumption patterns.

Traditional methods for dietary assessment, such as food frequency questionnaires and diet diaries are time-consuming, meaning sample sizes are typically small and subject to selection bias. Larger surveys generally lack product-level granularity e.g. collecting data on consumption patterns across broad food categories such as ‘cakes’, or ‘ready meals’, which encompass a wide variety of food and drink products, which can be diverse in relation to their impact on population and planetary health. Reporting bias is also a well-known problem in traditional dietary assessment.

Supermarket transaction data can address many of these challenges by providing objective data on food purchasing behaviours across large populations, often with highly granular product-level detail. However, these data are complex, and require significant time and expertise to work with. They also present challenges, such as selection bias and difficulty making reliable inferences about consumption behaviours from snapshots of food purchasing patterns.

In this workshop, we will explore the challenges and complexities of working with supermarket retail data and provide an overview of several tools and methods that have been developed to support robust use of supermarket transaction data in population and planetary health research.

A Community Dialogue on the Increasing Challenge of Data Quality in Digital Footprint Data

Organisers/speakers: Gregor Milligan Sheffield Hallam University; Gavin Long University of Nottingham; Torran Semple University of Oxford

Machine learning (ML) is increasingly influencing public social policy and industrial practices. The number of academic publications and conference proceedings incorporating ML approaches continues to increase, with ML now being routinely applied to digital footprints data for mental health outcomes, public health monitoring, social behaviour analysis, and health inequalities assessment.

While the outputs of these ML models have the potential for impact, they, like all ML studies, are directly limited by the quality of the data used to capture the desired observations. These fundamental challenges are often the first topics in ML textbooks or lecture series, accompanied by approaches to exploratory data analysis (EDA).

The phrase “Garbage In, Garbage Out” embodies a core principle of data science, meaning that if poor quality data are used in an ML model, poor results will be produced. This is particularly pertinent when a model is trained to identify drivers of a specific social concern.

If the data are flawed, fail to capture the concern accurately, or are collected in ways that introduce systematic bias, the model outputs are misleading and could incorrectly inform public policy or industrial practice.

Data quality issues are not always obvious, with latent and pernicious problems often evading initial EDA. This challenge is compounded by the increasing use of ML by interdisciplinary researchers, perhaps without formal data science training, who may prioritise methodological novelty within the discipline over rigorous data quality assessment and model evaluation. Peer review processes frequently fail to scrutinise data provenance and quality with the same rigour as methodological approaches, creating a false sense of confidence. When researchers assume that peer-reviewed datasets are inherently trustworthy, flawed data can propagate through the literature, eroding trust among data scientists, publishers, policymakers and the public.

Despite being fundamental to ML for social good, these data quality challenges persist. This workshop brings together data scientists from academia and industry to share experiences, identify common patterns in data quality failures, and begin developing collective strategies for detection and prevention. Through community dialogue, this panel aims to establish a shared understanding of the challenge’s scope and lay the groundwork for collaborative efforts to improve data quality practices across disciplines.

Reimagining Data Donation – The Data Donation project for Climate Change (CDat and Beyond)

Organisers/speakers: Dr Evgeniya Lukinova, Dr Gavin Long, Dr Daniel Fletcher, Professor James Goulding, Professor Alexa Spence

Data donation is emerging as a powerful approach for capturing real-world digital footprints, yet its methodological, ethical, and analytical foundations remain under development. This panel explores the future of data donation in academic research through the Data Donation for Climate Change (CDat) project, an example of data donation and data convergence in practice.

CDat combines a nationally representative UK survey with donated Tesco Clubcard purchasing data, converged with product-level environmental impact metrics, to generate novel insights into everyday food consumption and sustainability. By linking self-reported attitudes with observed behaviour, the project demonstrates how donated data can move research beyond traditional survey methods, enhancing validity while opening new possibilities for scale, precision, and public engagement.

The session will showcase the CDat project and invite critical discussion of the data donation approach. Panellists will reflect on issues of data validity, representativeness, bias, and interpretation, and consider the value—and future directions—of research outputs generated through data donation. Research outputs include:
- a unique linked dataset integrating survey, shopping, and environmental data;
- the development of a novel Environmental Food Purchasing Index (EFPI); and
- an interactive web portal enabling individuals to visualise the environmental impact of their own food purchasing.

Together, the papers examine what it means to create meaningful knowledge from donated data, how data convergence can strengthen behavioural research, and how researchers might return value to participants and the public.

Panel Papers
Donating and Validating Supermarket Data - Dr Evgeniya Lukinova
As online research faces growing concerns about synthetic and LLM-generated responses, donated supermarket data offers a robust methodological safeguard. Accessed via participant-specific verification codes, shopping data provides a strong link to real-world behaviour. This paper shows how data donation can enhance research credibility while exposing biases inherent in survey-only approaches.

Data Convergence and Measurement: Developing the Environmental Food Purchasing Index - Dr Gavin Long
Measuring the environmental impact of food purchasing is methodologically complex. Building on leading research (e.g. Poore & Nemecek, 2018; Clarke et al., 2022), this paper introduces the Environmental Food Purchasing Index (EFPI), a composite measure capturing the cumulative environmental impact of grocery shopping based on actual consumption patterns.

Creating Meaning from Data: Perceptions and Misperceptions of Sustainable Food - Dr Daniel Fletcher
By linking donated Tesco Clubcard data with survey responses and product-level scientific assessments of environmental impact, this paper directly compares perceived and objective measures of the environmental footprint of food consumption. The findings reveal general awareness of which food categories are environmentally impactful, but also expose a critical blind spot: consumers systematically underestimate the extent to which within-category choices—particularly among meat products—drive overall environmental impact.

Visualising Data for Public Engagement: The CDat.food Portal - Prof James Goulding & Prof Alexa Spence
This paper presents CDat.food, a web portal enabling individuals to visualise their food shopping and associated environmental impacts using their own Clubcard data. Developed through workshops and interviews, the portal represents a step toward participatory data donation infrastructures that return insight and value to contributors.

How can we better engage the public in Digital Footprints research?

Organisers/speakers: Romana Burgess; Samaira Khan, PEDRI); Mark Gardner (SDR UK); Fiona Jamieson & Katie Burns (The PSC); Anya Skatova, University of Bristol

The increase in Digital Footprints research brings huge potential for public benefit, but also significant ethical, governance, and trust challenges. As the number of these projects increase, so too does the need for meaningful, ongoing dialogue between researchers and the members of the public whose data are represented.
This session builds on work by the Public Engagement for Data Research Initiative (PEDRI) to explore how researchers can embed authentic public involvement throughout their Digital Footprints research.

Our aims are twofold:
1. To share practical insights into what effective public engagement looks like in data-intensive contexts; and
2. To give public contributors themselves a platform, acknowledging how their expertise and lived experience can shape research.

Through this session, we aim to highlight how working with the public can strengthen both research quality and legitimacy. Drawing on real examples from UK-based Digital Footprints projects, we will discuss barriers and enablers to engagement, explore how researchers can create inclusive spaces for dialogue, and examine how engagement outcomes can influence data governance, methods, and dissemination.

Showcasing the Priority Places for Food Index among end-users

Organisers/speakers: Andy Newing, Rachel Oldroyd, University of Leeds

Growing rates of food insecurity have social, economic and health impacts at the local level. Local authorities, charities and commercial organisations require neighbourhood insights to:

• identify communities in need of support in accessing healthy and affordable food
• link individuals’ experiences in accessing food to their neighbourhood characteristics
• understand the localised drivers of poor access to food
• enable targeted interventions at a variety of spatial scales

University of Leeds research has generated data products which support the need for local evidence-based interventions, enabling targeted support in accessing healthy and affordable food:

1. E-Food Desert Index (EFDI) – a neighbourhood-level composite indicator capturing characteristics associated with contemporary food deserts.

2. Priority Places for Food Index (PPFI) (co-designed and developed with consumer organisation Which?) – builds on the EFDI and acts as an indicator of neighbourhoods most likely to need support in accessing healthy and affordable food, capturing additional metrics of foodstore provision and household-level impacts of the cost-of-living crisis.

Which? used the PPFI as the cornerstone of its ‘Affordable Food For All’ campaign, lobbying supermarkets and policy makers to address access to food at a neighbourhood level. The PPFI has been viewed online over 21,000 times and following its migration to the Healthy and Sustainable Places (HASP) Data Service in 2025, it has been downloaded by a number of users in local government and charity sectors. Information collection at the time of download suggests that users are linking these data to other local datasets – including those derived from digital footprints – and using those to support local decision making.

This interactive workshop specifically targets those non-academic users of the PPFI and EFDI and seeks to:

• Understand their use of these tools in supporting evidence-based decision making, targeting support and interventions to those communities most in need.
• Facilitate networking and information sharing between these users.
• Connect users with academics, enabling further support in using these tools, co-design of further work – and identifying potential future improvements to these tools.