How we shape our Data Platform
Network Data Platform Workshop – March 2023
Members of the National Mouse Genetics Network recently met in Glasgow to discuss ideas and use cases for the Network’s data platform. Cluster leads and data scientists from each cluster represented the Network and we invited delegates from associate institutions such as the Dementia Research Institute (DRI) and the Spatial Omics Oxford Pipeline (SpOOX).
After an Introduction by Crispin Miller, the Data lead for the Network and Professor for Computational Biology at the Beatson Institute for Cancer Research/University of Glasgow, Dr Holly Hall from his research group gave a presentation on Human-Mouse Disease positioning to begin the workshop.
This was followed by presentations from each cluster lead, in which they highlighted the most important use cases for the data platform and the potential contribution to their research programmes. The presentations raised a lot of important questions and led to interesting discussions. Soon the many parallels and similarities between the needs of the clusters became apparent. Examples of common requirements are:
- The need to store very large (100+ TB) of data and handle it with minimum resource use/waste (e.g., how much raw data is stored and how long, how it is processed to minimise data volume)
- The need for submitted data to be standardised in format and quality to achieve a ‘gold standard’ for data deposition.
- How data is integrated and made accessible (e.g., an animal-centric model where all data from a single animal can be seen and seamlessly accessed, while also allowing data to be grouped and stratified appropriately)
- Integration of other external databases/deposits where possible
- How data can be edited after submission in a way that avoids multiple resubmissions and allows submission and edit tracking.
After a refreshing lunch break, the afternoon session was dedicated to a more in-depth analysis of the key topics highlighted in the morning.
Crispin Miller said “Data science is a key aspect of the work of the network, and it was exciting bringing the data leads from each disease cluster to consider the use-cases, overlaps and synergies that will help shape our approach to data and to drive collaborations between the disease clusters. It was an exciting and stimulating meeting.”
Watch out for more news and information about our data platform that will be shared throughout the development process.