Step 3: Data Collection, Selection, and Preparation

How is data selected and reviewed and who does it?
Who owns the data?
Is there a language selection decision to be made?
 

 *This Framework does not provide an exhaustive list of all decision points that are to be addressed in the process of an AI/ML project. Rather, it focuses on providing tools and guidance around the decision points which provide particularly strong opportunities for processes of co-creation. Below is a list of additional questions relevant to “Data selection, collection, and preparation” for consideration:

  • Have you identified all existing data sets and alternative data sources? Are there potential partner organisations that could provide access to data?

  • Is data publicly available or does it require 3rd party approval? Are there any legal requirements or barriers?

  • Lineage of data: Where does the data originally come from? How was it collected, curated and moved? Is its accuracy maintained over time?

  • How will training data be labelled? Are personal or organisational biases involved?