Data science machine learning
How National Highways information is to be used in the building of machine learning algorithms.
Machine learning means building mathematical models to predict an outcome, using techniques drawn from computer science and statistics.
The potential benefits of machine learning must be balanced against any risk to National Highways information or losing the trust of the public.
Requirement
Any machine learning done using National Highways information, either by National Highways or a National Highways Supplier, must reflect National Highways corporate values and meet National Highways information ethics requirement.
Specification
How the requirement is implemented will depend largely on how a Supplier manages their data science value chain.
That is, how a Supplier moves from the identification of a problem statement to the delivery of a productionised machine-learning solution.
However, certain conditions must be met:
Ethical
Having an ethics checklist for each stage of the data science value chain that aligns to National Highways ethical values.
Auditability
This includes:
- approvals and release notes
- relevant defect reports during development
- descriptive statistics and suitability commentary for the datasets used
- retention of retired models, for a suitable period, to allow retrospective decision analysis
Interoperability and portability
Wherever possible machine learning content must be vendor agnostic – to avoid vendor 'lock-in'.
Further guidance on how to achieve interoperability and portability:
Other considerations
Human assessment
Assess the impact of incorrect predictions and where reasonable, design systems with human-in-the-loop review processes.
Bias evaluation
Continuously develop processes that allow National Highways to understand, document and monitor bias in development and production.
Explainability versus 'black box' techniques
Where possible, develop tools and processes to continuously improve transparency and explainability of machine learning systems.
Further guidance:
- Partnership on AI - about machine learning
- Cornell University - explainable machine learning in development
Reproducible operations
Develop the infrastructure to allow for a level of reproducibility of the model across different types of machine learning systems.
Displacement strategy
If the machine learning output has the potential to change the nature of, or the amount of, work for human operators, this will be called out to determine if business change processes can be developed to mitigate the impact of workers being automated.
Practical accuracy
Accuracy metrics take the information lifecycle of the dataset into account.
Privacy
If personally identifiable information is used, the techniques use allows for privacy by design principles.
For example differential privacy, homomorphic encryption, the ability for the model to allow for the withdrawal of consent for processing from individuals.
Data risk management
Develop and improve reasonable capabilities to ensure data and model security are incorporated during the development of machine learning systems.
Auditability and change control
When models need to be changed, follow change-control processes to include:
- storage of previously used models
- governance records (for example person authorising changes, date of change and so on)
- model deployment documentation (for example impact assessment, release notes and roll-back analysis)
Tuning
Continuously tune models so that the outputs are in line with what's expected from the model.