Data science: machine learning
How National Highways information is to be used in the building of machine learning algorithms.
Machine learning means building mathematical models to predict an outcome, using techniques drawn from computer science and statistics.
The potential benefits of machine learning must be balanced against any risk to National Highways information or losing the trust of the public.
Any machine learning done using National Highways information, either by National Highways or a National Highways Supplier, must reflect National Highways corporate values and meet National Highways information ethics requirement.
How the requirement is implemented will depend largely on how a Supplier manages their data science value chain.
That is, how a Supplier moves from the identification of a problem statement to the delivery of a productionised machine-learning solution.
However, certain conditions must be met:
Having an ethics checklist for each stage of the data science value chain that aligns to National Highways ethical values.
- approvals and release notes
- relevant defect reports during development
- descriptive statistics and suitability commentary for the datasets used
- retention of retired models, for a suitable period, to allow retrospective decision analysis
Interoperability and portability
Wherever possible machine learning content must be vendor agnostic – to avoid vendor 'lock-in'.
Further guidance on how to achieve interoperability and portability:
Assess the impact of incorrect predictions and where reasonable, design systems with human-in-the-loop review processes.
Continuously develop processes that allow National Highways to understand, document and monitor bias in development and production.
Explainability versus 'black box' techniques
Where possible, develop tools and processes to continuously improve transparency and explainability of machine learning systems.
- Partnership on AI - about machine learning
- Cornell University - explainable machine learning in development
Develop the infrastructure to allow for a level of reproducibility of the model across different types of machine learning systems.
If the machine learning output has the potential to change the nature of, or the amount of, work for human operators, this will be called out to determine if business change processes can be developed to mitigate the impact of workers being automated.
Accuracy metrics take the information lifecycle of the dataset into account.
If personally identifiable information is used, the techniques use allows for privacy by design principles.
Data risk management
Develop and improve reasonable capabilities to ensure data and model security are incorporated during the development of machine learning systems.
Auditability and change control
When models need to be changed, follow change-control processes to include:
- storage of previously used models
- governance records (for example person authorising changes, date of change and so on)
- model deployment documentation (for example impact assessment, release notes and roll-back analysis)
Continuously tune models so that the outputs are in line with what's expected from the model.