Step 2: Model Cards for Model Reporting

Model cards were first introduced in Mitchell et al. 2019, as an effort to bring a standardised process for documenting models in the machine learning community.

Machine learning models are becoming increasingly widespread, and they are used even in most sensitive applications (e.g. healthcare, recruitment, education, etc.). It seems therefore appropriate to create a standardized system to gather information regarding model performance characteristics, intended use cases or potential pitfalls. Model cards would provide this type of information, and could thus help users make more informed decisions. They could decide, for example, whether a trained machine learning model is suitable for a particular application and context, before deployment. A standardized framework would also aid comparison across various axes including performance but also ethics and fairness.

A model card will be roughly structured as follows [Mitchell et al. 2019]:

Model Details

Basic information about the model.

  • Person or organization developing model

  • Model date

  • Model version

  • Model type

  • Information about training algorithms, parameters, fairness constraints or other applied approaches, and features

  • Paper or other resource for more information

  • Citation details License

  • Where to send questions or comments about the model

Intended Use

Use cases that were envisioned during development.

  • Primary intended uses

  • Primary intended users

  • Out-of-scope use cases

Factors

Factors could include demographic or phenotypic groups, environmental conditions, technical attributes.

  • Relevant factors

  • Evaluation factors

Metrics

Metrics should be chosen to reflect potential real-world impacts of the model.

  • Model performance measures

  • Decision thresholds

  • Variation approaches

Evaluation Data

Details on the dataset(s) used for the quantitative analyses in the card.

  • Datasets

  • Motivation

  • Preprocessing

Training Data

May not be possible to provide in practice. When possible, this section should mirror Evaluation Data. If such detail is not possible, minimal allowable information should be provided here, such as details of the distribution over various factors in the training datasets.

Quantitative analyses
  • Unitary results

  • Intersectional results

Ethical considerations

Any additional ethical consideration

Caveats and recommendations

Any additional information

We include one example model card here:

You can find more information and examples about model cards in Mitchell et al. 2019. Additionally, Google has released a model card toolkit for creating model cards automatically. You can see an example notebook released by Google here.

Last updated