# Dataset (machine learning) > A collection of raw data, commonly (but not exclusively) organized in one of the following formats:[^1] A collection of raw data, commonly (but not exclusively) organized in one of the following formats:[^1] - a spreadsheet - a file in format ## Chacteristics A dataset is characterized by its size and diversity. Good datasets are both large and highly diverse:[^2] - **Size** indicates the number of examples. - **Diversity** indicates the range those examples cover. ## See also - [ML crash course - Datasets, generalization, and overfitting](https://wiki.g15e.com/pages/ML%20crash%20course%20-%20Datasets,%20generalization,%20and%20overfitting.txt) ## Footnotes [^1]: https://developers.google.com/machine-learning/glossary#dataset [^2]: https://developers.google.com/machine-learning/intro-to-ml/supervised