Datalayer ships some knows datasets you can analyse, such as.

  • Titanic.
  • Bank.
  • Miserables.
  • Wine.

Other Sources

UCI Machine Learning Repository: This is a collection of almost 300 datasets of various types and sizes for tasks including classification, regression, clustering, and recommender systems. The list is available at

Amazon AWS public datasets: This is a set of often very large datasets that can be accessed via Amazon S3. These datasets include the Human Genome Project, the Common Crawl web corpus, Wikipedia data, and Google Books Ngrams. Information on these datasets can be found at

Kaggle: This is a collection of datasets used in machine learning competitions run by Kaggle. Areas include classification, regression, ranking recommender systems, and image analysis. These datasets can be found under the Competitions section at

KDnuggets: This has a detailed list of public datasets, including some of those mentioned earlier. The list is available at

Labeled Faces in the Wild: In 2007, Labeled Faces in the Wild was released in an effort to spur re-search in face recognition, specifically for the problem of face verification with un-constrained images. Since that time, more than 50 papers have been published that improve upon this benchmark in some respect. A remarkably wide variety of innovative methods have been developed to overcome the challenges presented in this database. As performance on some aspects of the benchmark approaches 100% accuracy, it seems appropriate to review this progress, derive what general principles we can from these works, and identify key future challenges in face recognition. In this survey, we review the contributions to LFW for which the authors have provided results to the curators (results found on the LFW results web page). We also review the cross cutting topic of alignment and how it is used in various methods. We end with a brief discussion of recent databases designed to challenge the next generation of face recognition algorithms

We will demonstrate saving and loading models in several languages using the popular MNIST dataset for handwritten digit recognition (LeCun et al., 1998; available from the LibSVM dataset page).

This dataset contains handwritten digits 0–9, plus the ground truth labels.

Stanford Snap

h2o dataset documentation page

THE MNIST DATABASE of handwritten digits



Public Data Sets

Below are links to publicly available data sets and resources. Datasets are such an integral part of data science and algorithms that it’s almost impossible to talk about our space without talking about data. This is a small but growing collection of links with public data.

Open City Datasets

Palo Alto Open Data


20 yrs crime data


Rents & Neighborhoods

Transportation and Travel

Airlines Dataset

So far it contains years 1987-2007 (based on

Data source:

Open flights database

Capital Bikes Share Data

Sciences and Engineering

Elements Of Statistics Learning Data

NASA Open Data

Seismic Data

Weather Public Data


GitHub Archive

Diverse Data Sets

Many Eyes Community Datasets

Kaggle Competitions

UCI Machine Learning Library

Human Activity Recognition Using Smartphones

MLData repository

GitHub Challenge

Yelp Dataset Challenge

Netflix Prize


Stanford Dataset Library

Million Songs Database



Find your favorite dataset!

LIBSVM Dataset Compilation

The Data Page NYU

Public Policy Data

European Open Data (6098 datasets!)

US Open Data

WorldBank Data

Guardian Data

Statistics Netherlands

Quandl 6M Financial, Economics, and Social Datasets



results matching ""

    No results matching ""