Dataset

Platform Dataset

Datalayer ships some knows datasets you can analyse, such as.

  • Titanic.
  • Bank.
  • Miserables.
  • Wine.

Notebook Dataset

https://library.ucsd.edu/dc/object/bb2733859v

Public Dataset

http://catalog.data.gov/dataset

https://open-data.europa.eu

https://www.springboard.com/blog/free-public-data-sets-data-science-project

UCI Machine Learning Repository: This is a collection of almost 300 datasets of various types and sizes for tasks including classification, regression, clustering, and recommender systems. The list is available at http://archive.ics.uci.edu/ml

Amazon AWS public datasets: This is a set of often very large datasets that can be accessed via Amazon S3. These datasets include the Human Genome Project, the Common Crawl web corpus, Wikipedia data, and Google Books Ngrams. Information on these datasets can be found at http://aws.amazon.com/publicdatasets.

Kaggle: This is a collection of datasets used in machine learning competitions run by Kaggle. Areas include classification, regression, ranking recommender systems, and image analysis. These datasets can be found under the Competitions section at http://www.kaggle.com/competitions https://www.kaggle.com/datasets

KDnuggets: This has a detailed list of public datasets, including some of those mentioned earlier. The list is available at http://www.kdnuggets.com/datasets/index.html

Labeled Faces in the Wild: http://vis-www.cs.umass.edu/lfw In 2007, Labeled Faces in the Wild was released in an effort to spur re-search in face recognition, specifically for the problem of face verification with un-constrained images. Since that time, more than 50 papers have been published that improve upon this benchmark in some respect. A remarkably wide variety of innovative methods have been developed to overcome the challenges presented in this database. As performance on some aspects of the benchmark approaches 100% accuracy, it seems appropriate to review this progress, derive what general principles we can from these works, and identify key future challenges in face recognition. In this survey, we review the contributions to LFW for which the authors have provided results to the curators (results found on the LFW results web page). We also review the cross cutting topic of alignment and how it is used in various methods. We end with a brief discussion of recent databases designed to challenge the next generation of face recognition algorithms

http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki

http://www.quora.com/Data/Where-can-I-get-large-datasets-open-to-the-public

http://www.commoncrawl.org

http://archive.ics.uci.edu/ml

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets

We will demonstrate saving and loading models in several languages using the popular MNIST dataset for handwritten digit recognition (LeCun et al., 1998; available from the LibSVM dataset page).

This dataset contains handwritten digits 0–9, plus the ground truth labels.

https://github.com/infochimps/chimpmark

http://www.infochimps.com/marketplace

http://wiki.apache.org/cassandra/LargeDataSetConsiderations

http://epp.eurostat.ec.europa.eu/portal/page/portal/energy/data/main_tables

http://research.microsoft.com/en-us/projects/mslr

http://www.netflixprize.com/index

http://www.openbelgium.be

http://aws.amazon.com/publicdatasets

http://snap.stanford.edu/snap/index.html

http://lemurproject.org/clueweb09

http://archive.ics.uci.edu/ml

http://archive.ics.uci.edu/ml/datasets.html

http://www.readwriteweb.com/archives/where_to_find_open_data_on_the.php

https://archive.org/details/stackexchange

http://blog.luckyoyster.com/post/33592990831/data-mining-the-web-100-worth-of-priceless

http://data.gov.be

http://statbel.fgov.be

Stanford Snap https://snap.stanford.edu/data

h2o dataset documentation page

https://github.com/h2oai/h2o-2/wiki/Hacking-Airline-DataSet-with-H2O

http://stat-computing.org/dataexpo/2009/the-data.html

http://h2o.ai/docs/master/datasets

THE MNIST DATABASE of handwritten digits

http://yann.lecun.com/exdb/mnist

https://samoa.incubator.apache.org/documentation/Getting-Started.html

http://downloads.sourceforge.net/project/moa-datastream/Datasets/Classification/covtypeNorm.arff.zip

https://github.com/vega/vega-datasets.git

http://www.gutenberg.org/wiki/Category:Bookshelf

https://de.dariah.eu/tatom/datasets.html

http://www.e-bookweb.nl/index.php?action=extra&extra=A_gratis_nederlandse_ebooks_in_txt_formaat&lang=NL

Below are links to publicly available data sets and resources. Datasets are such an integral part of data science and algorithms that it’s almost impossible to talk about our space without talking about data. This is a small but growing collection of links with public data.

Open City Dataset

Palo Alto Open Data http://www.cityofpaloalto.org/gov/depts/it/open_data/default.asp

Chicago https://data.cityofchicago.org

20 yrs crime data https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

NYC https://nycopendata.socrata.com

Rents & Neighborhoods http://www.huduser.org/portal/datasets/HUD_data_matrix.html

Transportation and Travel

Airlines Dataset

http://stat-computing.org/dataexpo/2009/the-data.html

So far it contains years 1987-2007 (based on http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html)

Data source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236

Open flights database http://openflights.org/data.html

Capital Bikes Share Data https://www.capitalbikeshare.com/trip-history-data

Sciences and Engineering Dataset

Elements Of Statistics Learning Data http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html

NASA Open Data http://data.nasa.gov/

Seismic Data http://sioseis.ucsd.edu/segy.header.html

Weather Public Data http://OpenWeatherMap.org http://OpenMeteoData.org

NIST http://srdata.nist.gov/gateway/gateway?dblist=0

GitHub Archive http://www.githubarchive.org

Diverse Dataset

Many Eyes Community Datasets http://www-958.ibm.com/software/analytics/manyeyes

Kaggle Competitions http://www.kaggle.com

UCI Machine Learning Library http://archive.ics.uci.edu/ml/datasets.html

Human Activity Recognition Using Smartphones http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

MLData repository http://mldata.org

GitHub Challenge https://github.com/blog/1450-the-github-data-challenge-ii

Yelp Dataset Challenge https://www.yelp.com/dataset_challenge

Netflix Prize http://stackoverflow.com/questions/1407957/netflix-prize-dataset

Infochimps http://www.infochimps.com/

Stanford Dataset Library http://snap.stanford.edu/data/index.html

Million Songs Database http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset

Caret http://caret.r-forge.r-project.org/datasets.html

RevolutionR http://www.revolutionanalytics.com/subscriptions/datasets

Find your favorite dataset! http://www.inside-r.org/howto/finding-data-internet

LIBSVM Dataset Compilation http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets

The Data Page NYU http://people.stern.nyu.edu/adamodar/New_Home_Page/data.html

Public Policy Dataset

European Open Data (6098 datasets!) http://open-data.europa.eu/en

US Open Data http://www.data.gov http://www.data.gov/opendatasites

WorldBank Data http://data.worldbank.org/data-catalog

Guardian Data http://www.guardian.co.uk/news/datablog/interactive/2013/jan/14/all-our-datasets-index

Statistics Netherlands http://www.cbs.nl/en-GB/menu/home/default.htm?Languageswitch=on

Quandl 6M Financial, Economics, and Social Datasets http://www.quandl.com

Other Dataset

http://grouplens.org/datasets/movielens

http://thematicmapping.org

http://thematicmapping.org/downloads/world_borders.php

Geo Dataset

https://gisgeography.com/top-6-free-lidar-data-sources

https://environment.data.gov.uk/ds/survey/#/survey

http://enfarchsoc.org/opendata

https://www.3dlasermapping.com/blog-post/3d-laser-mapping-release-open-source-lidar-data

http://www.thedirtdoctors.com/lidar-becomes-open-data-for-england

results matching ""

    No results matching ""