Skip to main content
HomeTutorialsSpreadsheets

Datasets from Images

This tutorial will demonstrate how you can make datasets in CSV format from images and use them for Data Science, on your laptop.
Aug 2018  · 4 min read

In machine learning, Deep Learning, Datascience most used data files are in json or CSV, here we will learn about CSV and use it to make a dataset. CSV stands for Comma Separated Values. These database fields have been exported into a format that contains a single line where a comma separates each database record. Files with the .csv extension are similar to plain text files. This allows individuals who do not run the same database applications to share database files between one another.

For this tutorial we have few requirements:

1->Microsoft Excel
2->Few Images
3->Notepad++(I suggest it as it is easy to use, you can try with WordPad, notepad or any text editor)

With this, let’s get started.

Now let's install 'Notepad++', visit this link- https://notepad-plus-plus.org/download/v7.5.6.html and install the version that works well on your device.

For 32-Bit

download 32 bit

For 64-Bit

download 64 bit

Now download a few images

Make a new folder (I named it as a dataset), make a few folders in it and fill those folders with images. I have downloaded car number plates from a few parts of the world and stored them folders.

new dataset
car number plates

Open terminal/Command Prompt in the current directory, i.e., in the folder dataset and run commands that I will be giving. Now I will list out commands for windows users:

Command to get a list of folders and files in your directory:-- dir /b/s

Command to get file names and save to a text file:-- dir /b/s/w *.jpg > "filename.txt"

For Linux users(Ubuntu):

the command to get a list of folders and files in your directory:-- ls -LR

Command to get file names and save to a text file:-- ls -LR *.jpg > files.txt

For Mac OSX: macOS is POSIX compliant, so it contains the usual command line utilities found in Unix environments.

the command to get a list of folders and files in your directory:--ls /b/s

Command to get file names and save to a text file:-- ls /b/s/w*.jpg > filename.txt

terminal

terminal

Here file was named by me as 'filename' here you can anything of your wish like 'names.txt', it will be stored in the directory where you used the command prompt (here I wanted only images stored with extension .JPG so I used *.jpg to call them you can use .jpeg or XML anything depending on your extension of images)

filename
dataset

Now enter ctrl+f and remove main directory details, for excel to pull images into it we need to give details of subdirectories and filenames starting with a " ./ " so we replace first backslash \ with ./ and the second one with /

find next

After making changes to the file, save the text file.

save text file

Now open Microsoft Excel, copy all names in a text file and paste them in excel sheet.

excel

If you want to label images, then make another column named label and fill them depending on how you want to label them. Here I labeled them depending on their country

excel

Now save it with extension CSV(Comma delimited) in the folder(dataset) where you have folders containing images.

csv

Now remove text file from that folder and convert folder which we named dataset to zip file.

You're Done!!!

You have successfully made a dataset in CSV format.

Conclusion

This tutorial provides a quick guide on how to make datasets in CSV format from images for data science. I hope you find this tutorial useful when you want to make a dataset. Hurray!!! You have completed this tutorial. If you have any questions or thoughts on the tutorial, feel free to reach out in the comments below.

Topics

Learn more about Spreadsheets

Course

Data Analysis in Google Sheets

3 hr
12.3K
Learn to use Google Sheets to clean, analyze, and draw insights from data. Discover how to sort, filter, and use VLOOKUP to combine data.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

How to Calculate Percentiles in Excel

In this tutorial, we'll explore what percentiles are, what they are used for, the ways of calculating them in Excel, simple examples of such calculations, and possible errors to be aware of when computing percentiles in Excel.
Elena Kosourova's photo

Elena Kosourova

8 min

tutorial

How to Calculate Factorials in Excel: A Complete Guide

Learn to calculate factorials in Excel with ease. Discover FACT, FACTDOUBLE, GAMMA functions, and more to solve mathematical and real-world problems.
Elena Kosourova's photo

Elena Kosourova

7 min

tutorial

How to Use the XLOOKUP Excel Function with Multiple Criteria

This tutorial discusses the purpose and syntax of the XLOOKUP Excel function, its advantages concerning its predecessors, the two main ways of using XLOOKUP with multiple criteria including their pros and cons, extending functionality through the optional parameters, and the way of running a similar search in older Excel versions.
Elena Kosourova's photo

Elena Kosourova

0 min

tutorial

How to Calculate Confidence Intervals in Excel

A beginner-friendly, comprehensive tutorial on understanding Confidence Interval calculations in Microsoft Excel.
Arunn Thevapalan's photo

Arunn Thevapalan

8 min

tutorial

Monte Carlo Simulation in Excel: A Complete Guide

A beginner-friendly, comprehensive tutorial on performing Monte Carlo Simulation in Microsoft Excel, along with examples, best practices, and advanced techniques.
Arunn Thevapalan's photo

Arunn Thevapalan

9 min

tutorial

Snscrape Tutorial: How to Scrape Social Media with Python

This snscrape tutorial equips you to install, use, and troubleshoot snscrape. You'll learn to scrape Tweets, Facebook posts, Instagram hashtags, or Subreddits.
Amberle McKee's photo

Amberle McKee

8 min

See MoreSee More