首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >使用子目录子集的image_dataset_from_directory

使用子目录子集的image_dataset_from_directory
EN

Stack Overflow用户
提问于 2022-11-09 05:33:35
回答 1查看 36关注 0票数 0

我已经下载了MINC数据集的材料分类,其中包括23个猫库。然而,我只对这些类别的一部分感兴趣(如木材、树叶、玻璃、头发)。

是否可以使用tf.keras.preprocessing.image_dataset_from_directory获取数据的子集?

我尝试过tf.keras.preprocessing.image_dataset_from_directory(folder_dir, label_mode="categorical", class_names=["wood", "foliage", "glass", "hair"]),但是它给出了这个错误,The `class_names` passed did not match the names of the subdirectories of the target directory.

有没有一种方法可以获得目录的子集而不删除或修改文件夹?我知道datagen.flow_from_directory能够做到这一点,但是keras说它是不推荐的,我应该使用image_dataset_from_directory

EN

回答 1

Stack Overflow用户

发布于 2022-11-09 14:13:10

有两种方法,第一种方法是通过生成器来完成这个任务,但是这个过程代价很高,还有另一种方法叫做,使用tf.data进行更精细的控制。您可以在此链接上查看此链接。

https://www.tensorflow.org/tutorials/load_data/images

但是,我将向您展示一个简短的演示,说明如何只加载您选择的文件夹。那么,让我们开始..。

代码语言:javascript
运行
复制
#First import some libraries which are needed
import os
import glob
import tensorflow as tf
import matplotlib.pyplot as plt

我只选两班“猫”和“狗”,你可以上两班以上的课.

代码语言:javascript
运行
复制
batch_size = 32
img_height = 180
img_width = 180

#define your data directory where your dataset is placed

data_dir = path to your datasetfolder

#Now, here define a list of names for your dataset, like I am only loading cats and dogs... you can fill it with more if you have more
dataset_names = ['cats' , 'dogs']

#Now, glob the list of images in these two directories (cats & Dogs)
list_files = [glob.glob(data_dir + images + '/*.jpg') for images in folders]

list_files = list_files[0] + list_files[1]
image_count = len(list_files)

#Now, here pass this list to a tf.data.Dataset
list_files = tf.data.Dataset.from_tensor_slices(list_files)

#Now, define your class names to labels your dataset later...
class_names = ['cats', 'dogs']

#Now, here define the validation, test, train etc.

val_size = int(image_count * 0.2)
train_ds = list_files.skip(val_size)
val_ds = list_files.take(val_size)

#To get labels
def get_label(file_path):
  # Convert the path to a list of path components
  parts = tf.strings.split(file_path, os.path.sep)
  parts = tf.strings.substr(parts, -4, 4)[0]
  one_hot = parts == class_names
  # Integer encode the label
  return tf.argmax(one_hot)

def decode_img(img):
  # Convert the compressed string to a 3D uint8 tensor
  img = tf.io.decode_jpeg(img, channels=3)
  # Resize the image to the desired size
  return tf.image.resize(img, [img_height, img_width])

def process_path(file_path):
  label = get_label(file_path)
  # Load the raw data from the file as a string
  img = tf.io.read_file(file_path)
  img = decode_img(img)
  return img, label

#Use Dataset.map to create a dataset of image, label pairs:
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
train_ds = train_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)

#Configure dataset for performance
def configure_for_performance(ds):
  ds = ds.cache()
  ds = ds.shuffle(buffer_size=1000)
  ds = ds.batch(batch_size)
  ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
  return ds

train_ds = configure_for_performance(train_ds)
val_ds = configure_for_performance(val_ds)

#Visualize the data
image_batch, label_batch = next(iter(train_ds))

plt.figure(figsize=(10, 10))
for i in range(9):
  ax = plt.subplot(3, 3, i + 1)
  plt.imshow(image_batch[i].numpy().astype("uint8"))
  label = label_batch[i]
  plt.title(class_names[label])
  plt.axis("off")

输出:

指向COLAB文件的链接是:

https://colab.research.google.com/drive/1oUNuGVDWDLqwt_YQ80X-CBRL6kJ_YhUX?usp=sharing

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74370328

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档