文章/答案/技术大牛

发布

社区首页 >问答首页 >使用子目录子集的image_dataset_from_directory

问使用子目录子集的image_dataset_from_directory
EN

Stack Overflow用户

提问于 2022-11-09 05:33:35

回答 1查看 36关注 0票数 0

我已经下载了MINC数据集的材料分类，其中包括23个猫库。然而，我只对这些类别的一部分感兴趣(如木材、树叶、玻璃、头发)。

是否可以使用tf.keras.preprocessing.image_dataset_from_directory获取数据的子集？

我尝试过tf.keras.preprocessing.image_dataset_from_directory(folder_dir, label_mode="categorical", class_names=["wood", "foliage", "glass", "hair"])，但是它给出了这个错误，The `class_names` passed did not match the names of the subdirectories of the target directory.

有没有一种方法可以获得目录的子集而不删除或修改文件夹？我知道datagen.flow_from_directory能够做到这一点，但是keras说它是不推荐的，我应该使用image_dataset_from_directory。

python

tensorflow

keras

回答 1

Stack Overflow用户

发布于 2022-11-09 14:13:10

有两种方法，第一种方法是通过生成器来完成这个任务，但是这个过程代价很高，还有另一种方法叫做，使用tf.data进行更精细的控制。您可以在此链接上查看此链接。

https://www.tensorflow.org/tutorials/load_data/images

但是，我将向您展示一个简短的演示，说明如何只加载您选择的文件夹。那么，让我们开始..。

#First import some libraries which are needed
import os
import glob
import tensorflow as tf
import matplotlib.pyplot as plt

我只选两班“猫”和“狗”，你可以上两班以上的课.

batch_size = 32
img_height = 180
img_width = 180

#define your data directory where your dataset is placed

data_dir = path to your datasetfolder

#Now, here define a list of names for your dataset, like I am only loading cats and dogs... you can fill it with more if you have more
dataset_names = ['cats' , 'dogs']

#Now, glob the list of images in these two directories (cats & Dogs)
list_files = [glob.glob(data_dir + images + '/*.jpg') for images in folders]

list_files = list_files[0] + list_files[1]
image_count = len(list_files)

#Now, here pass this list to a tf.data.Dataset
list_files = tf.data.Dataset.from_tensor_slices(list_files)

#Now, define your class names to labels your dataset later...
class_names = ['cats', 'dogs']

#Now, here define the validation, test, train etc.

val_size = int(image_count * 0.2)
train_ds = list_files.skip(val_size)
val_ds = list_files.take(val_size)

#To get labels
def get_label(file_path):
  # Convert the path to a list of path components
  parts = tf.strings.split(file_path, os.path.sep)
  parts = tf.strings.substr(parts, -4, 4)[0]
  one_hot = parts == class_names
  # Integer encode the label
  return tf.argmax(one_hot)

def decode_img(img):
  # Convert the compressed string to a 3D uint8 tensor
  img = tf.io.decode_jpeg(img, channels=3)
  # Resize the image to the desired size
  return tf.image.resize(img, [img_height, img_width])

def process_path(file_path):
  label = get_label(file_path)
  # Load the raw data from the file as a string
  img = tf.io.read_file(file_path)
  img = decode_img(img)
  return img, label

#Use Dataset.map to create a dataset of image, label pairs:
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
train_ds = train_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)

#Configure dataset for performance
def configure_for_performance(ds):
  ds = ds.cache()
  ds = ds.shuffle(buffer_size=1000)
  ds = ds.batch(batch_size)
  ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
  return ds

train_ds = configure_for_performance(train_ds)
val_ds = configure_for_performance(val_ds)

#Visualize the data
image_batch, label_batch = next(iter(train_ds))

plt.figure(figsize=(10, 10))
for i in range(9):
  ax = plt.subplot(3, 3, i + 1)
  plt.imshow(image_batch[i].numpy().astype("uint8"))
  label = label_batch[i]
  plt.title(class_names[label])
  plt.axis("off")

输出：

指向COLAB文件的链接是：

https://colab.research.google.com/drive/1oUNuGVDWDLqwt_YQ80X-CBRL6kJ_YhUX?usp=sharing

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/74370328

复制

相似问题

问使用子目录子集的image_dataset_from_directory
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用子目录子集的image_dataset_from_directoryEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用子目录子集的image_dataset_from_directory
EN