我已经下载了MINC数据集的材料分类,其中包括23个猫库。然而,我只对这些类别的一部分感兴趣(如木材、树叶、玻璃、头发)。
是否可以使用tf.keras.preprocessing.image_dataset_from_directory
获取数据的子集?
我尝试过tf.keras.preprocessing.image_dataset_from_directory(folder_dir, label_mode="categorical", class_names=["wood", "foliage", "glass", "hair"])
,但是它给出了这个错误,The `class_names` passed did not match the names of the subdirectories of the target directory.
有没有一种方法可以获得目录的子集而不删除或修改文件夹?我知道datagen.flow_from_directory
能够做到这一点,但是keras说它是不推荐的,我应该使用image_dataset_from_directory
。
发布于 2022-11-09 14:13:10
有两种方法,第一种方法是通过生成器来完成这个任务,但是这个过程代价很高,还有另一种方法叫做,使用tf.data进行更精细的控制。您可以在此链接上查看此链接。
https://www.tensorflow.org/tutorials/load_data/images
但是,我将向您展示一个简短的演示,说明如何只加载您选择的文件夹。那么,让我们开始..。
#First import some libraries which are needed
import os
import glob
import tensorflow as tf
import matplotlib.pyplot as plt
我只选两班“猫”和“狗”,你可以上两班以上的课.
batch_size = 32
img_height = 180
img_width = 180
#define your data directory where your dataset is placed
data_dir = path to your datasetfolder
#Now, here define a list of names for your dataset, like I am only loading cats and dogs... you can fill it with more if you have more
dataset_names = ['cats' , 'dogs']
#Now, glob the list of images in these two directories (cats & Dogs)
list_files = [glob.glob(data_dir + images + '/*.jpg') for images in folders]
list_files = list_files[0] + list_files[1]
image_count = len(list_files)
#Now, here pass this list to a tf.data.Dataset
list_files = tf.data.Dataset.from_tensor_slices(list_files)
#Now, define your class names to labels your dataset later...
class_names = ['cats', 'dogs']
#Now, here define the validation, test, train etc.
val_size = int(image_count * 0.2)
train_ds = list_files.skip(val_size)
val_ds = list_files.take(val_size)
#To get labels
def get_label(file_path):
# Convert the path to a list of path components
parts = tf.strings.split(file_path, os.path.sep)
parts = tf.strings.substr(parts, -4, 4)[0]
one_hot = parts == class_names
# Integer encode the label
return tf.argmax(one_hot)
def decode_img(img):
# Convert the compressed string to a 3D uint8 tensor
img = tf.io.decode_jpeg(img, channels=3)
# Resize the image to the desired size
return tf.image.resize(img, [img_height, img_width])
def process_path(file_path):
label = get_label(file_path)
# Load the raw data from the file as a string
img = tf.io.read_file(file_path)
img = decode_img(img)
return img, label
#Use Dataset.map to create a dataset of image, label pairs:
# Set `num_parallel_calls` so multiple images are loaded/processed in parallel.
train_ds = train_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)
val_ds = val_ds.map(process_path, num_parallel_calls=tf.data.AUTOTUNE)
#Configure dataset for performance
def configure_for_performance(ds):
ds = ds.cache()
ds = ds.shuffle(buffer_size=1000)
ds = ds.batch(batch_size)
ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
return ds
train_ds = configure_for_performance(train_ds)
val_ds = configure_for_performance(val_ds)
#Visualize the data
image_batch, label_batch = next(iter(train_ds))
plt.figure(figsize=(10, 10))
for i in range(9):
ax = plt.subplot(3, 3, i + 1)
plt.imshow(image_batch[i].numpy().astype("uint8"))
label = label_batch[i]
plt.title(class_names[label])
plt.axis("off")
输出:
指向COLAB文件的链接是:
https://colab.research.google.com/drive/1oUNuGVDWDLqwt_YQ80X-CBRL6kJ_YhUX?usp=sharing
https://stackoverflow.com/questions/74370328
复制相似问题