首页
学习
活动
专区
圈层
工具
发布
社区首页 >专栏 >【翻译】什么是 文件 IO

【翻译】什么是 文件 IO

作者头像
早起的鸟儿有虫吃
发布2025-06-08 15:38:11
发布2025-06-08 15:38:11
3280
举报

 什么是 IO?

在应用程序中,我们从外部设备读取或写入的数据可以视为 IO

The data that we read or write from external devices in the application can be treated as an IO. 

例如,将数据读写到磁盘、通过网络,甚至从任何外部设备读取/写入数据都属于 IO 操作。

For instance the data is read/written to disk, or over the Network, or even from any external devices are all IO operations.

note:

  • 关注 io 读写过程,默认阻塞的
  • 如果read(设备1)是阻塞的,那么只要设备1没有数据到达就会一直阻塞在设备1的read调用上

IO 为什么慢?Why is IO slow?

IO 速度慢的主要原因之一是设备硬件的物理限制,其次是处理器速度与 IO 不匹配

One of the main reason for slow IO is the physical limitations of the device hardware. Even though there is a lot of evolution that has gone in the IO devices whether it is storage/network or other IO devices, it has not matched up to processors speed due to many reasons.

首先,计算机架构的设计方式,包括数据/地址/控制总线,特别是传输数据的数据总线,本身就存在局限性,这严重制约了 IO 的性能。例如,在最初的总线架构中,数据总线由 CPU 和 DMA 共享,这导致了 IO 性能的下降

  • To begin with, the way computer architecture is deviced,there is an intrinsic limitaion in the way “buses” including data/address/control and specifically data bus that carry data, impose a lot on restriction on the performance of the IO. For instance in the initial BUS architecture , the data bus was shared between the CPU and DMA which accounted for the reduced IO performance.
图片
图片

双总线结构

  • 结构:双总线结构有两条总线,一条是主存总线,用于 CPU、主存和通道之间进行数据传送;另一条是 I/O 总线,用于多个外部设备与通道之间进行数据传送。
  • 优点:将较低速的 I/O 设备从单总线上分离出来,实现存储器总线和 I/O 总线分离
图片
图片

IO 设备的物理特性也会导致 IO 速度较慢,

例如磁盘。磁盘的构造方式包括机械读写头、电机上的旋转磁盘以及用于存储数据的磁性材料。

磁盘旋转到特定磁道所需的时间(旋转延迟)以及将读写头定位到正确扇区所需的时间(寻道时间),而延迟是从该扇区读取数据所需的时间。

The physical nature of the IO devices also account for slower IO.

for example. Magnetic disks.The way magnetic disks are built with mechanical read write heads and spinning disk on a motor and the magnetic material for storing data. 

The time taken to rotate the disk to a particular track (rotational latency ) and the time taken to position the read write head on the correct sector(seek time) and the latncy is the time taken to read the data from that sector.

图片
图片

Why do we care so much about the performance of IO? 我们为什么这么关心 IO 的性能?

Roughly the ratio of IO vs Compute is usually 90:10 where the application spends most of the time in IO.

 Even a minorimprovements in the IO operation will result in massive performance improvements from the application perspective.

IO 与计算的比例通常大致为 90:10,其中应用程序大部分时间都花在 IO 上。

即使 IO 操作的微小改进,从应用程序的角度来看,也会带来巨大的性能提升。

Everything in linux is considered as a file, Why is that! “Linux 中的一切都被视为文件”,这是为什么呢!

The idea(概念)of a file is an extremely important property(特性) of Linux,

 where input/output resources such as your documents, directories (folders in Mac OS X and Windows), keyboard, monitor, hard-drives, removable media, printers, modems, virtual terminals and also inter-process and network communication are streams of bytes defined by file system space. 

文件的概念是 Linux 的一个极其重要的特性,Linux 中的输入/输出资源,

例如文档、目录(在 Mac OS X 和 Windows 中称为文件夹)、

键盘、显示器、

硬盘、可移动介质、

打印机、调制解调器、

虚拟终端,

以及进程间和

网络通信,都是由文件系统空间定义的字节流。

File gives a unified and a simple abstraction for applications to facilitate data transfer across Network/storage/IO devices/IPC.

文件为应用程序提供了一个统一且简单的抽象,以便于跨网络/存储/IO 设备/IPC 进行数据传输。

The hardware and software components involved in making IO happen 一个 IO操作 需要硬件和软件2 个部分

Before we get into the details of different ways and optimisations that are in place to make IO more efficient, 

lets have a birds eye view on all the hardware and software components involved in making the IO happen.

在我们深入探讨提高 IO 效率的不同方法和优化措施之前,

让我们先来概览一下实现 IO 所涉及的所有硬件和软件组件。

图片
图片
  • CPU: Executes the IO APIs and system calls used in the application.

 Handles interrupts from the IO devices.

CPU :执行应用程序中使用的 IO API 和系统调用。处理来自 IO 设备的中断。

  • Filesystem : A filesystem is the methods and data structures that an operating system uses to keep track of files on a disk or partition
  • File system is a way of organising the files. File system facilitates all the operations on the file by managing all the metadata associated with the files. Different filesystems including ext2,ext3,ext4,NTFS,FAT32 etc have different ways of handling this. 文件系统 :文件系统是操作系统用来跟踪磁盘或分区上文件的方法和数据结构。文件系统是一种组织文件的方式。文件系统通过管理与文件相关的所有元数据来简化对文件的所有操作。不同的文件系统,包括 ext2、ext3、ext4、NTFS、FAT32 等,都有不同的处理方式。
  • VFS: VFS is a way to abstract out the implementation details of file systems, and arrive at a common skeleton for implementing a system call across file systems.Note that the file operations are implemented by the specific filesystem in which the inode resides. When opening a device node (character or block special) most filesystems will call special support routines in the VFS which will locate the required device driver information. These support routines replace the filesystem file operations with those for the device driver, and then proceed to call the new open() method for the file. This is how opening a device file in the filesystem eventually ends up calling the device driver open() method. VFS:VFS 是一种抽象文件系统实现细节的方法,它能为跨文件系统实现系统调用提供一个通用框架。需要注意的是,文件操作由 inode 所在的特定文件系统实现。打开设备节点(字符或块)时,大多数文件系统会调用 VFS 中的特殊支持例程,这些例程会定位所需的设备驱动程序信息。这些支持例程会将文件系统文件操作替换为设备驱动程序的操作,然后继续调用该文件的新 open() 方法。这就是在文件系统中打开设备文件最终会调用设备驱动程序 open() 方法的原因。
  • APIs

IO related APIs enable the applications to perform read/write on the IO devices. An Api call can result in multiple system calls.

API :IO 相关的 API 使应用程序能够对 IO 设备执行读/写操作。

一个 API 调用可以引发多个系统调用。

名称:高层视角来看,系统调用是内核向用户应用程序提供的“服务”,它们类似于库AP

  • System Calls :
  • MMU,IOMMU: Depending upon the way IO devices are addressed MMU/IOMMU plays a major role. Most of the current day architectures use memory mapped IO and all applications refer to the mapped IO devices using virtual addresses.
  •  MMU mapps the virtual addresses from CPU into physical address. On the contrary ,
  •  IOMMU maps the virtual addresses from device to physical address. What MMU does to CPU, I
  • OMMU does the same to devices. [More on IO addressing and virtual address mapping later in the post] MMU、IOMMU :根据 IO 设备的寻址方式,MMU/IOMMU 起着重要作用。目前大多数架构都使用内存映射 IO,所有应用程序都使用虚拟地址引用映射的 IO 设备。

MMU 将 CPU 的虚拟地址映射到物理地址。相反,IOMMU 将设备的虚拟地址映射到物理地址。

MMU 对 CPU 的操作,IOMMU 对设备的操作也一样。[稍后将详细介绍 IO 寻址和虚拟地址映射]

  • North Bridge,South Bridge interface: 北桥、南桥接口
    • The Northbridge[ memory controller hub] typically handles communications among the CPU, RAM, BIOS ROM, and PCI Express (or AGP) video cards, and the Southbridge. 北桥(内存控制器集线器)通常处理 CPU、RAM、BIOS ROM、PCI Express(或 AGP)视频卡和南桥之间的通信。
    • The southbridge [Input/Output Controller Hub ],The Southbridge typically implements the “slower” capabilities of the motherboard. The Southbridge can usually be distinguished from the Northbridge by not being directly connected to the CPU. Rather, the Northbridge ties the Southbridge to the CPU. Through the use of controller integrated channel circuitry, the Northbridge can directly link signals from the I/O units to the CPU for data control and access. 南桥(输入/输出控制器中枢),通常用于实现主板上“较慢”的功能。南桥通常与北桥的区别在于,南桥不直接连接到 CPU。相反,北桥将南桥连接到 CPU。通过使用控制器集成通道电路,北桥可以将来自 I/O 单元的信号直接连接到 CPU,以进行数据控制和访问。
    • North and south bridge refer to the data channels to the CPU, memory and Hard disk data goes to CPU using the Northbridge. 
    • And the mouse, keyboard, CD ROM external data flows to the CPU using the Southbridge. 
    • 南北桥指的是连接 CPU、内存和硬盘的数据通道,北桥负责将数据传送到 CPU
    • However in the recent architectures, the functionality previously provided by the North Bridge has now been integrated into the CPU. 。而鼠标、键盘、CD-ROM 等外部设备的数据则通过南桥传送到 CPU。然而,在最近的架构中,之前由北桥提供的功能现在已被集成到 CPU 中。
图片
图片
  • BUS standard architectures(PCI,PCIe,SATA,IDE) [Expansion Buses/ IO buses] Buses are the veins of the computer. Efficiency in handling buses dictate the efficiency of the IO. 
  • The Buses we are refering hers is the IO bus that connect IO devices to CPU. These devices connect to the system bus via a ‘south bridge’ implemented in the processors’ chipset. These are some of the common expansion bus types that have ever been used in computers: 总线标准架构 (PCI、PCIe、SATA、IDE) [扩展总线/IO 总线] 总线是计算机的血管。

总线处理效率决定了输入输出(IO)的效率。我们这里所说的总线是指连接输入输出设备到 CPU 的 IO 总线。这些设备通过处理器芯片组中的“南桥”连接到系统总线。

  • Host Bus adaptersIn computer hardware, a host controller, host adapter, or host bus adapter (HBA) connects a computer, which acts as the host system, to other network and storage devices. The terms are primarily used to refer to devices for connecting SCSI, Fibre Channel and SATA devices. Devices for connecting to IDE, Ethernet, FireWire, USB and other systems may also be called host adapters.Host adapters can be integrated in the motherboard or be on a separate expansion card. Some Host Bus Adapters are integrated circuit boards that are plugged into PCI. 主机总线适配器在计算机硬件中,主机控制器、主机适配器或主机总线适配器 (HBA) 将作为主机系统的计算机连接到其他网络和存储设备。
  • 这些术语主要用于指连接 SCSI、光纤通道和 SATA 设备的设备。用于连接 IDE、以太网、FireWire、USB 和其他系统的设备也可能被称为主机适配器。主机适配器可以集成在主板上,也可以安装在单独的扩展卡上。一些主机总线适配器是插入 PCI 的集成电路板。

note:主要各种线路上,跳过

  • IO Controllers (NIC controllers,Disk controllers, keyboard controllers) IO Controllers provides an interface between the disk drive and the bus connecting it to the rest of the system. 
  • 输入输出控制器 (网卡控制器、磁盘控制器、键盘控制器)输入输出控制器在磁盘驱动器和连接其与系统其他部分的总线之间提供接口。
  • IO Controllers control the read write head and interpret the commands from the host adapters. The IO controllers are usually embedded in the IO devices. The component that allows a computer to talk to a peripheral bus is host adapter or host bus adapter (HBA). 输入输出控制器控制读写头并解释来自主机适配器的命令。输入输出控制器通常嵌入在输入输出设备中。允许计算机与外围总线通信的组件是主机适配器或主机总线适配器 (HBA) 【负责功能实现】
  •  On the other hand, a disk controller allows a disk to talk to the same bus. Those two are often confused, especially in the PC world. In fact signals read by a disk read-and-write head are converted by a disk controller, then transmitted over the peripheral bus, then converted again by the host adapter into the suitable format for the motherboard’s bus, and then read by the CPU.The term network interface controller (NIC) is more often used for devices connecting to computer networks, while the term converged network adapter can be applied when protocols such as iSCSI or Fibre Channel over Ethernet allow storage and network functionality over the same physical connection.
  • 另一方面,磁盘控制器允许磁盘与同一总线通信。这两者经常被混淆,尤其是在 PC 领域。实际上,磁盘读写头读取的信号由磁盘控制器转换,然后通过外围总线传输,然后由主机适配器再次转换为适合主板总线的格式,最后由 CPU 读取。术语“网络接口控制器 (NIC)”更常用于连接到计算机网络的设备,而术语“融合网络适配器”则适用于 iSCSI 或以太网光纤通道等协议允许通过同一物理连接实现存储和网络功能的情况。
  • Device driverIn computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer. 
  • 设备驱动程序 在计算领域,设备驱动程序是一种计算机程序,用于操作或控制连接到计算机的特定类型的设备
  • A driver provides a software interface to hardware devices, enabling operating systems and other computer programs to access hardware functions without needing to know precise details about the hardware being used. 驱动程序为硬件设备提供软件接口,使操作系统和其他计算机程序能够访问硬件功能,而无需了解所用硬件的具体细节。
  • DMA DMA allows computer devices of certain hardware sub-systems to directly access system memory and other device’s memory independently of the CPU. 
  • 直接接入 DMA 允许某些硬件子系统的计算机设备独立于 CPU 直接访问系统内存和其他设备的内存。
  • This enables the CPU to keep on working concurrently on other task while long lasting memory operations take place; 
  • considerably boosting overall system performance.
  • DMA is used by different hardware like graphics cards, sound cards, network cards and disk drive controllers.
  • 这使得 CPU 能够在执行长时间内存操作的同时继续并发执行其他任务,从而显著提升系统整体性能。DMA 被各种硬件所使用,例如显卡、声卡、网卡和磁盘驱动器控制器。
  •  DMA is rather a concept than a specific technology. There is no specification which describes in detail how DMA transfers work. Even on the contrary, the concept of directly accessing memory without CPU interaction is employed in many different hardware sub-systems in today’s computers. The most typical application is communicating with peripheral devices plugged into a bus system like ATA, SATA, PCI or PCI Express. Beyond that, DMA transfers are used for intra-core communication in micro processors and even to copy data from the memory of one computer into the memory of another computer over the network via remote DMA (don’t mix up this technology with NVIDIA’s new GPUDirect RDMA feature).(refer1,refer2 for more details)
  • DMA 与其说是一项具体的技术,不如说是一个概念。目前还没有规范详细描述 DMA 传输的工作原理。即便如此,这种无需 CPU 交互即可直接访问内存的概念在当今计算机的许多不同硬件子系统中都有应用。最典型的应用是与插入 ATA、SATA、PCI 或 PCI Express 等总线系统的外围设备进行通信。除此之外,DMA 传输还用于微处理器的内核内通信,甚至可以通过远程 DMA 通过网络将数据从一台计算机的内存复制到另一台计算机的内存中(不要将此技术与 NVIDIA 新的 GPUDirect RDMA 功能混淆)。
  • Interruptsand IRQ lines from devices to I/O Advanced Programmable Interrupt Controller (I/O APIC). 
  • An interrupt is simply a signal that the hardware can send when it wants the processor’s attention.
  • 中断只是硬件需要处理器注意时发出的信号
  • The devices can use the IRQ (interrupt request lines) line to raise an interrupt when the IO is complete or if the data is availble (NIC) or if there is a catostropic error in an IO device. 
  • 当 IO 完成、数据可用(例如网卡)或 IO 设备出现 CATOSTROC 错误时,设备可以使用 IRQ(中断请求线)线发出中断。

IO Devices addressingIO 设备寻址

There are 2 ways in which IO Devices addressing is done IO 设备寻址有两种方式

  • Memory mapped IO内存映射 IO
  • Port mapped IO端口映射 IO

Memory Mapped:内存映射:

Port mapped IO端口映射 IO

CPU and DMA addresses CPU 和 DMA 地址

There are several kinds of addresses involved in the DMA API, and it’s important to understand the differences. DMA API 涉及几种地址,了解它们之间的区别非常重要。

The kernel normally uses virtual addresses. Any address returned by kmalloc(), vmalloc(), and similar interfaces is a virtual address. 内核通常使用虚拟地址。kmalloc()、vmalloc() 以及类似接口返回的任何地址都是虚拟地址。

The virtual memory system (TLB, page tables, etc.) translates virtual addresses to CPU physical addresses. The kernel manages device resources like registers as physical addresses. These are the addresses in /proc/iomem. The physical address is not directly useful to a driver; it must use ioremap() to map the space and produce a virtual address. 虚拟内存系统(TLB、页表等)将虚拟地址转换为 CPU 物理地址。内核将寄存器等设备资源作为物理地址进行管理。这些地址位于 /proc/iomem 中。物理地址对驱动程序并不直接有用;它必须使用 ioremap() 来映射空间并生成虚拟地址。

I/O devices use a third kind of address: a “bus address”. If a device has registers at an MMIO address, or if it performs DMA to read or write system memory, the addresses used by the device are bus addresses. In some systems, bus addresses are identical to CPU physical addresses, but in general they are not. IOMMUs and host bridges can produce arbitrary mappings between physical and bus addresses. I/O 设备使用第三种地址:“总线地址”。如果设备在 MMIO 地址上有寄存器,或者它执行 DMA 来读写系统内存,则该设备使用的地址是总线地址。在某些系统中,总线地址与 CPU 物理地址相同,但通常情况下并非如此。IOMMU 和主桥可以在物理地址和总线地址之间建立任意映射。

From a device’s point of view, DMA uses the bus address space, but it may be restricted to a subset of that space. For example, even if a system supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU so devices only need to use 32-bit DMA addresses. 从设备的角度来看,DMA 使用总线地址空间,但它可能被限制在该空间的子集内。例如,即使系统支持主内存和 PCI BAR 的 64 位地址,它也可能使用 IOMMU,因此设备只需要使用 32 位 DMA 地址。

Here’s a picture and some examples:: 这是一张图片和一些示例:

图片
图片

During the enumeration process, the kernel learns about I/O devices and their MMIO space and the host bridges that connect them to the system. For example, if a PCI device has a BAR, the kernel reads the bus address (A) from the BAR and converts it to a CPU physical address (B). The address B is stored in a struct resource and usually exposed via /proc/iomem. When a driver claims a device, it typically uses ioremap() to map physical address B at a virtual address (C). It can then use, e.g., ioread32(C), to access the device registers at bus address A.

在枚举过程中,内核会了解 I/O 设备及其 MMIO 空间,以及将它们连接到系统的主桥。

例如,如果一个 PCI 设备有一个 BAR,内核会从 BAR 中读取总线地址 (A),并将其转换为 CPU 物理地址 (B)。

地址 B 存储在结构体资源 (struct resource) 中,通常通过 /proc/iomem 公开。当驱动程序声明一个设备时,它通常使用 ioremap() 将物理地址 B 映射到虚拟地址 (C)。然后,它可以使用例如 ioread32(C) 来访问总线地址 A 处的设备寄存器。

If the device supports DMA, the driver sets up a buffer using kmalloc() or a similar interface, which returns a virtual address (X). 

The virtual memory system maps X to a physical address (Y) in system RAM. The driver can use virtual address X to access the buffer, but the device itself cannot because DMA doesn’t go through the CPU virtual memory system.

如果设备支持 DMA,驱动程序会使用 kmalloc() 或类似的接口设置一个缓冲区,该接口返回一个虚拟地址 (X)。

虚拟内存系统会将 X 映射到系统 RAM 中的物理地址 (Y)。

驱动程序可以使用虚拟地址 X 访问缓冲区,但设备本身无法访问,因为 DMA 不经过 CPU 虚拟内存系统

In some simple systems, the device can do DMA directly to physical address Y. But in many others, there is IOMMU hardware that translates DMA addresses to physical addresses, e.g., it translates Z to Y. This is part of the reason for the DMA API: the driver can give a virtual address X to an interface like dma_map_single(), which sets up any required IOMMU mapping and returns the DMA address Z. The driver then tells the device to do DMA to Z, and the IOMMU maps it to the buffer at address Y in system RAM. 在一些简单的系统中,设备可以直接对物理地址 Y 执行 DMA。

但在许多其他系统中,有 IOMMU 硬件将 DMA 地址转换为物理地址,例如,将 Z 转换为 Y。这是 DMA API 的部分原因:驱动程序可以将虚拟地址 X 提供给像 dma_map_single() 这样的接口,该接口设置任何所需的 IOMMU 映射并返回 DMA 地址 Z。然后驱动程序告诉设备对 Z 执行 DMA,IOMMU 将其映射到系统 RAM 中地址 Y 的缓冲区。

Application Buffering , page cache , Memory mapped IO , DirectIO 应用程序缓冲、页面缓存、内存映射 IO、DirectIO

Application Buffering or User Space Buffering 应用程序缓冲或用户空间缓冲

First a buffer space is allocated in the user space, and the reference of this buffer is passed to the syscall fread(). 首先在用户空间分配一个缓冲区空间,并将该缓冲区的引用传递给系统调用 fread()。

  • For read operation, first the page cache [kernel buffer] is checked. if it is available it is copied to the buffer in the user space. 对于读取操作,首先检查页面缓存[内核缓冲区]。如果可用,则将其复制到用户空间中的缓冲区。
    • If the data is not available in the page cache, the data is fetched from the IO device and cached in the page cache before coping to the buffer in user space. 如果页面缓存中没有数据,则从 IO 设备获取数据并将其缓存在页面缓存中,然后再复制到用户空间的缓冲区。
  • Similarly for write, first the data is written to the buffer in user space, and the reference of this buffer is passed to the syscall fwrite(). 类似地,对于写入,首先将数据写入用户空间中的缓冲区,然后将该缓冲区的引用传递给系统调用 fwrite()。
    • The data is written to the page cache [kernel buffer] and marked as dirty and the write is succeeded. Periodically the dirty pages are synced to the file. However we can manually trigger this sync using fsync() system call. 数据被写入页缓存(内核缓冲区),并被标记为脏页,写入操作成功。脏页会定期同步到文件。不过,我们可以使用 fsync() 系统调用手动触发此同步。

The three types of buffering available are unbuffered, block buffered, and line buffered. 可用的缓冲类型有三种:无缓冲、块缓冲和行缓冲。

  • When an output stream is unbuffered, information appears on the destination file or terminal as soon as written; 当输出流没有缓冲时,信息一写入就会出现在目标文件或终端上;
  • when it is block buffered many characters are saved up and written as a block; 当它是块缓冲时,许多字符被保存并作为一个块写入;
  • when it is line buffered characters are saved up until a newline is output or input is read from any stream attached to a terminal device(typically stdin) 当它是行缓冲时,字符会被保存,直到输出换行符或从连接到终端设备的任何流(通常是 stdin)读取输入

Page caches页面缓存

The IO data is stored in 2 places once in page cache and again in user space buffer.One key thing to notice is that, Pages marked dirty will be flushed to disk as since their cached representation is now different from the one on disk. This process is called writeback. writeback might have potential drawbacks, such as queuing up IO requests, so it’s worth understanding thresholds and ratios that used for writeback when it’s in use and check queue depths to make sure you can avoid throttling and high latencies. IO 数据存储在两个地方:一个在页面缓存中,另一个在用户空间缓冲区中。需要注意的关键一点是,标记为脏的页面将被刷新到磁盘,因为它们在缓存中的表示形式现在与磁盘上的表示形式不同。此过程称为写回。写回可能存在潜在的缺点,例如排队 IO 请求,因此有必要了解使用写回时的阈值和比率,并检查队列深度,以确保可以避免节流和高延迟。

  • Page Cache facilitates temoporal and spatial locality principles which means recently accessed and closely located data are accessed more often. 页面缓存有利于时间和空间局部性原理,这意味着最近访问和位置接近的数据会被更频繁地访问。
  • Page Cache also improves IO performance by delaying writes and coalescing adjacent reads. 页面缓存还通过延迟写入和合并相邻读取来提高 IO 性能。
  • Since all IO operations are happening through Page Cache, operations sequences such as read-write-read can be served from memory, without subsequent disk accesses 由于所有 IO 操作都通过页面缓存进行,因此诸如读-写-读之类的操作序列可以从内存中完成,而无需后续的磁盘访问

Direct IO直接输入输出

With The direct IO, the intermediate page cache or the kernel buffer is eliminated and the application interacts directly with the device. Directly here means it still invlovves kernel for IO interactions. Only catch here is that the kernel reads/writes data immediately to the device. 使用直接 IO,中间页缓存或内核缓冲区被消除,应用程序直接与设备交互。这里的直接意味着它仍然需要内核进行 IO 交互。唯一的问题是内核会立即将数据读写到设备上。

  • This might result into performance degradation, since the Kernel buffers and caches the writes, allowing for sharing the cache contents between application. When used well, can result in major performance gains and improved memory usage. 这可能会导致性能下降,因为内核会缓冲并缓存写入操作,从而允许在应用程序之间共享缓存内容。如果使用得当,可以显著提升性能并改善内存使用率。
  • Developers can ensure fine-grained control over the data access , possibly using a custom IO Scheduler and an application-specific Buffer Cache. 开发人员可以确保对数据访问进行细粒度的控制,可能使用自定义 IO 调度程序和特定于应用程序的缓冲区缓存。
  • Because Direct IO involves direct access to backing store, bypassing intermediate buffers in Page Cache, it is required that all operations are aligned to sector boundary.When using Page Cache, because writes first go to memory, alignment is not important: when actual block device write is performed, Kernel will make sure to split the page into parts of the right size and perform aligned writes towards hardware. 由于直接 IO 涉及直接访问后备存储,绕过页面缓存中的中间缓冲区,因此要求所有操作都与扇区边界对齐。当使用页面缓存时,由于写入首先进入内存,因此对齐并不重要:当执行实际的块设备写入时,内核将确保将页面分成正确大小的部分,并对硬件执行对齐的写入。
  • Whether or not O_DIRECT flag is used, it is always a good idea to make sure your reads and writes are block aligned. Crossing segment boundary will cause multiple sectors to be loaded from (or written back on) disk. 无论是否使用 O_DIRECT 标志,确保读写操作与块对齐始终是一个好主意。跨越段边界将导致从磁盘加载(或写回)多个扇区。

Memory mapped IO (mmap) 内存映射 IO(mmap)

With mmap, the page cache is mapped directly to the user space buffer, avoiding additional copying from page cache to user space buffer. 使用 mmap,页面缓存直接映射到用户空间缓冲区,

避免从页面缓存到用户空间缓冲区的额外复制。

  • mmap file can be in private mode or shared mode.
  • Private mapping allows reading from the file, but any write would trigger copy-on-write of the page in question in order to leave the original page intact and keep the changes private, so none of the changes will get reflected on the file itself. 
  • 私有映射允许从文件中读取数据,但任何写入操作都会触发相关页面的写时复制,以保持原始页面的完整性并保持更改的私有性,因此任何更改都不会反映在文件本身上。
  • In shared mode, the file mapping is shared with other processes so they can see updates to the mapped memory segment.
  • 在共享模式下,文件映射与其他进程共享,因此它们可以看到对映射内存段的更新。 mmap 文件可以处于私有模式或共享模式。
  • Unless specified otherwise, file contents are not loaded into memory right away, but in a lazy manner. 
  • 除非另有说明,文件内容不会立即加载到内存中,而是以惰性加载的方式加载。
  • Space required for the memory mapping is reserved, but is not allocated right away. The first read or write operation results in a page-fault, triggering the allocation of the appropriate page. By passing MAP_POPULATE it is possible to pre-fault the mapped area and force a file read-ahead. 内存映射所需的空间会被保留,但不会立即分配。首次读取或写入操作会导致页面错误,从而触发相应页面的分配。通过传递 MAP_POPULATE 参数,可以预先将映射区域设置为错误状态,并强制进行文件预读。

Final thoughts  最后的想法

We started off with highlighting different hardware and software components that make IO possible.

我们首先重点介绍了实现 IO 的各种硬件和软件组件。

DMA是一种高速的数据传输操作,DMA在DMA控制器的控制下,实现让存储器与外设、外设与外设之间直接交换数据,中间不需要经过CPU的累加器中转,并且内存地址的修改、传送完毕的结束报告都是由硬件电路(DMA控制器)实现的,CUP除了在数据传输开始和结束时进行中断处理外

Later we discussed the different ways of addressing IO devices. 

随后,我们讨论了寻址 IO 设备的不同方式。

Later we concluded with different ways IO is done.

最后,我们总结了 IO 实现的不同方式。

References参考

  • Linux Device Drivers, 3rd edition Linux 设备驱动程序,第三版
  • Understanding Linux Kernel
本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2025-05-27,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 后端开发成长指南 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  •  什么是 IO?
  • IO 为什么慢?Why is IO slow?
  • Why do we care so much about the performance of IO? 我们为什么这么关心 IO 的性能?
  • Everything in linux is considered as a file, Why is that! “Linux 中的一切都被视为文件”,这是为什么呢!
  • The hardware and software components involved in making IO happen 一个 IO操作 需要硬件和软件2 个部分
  • IO Devices addressingIO 设备寻址
    • Memory Mapped:内存映射:
    • Port mapped IO端口映射 IO
  • CPU and DMA addresses CPU 和 DMA 地址
  • There are several kinds of addresses involved in the DMA API, and it’s important to understand the differences. DMA API 涉及几种地址,了解它们之间的区别非常重要。
  • Application Buffering , page cache , Memory mapped IO , DirectIO 应用程序缓冲、页面缓存、内存映射 IO、DirectIO
    • Application Buffering or User Space Buffering 应用程序缓冲或用户空间缓冲
    • Page caches页面缓存
    • Direct IO直接输入输出
    • Memory mapped IO (mmap) 内存映射 IO(mmap)
  • Final thoughts  最后的想法
  • References参考
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档