专栏首页Linux内核及编程语言底层相关技术研究聊聊各种操作系统都在使用什么样的格式来存储可执行文件及目标文件

聊聊各种操作系统都在使用什么样的格式来存储可执行文件及目标文件

首先,非常抱歉本公众号断更了很长一段时间,其实这段时间已经积累了不少写作素材,但由于工作上一直比较忙,没有大段的时间可以整理出来,所以就一直耽搁到了现在。

好在快看到了曙光,应该在近期就能结束工作繁忙的状态,这样就可以有更多的时间来更新文章了。

也感谢在这段时间还一直不离不弃的各位朋友,后面我尽量多写一些有深度的原创文章来回馈大家。

不过今天还是偷个懒,把我刚刚读到的一篇比较好的,讲各种操作系统用什么样的格式,来存储可执行文件及目标文件,以及这些格式在各平台上的发展历史,这样的一篇文章,推荐给大家,希望大家再次看到类似格式时,能知道它们到底是怎么样的关系。


Historically the common object file formats in use today, PECOFF, ELF, and Mach-O were developed completely independently for specific operating systems (Windows NT, Unix System V R4 and Mach) and meant to address shortcoming in previous object file formats.

Unix

a.out

The earliest object file and executable format used by Unix was the a.out format. This was a very simple format, pretty much the bare minimum needed for the hardware at the time. It got its name by the default output file name used by Unix assemblers and linkers. If an output file wasn't specified these program would create a file named a.out. Most Unix assemblers and linkers today still do this.

COFF

With Unix System V, AT&T introduced COFF (Common Object File Format) to replace the a.out format. It was also an executable and object file format. The main thing it improved over the a.out format was multiple sections. The a.out format only supported the text and data sections with an implied .bss section. Multiple sections gave tools and applications more flexibility in laying out executables, allowing them to create read-only data sections for example. COFF also added support for shared libraries.

ELF

In order to resolve a number of problems with COFF, AT&T created ELF (Executable and Linkable Format) for Unix System V R4. The main problem with COFF is that wasn't very flexible and well defined, as result it had to be extended in various incompatible ways by various vendors that implemented. It wasn't the "Common" format it was hoped to be. ELF also provides support for position independent shared objects with a more dynamic form of symbol linkage than COFFs shared libraries. It also has a fairly sophisticated and extensible associated debugging format called DWARF.

On Unix machines ELF is the dominant executable and object file format. Only a few operating systems like AIX have stuck with their custom COFF-based formats. All the big open source operating systems, Linux and the BSDs, use ELF, though notably they went directly from a.out based formats to ELF, skipping COFF. Since ELF is an object file format and executable format this means most Unix development tools create ELF objects that are linked together to create ELF executables.

Microsoft

OMF, .COM and MZ .EXE

Under MS-DOS the standard object file format was OMF (Object Module Format) created by Intel for their x86 processors. MS-DOS itself used two different executable format. At first it only supported simple .COM files, just flat binaries like your bootloader, taken from CP/M (like pretty much everything in MS-DOS 1.x). With MS-DOS 2.0, Microsoft added support the "MZ" .EXE executable, so called because it used the two characters MZ as a magic number to identify the file type. The MZ format allowed executables to use multiple segments, while the .COM format effectively limited programs to one single 64K segment. Microsoft's own development tools for MS-DOS, along with most other third-party tools, produced OMF format object files that were linked to create .COM and MZ format executables.

New Executable (LE)

For Windows 1.0 Microsoft created the New Executable (NE) format for Windows executables. The key feature of this format is that it made segments explicit in the executable, showing where they started and where ended. The MZ format supported segments by relocating segment values in the code. Otherwise it was a flat binary file that was loaded into memory in one single continuous chunk, just like .COM file except with the segment references fixed up. Making segments explicit in the executable let Windows load segments separately, and even potentially move them and unload them as necessary. Since Windows, unlike MS-DOS, supported running more than one program at a time, this was very important otherwise memory would quickly become too fragmented. Another important feature the NE format added was support for DLLs (Dynamic Link Libraries).

However Microsoft didn't change the object file format used. Microsoft's development tools for the early 16-bit versions of Windows (along with the 16-bit versions of OS/2) continued to create OMF object files that were linked to create NE format executables. The OMF format was easily extensible, and the NE format only required minor additions, most notably for importing and exporting symbols for DLLs.

Linear Executable (LE)

For Windows/386 2.10 Microsoft created yet another executable format called Linear Executable (LE). This new version of Windows had a 32-bit Virtual Machine Manager (VMM), essentially a simple virtual machine that ran Windows and one or instances of MS-DOS in parallel. To each OS it would look like they had the entire PC to themselves, but really it was only a virtual machine. Since the VMM, and more importantly its drivers (VxDs) were 32-bit they old 16-bit NE format wouldn't work.

I'm not sure what object format used when creating LE executables originally. Later it became standard to use PECOFF object files when creating VxDs, but that format wasn't created until a few years later. They probably used OMF, extended to support 32-bit objects, just like IBM did with OS/2 2.0 which also used a variant of the LE format for its 32-bit executables.

PECOFF

Microsoft introduced PECOFF with Windows NT. While they already a usable 32-bit executable format, the LE format, and OMF supported 32-bit objects, both these formats were tied to the Intel x86 processors. Windows NT was designed to support many kinds of CPUs, and initially it supported MIPS and DEC Alpha CPUs in addition to x86 Intel-based PCs. Rather than adapt LE and OMF to support these other processors, the team developing Windows NT decided to adapt the existing Unix COFF format, which already supported multiple CPUs. My guess is that early on Microsoft Windows NT team were using existing Unix-based development tools to develop Windows NT rather than waiting for Microsoft's separate development tools team(s) to create tools for the other CPUs. Otherwise Windows NT probably would've got adapted versions of the LE and OMF formats, if only to make the development tools team's job easier.

The main feature PECOFF adds over COFF is support for DLLs, something that has been intrinsic to Windows from the start. COFF supported a similar but different shared library mechanism. Microsoft later extended the PECOFF format to support 64-bit CPUs.

Since PECOFF is both an object file format and executable, Microsoft development tools for 32-bit versions of Windows create PECOFF object files that are linked to create PECOFF executables. Notably Borland's tools produced OMF files that were linked to create PECOFF executables, but these days most other tools follow Microsoft's lead (eg. MinGW or Intel's ICC).

OS X

Mach-O

The Mach-O format was created for the Mach kernel, thus the name. This kernel was used in NeXTSTEP which became the basis for Apple's OS X. The Mach-O format was created as alternative to the a.out format used by BSD, which provides most the non-kernel parts of Mach-based operating systems. The main driving force seems to be creating position independent shared libraries and executables. One of the particularities of Mach-O and the operating systems that use it is that all code is required to be position independent, not just shared libraries.

Conclusion

So as you can see different groups of people working on different operating systems have worked independently to create these different formats based on their needs at the time. Early on Microsoft needed formats for support for the Intel x86's peculiar segmentation memory model, and OMF was the only object file format that supported it. Unix needed something different, support for multiple CPUs. When Microsoft needed multiple CPU support they chose base their format on then recently obsoleted COFF format which at the time was presumably implemented on more CPU types than the newer ELF format. Meanwhile the Mach kernel developers were off in their own world, futilely trying to create a workable microkernel. I'm not sure exactly what they were thinking with Mach-O, but they had to replace a.out with something else and I guess COFF wasn't up to the task and ELF didn't exist yet.


另附一些上述格式的官方描述文档。

Windows PECOFF:

https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

Linux ELF:

https://www.man7.org/linux/man-pages/man5/elf.5.html

本文分享自微信公众号 - Linux内核及JVM底层相关技术研究(ytcode),作者:wangyuntao

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2020-06-06

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 京东面试:说说MySQL的架构体系

    虽然他搞java开发好几年了,也一直使用的是MySQL数据库,但是面对这个问题依然是一脸懵逼,还以为面试官要问索引、慢查询、性能优化之类的(因为这些都是网上找点...

    田维常
  • linux内核启动流程分析 - efistub的入口函数

    网上类似标题的文章很多,但大都是从start_kernel讲起,我觉得这是远远不够的。

    KINGYT
  • 初学Qt(二) 中高级功能列举

    上一次和大家聊了聊Qt的三大基类,信号和槽的使用以及重新实现一些事件等话题。如果在学习Qt之前有一定的C语言编程,可能对响应界面操作还会有些不习惯。这次聊一聊Q...

    用户5908113
  • 不漫谈大数据反欺诈技术架构 No.126

    一年多以前,有朋友让我聊一下你们的大数据反欺诈架构是怎么实现的,以及我们途中踩了哪些坑,怎么做到从30min延迟优化到1s内完成实时反欺诈。当时呢第一是觉得不合...

    大蕉
  • 对象存储基础概念

    对象存储诞生之初 谈到为什么要有对象存储,必须聊聊对象存储诞生之前的两大存储模型:块存储和文件存储。 块存储主要是将存储介质的空间整个映射给主机使用的,主机如果...

    用户1260683
  • 美团面试官:讲清楚MySQL结构体系,立马发offer

    继续和大家分享,我去上海美团面试遇到的技术问题,当时,回答的也是马马虎虎的,不能说不好,也不能说好,反正就是没有给面试官一种爽的感觉。

    田维常
  • 窃听风云:扒掉你的最后一条“胖次”

    “每个人的手机都是一部窃听器,不管你开不开机,都能被窃听。”在2009年上映的《窃听风云》中吴彦祖饰演的人物有这样一句台词,随着影片热映,“手机窃听”的问题被更...

    FB客服
  • 大蕉说k8s(2)-Docker&Docker化 No.198

    之前聊过,容器跟虚拟机的区别,就在于虚拟机使用了名为 Hypervisor 的 软件 它通过硬件虚拟化功能,模 拟出了运行一个操作系统需 要的各种硬件,比如 C...

    大蕉
  • 架构视角-文件的通用存储原理

    架构师是互联网行业高薪又紧俏的资源。成为架构师最基本的是设计能力。设计与设计的区别主要体现在两方面:

    静儿
  • 深入理解Java虚拟机(类文件结构)

    之前在阅读 ASM 文档时,对于已编译类的结构、方法描述符、访问标志、ACC_PUBLIC、ACC_PRIVATE、各种字节码指令等等许多概念听起来都是云山雾罩...

    张磊BARON
  • 深入理解Java虚拟机 | 类文件结构

    之前在阅读 ASM 文档时,对于已编译类的结构、方法描述符、访问标志、ACC_PUBLIC、ACC_PRIVATE、各种字节码指令等等许多概念听起来都是云山雾罩...

    用户1740424
  • 【码云周刊第 9 期】前方高能,Discuz 官方携神秘干货归来!

    每周为您推送最有价值的开源技术内参! 一周热门资讯回顾 1、GitLab 8.17 发布,社区版也支持 GitLab Pages 了 ? GitLab 8.17...

    码云Gitee
  • 2018新年重磅:Wolfram 语言 Mathematica 11.3 发布

    WolframChina
  • 消息“时序”与“一致性”为何这么难?

    分布式系统中,很多业务场景都需要考虑消息投递的时序,例如: (1)单聊消息投递,保证发送方发送顺序与接收方展现顺序一致 (2)群聊消息投递,保证所有接收方展现顺...

    架构师之路
  • 云原生与AI漫谈

    写完上次的 MLOps 主题文章后,接下来计划写一篇机器学习与云原生结合的文章。不过个人在这块的经验并不多,还在各种学习和素材积累中。今天先来闲聊一些最近一阵子...

    阿泽 Crz
  • 深入研究Broker是如何持久化的

    上篇文章王子和大家讨论了一下RocketMQ生产者发送消息的底层原理,今天我们接着这个话题,继续深入聊一聊RocketMQ的Broker是如何持久化的。

    HUC思梦
  • 构造一个 CodeDB 来探索全新的白盒静态扫描方案

    前段时间开源新版本KunLun-M的时候,写了一篇《从0开始聊聊自动化静态代码审计工具》[1]的文章,里面分享了许多在这些年白盒静态扫描演变过程中出现的扫描思路...

    Seebug漏洞平台
  • 聊一聊容器与Docker

    通俗一点的解释:容器就是一个存放东西的地方,就像书包可以装各种文具、衣柜可以放各种衣服、鞋架可以放各种鞋子一样。我们现在所说的容器存放的东西可能更偏向于应用比...

    叔牙
  • 数据模型与查询语言 ------《Designing Data-Intensive Applications》读书笔记2

    作为一个开发者来说,在一个复杂的应用程序中,是存在很多分层模型的,但基本思想还是一样的:每一层都提供了一个干净的数据模型,从而隐藏了底层的复杂性。通过这样的抽象...

    HappenLee

扫码关注云+社区

领取腾讯云代金券