前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >聊聊各种操作系统都在使用什么样的格式来存储可执行文件及目标文件

聊聊各种操作系统都在使用什么样的格式来存储可执行文件及目标文件

作者头像
KINGYT
发布2020-06-09 10:09:04
7810
发布2020-06-09 10:09:04
举报

首先,非常抱歉本公众号断更了很长一段时间,其实这段时间已经积累了不少写作素材,但由于工作上一直比较忙,没有大段的时间可以整理出来,所以就一直耽搁到了现在。

好在快看到了曙光,应该在近期就能结束工作繁忙的状态,这样就可以有更多的时间来更新文章了。

也感谢在这段时间还一直不离不弃的各位朋友,后面我尽量多写一些有深度的原创文章来回馈大家。

不过今天还是偷个懒,把我刚刚读到的一篇比较好的,讲各种操作系统用什么样的格式,来存储可执行文件及目标文件,以及这些格式在各平台上的发展历史,这样的一篇文章,推荐给大家,希望大家再次看到类似格式时,能知道它们到底是怎么样的关系。


Historically the common object file formats in use today, PECOFF, ELF, and Mach-O were developed completely independently for specific operating systems (Windows NT, Unix System V R4 and Mach) and meant to address shortcoming in previous object file formats.

Unix

a.out

The earliest object file and executable format used by Unix was the a.out format. This was a very simple format, pretty much the bare minimum needed for the hardware at the time. It got its name by the default output file name used by Unix assemblers and linkers. If an output file wasn't specified these program would create a file named a.out. Most Unix assemblers and linkers today still do this.

COFF

With Unix System V, AT&T introduced COFF (Common Object File Format) to replace the a.out format. It was also an executable and object file format. The main thing it improved over the a.out format was multiple sections. The a.out format only supported the text and data sections with an implied .bss section. Multiple sections gave tools and applications more flexibility in laying out executables, allowing them to create read-only data sections for example. COFF also added support for shared libraries.

ELF

In order to resolve a number of problems with COFF, AT&T created ELF (Executable and Linkable Format) for Unix System V R4. The main problem with COFF is that wasn't very flexible and well defined, as result it had to be extended in various incompatible ways by various vendors that implemented. It wasn't the "Common" format it was hoped to be. ELF also provides support for position independent shared objects with a more dynamic form of symbol linkage than COFFs shared libraries. It also has a fairly sophisticated and extensible associated debugging format called DWARF.

On Unix machines ELF is the dominant executable and object file format. Only a few operating systems like AIX have stuck with their custom COFF-based formats. All the big open source operating systems, Linux and the BSDs, use ELF, though notably they went directly from a.out based formats to ELF, skipping COFF. Since ELF is an object file format and executable format this means most Unix development tools create ELF objects that are linked together to create ELF executables.

Microsoft

OMF, .COM and MZ .EXE

Under MS-DOS the standard object file format was OMF (Object Module Format) created by Intel for their x86 processors. MS-DOS itself used two different executable format. At first it only supported simple .COM files, just flat binaries like your bootloader, taken from CP/M (like pretty much everything in MS-DOS 1.x). With MS-DOS 2.0, Microsoft added support the "MZ" .EXE executable, so called because it used the two characters MZ as a magic number to identify the file type. The MZ format allowed executables to use multiple segments, while the .COM format effectively limited programs to one single 64K segment. Microsoft's own development tools for MS-DOS, along with most other third-party tools, produced OMF format object files that were linked to create .COM and MZ format executables.

New Executable (LE)

For Windows 1.0 Microsoft created the New Executable (NE) format for Windows executables. The key feature of this format is that it made segments explicit in the executable, showing where they started and where ended. The MZ format supported segments by relocating segment values in the code. Otherwise it was a flat binary file that was loaded into memory in one single continuous chunk, just like .COM file except with the segment references fixed up. Making segments explicit in the executable let Windows load segments separately, and even potentially move them and unload them as necessary. Since Windows, unlike MS-DOS, supported running more than one program at a time, this was very important otherwise memory would quickly become too fragmented. Another important feature the NE format added was support for DLLs (Dynamic Link Libraries).

However Microsoft didn't change the object file format used. Microsoft's development tools for the early 16-bit versions of Windows (along with the 16-bit versions of OS/2) continued to create OMF object files that were linked to create NE format executables. The OMF format was easily extensible, and the NE format only required minor additions, most notably for importing and exporting symbols for DLLs.

Linear Executable (LE)

For Windows/386 2.10 Microsoft created yet another executable format called Linear Executable (LE). This new version of Windows had a 32-bit Virtual Machine Manager (VMM), essentially a simple virtual machine that ran Windows and one or instances of MS-DOS in parallel. To each OS it would look like they had the entire PC to themselves, but really it was only a virtual machine. Since the VMM, and more importantly its drivers (VxDs) were 32-bit they old 16-bit NE format wouldn't work.

I'm not sure what object format used when creating LE executables originally. Later it became standard to use PECOFF object files when creating VxDs, but that format wasn't created until a few years later. They probably used OMF, extended to support 32-bit objects, just like IBM did with OS/2 2.0 which also used a variant of the LE format for its 32-bit executables.

PECOFF

Microsoft introduced PECOFF with Windows NT. While they already a usable 32-bit executable format, the LE format, and OMF supported 32-bit objects, both these formats were tied to the Intel x86 processors. Windows NT was designed to support many kinds of CPUs, and initially it supported MIPS and DEC Alpha CPUs in addition to x86 Intel-based PCs. Rather than adapt LE and OMF to support these other processors, the team developing Windows NT decided to adapt the existing Unix COFF format, which already supported multiple CPUs. My guess is that early on Microsoft Windows NT team were using existing Unix-based development tools to develop Windows NT rather than waiting for Microsoft's separate development tools team(s) to create tools for the other CPUs. Otherwise Windows NT probably would've got adapted versions of the LE and OMF formats, if only to make the development tools team's job easier.

The main feature PECOFF adds over COFF is support for DLLs, something that has been intrinsic to Windows from the start. COFF supported a similar but different shared library mechanism. Microsoft later extended the PECOFF format to support 64-bit CPUs.

Since PECOFF is both an object file format and executable, Microsoft development tools for 32-bit versions of Windows create PECOFF object files that are linked to create PECOFF executables. Notably Borland's tools produced OMF files that were linked to create PECOFF executables, but these days most other tools follow Microsoft's lead (eg. MinGW or Intel's ICC).

OS X

Mach-O

The Mach-O format was created for the Mach kernel, thus the name. This kernel was used in NeXTSTEP which became the basis for Apple's OS X. The Mach-O format was created as alternative to the a.out format used by BSD, which provides most the non-kernel parts of Mach-based operating systems. The main driving force seems to be creating position independent shared libraries and executables. One of the particularities of Mach-O and the operating systems that use it is that all code is required to be position independent, not just shared libraries.

Conclusion

So as you can see different groups of people working on different operating systems have worked independently to create these different formats based on their needs at the time. Early on Microsoft needed formats for support for the Intel x86's peculiar segmentation memory model, and OMF was the only object file format that supported it. Unix needed something different, support for multiple CPUs. When Microsoft needed multiple CPU support they chose base their format on then recently obsoleted COFF format which at the time was presumably implemented on more CPU types than the newer ELF format. Meanwhile the Mach kernel developers were off in their own world, futilely trying to create a workable microkernel. I'm not sure exactly what they were thinking with Mach-O, but they had to replace a.out with something else and I guess COFF wasn't up to the task and ELF didn't exist yet.


另附一些上述格式的官方描述文档。

Windows PECOFF:

https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

Linux ELF:

https://www.man7.org/linux/man-pages/man5/elf.5.html

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2020-06-06,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 Linux内核及JVM底层相关技术研究 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档