Grandpa, Tell Me The Story Again About The A20 Gate

After my previous article on the keyboard controller interface, there was a global public outcry (well, OK, there was Rothman) absolutely begging for a follow-up on the A20 gate. And so here we are. Anyway, how can you call yourself a BIOS blog and not have an article on the A20 gate???

First, an anecdote. The year was 1999. Dell Technologies hosted a regular technology forum for the engineering organization called the "STR/AIT" meeting. On Nov. 9th, we were invited to a BIOS Technology Overview:

I remember this session well; it is burned into my memory. Dr. Sato was giving a talk to the entire engineering organization about an introduction to BIOS, and guess what he opted to spend the ENTIRE HOUR talking about? You guessed it, the A20 gate.

A disclaimer: is the A20 gate relevant in modern times? The answer is no, the corresponding hardware was removed by Intel in ~2013, and has no relevance to ARM or RISC-V. However, it makes for a great story, and is super-interesting history, so please read on!

Basic Concepts

There are several foundational concepts that must be established before we can make sense out of any discussion of the A20 gate:

8088/8086 Segmented Memory Model
Near vs. Far pointers
Interrupt Vector Table

1. 8088/8086 Segmented Memory Model

The IBM PC (1981) launched with the Intel 8088 CPU, a cost-reduced version of the Intel 8086. These CPUs were 16-bit processors, meaning that the width of their general purpose registers was 16 bits. Moreover, the 8088/8086 had a 20-bit address bus, represented by signals A0 - A19, which allowed addressing up to 1MB of RAM. (2^20 = 1,048,576 = 1MB)

credit: The Indispensable PC HW Book

You may be thinking: how can you address up to 20 bits of memory if the registers can only hold 16-bit numbers? Intel wondered that too, so they implemented a segmented memory model, where memory is addressed via 64KB regions called segments. Memory addresses are formed by combining two 16-bit numbers, a base and an offset. The base refers to the starting address of a 64KB segment in the 1MB memory address space, and the offset is an offset inside that 64KB segment. Using this scheme, the entire 1MB address space can be addressed using 16-bit registers.

A key point to conclude this section: if you create a segment based near the top of the 1MB address space, and set an offset that extends beyond the 1MB barrier, what happens? What happens is the address wraps around, back to zero. Keep this point in mind.

2. Near vs. Far Pointers

In the segmented memory model, when a program references memory, we can differentiate between near pointers and far pointers. A near pointer is a pointer to code within the same 64KB segment, and a far pointer is a pointer to code in a different 64KB segment.

Calling code within the same (near) segment is quick and easy; this is referred to as an intrasegment call. Calling code in a different (far) segment is possible, but involves much more overhead for the CPU; this is called an intersegment call. The overhead of intersegment calls can reduce your program's performance, especially if calls to other segments are happening frequently.

3. Interrupt Vector Table

The concept of interrupts is popular in many microprocessor and microcontroller paradigms. Either software or hardware agents can interrupt the current program flow. Control is turned over to an interrupt handler which runs a short routine, called an interrupt service routine, which does some work and then returns control back to the program that got interrupted.

Once interrupted, how does the 8088 CPU know where to go to find the appropriate interrupt service routine? The 8088 is programmed by the BIOS to look up the address of the appropriate interrupt service routine in the interrupt vector table, which is a simple look-up table found at the very beginning of memory—addresses 0x0 through 0x3FF.

credit: The Undocumented PC

x86 CPUs have an instruction dedicated to calling interrupt service routines, called INT. Typical DOS programs use INT all the time, like calling INT 10 for video services, or INT 13 for storage services, and especially INT 21 which contains hundreds of different services tailored for DOS programs—think of services like read/write files, get/set current time and date, print to the screen, print to the printer, etc.

The Hack

With those background concepts established, we can now talk about an optimization, a.k.a. a hack, that some programmers devised to speed up their programs. Considering that:

Programs call interrupts regularly, requiring the CPU to read from the interrupt vector table
The interrupt vector table is located at the beginning of the memory map
Calls above 1MB wrap around back to the beginning of the memory map
Memory access within the same segment are quicker than memory accesses in other segments

Clever programmers implemented a technique whereby they would locate their code in a 64KB segment at the top of memory, so when calls to INT were made, the references would wrap around and locate the interrupt vector table and interrupt service routines as if they were in the same segment. The interrupt vector table was technically in the same segment, by virtue of the wrap-around effect, so the call was considered a "near" or "intrasegment" call, and performance was significantly improved over having the call to the interrupt vector table be in a "far" or "intersegment" call.

The Intel 80286 is Released (Oops!)

Intel launched the 286 CPU in 1982. One of its new features was a larger address bus, now 24 bits in size, capable of addressing up to 16MB of RAM. (2^24 = 16MB) This meant that the address bus consisted of signals A0 to A23. A 16X increase in memory space; pretty awesome, right?

Intel 286

Well, not so awesome for those programs which relied upon the effect of memory wrapping around back to 0x0 in order to reach the interrupt vector table. Now, the same program that worked on an 8088 CPU tried calling what it expected to be the interrupt vector table, but instead found itself in an empty block of perfectly valid memory above 1MB on the 286.

A20 Gate to the Rescue!

While designing their 286-based PC/AT, IBM saw this problem coming, and so they designed the A20 gate logic, originally an external latch controlled by the keyboard controller, that either allowed or disallowed the A20 address signal to work. Later, the A20 gate functionality was migrated to the embedded controller (EC) which signaled the Platform Controller Hub (PCH) via a dedicated signal, A20M#.

The A20 gate logic defaulted to the original 8088 behavior—the A20 address line was turned off, and memory accesses beyond 1MB wrapped around back to 0x0. Programs that wanted to take advantage of the 286 and later CPUs' larger memory address space would program the logic gate so that A20 worked normally. Often this was done by a dedicated memory manager, like DOS's HIMEM.SYS.

credit: The Undocumented PC

Retrospective

These are the kind of fun stories and engineering challenges that made the golden era of PC programming fun. Programming wasn't just about importing a framework and calling methods out of pre-defined libraries. Programmers had to understand the interactions between software and hardware, and craft novel solutions, like wrap-around memory accesses, and the A20 gate work-around.

Good times!

Grandpa, Tell Me The Story Again About The A20 Gate