Jesse Luehrs

writing an nes game, part 2

code layout

So before we get into the code itself, there are a couple more concepts that will be important to understand, in order to understand why the code is laid out the way it is.

First, we need to understand how the PPU handles drawing. The NES was designed to work on CRT screens, which work by drawing a horizontal line at a time from the top of the screen to the bottom, at which point it starts again at the top. This is done by a device called an electron gun, which fires electrons at a screen of pixels. The key point here is that drawing is done sequentially, one pixel at a time, and the mechanism requires some amount of time to move from one line to the next, and to move from the bottom of the screen back to the top. The period of time when the electron gun is moving from the end of one line to the beginning of the next is called the “HBlank” time, and the period of time when the electron gun is moving from the bottom of the screen back to the top is called the “VBlank” time. Except during HBlank and VBlank, the PPU is busy controlling what actually needs to be drawn, and manipulating it at all can cause all kinds of weird graphical glitches, so we need to make sure we only communicate with the PPU during HBlank or VBlank.

The way the NES handles this is to provide an interrupt (called NMI) which fires at the beginning of every VBlank, which allows you to do all of your drawing during a safe period of time. HBlank is harder to detect and not as useful (since it occurs in the middle of drawing a frame), and so it is not typically used except for some visual effects. NTSC television screens refresh at 60 frames per second, and the CPU clock speed in the NES is approximately 1.79MHz, and so we get approximately 30,000 CPU cycles per frame, which translates into roughly 5,000-10,000 opcodes. VBlank, though, only lasts around 2273 cycles (roughly 400-800 opcodes), so drawing code needs to be especially efficient. In particular, we don’t want to do any game logic at all during VBlank time, since that time is so limited.

The other aspect that needs to be handled is system initialization. When the CPU starts up, it’s in an undefined state, so we need to set things up to ensure that the game executes in a repeatable way. Emulators tend to be consistent in how they initialize the system state at startup, but this isn’t true of the real hardware, so it’s important to do this explicitly. Also, the PPU requires initialization, but that is handled automatically. It does take a bit over 30,000 CPU cycles though (a little over one frame), so we wait for two frames before starting our main game code. Two frames is plenty of time to do any initialization we might need to do.

To illustrate these concepts, here is an example program which modifies the background color every second. The details about how the background color is set isn’t particularly important (it’s not really a feature you’re likely to use very often), but this should illustrate the basic structure of an NES game. This isn’t intended to be a lesson on 6502 assembly (there are plenty of much better tutorials and references out there for that - see below), but just to show how games for the NES specifically are structured.

BANKSIZE    $4000
BANKS       1
BANKSIZE    $2000
BANKS       1

SLOTSIZE     $4000
SLOT 0       $C000
SLOTSIZE     $2000
SLOT 1       $0000

.ENUM $00
; declare the label 'sleeping' to refer to the byte at memory location $0000
sleeping     DB
; 'color' will then be at $0001
color        DB
; and 'frame_count' will be at $0002
frame_count  DB

  .bank 0 slot 0
  .org $0000
  ; First, we disable pretty much everything while we try to get the system
  ; into a consistent state. In particular, we really don't want any
  ; interrupts to fire until the stack pointer is set up (because interrupt
  ; calls use the stack), and we don't want any drawing to be done until the
  ; PPU is initialized. 
  SEI              ; disable all IRQs
  CLD              ; disable decimal mode
  LDX #$FF
  TXS              ; Set up stack (grows down from $FF to $00, at $0100-$01FF)
  INX              ; now X = 0
  STX $2000.w      ; disable NMI (we'll enable it later once the ppu is ready)
  STX $2001.w      ; disable rendering (we're not using it in this example)
  STX $4010.w      ; disable DMC IRQs
  LDX #$40
  STX $4017.w      ; disable APU frame IRQ

  ; First wait for vblank to make sure PPU is ready. The processor sets a
  ; status bit when vblank ends, so we just loop until we notice it.
  BIT $2002        ; bit 7 of $2002 is reset once vblank ends
  BPL vblankwait1  ; and bit 7 is what is checked by BPL

  ; set everything in ram ($0000-$07FF) to $00, except for $0200-$02FF which
  ; is conventionally used to hold sprite attribute data. we set that range
  ; to $FE, since that value as a position moves the sprites offscreen, and
  ; when the sprites are offscreen, it doesn't matter which sprites are
  ; selected or what their attributes are
  LDA #$00
  STA $0000, x
  STA $0100, x
  STA $0300, x
  STA $0400, x
  STA $0500, x
  STA $0600, x
  STA $0700, x
  LDA #$FE
  STA $0200, x
  BNE clrmem

  ; initialize variables in ram
  LDA #%10000001
  STA color
  ; no need to initialize frame_count or sleeping, since we just set them to
  ; $00 in the clrmem loop

  ; Second wait for vblank, PPU is ready after this
  BIT $2002
  BPL vblankwait2

  LDA #%10000000   ; enable NMI interrupts now that the PPU is ready
  STA $2000

  ; sleep while vblank is happening. this serializes the code flow a bit
  ; (the NMI interrupt will almost certainly occur while we are in this loop
  ; unless we do a significant amount of processing in the main codepath, so
  ; it won't interrupt anything important). it also ensures that our game
  ; logic only executes once per frame.
  INC sleeping
  LDA sleeping
  BNE wait_for_vblank_end

  ; change color every 60 frames
  LDX frame_count
  CPX #60
  BCS change_color
  STX frame_count
  JMP loop_end

  LDA #$00
  STA frame_count
  LDX color
  CPX #%10000001
  BEQ turn_green
  CPX #%01000001
  BEQ turn_red

  LDA #%10000001
  STA color
  JMP loop_end
  LDA #%01000001
  STA color
  JMP loop_end
  LDA #%00100001
  STA color

  JMP loop

  ; save the contents of the registers on the stack, since the interrupt can
  ; be called at any point in our main loop

  LDA color
  STA $2001

  ; indicate that we're done drawing for this frame
  LDA #$00
  STA sleeping
  ; and restore the register contents before returning


  .orga $FFFA
  .dw NMI
  .dw RESET
  .dw 0

  .bank 1 slot 1
  .org $0000
  .incbin "sprites.chr"

The first thing we do when the system turns on is disable IRQ interrupts. Calling and returning from interrupts uses the system stack, but the stack pointer could be pointing anywhere at this point, and so interrupts would be confuse things quite a bit. We never reenable IRQ interrupts here because we don’t use them at all (they would be reenabled by the CLI instruction). Next we disable decimal mode (this shouldn’t actually do anything, since the NES doesn’t have a BCD chip, but no real reason not to do this, to avoid confusion) and set the stack pointer to $FF. The stack pointer is stored in the register named S, and it is a one-byte offset from the RAM address $0100. The stack grows downward, so the stack pointer should start out pointing to $01FF, and then it will be decremented by PHA instructions and incremented by PLA instructions as necessary. Finally, we disable a bunch of other functionality on the PPU and APU, since we don’t want them to be active until we have finished initializing.

We need to wait for a total of two frames to ensure that the PPU is entirely initialized, so we next wait for the first frame to end, and then clear out the entire RAM space, and then wait for the second frame to end. At this point, the PPU is initialized, so we can enable NMI interrupts (by setting a bit in the PPU control register at $2000) and begin our main loop.

The main loop is where all of the logic goes. In this example, we just increment the frame count every frame, and change the background color (via some magic) every 60 frames. This allows the NMI interrupt to do nothing more than write a single value to the PPU, without requiring any logic at all. This illustrates the basic principle of using the main game loop to set up values in memory, which the code in the NMI interrupt can just read and act on directly, without requiring any calculations.

Here are some more useful links discussing the topics in this post: