Programming the Blitter

This refresher article describes how to set up a blit operation by waiting for a previous operation to finish, then loading the custom registers in the correct order.

Waiting for the blitter

You must wait for the blitter, because a previous operation may be run by the operating system or even by a chunk of your own code executed previously.
This is done by reading a bit in DMACONR. To be compatible with Amiga 1000s having the first revision of the Agnus chip, you must read DMACONR once before relying on the value of this bit.
DMACONR is a symbol for a custom chip register address, see the complete list or the shorter blitter register list.
BlitWait:
	tst DMACONR(a6)			;for compatibility
.waitblit:
	btst #6,DMACONR(a6)
	bne.s .waitblit
	rts

Starting the blit

The Blitter has area fill and linedraw modes apart from the mode shown here: bit processing of a rectangular block. It's used to scroll, combine, clear, and mask rectangular areas in chipmemory, for example where your sprite graphics or screen buffer are. (On Amiga, sprites drawn with the blitter are made distinct from its hardware sprites by calling them Blitter OBjects or BOBs for short.)
The Blitter uses up to 3 DMA channels (A, B, and C) to combine bit values from these channels to the fourth (destination, D) channel.

Load order

Load order is important. In general, only one point causes potential trouble, and that is that the first word of data is loaded and shifted immediately upon writing to either the data or channel pointer registers. Therefore, a good convention is 1) (data) 2) BLTCONx 3) mask 4) modulos 5) channel pointers. (Since D is used for writing, its channel pointer can be changed at any time.)
The last register you load (write to) is the BLTSIZE register, because it triggers the blit operation to start immediately (on the next available memory cycle).

Example

blitw	=Spritewidth/16			;sprite width in words
blith	=Spriteheight			;sprite height in lines

	lea $dff000,a6
	bsr BlitWait
	move.l #$09f00000,BLTCON0(a6)	;A->D copy, no shifts, ascending mode
	move.l #$ffffffff,BLTAFWM(a6)	;no masking of first/last word
	move.w #0,BLTAMOD(a6)		;A modulo=bytes to skip between lines
	move.w #Screenwidth/8-blitw*2,BLTDMOD(a6)	;D modulo
	move.l #Sprite,BLTAPTH(a6)	;source graphic top left corner
	move.l #Screen+byteoffset,BLTDPTH(a6)	;destination top left corner
	move.w #blith*64+blitw,BLTSIZE(a6)	;rectangle size, starts blit

Now, this is not a bob or sprite as we know it, but a simple variant. It just copies a rectangular blitter window from a graphic to the screen. To correct preserve the background, you must use it as a source channel, and you must use another channel as mask to know which pixels to preserve. Typically, this uses a $fca operation. OR blits can use a $dfc operation and a clear uses a $100 operation (replace the $9f0 operation and set channel registers appriopriately).

Further notes

The Blitter runs on DMA, but shares cycles with the CPU, so that a heavy blitter operation will block chipmem accesses for the CPU and make the code run slower. You can make the blit finish faster (when the operation is such that not all CPU cycle slots are already taken) by setting the BLTPRI bit in DMACON. Conversely, you can force every fourth cycle to be available for the CPU so that they can work concurrently by clearing this bit.
If the CPU is used to load the blitter registers and that code is running in chipmem, and the BLTPRI bit is set, and the blitter operation is such that it takes every available cycle, you don't have to wait for the blitter because the CPU will not execute any instructions until the blit has finished.