Size Coding for Block Boot

I thought I'd show some of the coding tricks necessary to shoehorn Block Boot, a 1K intro, into an Amiga floppy bootblock.
The full source is available in the Sources directory. It includes the music player for the 1Klång format.
I made a 'stealth mode' with an IF directive and an alternate palette using all near-white colors so I wouldn't reveal what I was working on at the party ;)

Tricks?? What tricks??

Well, at the end of a size-coding project you may find yourself 3 bytes over the limit and have to go byte hunting. Sometimes at times like these you really have to stretch your imagination to do what seems utterly impossible at the time. Because if you've done your job well, you've already sleuthed out the optimal way of bringing down the binary size so that it's "already optimal" with no bytes to gain.
Similarly, if you've done your job well, crunching (data compression) is not an option. You've already eliminated re-used instructions and patterns in the data.
Byte hunting is mostly just pouncing on the few remaining chance occurences that 1 instruction "could be made unnecessary" and be removed (if that makes sense).
Instead, there are many more bytes to be had from thinking of the demo as a whole. In my case, I was already knee deep in other Amiga coding projects but decided to make a bootblock for the Datastorm 2014 compos when this silly pun popped into my head. It absolutely stuck in my head, I just had to code that! :P

Block Boot


Okay... so a boot made out of blocks. I could build it out of Tetris pieces and of course I had to have some Russian music playing :D Luckily I had developed the positively tiny 1Klång synth. But from Piramida I knew that even though it's the tiniest synth, I would have to plan out the song to have room for the visuals code I wanted. I would not go over budget this time.
I went on one of my "research sprees", exploring Polyominoes, "official block colors", and Russian folk music. It's fun to learn stuff! :)

But how to make it any good??

I knew that I would make some sort of Tetris simulator, but just some blocks falling... wouldn't that be boring to watch? Well, Korobushka was played faster and faster in the songs I'd heard, and Tetris usually speeds up. So I'd have tempo changes in the song and then some sort of ending (turned out this ending didn't fit).
Being the author, I knew 1Klång inside and out I was able to eliminate song bytes from the Protracker->1Klång converter until it ended up as 93 bytes:
Song:
.o1	=12*K*1
.o2	=12*K*2
  dc.w .o1+16*K+.p0s0c0-.n,.o2+16*K+.p0s0c3i-.n,12*K-.o1+.p0s0c2i-.n,.o1+16*K+.p0s0c3i-.n
  dc.w .o1+16*K+.p0s1c0-.n,.o2+16*K+.p0s1c3i-.n,16*K-.o1+.p0s1c2-.n,.o1+16*K+.p0s1c3i-.n
  dc.w .o1+24*K+.p0s2c0-.n,.o2+24*K+.p0s2c3i-.n,24*K-.o1+.p0s2c2-.n,.o1+24*K+.p0s2c3i-.n
  dc.w .o1+16*K+.p0s1c0-.n,.o2+16*K+.p0s1c3i-.n,21*K-.o1+.p0s3c2-.n,.o1+16*K+.p0s1c3i-.n

.Notes:
.n:
.p0s0c3i:
	dc.b 0,$F3	;,$FE,$90,$00	;1 note chan delay
.p0s0c0:
  dc.b $d0,$89,$b0,$98,$60,$69,$d0,$b9
.p0s0c2i:
	dc.b $F1
.p0s0c2:
  dc.b $90,$50,$90,$50,$a0,$50,$a0,$50
;;    ---    ---
.p0s1c3i:
	dc.b 0	;;dc.b $FE,$90,$00
.p0s1c0:
  dc.b $80,$89,$b0,$d0,$90,$60,$60,$FF
;;    ---    ---
.p0s2c3i:
	dc.b 0	;dc.b $FE,$90,$00
.p0s2c0:
  dc.b $30,$36,$a0,$86,$50,$10,$50,$31
.p0s2c2:
  dc.b $60,$30,$60,$30	;continues
.p0s1c2:
  dc.b $50,$10,$50,$10,$F2,$60,$60,$F1,$68,$9D
;;    ---    ---
.p0s3c2:
  dc.b $F2,$64,$31,$34,$68,$F1,$d0,$10,$10,$FF
SongE:

I went ahead and drew a boot with pixels in DPaint and scaled it up to se how it'd look. I experimented with shading them to look beveled and less drab. When I had a full screen boot in DPaint that looked "as good as it gets" I planned how to store the pieces and the falling positions. I figured the song + 1Klång player to eat at least half the 1K so I'd better not waste a single bit!
TheBoot:
	dc.b s1+10,l3+8,t1+5,s2+3,z1+0,j3+0,i2+4,i2+8,i2+14,j1+18-16
	dc.b j3+12,o1+15,o1+17-16
	dc.b j2+0,l1+18-16,z1+11,t2+9,s2+11,t2+1,i2+6,i2+3
	dc.b j2+18-16,o1+18-16,o1+11,z1+10,o1+10,j1+18-16,l4+17-16,j4+10,z1+12
	dc.b s1+15	;la pjÄs de la resistance
TheBootE:	;31b

TheBootBits:
	dc.b %00000000
	dc.b %01010010
	dc.b %01100000
	dc.b %00001100

Pieces:	;4x4 grid, y0y1y2y3 for rightmost column, and so on. from BR corner up.
	dc.w %100010001000*f+r9	;I	0 (special: 4th column needed.
	dc.w %100011100000*f+r2	;j	1
	dc.w %111010000000*f+r3	;l	2
	dc.w %110011000000*f+r4	;o	3
	dc.w %100011000100*f+r5	;s	4
	dc.w %010011000100*f+r6	;t	5
	dc.w %010011001000*f+r7	;z	6
	dc.w %110010001000*f+r2	;j4	8

	dc.w %100010001100*f+r3	;l4	7
	dc.w %111000100000*f+r2	;J	9
	dc.w %001011100000*f+r3	;L	a
	dc.w %011011000000*f+r5	;S	b
	dc.w %100011001000*f+r6	;T	c
	dc.w %010001001100*f+r2	;j3	e
	dc.w %110001000100*f+r3	;l3	f

TheBoot defines the piece drop sequence and X position. One of the problems I had was that I needed 4 bits for the piece and more than 4 bits for the position. In .getpiece I map the 20 drop positions to 4 bits using TheBootBits, a 4 bytes shift-match table.
Pieces (called 's', 'j', see Wikipedia) defines the pieces, bit patterns set in a 4x4 grid, and their colors, in 30 bytes.

Starting to code

I needed an absolutely minimal init routine. After experimentation I ended up with:
	lea R(PC),a5			;a single (PC) reference and no more
	lea Cop0-R(a5),a4               ;used as a base register
	move.l a4,$80-C(A6)		;MUST be directly after above wait.

and then setting DMA in the frameloop:
	move.w #$87c0,$96-C(A6)

Pretty minimal if you ask me :)
But, the 1Klång player init is of course excluded from this, and I also added my standard BSS clear trick - to the wave generator loop of said init - by adding two instructions. In the same way, I joined the palette copy loop (a Copper palette list took more bytes) with the bitplane pointer poke loop (same reason) inside the frame loop.

"Just increase the song speed"

How hard can it be!? ;) Well, for one thing, there was no room for a level 6 interrupt, so I'd use the equivalent of Protracker's speed commands F07, F06, F05... So the tempo must be sped up in steps, at even pattern breaks.
But the pieces of course kept falling at the regular speed, using the simply gravity (acceleration) formula. I wanted them to hit the bottom synced with the music and some visual "boom" effect when they landed.
That meant that the pieces would have to predict when they landed! A hairy problem indeed. I did not have the budget for scripted timing data.
It took 3 evenings in math land to create an exponential curve to match the bricks' falling curve:
	move.w #$17,(a5)
;...
	lea PieceCtr-R(A5),a2
	moveq #0,d2
	move.w #16*$17,d2
	divu (a5),d2            ;a5=S1K_Data=_Tempo
	moveq #$17*2+2,d3
	divu (a5),d3
	add.w d3,d2	
	mulu d2,d2		;256 and increasing

With this and a reset curve trigger, I could have a constant that would map to a point ahead on the current tempo curve that would then match the drop impact ahead of time, counted in frames.

Collision detection

I had already designed the pixel-boot in DPaint so the pieces would land in the correct places to form a boot outline. But I would need the pieces to fit onto each other (like in Tetris) as well.
I solved this by adding a collision check on each 16x16 block of a piece that I drew. This allowed the DrawPiece routine to be re-used in a no-draw mode to know when it should snap to a Y position exactly above the colliding block of a previous piece, and trigger the screen shake effect:
	moveq #-1,d6			;don't draw, just check
	bsr.b DrawPiece			;test-draw (just check)
	beq.s .nocoll
.coll:
	moveq #-$10,d2
	and.w (a2),d2
	cmp.w (a2),d2			;prohibit infinite loop
	beq.s .nocoll			;done
	move.w d2,(a2)
	clr.l -8(a2)			;flag no more dropping
;;    ---  trig boing  ---
	addq.b #8,ScrY-R(A5)
	bra.s .tryagain			;once more (to save recalc only)
	
.nocoll:
	moveq #0,d6			;draw (and check)
	bsr.b DrawPiece
;...

Getting away cleanly

The intro would have to end (stop dropping pieces and stop the music) exactly when the boot had been formed (when the last piece had landed).
This caused some problems (size-wise) since I could trigger it with 2 lines of code, causing the pattern to play an extra note before shutting up, or 14 lines of code that would do it properly. I had 9 bytes to spare, but 14 lines of code is much more than 9 bytes...
All the work would become a BOO! block if I didn't fix this. It was vital, and I didn't have the bytes.
So in the end, I had to go on a late night 6 hour byte hunt at the party even though I'd brought a final version on floppy with me (because I had added a few instructions I deemed very necessary for release quality).

The Safe For Work Mode

I made a 'stealth mode' with an IF directive and an alternate palette using all near-white colors so I wouldn't reveal what I was working on at the party ;) It's still in the source, just set MONO=1!
Among other things (you're all over the source when byte hunting), I managed to rearrange the 1Klång player and stripping two or three instructions that I knew the song didn't need.
So finally the release was perfect and fitted in 1021 bytes for the compos the next day.

Summary of tricks used:

  • Plan out the data budgets - represent each data chunk in an optimal format that doesn't waste bits
  • Eliminate byte gaps from EVEN directives: check if you put one odd-length data chunk after another, or make a variable byte size to put in the gap
  • Combine loops to do more than one thing
  • Use loops in the frameloop to set consecutive palette/pointer registers instead of wasting Copper words
  • Re-use routines with "mode flags" where possible
  • BSS clear trick: Instead of dc.w 0's for variables initalized as 0, move them to beyond the end of the binary and clear that block
  • Last: Byte hunting by rearranging code or relying on previous register contents so 1 instruction is "made unnecessary"