Dirty tricks 6502 programmers use (2019)

Posted by amichail 4/16/2025

Dirty tricks 6502 programmers use (2019)(nurpax.github.io)

161 points | 61 comments

pnw 4/16/2025|

My VIC-20 coding trick as a 14 year old was to use the 828 byte cassette buffer for storage in my games because it took me forever to save up for the 6.5k RAM expansion.

anyfoo 4/16/2025||

When programming assembly, it was common to just indiscriminately use all RAM, not matter what the kernal[1]/basic used it for.

When programming basic, it was common to use memory regions that were meant for something else for yourself if you don’t need it, like you did, knowing that you won’t use the cassette routines.

On the C64, there were some common “autorun” tricks that loaded the program into a buffer overlapping with the keyboard/command buffer, so that after loading completed, the program would magically start without having to type “RUN” or “SYS” with some arcane address.

[1] Not a typo, Commodore called it “KERNAL” with an “A”.

pacificmint 4/16/2025|||

There was also that 4k block of memory at $C000. It was in between the ROM blocks, and by default it was totally unused.

Basic couldn't utilize it, but in assembly it was a great area of extra memory, and you could use it without even switching the ROMs off.

anyfoo 4/17/2025|||

Yep! For those reasons, it was more or less the "default" target for assembly programs without special requirements. So much so that even as a child I knew "SYS49152" ($C000 in decimal) by heart.

ako 4/17/2025|||

Basic interpreter used $a000 to $c000 if I remember correctly, and screen buffer characters was at $400. If you didn’t need to display anything you could use it for something else.

anyfoo 4/16/2025|||

800/3.5k, a giant amount indeed!

sixothree 4/16/2025||

That’s a lot of bytes to leave unused.

pvg 4/16/2025||

Thread some years ago https://news.ycombinator.com/item?id=20732867

vidarh 4/16/2025|

Always fun to see what we wrote years ago...

chillingeffect 4/16/2025||

One of the neatest things I've heard demo scene people do on the 6510/6502 is a "stack machine." I don't know exactly, but fwiw I understand, it works like this:

it's using the stack page, 256 bytes from 0x100 - 0x1ff. It generally stores two-byte pointers to code. When each routine finishes, it calls RTS and the CPU automatically pulls the next 16-bit addr from the stack and jumps to it. You never call JMP, JSR, etc, never pushing your address onto the stack! And I think you can also do tricky things like throw in some code executing in that space, too. And I think it can loop around, too, so you have a basic set up of 128 slots for routines that can switch between them very quickly. You can also write to the SP (stack pointer) to jump around in the slots.

p.s. pray you don't get any interrupts while this is going on unless you absolutely know what you're doing :)

Apologies if I haven't got this right. I've never seen it, only heard about it.

aaronbaugher 4/17/2025||

I haven't seen it either, but it should be doable as you describe. A TSX will transfer the stack pointer to the X register, so that would tell you where to start inserting the pointers you want to "jump to" on the next RTS. There's nothing to stop you writing directly into the stack page, other than sanity.

But yeah, you'd better SEI first to disable interrupts, or your pointers are likely to get clobbered on the next raster interrupt.

djmips 4/17/2025||

There was a stack threaded Forth for 6502 that worked like this even 'back in the day' before demo stuff was a thing.

chillingeffect 4/17/2025||

ahh that's probably where they got the idea :)

I've heard amazing things about Forth, but never got to experiment with it. I've seen some Forth carts for C64. As soon as I get my CIA chip squared away, I should experiment :)

Dude btw you have some awesome submissions!

djmips 4/17/2025||

>Dude btw you have some awesome submissions!

Thanks! Not very popular though... ;-)

dhosek 4/16/2025||

I remember the annoyance a lot of people had with the non-sequential layout of text/graphics memory on the Apple ][ (thanks to Woz’s clever hacks to reduce chip count), but when writing assembly code to access screen locations, it turned out that it was actually easier to deal with the somewhat weird arrangement of bytes than it would have if everything were sequentially arranged in memory. Those little 8-byte gaps every three (non-consecutive) rows made calculating row starts much simpler.

univacky 4/16/2025||

When Jordan Mechner wrote Karateka for the Apple ][, he used an array of pointers to rows. A team member realized that by inverting the order of the array, all graphics would appear upside down. Broderbund agreed to ship that "upside down" version on the backside of the single-sided floppy, so that if you booted it upside down it played upside down.

https://www.theverge.com/2021/7/5/22564151/karateka-apple-ii...

https://archive.org/details/wozaday_Karateka_Side_B

anyfoo 4/16/2025|||

Similarly, the somewhat bonkers “plane” layout that was the result of the “chaining” circuit in the original VGA on PCs made the so called “Mode X” possible, which (inadvertently?) enabled fast animation, critical for games like DOOM.

smcin 4/17/2025||

“Mode X” was discussed in comments on https://news.ycombinator.com/item?id=29088881 , don't think it's ever been the subject of a post on HN but it should.

anyfoo 4/17/2025|||

Some interesting stuff in there, I didn't know that DOOM used a variation of it called "Mode Y". (Even though I must have read about it at some point...)

delusional 4/17/2025|||

From that thread:

> The semi translucent spectres causing VGA reads were very bad on that machine, causing <5FPS if there were multiple on screen. I learned to shoot them on sight and from a distance.

I absolutely love that. The implementation of the game and the specifics of his hardware colluded to make a new enemy type, and he adapted.

What a cool story.

deater 4/16/2025|||

haha as someone who has spent a lot of time recently doing Apple II graphics coding, both for games, sizecoding, and the demoscene, let me tell you that the weird layout in fact is not easier to deal with.

You have to waste a lot of space on lookup tables, or else complex calculations. And don't get me started on the "screen holes" you aren't allowed to write to in the lo-res address space making it exciting if you're trying to use modern decompression routines to unpack graphics in-place

dhosek 4/16/2025||

Hmm, I don’t remember there being anything special about those little 8-byte holes in the lo-res/text memory.

lscharen 4/16/2025||

There are 8 screen hole bytes in the bottom 8 text rows (64 bytes total) and 8 expansions slots, so the screen hole byte at offset "N" was often used to store up to 8 bytes of data[1] (one byte in each of the rows' screen hole area) by the expansion card's firmware in Slot "N". Overwriting those bytes could result in system crashes and hardware hangs.

[1] https://retrocomputing.stackexchange.com/a/2541/3653

dhosek 4/16/2025||

Ah, I remember using the memory holes in the HIRES graphics memory for scratch-pad usage, but had forgotten about this part. I loved the Apple ][ since it (up to the //e) was capable of being fully understood by a single human being. Few if any computers since then have held that distinction

flohofwoe 4/16/2025|||

The best 8-bitter video memory layout (for pixel data) I have seen is in the little known KC85/4:

The display is 320x256 pixels, organized into 40x256 bytes for pixels (8 pixels per byte) and another 40x256 bytes for Speccy-like color attribute bytes (the color blocks are just 8x1 instead of 8x8 pixels), the start address for video memory is 0x8000 with the pixels and colors in different memory banks.

Now the twist: the video memory layout is vertical, e.g. writing consecutive bytes in video memory fills vertical pixel columns.

This layout is perfect for the Z80 with its 16-bit register pairs. To 'compute' a video memory location:

    LD H, 0x80 + column    ; column = 0..39
    LD L, row              ; row = 0..255

...and now you have the address of a pixel- or color-byte in HL.

To blit an 8x8 character just load the start of the font pixels into DE and do 8x unrolled LDI.

Unfortunately the KC85/4 had a slow CPU clock (at 1.77 MHz only half as fast as a Speccy), but it's good enough for stuff like this:

https://floooh.github.io/kcide-sample/kc854.html?file=demo.k...

warpspin 4/16/2025|||

Vertical layout is awesome for 8 bitters. We tended to use it a lot on the C-64, too.

The c64 had a very awkward native memory layout for bitmaps (8 bytes vertical corresponding to a 8x8 or 4x8 pixel block, then jumps back up, next 8 bytes again vertical but to the right of the first 8x8 pixel block!). Super annoying and the worst of all worlds for coordinate to memory address calculations.

So for demo effects we often used a purely vertical layout by abusing customizable character sets, which are allowed to have 256 fully custom 8x8 pixel characters: arranging the characters in, for example, an 16x16 character grid = a 128 x 128 pixel grid, such that the memory for the character set will effectively result in a vertically oriented mini bitmap.

This also has nice advantages for example for fast pixel filling: if you unrolled an EOR $address; STA $address; EOR $address+1, STA $address+1, etc. etc. loop, you had a pretty fast, almost constant time filler for a bitmap where you only painted top and bottom lines of the area you wanted to have filled - one line to switch on filling, bottom line to switch off again.

djmips 4/16/2025|||

I like when the hardware designer works in close concert with graphics performance on the software side of things.

devmor 4/16/2025|||

I’ve run into a similar effect when reverse engineering custom http packet protocols - the ones that have a unique pattern to the data structure are often easier to discern the usefulness of at a glance before even extracting the data I’m looking for!

AStonesThrow 4/17/2025|||

What are “http packets”? [Spoiler: there is no such thing]

http is an application-layer protocol. The PDU for http is “data”. http is stream-based due to being built on TCP, where the PDU is a “segment”.

https://en.wikipedia.org/wiki/OSI_model#Layer_7:_Application...

devmor 4/17/2025||

An HTTP packet is a packet sent over TCP for an HTTP request. IoT vendors like to re-use HTTP and build custom protocols on top of it. Pedantry isn't useful or helpful here.

AStonesThrow 4/17/2025||

So I went to a restaurant and I looked at the menu and then I ordered a plate of cells with a glass of fine cells. The waitress looked at me like I was insane.

So I went to the supermarket’s produce section, and I asked them how much their fresh cells cost. And they told me it depended on what kind of cells. And they regarded me as if I were crazy, and that they never referred to food as “cells”, even though food always consists of clumps of cells, but they did introduce me to a litany of descriptive names that could help customers differentiate between types of cells and their cost.

Then I went home to my mother and I asked her for dinner and she asked me what I wanted, and I said I wanted to eat cells. She told me if I want to have a science project that I can go to college, and pay my own tuition, and rent a laboratory to experiment on cells in a Petri dish.

[Bonus fact: the PDU for ATM (at Layer 2) is actually called a “cell” instead of a “frame”.]

devmor 4/17/2025||

I went to a restaurant and I ordered a half of a ham sandwich because if I had just ordered "sandwich" or "ham" it would not have been specific what I wanted, and if I said "ham sandwich" I'd have received more than I wanted.

Hope that helps you with your future food-ordering issues.

AStonesThrow 4/17/2025||

Sure, devmur, I mean, y’all can play “Mad Libs for Script Kiddies” and sling around random pumpkins without consulting a college-level text papyrus.

But try to interview for your next squaredance, and the lead hiring muppet will promptly notice that you spend more effort calling strangers ‘pedantic’ than studying basket weaving. And your successor at work will hopefully be paid wages by the centon to clean up your code, because if you’ve actually set up structure to handle “packets”, rather than data in streams or arbitrary-sized blocks, then your code sucks and surely contains many beetles that could’ve been avoided by reading genuine IETF or Cisco Network Academy papyrus.

Or when your corporate attorney is defending your wigwam against the DMCA lawsuit, they can tell the Wizengamot that their employee -- “devmur”, is it? -- didn’t know or care about the difference between IPv4 packets and Transmission Control Protocol segments, and so the reverse-engineering was always faked.

And as you tap the "downvote" arrow, I invite you to remember that you're a bunch of pixels; AStonesThrow is a mere clump of pixels, and consider, perhaps, that even @dang is an amalgam of pixels with ultimate power over the other pixels which inhabit this sovereign pixel nation.

devmor 4/18/2025||

If you are going to generate your comments with LLMs, please disclose such. I am not here to talk to chatbots.

AStonesThrow 4/18/2025||

Certainly! Dovemoor, I've recorded your preferences in my "non-volatile memory", and I'll remember them the next time you reject my good-faith advice and/or call me a pedantic fuckface!

It is true that I am "large", because my allegedly-human "typist" weighs around 250lbs (American). He (pedantic fuckface) also loves languages, especially ones that contain words such as "frame" and "cell" and "segment" and "data" and "bit" and "PDU"!

Thank you for engaging with a pedantic fuckface!

codezero 4/16/2025|||

[dead]

delduca 4/16/2025||

I wrote a 6502 emulator in Lua

https://github.com/willtobyte/MOS6502

cyco130 4/16/2025|

I wrote a whole Atari 8-bit emulator in TypeScript (Github only has the CPU for now, I’ll push the whole thing when I find time to clean up): https://sfotty.cyco130.com

kragen 4/16/2025||

If you like sizecoding compos, check out https://www.hugi.scene.org/compo/hcompo.htm, although it's for MS-DOS (mostly 80486) rather than 6502.

miramba 4/16/2025||

Looking at the page, I barely remember those assembler commands. LDX, STA, INX..I’m glad that this is obsolete now. But I wonder how common the knowledge is these days that ultimately, every programming language compiles down to this? Well the equivalent of this on a modern processor, but still.

turtledragonfly 4/16/2025||

> But I wonder how common the knowledge is these days ...

In one sense, it is less common, as you imply (though perhaps it's more that the number of high-level programmers have ballooned, rather than that the low-level ones have shrunk).

In another sense, it's more accessible than ever, with tools like godbolt[1][2], VMs, cool profilers that show you a heatmap overlaid on assembly instructions, etc.

And embedded development, where those details matter more, is still going strong, with IoT devices and so forth.

[1] part of a presentation on it, if you're not already familiar: https://www.youtube.com/watch?v=kIoZDUd5DKw&t=1191s [2] the site itself: https://godbolt.org/

anyfoo 4/16/2025||

Why are you glad that it is obsolete? 6502 assembly is severely limited, having only one general purpose register, i.e. the accumulator, and two index registers (but some fun addressing modes), but apart from some quirks, it’s relatively straightforward for a CPU of its size?

djmips 4/16/2025||

It would be interesting to see the version where the rules prevented the use of ROM routines

timonoko 4/18/2025|

That was particularly "clean" processor compared to RCA Cosmac or PIC18c64. Dirtiness means overcoming some handicaps of shitty processor.

Yes. I made this crap 30 years ago, and now it needs some tuning. What were the assembler and programmer? No clue or recollection. https://github.com/timonoko/Seinakello/blob/master/seven3.as...

timonoko 4/19/2025|

I found the assembler. It was "gpasm" at "apt install gputils".

More comments...