PED81C is a video system for AGA Amigas that provides pseudo-native chunky screens, i.e. screens where each byte in CHIP RAM corresponds to a dot on the display. In short, it offers chunky screens without chunky-to-planar conversion or any CPU/Blitter/Copper sweat.
Notes: * due to the nature of the system, the videos must be watched in their original size (1920x1080); * YouTube's video processing has slightly reduced the visual quality (i.e. the result is better on real machines).
Full details straight from the documentation:
--------------------------------------------------------------------------------
CORE IDEA
The core idea is using SHRES pixels ("pixels" from now on) to simulate dots in a
CRT/LCD-like fashion.
Each dot is made of 4 pixels as follows:
ABCD
ABCD
ABCD
ABCD
where
X
X
X
X
represents a pixel.
The eye cannot really distinguish the pixels and, instead, perceives them almost
as a single dot whose color is given by the mix of the colors of the pixels. The
pixels thus constitute the color elements ("elements" from now on) of the dot.
The effect is not perfect though, as the pixels can still more or less be seen.
The sharper the display / the bigger the pixels, the worse the visual mix. In
practice, though, the effect works acceptably well on CRT, LCD and LED displays
alike.
The pixels can be assigned any RGB values ("base colors" from now on).
For example, the most obvious choice is:
RGBW
RGBW
RGBW
RGBW
Starting from the left, the pixels are used for the red, green, blue and white
elements of dots. The pixels can be assigned any values in these ranges:
R: $rr0000, where $rr in [$00, $ff]
G: $00gg00, where $gg in [$00, $ff]
B: $0000bb, where $bb in [$00, $ff]
W: $wwwwww, where $ww in [$00, $ff]
As a consequence, there is an overall brightness loss of at least 50%. For
example, the white dot (the brighest one) is obtained by assigning the pixels
the maximum values in the ranges (i.e. R = $ff0000, G = $00ff00, B = $0000ff,
W = $ffffff), which add up to $ffffff*2, the half of the absolute maximum value
of the 4 pixels, i.e. $ffffff*4.
Each set of base colors ("color model" from now on) produces the specific
palette that the dots are perceived in ("dots palette" from now on). To
understand how to calculate the dots palette, it is first necesssary to look at
how the screens work.
The raster, i.e. the matrix of the bytes (stored as a linear buffer) that
represent the dots, must reside in CHIP RAM. It is used as bitplane 1 and also
as bitplane 2, shifted 4 pixels to the right.
This how a byte %76543210 (where each digit represents at bit) in the raster is
displayed:
bitplane 2: 76543210
bitplane 1: 76543210
****
The marked bits are those that produce the dot that corresponds to the byte:
ABCD
ABCD
ABCD
ABCD
^^^^
bitplane 2: 76543210
bitplane 1: 76543210
The elements are thus indicated by the bit pairs in the byte:
%73 -> element A
%62 -> element B
%51 -> element C
%40 -> element D
Replacing the digits with letters gives a better representation:
%ABCDabcd
where:
X = most significant bit for element X
x = least significant bit for element X
Each element can have only 4 values corresponding to the bit pairs %00, %01,
%10 and %11. Such values are those stored in COLORxx. Therefore, the bit pairs
represent the COLORxx indexes:
bitplane 4 and 3 = %00 -> element A -> COLOR00 thru COLOR03
bitplane 4 and 3 = %01 -> element B -> COLOR04 thru COLOR07
bitplane 4 and 3 = %10 -> element C -> COLOR08 thru COLOR11
bitplane 4 and 3 = %11 -> element D -> COLOR12 thru COLOR15
Given that there are 4 elements and that each element can have 4 different
values, the total number of combinations (i.e. of dots colors) is 4^4 = 256.
In the RGBW color model, COLORxx could be set up as follows (for simplicity, the
low-order 12 bits are left to the automatic copy performed by AGA):
Without CPU and/or Blitter intervention, those bits cannot be eliminated - but
processing data is precisely what PED81C tries to avoid, so it is necessary to
find a way to deal with the spurious bits.
This is what happens with two consecutive bytes %ABCDabcd and %EFGHefgh:
Between the dots produced by the bytes as explained above ("desired dots" from
now on) is a dot that is made of bits coming from both the bytes ("middle dot"
from now on), i.e. %EFGH and %abcd. The simplest solution would be masking the
middle dot out with a no-DMA vertically repeating jailbar mask sprite, but that
would introduce a horrible vertical spacing between the columns of dots and
reduce further the brightness of the screen.
A smarter solution would be adding one more selector bitplane to distinguish
between desired dots and middle dots (for readability, from now on, 0 bits are
replaced with '·' where needed):
COLOR16 thru COLOR31 could then be set up so that the middle dots are mixes of
the desired dots, keeping in mind that the middle dots have the most and least
significant bits swapped around (the least significant bits of the left dot end
up in the most significant bits of the middle dot and the most significant bits
of the right dot end up in the least significant bits of the middle dot). The
simplest settings reflect the settings of the desired dots, but with the RGB
values assigned to the %01 and %10 bit pairs swapped around.
For example, in the RGBW color model:
left dot: $550000
middle dot: $ff0000
right dot: $aa0000
The middle dot would end up being a full red, stronger than the desired dots,
which is not visually correct nor logical, as the middle dots would be more
prominent than the desired dots. A solution could be dimming the RGB values of
middle dots.
For example, if they were halved, the result would be:
left dot: $550000
middle dot: $800000
right dot: $aa0000
The middle dot would be a good average of the desired dots. That works
conceptually, but in practice it causes the middle dots columns to look like
vertical scanlines - which is not desirable either.
The case of different hues is even more complicated. For example, if the bytes
were %10001000 ($ff0000) and %010001000 ($00ff00), the result would be:
left dot: $ff0000
middle dot: $55aa00
right dot: $00ff00
The middle dot would be a kind of average of the actual dots, although not
really good (a good average would be $808000).
It is possible to experiment with the COLORxx values to achieve different
results, but the overall scanlines-like effect would still remain. Moreover, the
3rd selector bitplane would steal a lot of CHIP bus slots. An alternative is
required.
The proposed solution consists in eliminating the 3rd selector bitplane and
assigning the bit pairs %01 and %10 the same RGB values (which basically gives
the most and least significant bits the same weight). As a downside, this
reduces the amount of dots colors: given that each element can have only 3
different values, the total number of colors falls down to 3^4 = 81.
left dot: $880000
middle dot: $ff0000
right dot: $880000
Again the middle dot would be brighter than the actual dots, but now this can
be easily solved by simply forbidding the %01 bit pair in bytes, given that it
can always be replaced by the %10 bit pair. So, the bytes would instead be both
%10000000 ($880000) and the result would be:
left dot: $880000
middle dot: $880000
right dot: $880000
Also the case of different hues, %10001000 ($ff0000) and %01000100 ($00ff00),
gives a correct result (for complete correctness, in this example the low-order
bits of COLOR02 and COLOR05 are set to 0):
left dot: $ff0000
middle dot: $808000
right dot: $00ff00
--------------------------------------------------------------------------------
COLOR MODELS
The CORE IDEA section introduces the RGBW color model, but the number of
possible color models is huge (2^288). For best results, it is adviceable to
define the color models that are most suitable to the graphics to be displayed.
The most obvious general-purpose color models are:
* CMYW: Cyan Magenta Yellow White
* G: Greyscale
* KC: Key Colors (red yellow green cyan blue magenta white)
* RGBW: Red Green Blue White
This table shows the COLORxx settings for the general-purpose color models.
For the G color model, the arithmetically perfect assignment would be:
* COLOR01, COLOR02: $333333
* COLOR05, COLOR06: $666666
* COLOR09, COLOR10: $999999
* COLOR13, COLOR14: $cccccc
However, the resulting dots palette would contain only 26 unique colors.
Each color model has strenghts and weaknesses. This table provides an evaluation
of the general-purpose color models (COLORS = number of unique colors in the
resulting dots palette).
Once the color model is defined, the corresponding dots palette can be
calculated by mixing the RGB values assigned to the bit pairs in the bytes from
0 to 255. The bytes which include a %01 bit pair should be treated as illegal
and thus be assigned one of the RGB values also assigned to a legal byte (the
easiest solution is to use the value of byte 0). The calculation of the RGB
value ($6a2b40) corresponding to the byte %10011010 in the RGBW color model,
done in the CORE IDEA section, makes for a practical example.
The PED81C archive includes GeneratePalette, a handy tool that generates a dots
palette according to the desired color model and then saves it to an ILBM file.
It normalizes to $ff the components of the calculated colors, so that the latter
are brighter and have a higher dynamic range than the actual dots palette
colors, allowing for better graphics conversion. Also, it assigns the value of
byte 0 to the illegal bytes.
X0: 24-bit RGB value for the %00 pair of element X
X2: 24-bit RGB value for the %10 pair of element X
X3: 24-bit RGB value for the %11 pair of element X
FFIS100: $ff treated internally as $100 (for better rounding)
FILE: output file
The 24-bit RGB values must be in hexadecimal format without prefix.
The palettes are suitable for screens which use bitplanes 3 and 4 as selector
bitplanes.
The PED81C archive also includes:
* the palettes for the general-purpose color models, stored as ILBM pictures;
* GeneratePalettes, a script that generates a few palettes (it can be used also
as a reference for GeneratePalette usage).
The palettes can be used to draw/convert graphics.
For example, to display a picture in an RGBW screen:
1. draw/remap the picture with the RGBW palette;
2. save the picture as raw chunky data;
3. copy the raw chunky data to the raster or use it directly as the raster.
--------------------------------------------------------------------------------
SETTING UP AND USING SCREENS
PED81C screens are obtained by opening SHRES screens with these peculiarities:
* the raster must be used as bitplane 1 and 2;
* bitplane 3 must be filled with %01010101 ($55);
* bitplane 4 must be filled with %00110011 ($33);
* bitplanes 2 and 4 must be shifted horizontally by 4 pixels;
* COLORxx must be set according to the chosen color model;
* the 4 pixels in the leftmost column are made of just the least significant
bits of the leftmost dots, so it is generally recommendable to hide them by
moving the left side of the window area by 1 LORES pixel to the right.
Notes:
* to obtain a screen which is W LORES pixels wide, the width of the raster must
be W*4 SHRES pixels = W/2 bytes (e.g. 320 LORES pixels -> 1280 SHRES pixels =
160 bytes = 160 dots);
* to obtain a scrollable screen, allocate a raster bigger than the visible area
and, in case of horizontal scrolling, set BPLxMOD to the amount of non-
fetched dots (e.g. for a raster which is 256 dots wide and is displayed in
a 320 LORES pixels area, BPLxMOD must be 256-320/2 = 96);
* HIRES/SHRES resolution scrolling is possible, but it alters the colors of the
leftmost dots;
* given the high CHIP bus load caused by the bitplanes fetch, it is best to
enable the 64-bit fetch mode (FMODE.BPLx = 3).
In general, given a raster which is RASTERWIDTH dots wide and RASTERHEIGHT dots
tall, the values to write to the chipset registers in order to create a centered
screen can be calculated as follows:
* SCREENWIDTH = RASTERWIDTH * 8
* SCREENHEIGHT = RASTERHEIGHT
* DIWSTRTX = $81 + (160 - SCREENWIDTH / 8)
* DIWSTRTY = $2c + (128 - SCREENHEIGHT / 2)
* DIWSTRT = ((DIWSTRTY & $ff) << 8) | ((DIWSTRTX + 1) & $ff)
* DIWSTOPX = DIWSTRTX + SCREENWIDTH / 4
* DIWSTOPY = DIWSTRTY + SCREENHEIGHT
* DIWSTOP = ((DIWSTOPY & $ff) << 8) | (DIWSTOPX & $ff)
* DIWHIGH = ((DIWSTOPX & $100) << 5) | (DIWSTOPY & $700) |
((DIWSTRTX & $100) >> 3) | (DIWSTRTY >> 8)
* DDFSTRT = (DIWSTRTX - 17) / 2
* DDFSTOP = DDFSTRT+SCREENWIDTH / 8 - 8
Example registers settings for:
* screen equivalent to a 319x256 LORES screen
* 160 dots wide raster
* blanked border
* 64-bit sprites and bitplanes fetch mode
* sprites on top of bitplanes
* sprites colors assigned to COLOR16 thru COLOR31
The selector bitplanes need a lot of RAM. To save RAM drastically it is enough
to store just 1 line for each of them and to reset BPLxPTx with the Copper
during the horizontal blanking period of every rasterline. As a downside, this
steals some CHIP bus slots and complicates Copperlists.
#2
If a selector bitplane is omitted, the elements become 2 couples of identical
elements; if both the selector bitplanes are omitted, all the elements become
equal. Omitting the selector bitplanes saves (a lot of) CHIP bus slots and can
be useful in particular cases. For example, the demo THE CURE does not use any
selector bitplanes and uses bytes of the kind %HHHHLLLL, where H = High bit,
L = Low bit; this, thanks to jailbar mask sprites produces perfect LORES-looking
4-color pixels (which, together with bitplanes DMA toggling every other
rasterline, produces a dot-matrix display).
#3
If the visual output suffers from a heavy "jailbars" effect, it could be
improved by shifting every other rasterline by 1 to 3 pixels - for example:
To lessen the dithering of tweak #3 and improve the color mix, the shifting can
also be inverted on an alternate frame basis - for example, the rasterlines
could be shown on the next frame as follows:
However, this causes flickering visuals (especially on displays with quick
response), so it is not really recommendable.
#5
Adding a horizontal scanlines effect by swapping the elements palette on an
alternate line basis (through BPLCON4) makes the visual output resemble that of
a CRT display.
#6
To reduce the amount of graphics to draw and the memory usage, the raster size
can be halved by repeating each rasterline once (which is easily obtained by
means of FMODE.BSCAN2 and BPLxMOD). This combines well with tweak #5.
#7
If needed, the bitplanes order can be reversed, i.e. the selector bitplanes
could be assigned bitplanes 1 and 2, and the raster bitplanes could be assigned
bitplanes 3 and 4:
In this case, COLORxx need to be set up differently:
bitplane 2 and 1 = %00 -> element A -> COLOR00 COLOR04 COLOR08 COLOR12
bitplane 2 and 1 = %01 -> element B -> COLOR01 COLOR05 COLOR09 COLOR13
bitplane 2 and 1 = %10 -> element C -> COLOR02 COLOR06 COLOR10 COLOR14
bitplane 2 and 1 = %11 -> element D -> COLOR03 COLOR07 COLOR11 COLOR15
Note: GeneratePalette does not support such arrangement.
#8
With a careful setup of COLORxx, the unused 4 bitplanes can be used to overlay
other graphics or even up to two more chunky screens, optionally with colorkey
and translucency. That, however, would increase noticeably the CHIP bus load.
The meaning of PED81C is "Pixel Elements Dots, 81 Colors".
#2
Although due to the middle dots the logical horizontal resolution is half of the
physical one, the averaging provided by the middle dots and SHRES quite fool the
eye.
#3
Visually, the best results are obtained with complex/dithered images, as plain
color areas and geometrical shapes reveal the pixels and the middle dots. In
particular, isolated dots look 3x-ish wide.
#4
81 is only the theoretical maximum number of dots colors. The actual number
depends on the chosen base colors.
#5
The core idea could be used also to display 24-bit pictures, but the coarseness
of the method wastes completely the subtlety of such high color resolution (also
verified experimentally).
#6
Usage of PED81C is of course welcome and encouraged. It would be nice if credit
were given. If used in a commercial production, I would appreciate if permission
were asked first and if I could receive a little share of the profits.
PED81C is very CHIP bus intensive: the bitplanes data fetched are twice that of
an equivalent 256 colors LORES screen. If Lisa had been able to use the BPLxDAT
values of inactive bitplanes (like, for example, Denise does with bitplanes 5
and 6 when 4 bitplanes only and HAM are enabled) BPL3DAT and BPL4DAT could have
been loaded with the selector values thus halving the DMA fetches - but
unfortunately that is not the case.
Therefore, one might wonder whether is PED81C is actually advantageous. A lot
depends on how graphics are rendered: for example, a favourable case is when the
CPU can keep on executing cached code after writing to CHIP RAM so that no/few
cycles are wasted between writes. A general and indirect evaluation can be done
by comparing PED81C to the traditional C2P methods as follows.
The measurements, for simplicity, are based on the amount of data to render,
convert (if needed) and fetch for output relatively to 1 line.
Reference regular screen:
* 320 pixels wide LORES
* 6 bits deep screen (for fairness, because PED81C can at most output 81 unique
colors and the actual number of colors, as shown above, might be even less
depending on the color model)
Assumptions:
* 1 chunky pixel = 1 byte
* CPU and Blitter operations in CHIP RAM involve 6 bitplanes
In only CHIP RAM is available, the figures are as follows.
If FAST RAM is available, the figures of PED81C do not change (as the raster
always resides in CHIP RAM), while the figures of the other cases are as
follows.
CPU-only C2P:
* rendering in FAST RAM: 320 bytes
* C2P reads from FAST RAM: 320 bytes
* C2P writes to CHIP RAM: 240 bytes
* bitplanes fetch: 240 bytes
* total: 640 bytes FAST RAM, 480 bytes CHIP RAM
CPU+Blitter C2P, 1 CPU pass and 1 Blitter pass:
* rendering in FAST RAM: 320 bytes
* C2P reads by CPU from FAST RAM: 320 bytes
* C2P writes by CPU to CHIP RAM: 240 bytes
* C2P reads by Blitter from CHIP RAM: 240 bytes
* C2P writes by Blitter to CHIP RAM: 240 bytes
* bitplanes fetch: 240 bytes
* total: 640 bytes FAST RAM, 960 bytes CHIP RAM
Overall, PED81C has the edge performance-wise, especially considering that CPU
and Blitter are not busy with converting data. It must be pointed out, though,
that PED81C's logical horizontal resolution is halved (hence the 160 bytes per
line) and that the overall visual quality is inferior to that of a regular
screen mode.
The idea of using SHRES pixels as elements is by Fabio Bizzetti, who used it for
his Virtual Karting and Virtual Karting II games.
In the late 90s, I was in touch with him and he told me that his idea was to
"fool the RF signal" (or something along these lines). This got me thinking and
I came up with the core idea. Before writing here (in 2022!) I had never
bothered checking what he actually had done, but now I deemed it appropriate to
do it in order to provide a brief description of his method, both as an
acknowledgement of his brilliant idea and to provide more food for thought.
After starting Virtual Karting II in UAE, having a look at the moving graphics,
grabbing a screenshot, checking the values of BPLCON0 and BPLCON1, and checking
the bitplanes memory, I found out that he used bitplanes 1-3 as selector
bitplanes and assigned the pixels these elements (from left to right): red-
orange-yellow-green-cyan-azure-blue-purple (so, there are no middle dots and
dots are really 2x-wide). To mitigate the columns-looking result, he applied the
crosshatch tweak, swapping the scroll offsets on an alternate frame basis.
#2
Between the end of the 90s and 2003 I had created a system (implemented as a
shared library) based on the same core idea, but using 3 selector bitplanes.
PED81C is actually a simplification of that system, born from precisely from the
removal of the middle dots selector bitplane to improve the speed.
The old system was really rich feature-wise, as it provided:
* 256 colors screens
* HalfRes screens: screens like PED81C's
* FullRes screens: screens without middle dots - this was achieved by means of
a conversion performed by the CPU, optionally assisted by the Blitter (for
the record, the CPU-only conversion allowed 320x256 screens at about 50 fps
on an Amiga 1200 equipped with a Blizzard 1230-IV and 60 ns FAST RAM)
* chequer effect: crosshatch tweak for HalfRes screens
* double and triple buffering
* 5 embedded color models (RGBW, RGBM, RGBP, RGBPS, RGB332)
* color/palette handling functions (color setting, color remapping, 24-bit
fading and 24-bit cross-fading)
* Cross Playfield mode: 256 color screen overlay on top of another screen with
any degree of opacity between 0 and 256 (in practice, this produced 16-bit
graphics)
* Dual Cross Playfield mode: like Cross Playfield mode, but with a selectable
colorkey
* graphical contexts (clipping, drawing modes)
* pixmap fuctions (blitting, zooming, rotzooming)
* graphical primitives
* font functions
* ILBM functions
One might wonder why such system is not public - the reasons are:
* the core would need to be re-designed;
* the implementation could be better;
* the accessory functions (like the graphical ones) should be in a separate
library;
* the documentation would need a major overhaul.
Basically, I do not consider the system suitable for public distribution. I
would rather redo it from scratch... but that is precisely why PED81C was born:
while thinking how to improve the system, I realized how to eliminate the 3rd
selector bitplane and decided to get rid of the FullRes screens, because the
point of these systems is obtaining chunky screens without data conversion
(otherwise, it is better to use one of the traditional C2P methods, which give
better visual results).
#3
Originally I had planned to use PED81C to make a new game. However, I could not
come up with a satisfactory idea; moreover, due to personal reasons, I had to
stop software development. Given that I could not predict when/if I would able
to produce something with PED81C and given that the war in Ukraine put the world
in deep uncertainty, I decided that it was better to release PED81C to avoid
that it went wasted and also as a gift to the Amiga community.
I must admit I have been tempted to provide an implementation of PED81C in the
form of a library or of a collection of functions, but since setting up PED81C
screens is easy and since general-purpose routines would perform worse than
tailor-made ones, I decided to let programmers implement it in the way that fits
best their projects.
Edited by saimo on 2022/3/9 18:03:25 Edited by saimo on 2023/6/19 22:33:45 Edited by saimo on 2023/6/21 16:11:05 Edited by saimo on 2023/6/26 20:56:07 Edited by saimo on 2023/11/28 23:01:50 Edited by saimo on 2023/11/28 23:05:45 Edited by saimo on 2023/11/29 12:13:25 Edited by saimo on 2024/4/2 21:12:09
RETREAM - retro dreams for Amiga, Commodore 64 and PC
I don't know how that works, so I can't answer. The best, most complete and shortest explanation I can give is the one in the CORE IDEA section of the documentation.
Clever... is it sort of like subpixel font rendering to achieve the effect?
No. Visual results might be similar, but it's something completely different. For subpixel font rendering you basically just quadruple or sixteenfold the resolution a B/W font is rendered to, then you scale the result down to the original resolution again but using grey scale colours (or matching alpha masks) instead of just B/W for the 2x2 or 4x4 pixels groups which have 1/4, 1/2, 3/4, or in case of sixteenfold rendering additionally 1/16, 2/16 ... 14/16, 15/16, black pixels.
@joerg no that's not sub pixel rendering that's what apple uses for retina display and such. sub pixel rendering anti aliases by splitting each full pixel into separate r g and b pixels and then setting boundary pixels to swatches that will blur the line between the white and the black.
I guess you've been influenced by the video examples, but "native" chunky would be better used for other things (although, I did try to imagine if an interesting horizontal shooter could be done). To be honest, I feared I'd get a flock of Doom-like game requests/proposals :D
Uploaded an archive with updated documentation. While at it, given that I was asked for a source code example, I whipped up an AMOS Professional program that shows how to set up a PED81C screen and to perform some basic operations on it - hopefully, this will be easy to understand and also open the door to AMOS programmers. The program source is included in the archive.
'-----------------------------------------------------------------------------
'$VER: PED81C example 1.3 (28.11.2023) (c) 2023 RETREAM
'Legal terms: please refer to the accompanying documentation.
'www.retream.com/PED81C
'contact@retream.com '-----------------------------------------------------------------------------
'-----------------------------------------------------------------------------
'DESCRIPTION
'This shows how to set up a PED81C screen and to perform some basic operations
'on it.
'Screen features:
' * equivalent to a 319x256 LORES screen
' * 160 dots wide raster
' * single buffer
' * blanked border
' * 64-bit bitplanes fetch mode
' * CMYW color model
'
'NOTES
'The code is written to be readable, not to be general-purpose/optimal.
'-----------------------------------------------------------------------------
Procedure _ALLOCATE_BITPLANE[BANKINDEX,SIZE]
'--------------------------------------------------------------------------
'DESCRIPTION
'Allocates a CHIP RAM buffer to be used as a bitplane.
'
'INPUT
'BANKINDEX = index of bank to use
'SIZE = size [bytes] of bitplane
'
'OUTPUT
'64-bit-aligned bitplane address (0 = error)
'
'WARNINGS
'The buffer must be freed with Erase BANKINDEX or Erase All.
'--------------------------------------------------------------------------
Trap Reserve As Chip Data BANKINDEX,SIZE+8
If Errtrap=0 Then A=(Start(BANKINDEX)+7) and $FFFFFFF8
End Proc[A]
Procedure _DEINITIALIZE_SCREEN
'--------------------------------------------------------------------------
'DESCRIPTION
'Deinitializes the screen.
'
'WARNINGS
'Can be called only if the display is off.
'--------------------------------------------------------------------------
Erase All
Doke $DFF1FC,0 : Rem FMODE
End Proc
Procedure _INITIALIZE_AMOS_ENVIRONMENT
'--------------------------------------------------------------------------
'DESCRIPTION
'Ensures the program cannot be interrupted or brought to back, and turns
'off the AMOS video system.
'--------------------------------------------------------------------------
Break Off
Amos Lock
Comp Test Off
Auto View Off
Update Off
Copper Off
_TURN_DISPLAY_DMA_OFF
End Proc
Procedure _INITIALIZE_SCREEN
'--------------------------------------------------------------------------
'DESCRIPTION
'Initializes the screen.
'
'OUTPUT
'-1/0 = OK/error
'
'WARNINGS
'_DEINITIALIZE_SCREEN[] must be called also in case of failure.
'
'NOTES
'Sets RASTERADDRESS.
'--------------------------------------------------------------------------
'Allocate the raster.
_ALLOCATE_BITPLANE[10,RASTERSIZE] : If Param=0 Then Pop Proc[0]
RASTERADDRESS=Param
'Allocate and fill the selector bitplanes.
_ALLOCATE_BITPLANE[11,RASTERSIZE] : If Param=0 Then Pop Proc[0]
B3A=Param
Fill B3A To B3A+RASTERSIZE,$55555555
_ALLOCATE_BITPLANE[12,RASTERSIZE] : If Param=0 Then Pop Proc[0]
B4A=Param
Fill B4A To B4A+RASTERSIZE,$33333333
'Set the chipset.
DIWSTRTX=$81+(160-RASTERWIDTH)
DIWSTRTY=$2C+(128-RASTERHEIGHT/2)
DIWSTRT=((DIWSTRTY and $FF)*256) or((DIWSTRTX+1) and $FF)
DIWSTOPX=DIWSTRTX+RASTERWIDTH*2
DIWSTOPY=DIWSTRTY+RASTERHEIGHT
DIWSTOP=((DIWSTOPY and $FF)*256) or(DIWSTOPX and $FF)
DIWHIGH=((DIWSTOPX and $100)*32) or(DIWSTOPY and $700) or((DIWSTRTX and $100)/8) or(DIWSTRTY/256)
DDFSTRT=(DIWSTRTX-17)/2
DDFSTOP=DDFSTRT+RASTERWIDTH-8
Doke $DFF100,$4241 : Rem BPLCON0
Doke $DFF102,$10 : Rem BPLCON1
Doke $DFF104,$224 : Rem BPLCON2
Doke $DFF108,0 : Rem BPLMOD1
Doke $DFF10A,0 : Rem BPLMOD2
Doke $DFF1FC,$3 : Rem FMODE
End Proc[-1]
Procedure _LOAD_PICTURE_INTO_RASTER[FILEPATH$]
'--------------------------------------------------------------------------
'DESCRIPTION
'Loads a raw 8-bit chunky picture into the raster, ensuring that its size
'is correct.
'
'IN
'FILEPATHS = path of picture file
'
'OUTPUT
'-1/0 = OK/error
'--------------------------------------------------------------------------
Trap Open In 1,FILEPATH$ : If Errtrap Then Pop Proc[0]
L=Lof(1)
Close(1)
If L<>RASTERSIZE Then Pop Proc[0]
Trap Bload FILEPATH$,RASTERADDRESS
End Proc[Errtrap=0]
Procedure _RANDOMIZE_RASTER
'--------------------------------------------------------------------------
'DESCRIPTION
'Randomizes the raster by swapping 16 dots per frame, until a mouse button
'is pressed.
'--------------------------------------------------------------------------
XM=RASTERWIDTH-1
YM=RASTERHEIGHT-1
Repeat
C=16
While C
X0=Rnd(XM)
Y0=Rnd(YM)
X1=Rnd(XM)
Y1=Rnd(YM)
A0=Y0*RASTERWIDTH+X0+RASTERADDRESS
A1=Y1*RASTERWIDTH+X1+RASTERADDRESS
C0=Peek(A0)
Poke A0,Peek(A1)
Poke A1,A0
Dec C
Wend
_WAIT_SCREEN_BOTTOM
Until Mouse Click
End Proc
Procedure _RESTORE_AMOS_ENVIRONMENT
'--------------------------------------------------------------------------
'DESCRIPTION
'Restores the AMOS environment.
'--------------------------------------------------------------------------
Copper On
Update On
Auto View On
Amos Unlock
Break On
_TURN_DISPLAY_DMA_ON[$20]
End Proc
Procedure _TURN_DISPLAY_DMA_OFF
'--------------------------------------------------------------------------
'DESCRIPTION
'Disables the bitplanes, Copper and sprites DMA.
'--------------------------------------------------------------------------
_WAIT_SCREEN_BOTTOM
Doke $DFF096,$3A0 : Rem DMACON
End Proc
Procedure _TURN_DISPLAY_DMA_ON[SSPRITESFLAG]
'--------------------------------------------------------------------------
'DESCRIPTION
'Enables the bitplanes and Copper DMA.
'
'INPUT
'SSPRITESFLAG = $20/0 = turn / do not turn sprites on
'
'WARNINGS
'The chipset must have been set up properly.
'--------------------------------------------------------------------------
_WAIT_SCREEN_BOTTOM
Doke $DFF096,$8380 or SSPRITESFLAG : Rem DMACON
End Proc
Procedure _WAIT_SCREEN_BOTTOM
'--------------------------------------------------------------------------
'DESCRIPTION
'Waits for the bottom of the screen.
'--------------------------------------------------------------------------
While Deek($DFF004) and $3 : Wend
Repeat : Until(Leek($DFF004) and $3FF00)>$12C00
End Proc
Edited by saimo on 2023/11/28 23:02:09 Edited by saimo on 2023/11/29 12:13:47
RETREAM - retro dreams for Amiga, Commodore 64 and PC
Should be a definitive + to Amos Professional as an extension
I wrote the example in AMOS to make it easier to understand by more people, but the system isn't intended for any specific language. I decided to let programmers implement PED81C in the way that fits best their projects and in their language of choice, given that setting up PED81C screens is easy and given that general-purpose routines would perform worse than tailor-made ones.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
@Saime : I did never said that it should be "Amos Only" system ;) But simply that it may be interesting to add support for it using extension format. The support should handle screen setup. User will create his/her own custom methods for required effects.
I must admit that I read doc and didn't understand how it works but I'll do another check and if I find time, I will maybe look to see if I can do something interesting with it using extension format.
EDIT : From doc I read this : RGBW does this mean it is a 32 bit value ? Because after you talk about $ww using $wwwwww .. I don't understand. Ew... I think I understand you talk about RGB value for the ColorXX register used for the 4th pixel ? Same for others pixels. Right ?
Edited by AmiDARK on 2023/8/1 20:27:43 Edited by AmiDARK on 2023/8/1 20:29:31 Edited by AmiDARK on 2023/8/1 20:41:10
All we have to decide is what to do with the time that is given to us.
I must admit that I read doc and didn't understand how it works but I'll do another check and if I find time, I will maybe look to see if I can do something interesting with it using extension format.
EDIT : From doc I read this : RGBW does this mean it is a 32 bit value ? Because after you talk about $ww using $wwwwww .. I don't understand. Ew... I think I understand you talk about RGB value for the ColorXX register used for the 4th pixel ? Same for others pixels. Right ?
Key concepts: * each dot is 8 bit; * each dot is made of 4 SHRES pixels; * each SHRES pixel is 2 bit (i.e. it can be assigned 4 different RGB values); * the RGB values assigned to the SHRES pixels are 24 bit and can be chosen freely.
$wwwwww is a placeholder for any 24 bit value assigned to the W(hite) SHRES pixels in the RGBW mode (e.g. $333333 or $eeeeee).
I hope this helps.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
I have just released a little update, accompanied by the PED81C Voxel Engine (PVE), i.e. a new demo. If you can't be bothered trying it yourself, you can see it in this video - but beware: YouTube's video compression degraded the visual quality (especially the colors saturation and brightness).
PVE is an experiment to test the graphical quality and computational performance
of the PED81C system. It allows to move freely around a typical voxel landscape.
--------------------------------------------------------------------------------
GETTING STARTED
PVE requires:
* Amiga computer
* AGA chipset
* 200 kB of CHIP RAM
* 4 MB of FAST RAM
* PAL SHRES support
* digital joystick/joypad and mouse
* 2.1 MB of storage space
If the monitor / graphics card / scan doubler do(es) not support SHRES, the
colors will look off or even not show at all.
For example:
* MNT's VA2000 graphics card displays only the even columns of pixels, so only
reds and blues show;
* Irix Labs' ScanPlus AGA displays only the odd columns of pixels (contrary to
how is was originally marketed), so only greens and grays show.
To install PVE, unpack the LhA archive to any directory of your choice.
To start PVE, open the program directory and double-click the program icon from
Workbench or execute the program from shell.
* The map wraps around at its edges.
* The number shown in the top-left corner of the action screen indicates the
number of frames rendered in the last second.
* Upon returning to AmigaOS, PVE prints out:
* the total number of frames rendered;
* the total number of frames shown;
* the average number of frames rendered per second;
* the average time (expressed in frames) taken by the rendering of a frame.
* The graphics are first rendered in a raster in FAST RAM and then copied to a
triple-buffered PED81C raster in CHIP RAM.
* The screen resolution is 1020x200 SHRES pixels, which correspond to 255x200
LORES-sized dots and to 128x200 logical dots.
* Rendering is done by columns, from bottom to top and then left to right.
* The code applies a depth of 256 steps per column, so it evaluates 256*128 =
32768 dots per frame (and then renders only those which are actually visible).
* The code is 100% assembly.
* The code is optimized for 68030.
* The program supports only maps of 1024x1024 pixels, but it can be made to
support maps of other sizes by simply redefining the width and height
constants and reassembling the code.
* The height of the camera adapts automatically to that of the point it is at,
but it can be made user-controllable and its maximum value can be increased
almost to the point that the lanscape disappears at the bottom of the screen.
* On an Amiga 1200 equipped with a Blizzard 1230 IV mounting a 50 MHz 68030 and
60 ns RAM:
* the program runs at about 20.2 fps;
* the rendering of graphics alone runs at about 22.2 fps;
* the impact of PED81C is of about 22.2-20.2 = 2 fps - in other words,
writing the graphics to the PED81C raster requires about 50/22.2-50/20.2 =
0.223 frames (when only the bitplanes DMA is active);
* rendering the graphics directly to the PED81C raster degrades the
performance by about 2 to 3 fps (tested only with an older and less
optimized version).
* On an Amiga 1200 equipped with a PiStorm32, the program runs at 50 fps
(unsurprisingly).
* The map size is 1024x1024 pixels.
* The map requires 2 MB of FAST RAM.
* The program takes over the system entirely and returns to AmigaOS cleanly.
After a hiatus from programming of several months (due to a computer-unrelated
project), I decided to finally create something for PED81C because I had made
nothing with it other than a few little examples, I wanted to test its
graphical quality and computational performance, and... I felt like having some
good fun.
After some inconclusive mental wandering, the idea of making a voxel engine came
to mind for unknown reasons (I had never dabbled with voxel before).
When the engine was mature enough I decided to distribute PVE publicly (which
initially was not planned).
About the update, I fixed some palette values in a table in the documentation, added the formulas for calculating DIWSTRT, DIWSTOP, DIWHIGH, DDFSTRT and DDFSTOP to the documentation and implemented them in the AMOS Professional source code example. This is the snippet relative to the register settings:
In general, given a raster which is RASTERWIDTH dots wide and RASTERHEIGHT dots
tall, the values to write to the chipset registers in order to create a centered
screen can be calculated as follows:
* SCREENWIDTH = RASTERWIDTH * 8
* SCREENHEIGHT = RASTERHEIGHT
* DIWSTRTX = $81 + (160 - SCREENWIDTH / 8)
* DIWSTRTY = $2c + (128 - SCREENHEIGHT / 2)
* DIWSTRT = ((DIWSTRTY & $ff) << 8) | ((DIWSTRTX + 1) & $ff)
* DIWSTOPX = DIWSTRTX + SCREENWIDTH / 4
* DIWSTOPY = DIWSTRTY + SCREENHEIGHT
* DIWSTOP = ((DIWSTOPY & $ff) << 8) | (DIWSTOPX & $ff)
* DIWHIGH = ((DIWSTOPX & $100) << 5) | (DIWSTOPY & $700) |
((DIWSTRTX & $100) >> 3) | (DIWSTRTY >> 8)
* DDFSTRT = (DIWSTRTX - 17) / 2
* DDFSTOP = DDFSTRT+SCREENWIDTH / 8 - 8
RETREAM - retro dreams for Amiga, Commodore 64 and PC
v1.1 (22.12.2023) * Reworked screen buffering, so that the raster data is more efficiently written to CHIP RAM when bitplanes DMA is inactive. * Improved 68030 caches handling. * Added 68040 and 68060 caches handling. * Added MMU handling to avoid that the MMU affects the speed negatively. * Optimized rendering core by making it write the dots sequentially. * Made a little 68060-specific code optimization. * Ensured 68060 susperscalar dispatch is enabled. * Added live-toggable staggered lines video filter, which helps see better colors on devices that do not support SHRES and reduces the jailbars effect on devices that support SHRES (to enable/disable: [F1]). * Made fps indicator live-togglable (to enable/disable: [F2]). * Made quitting from the voxel screen return to the splash screen. * Replaced mouse controls with keyboard controls. * Added benchmark function. * Added command line switches to control the CPU caches. * Fixed bug that caused a longword to be written to a random location when the fps indicator was on. * Fixed an innocuous initialization bug. * Made cleanup code more robust. * Updated, extended and fixed documentation.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
It was ages that I intended to dig up some 20+ years old code and use it to play with PED81C a little more. Finally I got around to do it and came up with a new test program called Zoomaniac. Details in the video and in the manual excerpt below. Download available at https://retream.itch.io/ped81c.
Zoomaniac has been written to evaluate the performance on a stock Amiga 1200 of
a general-purpose texture scaling routine that writes directly to a PED81C
raster.
The following results are relative to the full screen effect that zooms the
cosmonaut in and out.
On a stock Amiga 1200, the execution speed is between 25 and 26 fps. If the
staggered lines are turned on, the performance drops by about 1 fps (which was
unexpected, since all that such option adds is a Copper WAIT and a Copper MOVE
for each rasterline).
Given that the DMA load caused by PED81C is "double" (see its documentation for
the details), a version that uses only half the number (2) of bitplanes has been
made to check the performance as if the Amiga had a native chunky video mode.
Surprisingly, the performance did not improve at all: relatively to the CHIP bus
access, the scaling code must interleave so nicely with the bitplane data
fetches that having more bus cycles available does not make any/much difference.
An Amiga 1200 equipped with a 68030 clocked at 50 MHz and 60 ns FAST RAM easily
performs at steady 50 fps. To find out the maximum performance, new tests were
made with special versions of the program that had the video synchronization
code disabled.
The speed when running the program normally was between 77 and 78 fps. The
staggered lines option lowered the fps by about 2. The 2 bitplanes versions
performed better, reaching 80-81 fps or, with the staggered lines on, 79-80 fps.
Like on the stock Amiga 1200, the extended Copperlist that implements the
staggered lines causes a small and similar performance drop. Instead, the
halving of the bitplanes DMA load did produce a speed increase.
The following table sums up the results.
S = stock Amiga 1200
E = Amiga 1200 68030 @50 MHz / 60 ns FAST RAM (Blizzard 1230 IV)
2 = 2 bitplanes on
4 = 4 bitplanes on
L = staggered lines on
Notes:
* when FAST RAM is detected, an alternative and more suitable scaling routine
is used (although writes still happen to CHIP RAM);
* on (some?) machines equipped with FAST RAM an even faster strategy would be
rendering to FAST RAM and then simply copying at the maximum speed the
rendered frame to the CHIP RAM raster.
* The scaling routine fits any rectangle from a texture into a rectangle of any
size and ratio of another texture with nearest-neighbor matching.
* Logic and rendering are totally asynchronous: the logic runs always at 50 Hz
and the rendering never stops (unless it reaches the limit of 50 fps, imposed
by the display refresh rate), thus exploiting the machine's full potential.
* The screen buffering employs three buffers in CHIP RAM.
* The screen resolution is 1020x256 SHRES pixels, which correspond to 255x256
LORES-sized physical dots and to 128x256 logical dots.
* The code is 100% assembly.
* The program takes over the system entirely and returns to AmigaOS cleanly.
CHANGELOG
March 27, 2024 * Added the Zoomaniac demo. * [PED81C Voxel Engine] Made a couple of minor changes. * [PED81C Voxel Engine] Updated documentation.
January 1, 2024 * Rebuilt demos against latest custom framework. * [PED81C Voxel Engine] Optimized slightly background rendering. * [PED81C Voxel Engine] Corrected benchmark fps calculation (312 rasterlines were considered instead of 313). * [PED81C Voxel Engine] Built against latest custom framework. * [PED81C Voxel Engine] Updated, extended and fixed documentation.
Edited by saimo on 2024/3/29 13:51:16
RETREAM - retro dreams for Amiga, Commodore 64 and PC
In response to the feedback received, I have uploaded a new version of Zoomaniac that allows to enable/disable the fps limit by means of [F3].
* The number shown in the top-left corner of the effects screen is the fps
indicator, which reports the number of frames rendered in the last second.
It is limited to 999.
* When the fps limit is on, the maximum number of frames rendered per second
is 50 also on the most powerful machines, as the display refresh rate is 50
Hz. When the fps limit is off, frames are rendered without pausing when the
previously rendered frame/frames has/have not (completely) displayed yet. On
machines which cannot run the program at 50 fps or more, turning off the
limit has no effect whasoever; on the other machines, the only visible effect
is that the fps indicator goes beyond 50, thus giving a measure of the maximum
speed that the machine can reach.
Also, this new version runs 1-2 fps faster on 68030 thanks to the data cache burst:
* on 68030 tests proved that: it is advantageous to turn the data cache burst
on when scaling a 128 dots wide rectangle to a rectangle wider than 8 dots
(i.e. with an X scaling factor greater than 1/16); with a scaling factor of
1/16 or less the difference proved to be minimal when both the source and
destination rectangles were 256 dots tall; considering that turning the data
cache burst off would therefore be advantageous only with very narrow and
tall rectangles (which are uncommon and intrinsically rather inexpensive),
it is not worth it to implement a data cache burst management inside the
scaling routine;
CHANGELOG
v1.1 (28.3.2024) * Turned the 68030 data cache burst on for slightly faster performance. * Made a couple of minor optimizations. * Added frames rendering limit toggle ([F3]). * Worked on fps indicator: added hundreds digit; made digits smaller; made digits auto-clearing, so that they read correctly also when they are not cleared before drawing. * Made staggered lines toggle as soon as [F1] is pressed (instead of when it is released). * Updated splash screen. * Redesigned the 'M' in the logo. * Updated and extended manual.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
To have a complete set of scaling routines (which hopefully I'll use for something someday), I added support for color-keying, zero-keying (color-keying with color 0), and horizontal and vertical flipping. Morever, given that initially the focus was on the stock A1200, the performance on expanded machines was not optimal (as the rendering was done directly in CHIP RAM), so I added also an alternative buffering method that, when 2 rasters can be allocated in FAST RAM, allows rendering in FAST RAM and then copies the rendered raster to the raster in CHIP RAM as quickly as possible, starting when the beam reaches the bottom of the screen. This, relatively to the first effect in the test program (which is the only one whose performance was measured until now), produced a gain of 8-9 fps on my 68030-equipped Amiga 1200.
The updated test program (available at https://retream.itch.io/ped81c), to demostrate the new features, streches and shrinks a color/zero-keyed texture covering almost the entire screen over a full-screen zooming background, with all the possible flipping combinations. That is of course a bit taxing for a stock A1200, whose performance drops between 12 and 16 fps in the busiest cases.
(Side note: the video was recorded before finalizing the test program, so it shows an outdated splash screen and zooming jumps relatively to the background when passing from/to the color/zero-keying effects.)
This snippet from the updated manual provides further details.
Zoomaniac has been written to evaluate the performance on stock and modestly-
accelerated Amiga 1200s of some general-purpose texture scaling routines in
conjunction with PED81C.
--------------------------------------------------------------------------------
GETTING STARTED
Zoomaniac requires:
* Amiga computer
* AGA chipset
* 170 kB of CHIP RAM
* 1.2 MB of any RAM
* PAL SHRES support
* keyboard
* 1 MB of storage space
To install Zoomaniac, unpack the LhA archive to any directory of your choice.
To start Zoomaniac, open the program directory and double-click the program icon
from Workbench or execute the program from shell.
If your monitor / graphics card / scan doubler do(es) not support SHRES, the
colors will look off or even not show at all. In such case, to hopefully fix the
colors a bit, try the staggered lines option.
* The staggered lines shift the odd lines by 1 SHRES pixel to the right. On
systems which handle SHRES correctly, that will reduce the jailbars effect
(but give the screen a kind of wavy look). On system which handle SHRES as
HIRES (for example, MNT's VA2000 graphics card and Irix Labs' ScanPlus AGA -
contrary to how is was originally marketed - display only the even or odd
columns of pixels, so only reds and blues or greens and grays show), that
helps improving the colors a bit (giving the screen a kind of scanline
effect). On other systems, the results are unpredictable, but the option is
still worth a try.
* The number shown in the top-left corner of the effects screen is the fps
indicator, which reports the number of frames rendered in the last second.
It is limited to 999.
* When the fps limit is on, the maximum number of frames rendered per second
is 50 also on the most powerful machines, as the display refresh rate is 50
Hz. When the fps limit is off, frames are rendered without pausing when the
previously rendered frame/frames has/have not (completely) displayed yet. On
machines which cannot run the program at 50 fps or more, turning off the
limit has no effect whasoever; on the other machines, the only visible effect
is that the fps indicator goes beyond 50, thus giving a measure of the maximum
speed that the machines can reach.
The following results are relative to the full screen effect that zooms the
cosmonaut in and out without flipping. The source textures are 256x512 dots and
the screen internally consists of 128x256 dots. Since a dot is represented by a
byte, 128x256 = 32768 bytes are fetched and written to render a frame.
On a stock Amiga 1200, the execution speed is between 25 and 26 fps. If the
staggered lines are turned on, the performance drops by about 1 fps (albeit all
that such option adds is a Copper WAIT and a Copper MOVE for each rasterline).
Given that the DMA load caused by PED81C is "double" (see its documentation for
the details), a version that uses only half the number (2) of bitplanes has been
made to check the performance as if the Amiga had a native chunky video mode.
Surprisingly, the performance did not improve at all: relatively to the CHIP bus
access, the scaling code must interleave so nicely with the bitplane data
fetches that having more bus cycles available does not make any/much difference.
An Amiga 1200 equipped with a 68030 clocked at 50 MHz and 60 ns FAST RAM easily
performs at steady 50 fps. To find out the maximum performance, tests were made
with the fps limit off.
The speed when running the program normally was between 84 and 86 fps. The
staggered lines option lowered the fps by about 1. The 2 bitplanes versions ran
at the same speed - in this case, that is because most of the CHIP RAM accesses
happen when no bitplanes DMA is going on (see TECHNICAL DETAILS section).
expanded Amiga 1200: Blizzard 1230 IV, 68030 @50 MHz, 60 ns FAST RAM
Notes:
* given that a stock Amiga 1200 reaches about 25.5 fps, it manages to render
128*256*25.5 = 835584 dots per second; considering that the 68020 is clocked
at 14.187580 MHz, rendering 1 dot requires about 14187580/835584 = 17 CPU
cycles;
* on 68030 tests proved that: it is advantageous to turn the data cache burst
on when scaling a 128 dots wide rectangle to a rectangle wider than 8 dots
(i.e. with an X scaling factor greater than 1/16); with a scaling factor of
1/16 or less the difference proved to be minimal when both the source and
destination rectangles were 256 dots tall; considering that turning the data
cache burst off would therefore be advantageous only with very narrow and
tall rectangles (which are uncommon and intrinsically rather inexpensive),
it is not worth it to manage the data cache burst inside the scaling
routines.
The scaling routines fit any rectangle from a texture into a rectangle of any
size and ratio of another texture with nearest-neighbor matching. Optionally,
they can flip the rectangles horizontally and/or vertically, and treat as
transparent the dots of a specific color (color-keying) or of color 0 (zero-
keying).
Color/zero-keying allows to render graphics of arbitrary shapes without masks
(which saves RAM and CPU cycles). Thanks to the fact that PED81C graphics always
use at most 81 colors, there are 256-81 = 175 colors that can be used for color-
keying without causing any visual loss.
For performance reasons, there are the 3 separate routines.
--------------------------------------------------------------------------------
OTHER TECHNICAL NOTES
* Logic and rendering are totally asynchronous: the logic runs always at 50 Hz
and the rendering never stops (unless it reaches 50 fps and the fps limit is
on), thus exploiting the machine's full potential.
* The screen is triple-buffered.
* When 2 rasters can be allocated in FAST RAM:
1. the graphics are rendered always to the available raster in FAST RAM;
2. after the rendering has completed and as soon as the bottom rasterline has
has been displayed, the rendered raster is copied as quickly as possible
to the raster in CHIP RAM (which is the one that gets displayed).
The copy successfully races the beam (on the expanded Amiga 1200 mentioned in
the PERFORMANCE section, it requires about 57 rasterlines during the vertical
blanking and 35 rasterlines during the fetching of the top rasterlines), so no
tearing occurs.
Such method yields a faster performance than rendering directly to a raster in
CHIP RAM (especially when there is overdraw and/or data gets also read from
the raster).
* The screen resolution is 1020x256 SHRES pixels, which correspond to 255x256
LORES-sized physical dots and to 128x256 logical dots.
* The code is 100% assembly.
* The program takes over the system entirely and returns to AmigaOS cleanly.
RETREAM - retro dreams for Amiga, Commodore 64 and PC
By chance I discovered that Zoomaniac might crash on my real A1200. After some investigation it turned out that it was due to a stack issue that happened when the execution dropped below 50 fps on 68020, 68030 and 68040 (an instruction was executed before instead of after a branch). That's fixed now. While searching for the problem, I realized a way to make the solid scaling routine a bit faster - so the bug, although finding it required some effort, was actually a good thing!