@Elwood Sorry, no. There were no comments on Aros-Exec so it didn't even show up as a blip on Power2People's radar screen. I suppose I could have started a bounty myself by offering some money.
I might just start tinkering soon. I'm getting a used computer from a friend and if I'm lucky it should have an nVidia graphics card so I can get it to run with the AROS Gallium3D drivers.
Here's how it would work: It would open up 256-color palette-mapped layers using 8-bit alpha-only framebuffer objects. Some values of the palette would correspond to a 1-dimensional texture indexed by the Y coordinate of the screen to produce copper-rainbow effects. The 0 value of all of the framebuffers would pass through to the lower-priority framebuffers in the background. If the fragment shader program defining this could handle up to 10 layers, it would make all of the sprite/playfield capabilities of the Classics available to high-end graphics card systems such as the SAM 4x0 series and AmigaOnes.
I think that I completely missed this thread when it was originally started...
IMHO, you'd be better off using OpenGL rather than Gallium since that would be more portable. Any graphics card that's capable of doing what you describe should have more than enough power to perform the extra 1D texture lookup to go from a 256 colour palette to a 16-/32-bit framebuffer. If you were running EUAE in windowed mode on a truecolour screen then you'd have to do this anyway.
I may end up doing both in the end. The problem isn't the 8-bit CLUT, it's getting OpenGL to open an 8-bit alpha-only framebuffer to save on video RAM. For those graphics cards that don't have Gallium, the OpenGL version will require 32-bits of memory per 8-bit pixel just because I don't know how to allocate an 8-bit framebuffer in OpenGL.
I may end up doing both in the end. The problem isn't the 8-bit CLUT, it's getting OpenGL to open an 8-bit alpha-only framebuffer to save on video RAM. For those graphics cards that don't have Gallium, the OpenGL version will require 32-bits of memory per 8-bit pixel just because I don't know how to allocate an 8-bit framebuffer in OpenGL.
You could do the conversion in the very same shader that does everything else, thus requiring no 8-bit framebuffer.
Samurai_Crow wrote: So planar-to-chunky could be done in the shader?
You've completely lost me now. You're planning to do a little processing on the GPU, then get the CPU to do P2C inbetween? In that case, the overhead of copying the data from/to VRAM (or accessing it directly) will probably kill any performance advantage gained from using the GPU for the first part.
If GPUs are capable of doing any planar processing at all,** then of course it can do C2P as well. In fact, if you're operating on planar data, then why use an 8-bit/pixel framebuffer at all? The 8-bit chunky format is meaningless if the data is planar. Use a 32-bit framebuffer so that you can execute operations in parallel on all channels. If you only scalar processing, then you're not using 3/4 of the available processing power.
Hans
** The GPU would have to be able to treat the input as integers instead of converting to floating-point, do bitwise operations, and support multi-texturing up to 8-textures for AGA.
Thanks for confirming that most of what I wanted to do is possible on some modern hardware. As you may know, I'm a member of the NatAmi team and being able to emulate SuperAGA in shaders would be one goal of mine, but being able to emulate only the AGA planar modes would still be a worthy goal.
@Thread
I'll try to keep you informed. Incidently, the new machine I got from my friend did not have an nVidia graphics card in it so I'm back to editing shaders in Linux. If I can get GLSL to perform as needed, I'll let you all know.
@Deniil If you want accurate timing, use UAE. If you want something that will emulate at full-speed, use the closest hardware equivalent to what you're emulating. In this case fragment shaders are the closest things that 3D graphics cards have to copper lists.
A shader is a program that runs on the GPU. They can be run to process vertex data or pixel data (or any other large arrays of data). For example, a vertex shader could take in vertices of triangles being rendered and transform them to screen coordinates. Next a fragment/pixel shader would run on every pixel in that triangle, and perform per-pixel lighting calculations. Or, the fragment shader could perform a blur effect, or a sharpen effect, or something completely different (such as Samurai_Crow's AGA emulation idea).
Samurai_Crow: I'll be more than happy to work with you in building this up with a framework around it as I have in mind to do something similar for CPU level operations...
so if you want to do the Graphical Chipset, I can work on handling the non-graphical chipset operations and splitting them out.
I will definitely be putting my own Radeon HD5450 through its paces to test anything if you want any direct coding help as well.
Before I commit to doing anything, I'm going to have to experiment with the shaders. I may have to do some of this in Linux or MacOSX because I want to be sure that the development system works well. Terminills on AROS Exec has agreed to give me an nVidia graphics card so I can try this out in AROS also. Once I get this working on OpenGL, I can look at what it would take to get the full version working on Gallium.
It just occurred to me that if I have a graphics card capable of doing bitwise operators, and I have a 32-bit pixel format, and I know the endian match of the screenmode (eg. ARGB vs. BGRA), then I can slice the 32-bit pixels up into an 8 bit-per-pixel display mode. The only catch is that the blitter won't recognize it as 8 BPP.
The shader editor under XCode 4.0.2 on my Mac doesn't support shaders beyond GLSL 1.2 which corresponds to OpenGL 2.1 or OpenGL-ES 2.0 . So much for shifting and masking bits.
I need a new OS. Even MacOSX Snow Leopard 10.6.8 is too far behind the curve for me now. :(
@Belxjander
What OS are you running that you can do OpenGL 3.1-level shaders?
Hello @Samurai Crow Could you explain what kind of algorithm you want to use to do the c2p ? Cause I cant imagine how a shader can do that???
I just imagined an example with 4 bitplanes a b c d So each biptlanes is readed 32 bits at once (Exactly like an RGBA texture)
so we have aR8 aG8 aB8 aA8 bR8 bG8 bB8 bA8 cR8 cG8 cB8 cA8 dR8 dG8 dB8 dA8
The first 8 pixels to obtain are all in aR8 bR8 cR8 dR8
First pixel will be easy: aR8/128*8 + bR8/128*4 + cR8/128*2 + dR8/128*1 (it can be done by multiplying input with a constant factor 1/128 then multiplying with a constant color 8 4 2 1) but how to do the next 7 pixels ???
I would convert all of the values to integers and see if there is a sampler for reading in ivec4 instead of vec4. I couldn't use floating point textures though. In OpenGL 3.x they introduced shift operations and modulo as well as bit masking.