I do have some new stuff in progress, and have issue with one game, which do use some heavy shader enabling of which drop the FPS from 40 fps to just 10. Maybe anyone can spot issues with or bring ideas of how to optimise/reduce it so to make it not loose that much ?
Or at least to make it "half working" just without loosing so much FPS..
Shaders is "water" effect happens in the whole gameplay. Originally in fragment one "FogMode" was a bool, so i replace it on "int". There they are:
//reflection
_videoDriver->setRenderTarget(_reflectionMap, true, true); //render to reflection
//get current camera
scene::ICameraSceneNode* currentCamera = _sceneManager->getActiveCamera();
//set FOV anf far value from current camera
// _camera->setFarValue(currentCamera->getFarValue());
// _camera->setFOV(currentCamera->getFOV());
_camera->setNearValue(500.0f);
_camera->setFarValue(15000.0f); /// Must be bigger than water
_camera->setFOV(10*core::DEGTORAD); /// Playing with FOV and Z distance
core::vector3df position = currentCamera->getAbsolutePosition();
position.Y = -position.Y + 2 * RelativeTranslation.Y; //position of the water
_camera->setPosition(position);
/// NODE -> WATER: (Sea)
/// TO-DO: THIS ONE CAUSES MEMORY LEACKS ON RELOAD !!!!!!! ...
if(!waterNode){
waterNode = new RealisticWaterSceneNode(smgr, 10000, 10000, "water", irr::core::dimension2du(512, 512), nodeLevel, -1);
/// waterNode->setParent(nodeLevel); /// Attach to parent: Level
waterNode->setPosition(vector3df(2500,0,0));
waterNode->setRotation(vector3df(0,0,0));
waterNode->setWindForce(10.0f);
waterNode->setWindDirection(irr::core::vector2df(5.f, -5.f));
waterNode->setWaveHeight(0.3f);
waterNode->setWaterColor(video::SColor(0,50,50,50)); /// Black
waterNode->setColorBlendFactor(0.15);
/// Materials:
waterNode->setMaterialFlag(video::EMF_LIGHTING, true); // Node is affected by LIGHT?
waterNode->setMaterialFlag(video::EMF_FOG_ENABLE, true); /// Node is affected by FOG? - (Its the only node with the fog enabled)
waterNode->setMaterialFlag(video::EMF_BACK_FACE_CULLING, true); // Render both sides !!! Affects water reflex !!!
waterNode->setMaterialFlag(video::EMF_NORMALIZE_NORMALS, true);
/// waterNode->setMaterialFlag(video::EMF_ANISOTROPIC_FILTER, true); // Increase view distance quality (similar to sharpness)
waterNode->setMaterialType(video::EMT_SOLID);
};
2) upvector does't depend on inputs, it could be defined as const vec3 outside the function.
Interesting, how much speed drop shaders usually have, when inside of main we do vec3 something , which can be done outside ? I will test of course, just intersting theoriticaly it is micro optimisation, or something like "do as much as possible outside of main()"
Quote:
3) there is one confusing thing, this fogFactor which gets redefined inside branches. Should "float" part (redeclaration) be removed?
I think if i will go linear fogmode, and use it as i show at top, those floats gone too.
Quote:
1) is there a way to reduce details (vertices) somehow? For example, when water node is created by new.
Can't say for now, but only know that this "realistic water effect" is from this github :
Sadly i can't test for a few days on amigaos, but i do test win32 version that it still works.
Through can't see there if it improve anything, FPS the same more or less on win32, and not depends on the shader (while on amigaos4 it very depends, with shader 12 fps in game, without 42).
@Capehill Do you know any good article about general shaders optimisation ? I mean some basic rules like "you should't do this and this as it made shader be slow", or "never do that and that, or shader will be slow" ?
All i know for now, is little bits from there and there. I know that for example our opengles2 have glslangvalidator being disabled in terms of shader optimisation, because, it cause issues with Nova. What mean that we need to optimize all our shaders ourselfs.
I do find firstly some offline shader optimizator which for now inbuild inside of Unity, but till 2016 it was standalone: https://github.com/aras-p/glsl-optimizer
@Capehill I build glsl-optimizer from link above, and take a look what kind of shader is made for me after i throw in it original fragment shader, there is optimized output:
Quite "optimized" :) There and "mix" and things combined and more compacted, and instead few calls all put in one call, etc,etc. I tested on windows version, and it still works as original. Will check after few days how it will be on amigaos .. But probabaly that for real "optimized" enough version optimizator did.
Also the fact that this thing inbuild in Unity kind of tell that it should be indeed good enough.
EDIT: After thinking a bit more about, i think on Windows10 i will see no differences at all with any kind of shader optimisation, because it's sure some kind of good enough GLSL optimizator inbuild inside of windows drivers already, so when i run game with non-optimized shader, it still produce for me optimized by windows drivers shader.
So this need to be tested only on amigaos4, because only there it will make sense, as we do not have inbuild shaders optimization.
I only know that Khronos' page that I linked. Shader looks simple, no loops, no branches. If all those calculations must be done on the fragment level and cannot be moved to the vertex shader, then I cannot say what could be optimized. Have you tried lowest float precision?
Shader does however 3 texture lookups and there are FBOs involved so it can be tricky to diagnose exactly what takes time. If I understand correctly, engine must render scene to a texture which is then sampled by the shader so that reflections can be rendered on the water.
Quote:
glslangvalidator being disabled in terms of shader optimisation, because, it cause issues with Nova
No, no tickets, it just we year ago with Daniel tested his ogles2.library with glslangvalidator enabled optimisation level 1, level2 and level3 (mean not usuall -Ox gcc optimisation, but inbuild shaders-generated code optimization): in all levels we have more issues with Nova in compare with no optimisation at all. All tests i did were on bunch of shadertoy shaders.
But i think it's not bug per se. I.e. O2 (or O3) optimisation of course enabled when all this builded, what i mean shaders optimizator code . And for us it probably good to have it disabled, as we can made shaders as we want them to be and not rely on internal shaders optimizators.
In end of all, ff there were before shader optimizator code enabled in ogles, why we then need to thinkg about shader fixes at all, as everything will be done inside of this optimisator :)
@Capehill Ported that glsl-optimizer from Unity on amigaos4 natively :) one more tool :)
Btw, see what written in that glsl-optimizer readme:
Quote:
A C++ library that takes GLSL shaders, does some GPU-independent optimizations on them and outputs GLSL or Metal source back. Optimizations are function inlining, dead code removal, copy propagation, constant folding, constant propagation, arithmetic optimizations and so on.
Apparently quite a few mobile platforms are pretty bad at optimizing shaders; and unfortunately they also lack offline shader compilers. So using a GLSL optimizer offline before can make the shader run much faster on a platform like that. See performance numbers in this blog post.
Even for drivers that have decent shader optimization, GLSL optimizer could be useful to just strip away dead code, make shaders smaller and do uniform/input reflection offline.
Almost all actual code is Mesa 3D's GLSL compiler; all this library does is spits out optimized GLSL or Metal back, and adds GLES type precision handling to the optimizer.
This GLSL optimizer is made for Unity's purposes and is built-in starting with Unity 3.0.
Sadly, it add no single FPS. Very strange. But with disabled shader in whole, it add +30 FPS.. Then i tried to remove from optimizer shader fog completely, like this:
Still the same 14 fps. Once shader disabled, then 45 FPS. Wtf ..
Through, we use GL4ES there, so it auto conver those shaders a bit too, but usually it not add/change much , but just in case, i dump that shader after gl4es conversion happens (so what we exactly send to olges2), and that what we have with that optimized shader with disabled fog:
#version 100
precision highp float;
precision highp int;
float clamp(float f, int a, int b) {
return clamp(f, float(a), float(b));
}
float clamp(float f, float a, int b) {
return clamp(f, a, float(b));
}
float clamp(float f, int a, float b) {
return clamp(f, float(a), b);
}
vec2 clamp(vec2 f, int a, int b) {
return clamp(f, float(a), float(b));
}
vec2 clamp(vec2 f, float a, int b) {
return clamp(f, a, float(b));
}
vec2 clamp(vec2 f, int a, float b) {
return clamp(f, float(a), b);
}
vec3 clamp(vec3 f, int a, int b) {
return clamp(f, float(a), float(b));
}
vec3 clamp(vec3 f, float a, int b) {
return clamp(f, a, float(b));
}
vec3 clamp(vec3 f, int a, float b) {
return clamp(f, float(a), b);
}
vec4 clamp(vec4 f, int a, int b) {
return clamp(f, float(a), float(b));
}
vec4 clamp(vec4 f, float a, int b) {
return clamp(f, a, float(b));
}
vec4 clamp(vec4 f, int a, float b) {
return clamp(f, float(a), b);
}
float max(float a, int b) {
return max(a, float(b));
}
float max(int a, float b) {
return max(float(a), b);
}
uniform vec3 CameraPosition;
uniform float WaveHeight;
uniform vec4 WaterColor;
uniform float ColorBlendFactor;
uniform sampler2D WaterBump;
uniform sampler2D RefractionMap;
uniform sampler2D ReflectionMap;
varying vec2 bumpMapTexCoord;
varying vec3 refractionMapTexCoord;
varying vec3 reflectionMapTexCoord;
varying vec3 position3D;
void main ()
{
vec2 tmpvar_1;
tmpvar_1 = (WaveHeight * (texture2D (WaterBump, bumpMapTexCoord).xy - 0.500000));
gl_FragColor = mix (mix (texture2D (ReflectionMap, clamp (
((reflectionMapTexCoord.xy / reflectionMapTexCoord.z) + tmpvar_1)
, 0.00000, 1.00000)), texture2D (RefractionMap, clamp (
((refractionMapTexCoord.xy / refractionMapTexCoord.z) + tmpvar_1)
, 0.00000, 1.00000)), max (
normalize((CameraPosition - position3D))
.y, 0.00000)), WaterColor, ColorBlendFactor);
}
Didn't looks bad enough , just about the same, only with some external functions added on top. But that can't explain "Eating" of 30 FPS, right ?
Edited by kas1e on 2022/9/18 8:17:37 Edited by kas1e on 2022/9/18 8:23:06 Edited by kas1e on 2022/9/18 8:24:44 Edited by kas1e on 2022/9/18 8:29:38 Edited by kas1e on 2022/9/18 8:30:52
@Capehill Commenting out just refraction part: + 8-9 fps (so instead of 14, i have 22-23). Everything looks seems like the same a little bit mirrring of the things in the water start to be alittle bit less quality, and that all. Almost same effect..
Then, commenting out just reflection part and keeping refraction , make whole effect disappear , but then, not add a lot : it just add +6 fps, so whole fps start to be just 20.
If i comment out both, then i do have just 22 fps again and nothing else. So, the best i get is to comment out refration part and have 22-23 fps. Through, when i comment out whole usage of shader, i do have 45 fps.
Question is where is another 20 fps died in the shader.. 10 we loose on refraction then (strange why as well?)
See how it looks like originally:
(click open in new tab for fullsize)
So that place just 14 fps.
And that how it looks like when just refration (133-140) lines commented out :
(click open in new tab for fullsize)
So there i have 22-23 fps, but to have it be playable we should have at least stable 30 or better 60 ..
Looks like the fps drop is the result of rendering the entire scene an additional two times. Once to generate the refraction map, and once to generate the reflection map.
@Hans Simple removing one "_sceneManager->drawAll(); //draw the scene" call right after refraction calcualtion happens of course add FPS, but then, this refration take no place then too probabaly..
Anyway, simple commenting out everything inside of the "OnAnimate" function, gives me 40 FPS back in game, but of course, no water effect. From where slownes come dunno .. Even 40 FPS not enough even with shader, game is simple, just good looking, but written pretty bad..
Anyway, simple commenting out everything inside of the "OnAnimate" function, gives me 40 FPS back in game, but of course, no water effect. From where slownes come dunno .. Even 40 FPS not enough even with shader, game is simple, just good looking, but written pretty bad..
Some profiling would be needed to dig deeper. I still suspect that GL4ES is doing something that hurts performance on our systems.
@Hans We already did lot of profiling of all stuf many times as you remember, and it never point us on anything, but only on fact that gl calls happens.
So we need to go other route.
What about writing analogue of "slow for us" quake3map irrlicht test, workingdirectly over nova and/or ogles2, so we can see if it will give us 500 fps at least, or will be on the same level as it now with gl4es ?
@kas1e The profiling that I've done is with the manually coded driver profiling code (in special driver builds), which provides limited information. That's what gave me the impression that GL4ES and/or some games are glFlush()ing too often.
I haven't used a statistical profiler (not available at the time), or gprof (which still doesn't work?) to dig deeper.
Quote:
What about writing analogue of "slow for us" quake3map irrlicht test, workingdirectly over nova and/or ogles2, so we can see if it will give us 500 fps at least, or will be on the same level as it now with gl4es ?
That could help, if someone takes the time to write the code. It'll give us some idea of whether GL4ES is indeed the main culprit.
That could help, if someone takes the time to write the code. It'll give us some idea of whether GL4ES is indeed the main culprit.
Well.. in that case we on dead end. It's imho driver's author only can find what wrong :) Because anyone else who will wrote that test code, will made it bad/slow/wrong/or whatever which will show us nothing.
We were in hope that firstly DMA, then GART will help us out : but nope, nothing help, we in the same situation with speed in some cases, while, as i point out, pure CPU rendering is OK and comparable more or less sane with other x86 hardware ..
The only solution, is that driver's author will just wrote necessary test cases : for nova, for opengles2 , which we can test in all conditions, on different platforms, on win32, on linux, on amigaos4. Only then we can find what is cullpit.
We do have your SDL2+OGL book with examples which i made to work on os4 as well, so why we can't get for example this as a base and made some tests and run them on different win32 and different amigaos4 machines (under os4 and linux) ? That will not enough to show us anything ?
I mean, issue to find mean not just me or someone wrote another buggy test case, but it's need your time and full attention to this issue sadly :) Test cases, benchmarks and again test cases and test cases..
Isn't gfxbench already said us about bad raw speed for read/write and not reach the necessary theiretical limits ? Is it of the values we expect it to be ?
As for GL4ES itself : well, we do have some games which isn't GL4ES based for example, but directly use OGLES2 : for example Eldritch. It also slow when it comes to more heavy scenes. By slow i mean the same 15-20 FPS and slownes exactly looks the same as in case with other (gl4es based) games mean "when scene start to be more heavy, we drops a lot".
Edited by kas1e on 2022/9/20 9:50:33 Edited by kas1e on 2022/9/20 10:29:16
Isn't gfxbench already said us about bad raw speed for read/write and not reach the necessary theiretical limits ? Is it of the values we expect it to be ?
Edited by Spectre660 on 2022/9/20 10:19:26 Edited by Spectre660 on 2022/9/20 10:20:31 Edited by Spectre660 on 2022/9/20 10:22:39