2) Currently, nothing enforces shareability of the memory, since there is no per-task MMU setup yet. Rules for that will be laid out later. Right now the rule is that tasks and sub-tasks can share the memory.
I think we need the per-task MMU setup, it be simply unpiratical to need to unmap and map memory all the time.
Quote:
Right now, there are three possible ways to map the pages for the blob
Nice, sound like the Immediately be the one to use whit emulators. And maybe the mapping mode for buffers and smaller stuff.
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
"Rogue, Another example is mapping a large graphics card area. Right now, you are limited to the size of the PCI address space; graphics card drivers will make sure the right stuff is copied in the right addresses."
Did I misread this?
I'm not sure exactly what Rogue is saying. I was under the impression that the graphics card's VRAM PCI BAR size was fixed (at 256MB), which would mean that there would be no way for the CPU to ever directly access the additional VRAM. Rogue makes it sound like that size is adjustable. If so, then the additional VRAM could be made CPU accessible, and theoretically be used via ExtMem object.
Having said that. Even if this is possible, I would strongly discourage any direct CPU access to VRAM. The reason is that CPU access to VRAM is very slow, expecially with PCIe devices. Even when using altivec, the RAM=>VRAM transfer rate on the A1-X1000 is ~400 MB/s out of a theoretical max of 4 GB/s! DMA/GART is the only way to transfer data to/from VRAM at high speed. Borrowing the extra VRAM for use as generic RAM would be rather slow.
I'd like to point out that the CPU only being able to directly access the first 256 MiB of VRAM does not mean that the rest cannot be used for graphics purposes. Sure, Picasso96 can't use it because it has no concept of GPU-only VRAM. However, the GPU can access it, so 3D drivers could store textures and other data there. The code to use that extra VRAM isn't written yet, but it is on the to-do list.
I don't understand what Rogue means by that, you have p96LockBitMap that gives you renderinfo and pointer to the memory but as a application developer I never needed to page it.
Maybe its some thing low level, some thing hidden.
Svgalib on linux you can access memory using paged memory using vga_setpage(), the memory window there is fixed size of 64K.
(NutsAboutAmiga)
Basilisk II for AmigaOS4 AmigaInputAnywhere Excalibur and other tools and apps.
Adding additional Ram to upper mem at this point is not possible because there is no physical place for it, however if you can put just the graphics card there it would be very advantageous. More can come later.
I'm not sure what you mean, but there is no way that an ExtMem object could use any VRAM that the CPU does not have access to.
@QuikSanz
Quote:
Adding additional Ram to upper mem at this point is not possible because there is no physical place for it, however if you can put just the graphics card there it would be very advantageous. More can come later.
Sorry, but I have no idea what you mean, or how it relates to what I said. The 256 MB of VRAM that the CPU can access, is already directly mapped into address space and is, therefore, directly CPU accessible.
If I'm thinking what Rogue is thinking, The graphics card driver can point to any address it want's to. This is the only thing that a program will not even have to be aware of and therefor rewritten, at least that I can think of. If this operation is done at full speed, it should because it a simple transfer, there should be no speed penalty.
Chris
PS: Perhaps we will get more clarification on this later. I'm a user, not a coder.
You cannot "just" map the graphics card area, since there are going to be implications (especially with caching and read/write coherency) but it might offer a way to access large areas of graphics cards that do have a 64 bit address space.
It's mandatory that the CPU could access that memory, though. To be honest, I don't know how modern graphics cards like NVidia or RadeonHD do that; but I had previously seen the P-10 and a few other cards that could have 64 bit BAR's that the CPU could access.
Seriously, if you do want to contact me write me a mail. You're more likely to get a reply then.
No, I don't mean sizes are adjustable. I'm not sure were I saw this, but I recently came across a PCI card with a 64 bit BAR that was pretty large. I thought it was a graphics card, but come to think of it I am no longer sure (although I wouldn't know of any other cards that have such large BAR's).
Must be getting old, I can't remember what is was, dammit :(
Seriously, if you do want to contact me write me a mail. You're more likely to get a reply then.
That's too bad. Freeing up system memory would be superb.
Chris
PS: not sure why the CPU needs to read the mem in the card, just needs to display the info dumped in it, I'm just a novice anyway. I only look to move forward.
No, I don't mean sizes are adjustable. I'm not sure were I saw this, but I recently came across a PCI card with a 64 bit BAR that was pretty large. I thought it was a graphics card, but come to think of it I am no longer sure (although I wouldn't know of any other cards that have such large BAR's).
Okay, that makes sense. All Radeon HD cards have 64-bit BARs but, so far, I've never seen one with a BAR larger than 256MB.
AMD implements hUMA within the newst GPU - Heterogeneous Uniform Memory Access. See details: here As far i know - it allows to unify the CPU and GPU address space.
I thought you meant that from the beginning:) EMO is (IMO) indeed a very good idea to be used for GPU Computing together with hUMA :)
Cool! I wish we already could use that. But better get the Gallium & OpenGL first...
(I hope we seen blog update on that 3D part soonish as well...)
UPDATE: (because there was some discussion of visible memory space, example how memory is mapped in one case) Memory map on T1040QDS ---------------------- The addresses in brackets are physical addresses. Start Address End Address Description Size 990xF_FFDF_0000 0xF_FFDF_0FFF IFC - FPGA 4KB 1000xF_FF80_0000 0xF_FF80_FFFF IFC - NAND Flash 64KB 1010xF_FE00_0000 0xF_FEFF_FFFF CCSRBAR 16MB 1020xF_F803_0000 0xF_F803_FFFF PCI Express 4 I/O Space 64KB 1030xF_F802_0000 0xF_F802_FFFF PCI Express 3 I/O Space 64KB 1040xF_F801_0000 0xF_F801_FFFF PCI Express 2 I/O Space 64KB 1050xF_F800_0000 0xF_F800_FFFF PCI Express 1 I/O Space 64KB 1060xF_F600_0000 0xF_F7FF_FFFF Queue manager software portal 32MB 1070xF_F400_0000 0xF_F5FF_FFFF Buffer manager software portal 32MB 1080xF_E800_0000 0xF_EFFF_FFFF IFC - NOR Flash 128MB 1090xF_E000_0000 0xF_E7FF_FFFF Promjet 128MB 1100xF_0000_0000 0xF_003F_FFFF DCSR 4MB 1110xC_3000_0000 0xC_3FFF_FFFF PCI Express 4 Mem Space 256MB 1120xC_2000_0000 0xC_2FFF_FFFF PCI Express 3 Mem Space 256MB 1130xC_1000_0000 0xC_1FFF_FFFF PCI Express 2 Mem Space 256MB 1140xC_0000_0000 0xC_0FFF_FFFF PCI Express 1 Mem Space 256MB 1150x0_0000_0000 0x0_ffff_ffff DDR 2GB
So 256MB seems pretty standard (reservation) for PCIe cards.
Edited by KimmoK on 2014/6/9 13:23:36
- Kimmo --------------------------PowerPC-Advantage------------------------ "PowerPC Operating Systems can use a microkernel architecture with all it�s advantages yet without the cost of slow context switches." - N. Blachford
4) ExtMem works with both 32 and 64 bit CPU's. It theoretically works on the classic too, but you rarely have enough memory on the classic for something like that.
Other systems have limits which make it useless, for example the Sam440ep (512 MB RAM and only 1 PCI slot), and AFAIK it's the same for the µA1, but you underestimate the classic Amigas
A3000/A4000 with 7x Zorro3 bus board: 7 DKB3128/ZorRAM/BigRAM+ 256 MB Zorro3 cards, the current 1 GB Zorro3 space is only a software limit, 128 MB on the CyberStormPPC, 16 MB mainboard fast and 2 MB chip: 1.9 GB RAM. If it's possible to use a PCI board together with it somehow additionally 3-4 256 MB Radeon PCI gfx cards used only as RAM: 2.65-2.9 GB RAM
AMD implements hUMA within the newst GPU - Heterogeneous Uniform Memory Access. See details: here As far i know - it allows to unify the CPU and GPU address space.
Details on that are rather sketchy. AFAICT, it looks like this may be for their APU's only (i.e., CPU and GPU on the same chip), and not for plug in cards. The CPU on those chips is x86/x64, so they're no use to us.
With PCIe cards, the size of the VRAM BAR can still determine how much VRAM the CPU can see. This could easily be done, but it's up to the manufacturer to set that BAR's size. I guess that they still see a value in making it small enough for 32-bit systems.
" you'll see that PCI devices (incl. VRAM), the kernel and other memory buffers take up space outside that 2GB."
So when we get SMP support will it " use only mem from the 1st core bank of mem or both cores? Therefor 3Gb avail on each core and will it auto-configure across both? Let me know if you don't understand.
" you'll see that PCI devices (incl. VRAM), the kernel and other memory buffers take up space outside that 2GB."
So when we get SMP support will it " use only mem from the 1st core bank of mem or both cores? Therefor 3Gb avail on each core and will it auto-configure across both? Let me know if you don't understand.
I'm not sure that I understand what you're asking 100%. However, with Symmetric Multi-Processing (SMP), the cores share all memory and any process/thread can execute on any core. All items in memory use the same address on all cores. So, you won't end up with any more address space.
If system mem is spread across all cores it would be much better than 2Gb on one core and four on the other, as this would be nutz.
Okay, it sounds like you don't understand how SMP works. With SMP there is no such thing as 2 GB for one core and 4 GB for another; all cores can access all memory using exactly the same addresses. This is important because processes/threads can be executed by any core at any time. It's the OS' task to decide which core runs which thread at what time. If one core is reaching maximum capacity while another core is almost idle, then the OS can transfer a few threads to the other core to balance the load, and maximise throughput.