Hola.
What are your thoughts on the article about the memory configuration of Navi12? Will they use GDDR6 in a 256-bit configuration or HBM? (German, but you can just put it in google translate)
Thanks for the comment and the info that it came with it. I ‘ve been very busy in the last days,
I doubt that AMD is going to use an HBM configuration in any of the RDNA cards with a single one exception. The one that they are going to release for the HPC market in the future.
GDDR6 is different than the old GDDR memories, it has 2 channels per die, this means that we have two channels of 16 bits per chip and because of this we need an interface per channel.


Well, it seems that Navi 12 has 16 interfaces, this means that it is another 256 bit bus like Navi 10 (RX 5700). Then we can conclude that AMD is doing with RDNA the same that Nvidia did with Turing, remember that the full TU106 (RTX 2070) and the TU104 are GPUs with different configuration but with the same memory configuration.
About Navi12, 3DCenter says:
AMD Navi 12
- High end version of Navi, likely resulting in the Radeon RX 5800 series
- Possibly between 3328 and 4096 shader units (source: own acceptance)
- Apparently 256 bit (GDDR6) memory interface (Source: 3DCenter-Forum)
- Driver code has slight differences to Navi 10, ergo technologically probably something newer than the first Navi chip
My theory is that Navi12 and Navi14 are newer designs than Navi10.
Do you remember what happened with GCN 1.0 (Southern Islands) and GCN 1.1 (Sea Islands). The later was an improvement of the first that was made for console and PC APUs of the era (Kaveri, Kryptos and Liverpool). In the PC market one year after AMD released the RX 7xx0 series they put a card named RX 7790 that used a GPU named Bonaire, it was almost the same GPU than the Xbox One without the custom parts (including ESRAM) and with a GDDR5 interface.
In the RDNA whitepaper they make clear that they have more advanced RDNA designs, for example RX 5700 can only do SIMD within a register upto FP16 but we know that some designs support lower precision ops and a few new instructions.

This means like Sea Islands ISA was an expansion of the Southern Islands ISA then we can be in front of the RDNA 1.1 architecture in the case of Navi 12 and Navi 14. Why I say both? Well… I believe that the configuration of Navi 12 is going to be 2X Navi 14 and we know the configuration of it. In other words, 24 CUs/12 WGP for the Navi 14 and 48 CUs/24 WGP for the Navi 12.
I can add some extra things. Navi12 seems to have instructions related ray tracing, using the new image paths disclosed in the whitepaper. So if they have special units for intersection, that makes it next-generation Navi? (I’d also like to note, the patent says it still can have fixed-function traversal, but it can be bypassed where necessary and only use the intersection accelerators).
https://github.com/llvm-mirror/llvm/commit/eaed96ae3e5c8a17350821ae39318c70200adaf0#diff-779f5f88afd691030e4189856730b76cR25
The intesection unit just calculates if the ray intersects with the primitive but it didn’t traverse the BVH. Then we need another unit for traversing the BVH, in the case of the RT Core of Turing it includes both types of units and we don’t know if RDNA with RT Support will include both but… Do you know where the complete RT unit is in the case of Turing? is the direct neighbour of the texture unit.
Some pundits are talking about the ray tracing in AMD being inferior to the Nvidia solution only because the patent talks in generar about traversal using the shader units, but it seems that the same patent informs about the fixed function traversal unit. I am sure that AMD will go to the full solution instead of the partial one, it will be fun to see all the Astrotufers crying when the AMD solution will go to vs toe against Turing.
Do you believe that AMD is happy with the Navi 10/RX 5700 performance? No, they are not, and the reason is that it exists a gap between Pascal Performance and Turing Performance that they want to cross. Navi 10/RX 5700 was designed to surpass the performance of Pascal (GTX 1080) but it is bit worse than the RTX 2070, they want to tie atleast. In other words, Navi 12 and Navi 14 could be a little better than Navi 10 and it seems that in a few monthts we could have the first AMD GPU with Ray tracing support going directly to compete against the RTX 2080 series.
Also, Navi12 does not appear to have the ldsmisaligned bug in workgroup mode. I believe you’ve mentioned it before, but isn’t this part good for RT in the texture units? According to the ISA , the texture units and ALUs get more bandwidth (chap 10.3) . Of course, this could be a coincidence. In what mode are the rays submited, in what mode are they traversed?
https://github.com/llvm-mirror/llvm/blob/eaed96ae3e5c8a17350821ae39318c70200adaf0/lib/Target/AMDGPU/AMDGPU.td#L142
https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Shader_ISA_7July2019.pdf
This is another direct confirmation that Navi 12 has got ray tracing units.
Was ML denoising the final part to make RT viable for Navi12? They’re receiving praise from Microsoft, and they also have those new units for fp16 and int8.
https://community.amd.com/community/radeon-pro-graphics/blog/2019/07/30/radeon-prorender-at-siggraph-2019-new-integrations-updated-plug-ins-full-spectrum-rendering-availability-and-more
Yes, I talked about it before. In this same post, the ALUs are now going to be able to subdivide the ALU beyond the FP16 and being able to do operations with Int8 and Int4 precisión.
So with all this in mind, is Navi12 bigger than Navi10?
(You can respond in Spanish)
Of course that it is going to be bigger if we take the last information that you gave to me as a source. The total size? About the size of Vega 20… more than 300mm^2 but less than 350mm^2.
Also, Gfx908 is the MI-100.
https://github.com/llvm-mirror/llvm/commit/2209d163988a45e3a563ec775d6d9f068705f426
Then it is the MI-Next.
Thanks for all the info.

En serio, con un bus de 384 bits GDDR6 basta y sobra para tener un chip que compita con la 2080 Ti, ahora, me preocupa el futuro, irremediablemente se tendria que pasar a memorias HBM si no hay mejoras en las memorias tradicionales, como harian la 3080 Ti sino?
Me gustaMe gusta
No se si será un sitio apropiado para hacer esta pregunta, pero como no suelo usar discord y también está relacionado con arquitectura de graficos te la dejo por aquí 😉
Desde muy pequeño siempre he sido muy friki del hardware de las consolas y los arcades de su respectiva epoca (hablo de que cuando yo era pequeño era plena epoca super nes, mega drive y todos flipabamos con la neo geo). Luego llegaron los poligonos y todo empezo a complicarse mucho. Pero hubo una cosa que siempre me llamo mucho la atención. Los primeros hardware 3D de las grandes de la epoca (sega, namco) eran bastante mas potentes que las consolas de la epoca (system 22 y 23, model 1, 2, 3 y todas sus revisiones), hasta que a partir de cierto momento (imagino por unificacion y reduccion de costes) empezo a unificarse el hardware de consolas y arcade (naomi de sega q basicamente era una dreamcast, system 11 y system 12 de namco q eran como una play 1, etc etc..
La pregunta sería…. como de potente era realmente el hardware arcade antes de esa unificacion y porque las consosolas iban tan por dentras? (ademas de por costes). Es decir, la saturn no llegaba al model 2 y ademas era bastante especialita (creo q iba por cuadrados en lugar de triangulos en la geometria de poliginos), y luego el model 3 lei por ahi q era bastante completo pero tambien complejo y peculiar. No podia sega haber hecho algo mas parecido a sus model en la saturn?? tan caro hubiese sido??
Y ya puestos otra pregunta :P, desde el punto de vista de arquitectura, que diferencias habia entre esa epoca y la actual. Ahora se habla mucho de GPU y de APIS pero creo que antes era todo como mas espartano, mas «rudo» no?
Perdona el acoso de preguntas pero te veo que controlas mucho de estos temas y me suele interesar mucho tu opinion. Cuando me respondiste a un comentario sobre jimm keller, snowden y meltdown y spectre me resulto muy interesante!
Saludos.
Me gustaMe gusta
Gracias. I needed some sanity checking on my theories.
I have another theory about RT, denoising and Arcturus.
According to AMD, ML training (denoising) takes tremendous amounts of resources and they don’t plan on making the infrastructure available.
And we know that the MI-Next is Arcturus specifically aimed at ML Training. (Not sure what type they use for training, but it says it also has int16 and int4 support, unlike the Navi cards that only need to do the inference. AMD seems to say FP16 is used for training, but it might vary). “In 2020, we’ll have a new accelerator that will come out that will be: Substantially faster, higher performance, a new architecture as well”
https://github.com/llvm-mirror/llvm/commit/0392724202d6e054771c79a72168a7b3f032960d#diff-983f40a891aaf5604e5f0b955e4051d2R387
AMD are always talking about how they feel that NVIDIA are doing things wrong, that the performance hit is too large. How AMD are the ones controlling the ecosystem pretty much, and that they will make it available when everyone are ready. Could they be waiting for the training part of the ecosystem to be ready?
So would Arcturus not also be a device to train the denoiser on? Of course, since it lacks the proper hardware it can’t render it quickly. So it has to be able to receive and decode (at minimum) images that it can work on and train with. Perhaps why it has a VCN, and also a way to decode jpegs.
https://lists.freedesktop.org/archives/amd-gfx/2019-July/036810.html
So who would get to do the training? Well, Azure has a service currently where they can use cloud computing to create realistic audio, something both upcoming consoles have (Sony publicly, AMD mentioned audio in their blog about scarlett). Microsoft wants to sell services, and Sony already signed a deal with them. So I guess the most logical choice would be to do the training there? The console (or a connection of several consoles?) can be left to do the actual rendering.
I know the idea is nuts, but can they render on a smaller resolution on a console and then upscale the image using ML on a secondary GPU to save in some speed? I know AMD are working on a directML upscaler. This might be a bit too much together with denoise training, as even NVIDIA seems to have given up on DLSS training games…
https://radeon-pro.github.io/RadeonProRenderDocs/rif/filters/ai_upscale.html
Me gustaMe gusta
It clipped the time codes for the youtube links, which would make it impossible to track it… Just writing them here.
15:51
17:28
0:41
4:19
4:01
38:12
Me gustaMe gusta
Urian viu Navi 14? É um novo Chip com 12 DCU (1 desativo) 24 CU 1408 ULA. Este chip sugiere que existe en una versión grande Big com com 2 Shader Complex totalizando 48 CU (44 ativos).
https://www.techpowerup.com/gpu-specs/amd-navi-14.g919
Provavelmente uma APU con este chip teria dimensões de 360 à 370mm ^ 2 dependiendo del arreglo de dos núcleos, ficando de acordo con tamanho del chip do One X o das bolaschas esticadas Ryzen V1000
Isto me lembra o Hawaii…
Me gustaMe gusta