Device Agnostic Pipeline #140

AD2605 · 2024-10-02T09:12:46Z

Adds CollectiveMMA and a CollectiveBuilder API for device agnostic pipeline.
This piggybacks off of the SM_70 2 stage gemm pipeline, with blocking in SMem and RMem, to get somewhat performant gemm on any device.

rolandschulz

What's the purpose of this PR?
Have an example a user can try out on any HW and get decent performance?
We don't expect anyone to really use this for anything right?
The GEMM isn't really device agonistic, is it? It's more that likely any GPU a user would run on has features assumed by the SM_70 pipeline.
If so would it be sufficient to only add the example, but not the new builder and instead directly use the SM_70 builder as a baseline for most current GPUs?

rolandschulz · 2024-10-17T03:31:53Z

examples/sycl/device_agnostic/device_agnostic_collective_builder.cpp

+  /// Prints the usage statement.
+  std::ostream & print_usage(std::ostream &out) const {
+
+    out << "PVC GEMM Example\n\n"


still "PVC"

rolandschulz · 2024-10-17T03:32:07Z

examples/sycl/device_agnostic/device_agnostic_gemm.cpp

+  /// Prints the usage statement.
+  std::ostream & print_usage(std::ostream &out) const {
+
+    out << "PVC GEMM Example\n\n"


rolandschulz · 2024-10-17T03:33:07Z

examples/sycl/device_agnostic/device_agnostic_gemm.cpp

+  // Run examples
+  //
+
+  // The KernelHardwareInfo struct holds the number of EUs on the GPU with a given device ID. This


"EU" isn't a general term

rolandschulz · 2024-10-17T03:36:44Z

include/cutlass/gemm/collective/builders/device_agnostic_mma_builder.inl

+      #endif
+
+      using TiledMMA = TiledMMA<MMA_Atom<UniversalFMA<ElementAccumulator, ElementA, ElementB, ElementAccumulator>>,
+                            Layout<Shape<_4, _4, _1>>>;


why this shape?

This would result in a work-group size of 16, it's small enough that it would run on any device, hence the size. No other reason in particular

AD2605 · 2024-10-17T11:19:08Z

The GEMM isn't really device agonistic, is it? It's more that likely any GPU a user would run on has features assumed by the SM_70 pipeline.

The SM70 mainloop is device agnostic, it implements a tiled GEMM algorithm, with data being blocked in shared memory and registers. With us passing the UniversalCopy and UniversalMMA, this would become a truly device agnostic gemm.

If so would it be sufficient to only add the example, but not the new builder and instead directly use the SM_70 builder as a baseline for most current GPUs?

SM70 does not have a collective builder. Also, I believe the idea is that the API accepts something like a DeviceAgnostic arch, instead of we relying on the user to actually understand that the sm_70 pipeline could potentially be turned to Device Agnostic one.

AD2605 added 6 commits October 2, 2024 09:25

initial changes for device_agnostic pipeline

f54eaf2

add a TODO before raising draft PR

7118cec

fix compilation issues part 1

ff8f6f6

fix bugs in device_agnostic_gemm

35266e7

fix issues with collective builder API for device agnostic

0ca4975

refactor

bba0ec4

AD2605 marked this pull request as ready for review October 9, 2024 14:30

Merge branch 'sycl-develop' into atharva/device_agnostic_pipeline

b7ca87f

AD2605 changed the title ~~[DRAFT] Device Agnostic Pipeline~~ Device Agnostic Pipeline Oct 10, 2024

rolandschulz reviewed Oct 17, 2024

View reviewed changes

remove instances of PVC from device agnostic example

5a71c5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Device Agnostic Pipeline #140

Device Agnostic Pipeline #140

AD2605 commented Oct 2, 2024

rolandschulz left a comment

rolandschulz Oct 17, 2024

AD2605 Oct 17, 2024

rolandschulz Oct 17, 2024

AD2605 Oct 17, 2024

rolandschulz Oct 17, 2024

AD2605 Oct 17, 2024

rolandschulz Oct 17, 2024

AD2605 Oct 17, 2024

AD2605 commented Oct 17, 2024

Device Agnostic Pipeline #140

Are you sure you want to change the base?

Device Agnostic Pipeline #140

Conversation

AD2605 commented Oct 2, 2024

rolandschulz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AD2605 commented Oct 17, 2024