>On that note, what do you suggest for GPU programming?
MPI solutions tend to work with multiple processes, so they are best used when running calculations on a distributed network of many computers. You can use MPI on a single PC, but there are options which are easier to code and are more performant.
For a GPU, using MPI would make sense to sync between different GPUs (or on a distributed network), but really have nothing to do with writing a program that runs on one GPU.
For generic GPU programming, you realistically have only two options - CUDA and OpenCL. Please keep in mind that NVIDIA pays me, so I'm biased.
The showstopper:
If you need the code you write to work on AMD cards, CUDA is out of the question. Not entirely, they have this new thing coming up, but it isn't released yet, and will only work for their $5k+ cards, etc.
If you'll be working on NV cards anyway, CUDA is objectively better supported (we barely update openCL), better documented and you'll find more code and examples for it online. Also, better utility libraries.
Subjectively it may be easier to work with, as it integrates into C++ better than (current) openCL.
If you are not sure whether to go NV or AMD, a few helpers:
NV if you're doing anything related to Neural Networks / Deep learning
NV if you're doing video encoding/decoding
AMD if you are performing a bunch of simple calculations, such as bitcoin mining, crypto stuff, etc
NV if you are writing relatively complex algorithms on the GPU.
AMD cards can perform more raw operations per second than same tier NV cards, but they start getting more and more overhead the more complex your task becomes.