PyTorch Neural Quirks
PyTorch Neural Quirks
Explore implied dimensionality and the channel concept in PyTorch.
Join the DZone community and get the full member experience.Join For Free
PyTorch uses some different software models than you might be used to, especially if you migrate to using it from something like Keras or TensorFlow. This first is, of course, the Tensor concept (something shared with TensorFlow, but not so obvious in Keras). The second is the nn.Module hierarchy you need to use when building a particular network. The final one is implied dimensionality and the channel concept. Of these, I'd really like to focus on the latter its own article, so let's get the first two out of the way first.
Tensors in PyTorch are really just values, and they mirror many of the methods available on NumPy arrays — like ones(), zeros(), etc. They have specific naming conventions on instances too. For example, Tensor::add_() will add to the calling addend and adding in place, while Tensor::add() will return a new Tensor with the new cumulative value. They support list-like indexing semantics, slicing, and comprehensions as well. They convert easily too and from NumPy arrays as well via the torch.from_numpy() and Tensor::numpy() methods. They also have a sense of location and are affiliated with a specific device, and this is where things can get tricky.
Like most of you, I don't develop code on a box with a GPU. I develop using CPU and then port to a high powered GPU for training and evaluation after the development work is out of the way. When using a CPU, you don't need to worry about device affinity. Things aren't so simple when using a GPU, but they are at least easy to fix when you see where the problems are.
So, in a nutshell, CUDA Tensors can't be manipulated by CPU in primary memory. The underlying datatype for CUDA Tensors is CUDA and GPU specific and can only be manipulated on a GPU as a result. In order to shift a CUDA Tensor to a CPU datatype, you need to call the Tensor::cpu() method. The problem is that it's tough to identify where and when this needs to happen, especially if you have a development cycle like mine where you spend most of your time on CPU.
I keep running into this when I'm taking predicted and actual values from a prediction run and creating a confusion matrix. I'll use scikit-learn for this kind of thing usually. If you do, and you don't call the Tensor::cpu() method prior to processing the results or you convert to a NumPy array or a list, you'll end up getting an error regarding not being able to use a GPU type for memory-based datatypes. Call the Tensor::cpu() method on the tensor you're working with, and that'll clear everything up. As a general rule, after running a model, convert output types to CPU prior to evaluating, and you should be fine.
Second, the nn.Module. So in Keras, you assemble a network via a model of some kind (say, a Sequential model) to which you then add layers. PyTorch is different. nn.Module supports PyTorch's more detailed philosophy for model development. In PyTorch, gradient calculations are handled for you, but you need to explicitly define how things are processed by the network as you propagate values forward. nn.Module will use this processing definition to backpropagate loss, so you don't have to. But it's certainly a bit of a departure that takes some getting used to.
With these out of the way, we'll look more closely at network definitions and dimensionality next.
Opinions expressed by DZone contributors are their own.