I like and use both PyTorch and TensorFlow. You’re entirely correct that TensorFlow is far more verbose than PyTorch, but Keras remedies this (for many problems). I’d also add that PyTorch has a narrower scope than PyTorch and because of this introduces problems when scaling or distributing (PyTorch’s flexible broadcasting isn’t without trade-offs). Maybe there’ll be a convergence of Caffe 2 and PyTorch to remedy that problem.
For me, Keras is the least verbose of Tensorflow and PyTorch, however I find it too sparse and there's too much "magic" happening under the nicely done API. I have found PyTorch to be the happy half-way point between Tensorflow and Keras in terms of verbosity.
I’m curious, what parts do you find too magical? I’ve been trying to help with this problem (clarifying or surfacing features that are too magical without relinquishing Keras’ commitment to “convention over configuration”), so your (or others) feedback is welcome!