I'm trying to run your benchmark with my binary-parser, i noticed that in your benchmark there are lots of single combinator benchmarks which only repeat once, this give library like binary a huge disadvantage because binary do the offset counting in outer loop, see haskell/binary#124 for detail.
You can argue that's the part of the benchmark, but I think maybe we should at least add replicateM 10 or something to make the bench fair.
Another problem is cereal's getByteString is doing a copy which screws the result, because creating a pinned memory is way too slow, we should just use getBytes here.