How does Hedgehog and Hypothesis differ in their shrinking strategies?
The article uses the words "integrated" vs. "internal" shrinking.
> the raison d’être of internal shrinking: it doesn’t matter that we cannot shrink the two generators independently, because we are not shrinking generators! Instead, we just shrink the samples that feed into those generators.
Besides that it seems like falsify has many of the same features like choice of ranges and distributions.
> The key insight of the Hypothesis library is that instead of shrinking generated values, we instead shrink the samples produced by the PRNG.
Hedgehog loses shrink information when you do a monadic bind (Gen a -> (a -> Gen b) -> Gen b). Hypothesis parses values out of the stream of data generated by the PRNG, so when it "binds", you are still just consuming off that stream of random numbers, and you can shrink the stream to shrink the generated values.
If I understand correctly, they approximate language of inputs of a function to discover minimal (in some sense, like "shortest description length") inputs that violate relations between inputs and outputs of a function under scrutiny.
Really? Your examples seem the opposite. I am left immediately thinking, "hm, is it failing on a '!', some sort of shell issue? Or is it truncating the string on '#', maybe? Or wait, there's a space in the third one, that looks pretty dangerous, as well as noticeably longer so there could be a length issue..." As opposed to the shrunk version where I immediately think, "uh oh: one of them is not handling an empty input correctly." Also, way easier to read, copy-paste, and type.
> As opposed to the shrunk version where I immediately think, "uh oh: one of them is not handling an empty input correctly."
I agree that non-empty strings are worse, but unfortunately `("", "", "", "")` wouldn't only make me think of empty strings; e.g. I'd wonder whether duplicate/equal values are the problem.
fails_on_empty_third_arg(
a = "", # or any other generated value
b = "", # or any other generated value
c = "",
d = "", # or any other generated value
)
The special value doesn't stand out, though. All three examples I gave were what I thought skimming his comment before my brain caught up to his caveat about an empty third argument. The empty string looked like it was by far the most harmless part... Whereas if they are all empty strings, then by definition the empty string stands out as the most suspicious possible part.
I care about the edge between "this value fails, one value over succeeds".
I wish shrinking were fast enough to tell me if there are multiple edges between those values.
- Generate a random number N for the size (maybe restricted to some Range)
- Generate N `Char` values, by using a random number for each code point.
- Combine those Chars into a string
falsify runs a generator by applying it to an infinite binary tree, with random numbers in the nodes. A generator can either consume a single number (taken from the root node of a tree), or it can run two other generators (one gets run on the left child, the other gets run on the right). Hence the above generator would use the value in the left child as N, then run the "generate N Chars" generator on the right child. The latter generator would run a Char generator on its left child, and an 'N-1 Chars' generator on its right child; and so on.
To shrink, we just run the generator on a tree with smaller numbers. In this case, a smaller number in the left child will cause fewer Chars to be generated; and smaller numbers in the right tree will cause lower code-points to be generated. falsify's tree representation also has a special case for the smallest tree (which returns 0 for its root, and itself for each child).
The article uses the words "integrated" vs. "internal" shrinking.
> the raison d’être of internal shrinking: it doesn’t matter that we cannot shrink the two generators independently, because we are not shrinking generators! Instead, we just shrink the samples that feed into those generators.
Besides that it seems like falsify has many of the same features like choice of ranges and distributions.
> The key insight of the Hypothesis library is that instead of shrinking generated values, we instead shrink the samples produced by the PRNG.
Hedgehog loses shrink information when you do a monadic bind (Gen a -> (a -> Gen b) -> Gen b). Hypothesis parses values out of the stream of data generated by the PRNG, so when it "binds", you are still just consuming off that stream of random numbers, and you can shrink the stream to shrink the generated values.
Here is a talk that applies the Hypothesis idea to test C++: https://www.youtube.com/watch?v=C6joICx1XMY . Discussion of PBT implementation approaches begins at 6:30.
If I understand correctly, they approximate language of inputs of a function to discover minimal (in some sense, like "shortest description length") inputs that violate relations between inputs and outputs of a function under scrutiny.
Suppose I have a function which takes four string parameters, and I have a bug which means it crashes if the third is empty.
I'd rather see this in the failure report:
("ldiuhuh!skdfh", "nd#lkgjdflkgdfg", "", "dc9ofugdl ifugidlugfoidufog")
than this:
("", "", "", "")
> ("ldiuhuh!skdfh", "nd#lkgjdflkgdfg", "", "dc9ofugdl ifugidlugfoidufog")
I would prefer LazySmallcheck's result, which would be the following:
Where `_` indicates that part of the input wasn't evaluated.I agree that non-empty strings are worse, but unfortunately `("", "", "", "")` wouldn't only make me think of empty strings; e.g. I'd wonder whether duplicate/equal values are the problem.
I guess if we were even more clever we could get to something more like (…, …, "", …).
[2] https://github.com/HypothesisWorks/hypothesis/pull/3555
- Generate a random number N for the size (maybe restricted to some Range)
- Generate N `Char` values, by using a random number for each code point.
- Combine those Chars into a string
falsify runs a generator by applying it to an infinite binary tree, with random numbers in the nodes. A generator can either consume a single number (taken from the root node of a tree), or it can run two other generators (one gets run on the left child, the other gets run on the right). Hence the above generator would use the value in the left child as N, then run the "generate N Chars" generator on the right child. The latter generator would run a Char generator on its left child, and an 'N-1 Chars' generator on its right child; and so on.
To shrink, we just run the generator on a tree with smaller numbers. In this case, a smaller number in the left child will cause fewer Chars to be generated; and smaller numbers in the right tree will cause lower code-points to be generated. falsify's tree representation also has a special case for the smallest tree (which returns 0 for its root, and itself for each child).