Two early thoughts as I am reading:
1. I agree with the description of empirical work as a "craft". When I teach, I always tell students that econometrics (in practice) is as much "art as it is science". And, I try to point and what points in an analysis we have left the science realm and have entered the art realm.
2. The "system" pillar and "vernacular knowledge" seem incontrovertible to me, but the "scale" part is less clear. Yes, we don't want to "sweat the small stuff". But, this requires knowledge that the "stuff" is "small". Too often researchers appeal to the "smallness" of a problem as a justification for ignoring it, but in fact have no idea if the problem is small or not. Thus, ignoring "small" stuff runs the very risk of inducing a lack of credibility. Two common examples of this are issues of measurement error and the choice between LPM/probit/other binary choice models.
Final comment relating to the chapter on hypothesis testing. I agree with all that is said. However, I think the point could be made more clearly by emphasizing that NHST entails testing hypotheses concerning parameters, not models. Thus, rejecting or failing to reject any null - active or passive - cannot prove a model is correct. At best, it can show that the data are consistent with a particular model. This is the phrasing that was engrained in me during grad school: "We never prove a model. We either disprove it or show that the data are consistent with it." That said, I agree with your point that how we structure the null and alternative may be more informative about whether the data are consistent with our particular model of interest. So, my comment is really just about another way of phrasing your point.
Smaller comment related. I think there are some (not a ton) of examples of papers that use active nulls. A classic example, if I recall correctly, is Townsend 1994 on village risk sharing. The model is that with perfect village risksharing, household consumption is equal to per capita income in the village and own income is irrelevant. So, regressing own consumption on individual and village mean income should give coeffs of 0 and 1, respectively. This is tested, and would represent an active null in your terms.
I agree with Dan Millimet that ignoring small stuff without knowing it is small is a problem. And I agree with Sheridan Grant that my techniques are intended to help you figure that out. I'm just agreeable all around!
1. I think where the book diverges from "art vs. science" is its emphasis on credibility. I think most (social) scientists wouldn't say that the artistic aspects of an analysis lend it credibility, and they might also think of the artistic choices as not being of much consequence. Taking up your binary outcome model example, it seems like folks tend to choose probit vs. logistic regression based on personal preference, but one is probably truly more appropriate for any given analysis. "Craftsmanship" points out that there are better and worse ways, more and less correct ways, of doing the "art" part of the analysis. 2. I view the "scale" chapter as demonstrating how to decide where the "system" ends and begins. There's always a tradeoff to modeling more stuff, but how can you know whether modeling something is worthwhile? Also, the chapter is about how you can know that something is small, rather than just assuming it away. I agree that folks often ignore "small" stuff that might actually be pretty important on the basis of convenience, but the books approach is the opposite: rather than centering your analysis around a model or theory and then assuming away anything inconvenient, you use scale rigorously up front to determine the boundaries of your system.