I wrote a DFS solution that ran in 2ms for the n-Queens II problem. This executes in ~160us on my mac, so I wondered if you were doing perhaps 10 iterations and taking the average. Out of curiosity, I wrote an implementation that just stores the expected answers in an array and returns the result. This also took 2ms. On my mac, these solutions run in 160us and 42ns respectively. This suggests that there is just 2ms of overhead.
Please take a look at the overhead. Also, perhaps if a solution is particularly fast, you could re-run it, but time 1,000 executions. If it's set up time, (instrumentation?) perhaps run the test twice, and time the second.