I was wondering why dfs is about 2x faster than bfs? Aren't they all O(n)?

Because typical DFS solutions are recursive and therefore they use the hardware stack (or the VM stack which probably uses the hardware one anyway). It really matters. As an experiment, here is Java code for an iterative DFS solution that runs in 18 ms, which is about the same as BFS. Note that using a typical
int[][] MOVES
array instead of that ugly switch allows to get rid of the wholeMove
class, but then it runs in 21 ms. Additionally, the stack could be allocated once and reused, but that only speeds up about 1 ms or so and it's silly.public void wallsAndGates(int[][] rooms) { for (int i = 0; i < rooms.length; ++i) { for (int j = 0; j < rooms[i].length; ++j) { if (rooms[i][j] == 0) { searchForRooms(rooms, i, j); } } } } private static void searchForRooms(int[][] rooms, int i, int j) { Deque<Move> stack = new ArrayDeque<>(); stack.push(new Move(i, j)); while (!stack.isEmpty()) { Move m = stack.peek(); if (m.hasNext()) { Move next = m.next(rooms, stack.size()); if (next != null) { rooms[next.i][next.j] = stack.size(); stack.push(next); } } else { stack.pop(); } } } private static class Move { private static final int DOWN = 0, UP = 1, RIGHT = 2, LEFT = 3, DONE = 4; final int i, j; int move = DOWN; Move(int i, int j) { this.i = i; this.j = j; } boolean hasNext() { return move != DONE; } Move next(int[][] rooms, int distance) { Move next = null; switch (move) { case DOWN: next = i < rooms.length  1 && rooms[i + 1][j] > distance ? new Move(i + 1, j) : null; break; case UP: next = i > 0 && rooms[i  1][j] > distance ? new Move(i  1, j) : null; break; case RIGHT: next = j < rooms[i].length  1 && rooms[i][j + 1] > distance ? new Move(i, j + 1) : null; break; case LEFT: next = j > 0 && rooms[i][j  1] > distance ? new Move(i, j  1) : null; break; } ++move; return next; } }

I think BFS is better in worst case. However, due to the particular test cases, DFS seems faster.
Let's think about a square matrix
n x n
where all the cells are rooms except one gate at the upper left corner. Both DFS and BFS will visit the cells and assign the distance once. For this case, DFS using system stack will be faster than BFS using Deque class.Now, let's think about a square matrix where we have a gate in each
3 x 3
square submatrix. Something like this.INF INF INF INF INF INF INF INF INF INF INF INF 0 INF 0 INF 0 INF 0 INF 0 INF INF INF INF INF INF INF INF INF INF INF INF INF 0 INF 0 INF 0 INF 0 INF 0 INF INF INF INF INF INF INF INF INF INF INF INF INF 0 INF 0 INF 0 INF 0 INF 0 INF INF INF INF INF INF INF INF INF INF INF INF INF 0 INF 0 INF 0 INF 0 INF 0 INF INF INF INF INF INF INF INF INF INF INF INF INF 0 INF 0 INF 0 INF 0 INF 0 INF INF INF INF INF INF INF INF INF INF INF INF
In this case, BFS will still visit the rooms and calculate the distance once, time is O(n^2). However, DFS will calculate the distance again and again. It's time complexity is about O(n^4).
But apparently we do not have such test case.
