Arguably the key reason for the success of deep neural networks is their ability to
autonomously form non-linear combinations of the input features, which can be used in subsequent
layers of the network. The analogon to this capability in inductive rule learning is to learn a
structured rule base, where the inputs are combined to learn new auxiliary concepts, which can then
be used as inputs by subsequent rules. While it is clear that this is sufficient from a logical
point of view because every logical expression can be reduced to an equivalent DNF expression, it
could nevertheless be the case that more structured representations, which form deep theories by
forming intermediate concepts, could be easier to learn, in very much the same way as deep neural
networks are able to outperform shallow networks, even though the latter are also universal
function approximators. However, there are several non-trivial obstacles that need to be overcome
before a sufficiently powerful deep rule learning algorithm could be developed and be compared to
the state-of-the-art in inductive rule learning. In this talk, we want to draw attention to this
unsolved problem, and show some preliminary results where, for the lack of a powerful deep rule
learning algorithm, we empirically compare deep and shallow rule sets that have been optimized with
a uniform general mini-batch based optimization algorithm. In our experiments on both artificial
and real-world benchmark data, deep rule networks outperformed their shallow counterparts, which we
take as an indication that it is worthwhile to devote more efforts to learning deep rule structures
from data.