Modern ensemble learning algorithms based on gradient boosted decision trees such as XGBoost, LightGBM, or CatBoost are known to outperform other machine learning methods, even deep neural networks, for certain use cases. For edge computing, tree-based methods have the attractive property of shifting a large part of the computational complexity from arithmetic operations into memory. However, existing hardware architectures are often exclusively based on on-chip SRAM storage and thus provide very fast throughput and latency for prediction but are inflexible and cannot be adapted to other tasks at runtime. In this work, we propose a flexible hardware architecture for inference of boosted decision trees employing external DRAM for storage, greatly reducing the on-chip resources required and making it suitable for integration into system on a chip (SoC) devices. We present synthesis results targeting an Intel Agilex 7 AGF014 FPGA, highlighting the low resource usage, and show throughput measurements demonstrating the competitive performance even when processing speed is limited by external memory transfer speed. This enables edge AI capabilities even on devices with very limited logic resources.