Contemporary text -to -voice solutions for accessibility applications can generally be classified into two categories: (i) Parametric speech synthesis based on devices (SPS) or units selection (USEL) and (II) neural TTS based on The cloud. SPSS and Usel offer low latency and low disk footprint at the expense of naturalness and audio quality. Cloud -based Neural Systems provide significantly better audio and naturalness quality, but they go back in terms of latency and response capacity, which makes them not practical for real world applications. More recently, neural TTS models were displayed to run on portable devices. However, latency remains higher than SPSS and Usel, while the disc footprint prohibits the pre -installation of multiple voices at the same time. In this work, we describe a high quality compact TTS system that reaches latency in the order of 15 ms with low disc crowd. The proposed solution is capable of running on low power devices.