Long indications present a significant challenge for practical LLM-based systems that need to operate with low latency and limited resources. We investigate fast compression for zero-shot dialog systems that learn to use invisible APIs directly in the context of their documentation, which can consume hundreds of fast tokens per API. We build on a recently introduced approach (Mu et al., 2023) that learns to compress the message into a few “essential token” activations during tuning. However, this simple idea is not effective in compressing API documentation, resulting in low precision compared to the baseline using an uncompressed message. In this work, we introduce two important improvements. First, we specialize essential tokens for different hierarchies within an API: we use a Gist_arg token to compress an argument and a Gist_value token to compress an acceptable value of a categorical argument. We then dynamically reveal the Gist_value tokens only when they are needed. Second, we add a reconstruction loss to predict the API documentation from the essential tokens. Across multiple API calling tasks, our proposed system maintains the simplicity, efficiency, and large compression factor (20 times in SGD) of the essential token approach while achieving significantly higher accuracy.