Interactive digital agents (IDA) take advantage of digital environments APIs to perform tasks in response to user applications. While the IDA driven by large language models (LLMS) adjusted by instructions can react to the feedback of interface invocations in multiple steps exchanges, they have not been trained in their respective digital environments. The above methods achieve less than half of the tasks at sophisticated reference points such as Appworld. We present a reinforcement learning approach (RL) that trains directly in its target environments. We formalize this training as a partially observable Markov decision process and derive the loop, an efficient data and memory of the optimization of proximal policy. Loop uses non Value Network and maintains exactly a copy of the underlying LLM in memory, which makes its implementation direct and as efficient in memory as a single LLM adjustment. An agent of 32 billion parameters trained with loop in the APPWORLD environment exceeds the OpenAi O1 much larger agent in 9 percentage points (15% relative). As far as we know, this is the first report reported by first RL that interact with an environment of multiple multiple domain applications through direct calls of API. Our analysis sheds light on the effectiveness of RL in this area, which shows that the agent learns to consult the documentation of the API, avoid unjustified assumptions, minimize the conspression and recover from the setbacks.