Large pre-trained generative transformers have demonstrated exceptional performance on various natural language generation tasks, using large training datasets to capture the logic of human language. However, tailoring these models for certain applications through fine-tuning poses significant challenges. The computational efficiency of fine-tuning is highly dependent on the model size, making it expensive for researchers to work with large models. Fine-tuning on smaller datasets poses a risk of catastrophic forgetting, where the model overfits to a specific task domain and loses important knowledge gained during pre-training. Because of this problem, reasoning skills such as compositional generalization and common sense face problems when evaluating the model.
Existing methods include fast tuning, which involves adding trainable tokens or vectors to the input and optimizing their embeddings. This method allows adaptation to new tasks with minimal data, reducing the risk of catastrophic forgetting. The second method is the NeuroAlly-Decomposed Oracles (NADO) algorithm, which provides an intermediate point through a smaller transformer model to control the base model without changing its parameters. However, questions remain about its optimal training practices for significant distribution discrepancies and the reduction of additional costs associated with training the NADO module. The last method is the GeLaTo algorithm, an innovative framework for improving autoregressive text generation by integrating tractable probabilistic models (TPMs).
A team of researchers from the University of California, Los Angeles, amazon AGI, and Samsung Research America have introduced norm-based disentangled neural decomposition oracles (DiNADO), an improved parameterization of the NADO algorithm. It improves the convergence of NADO during supervised fine-tuning and later stages and focuses on the uniqueness of global parametric optima. The inefficiency of gradient estimation is handled by using NADO with sparse signals from the control signal function, showing how to improve the efficiency of sample and gradient estimation. Furthermore, a natural combination of DiNADO with approaches such as LoRA enables base model updates through a contrastive formulation and enhances the capacity of the NADO model while improving inference-time performance.
DiNADO is evaluated using two main tasks: Formal Machine Translation (FormalMT) and Lexically Constrained Generation (LCG). In the case of FormalMT, a formal baseline and a binary classifier are used to approximate the formality score. The LCG task uses the CommonGen dataset, which evaluates the compositional generalization capabilities and common-sense reasoning of text generation models. The experiments are divided into two parts:
- Results using a GPT-2-Large base distribution, evaluated for generation quality and controllability.
- A sample efficiency study on how different target designs and reweighting techniques improve NADO sample efficiency.
The results demonstrate that DiNADO-Soft outperforms DiNADO-Hard, as DiNADO-Hard’s strict forward consistency can affect the learning of the oracle signal. Higher-capacity NADO modules offer greater flexibility and controllability with DiNADO-Merge, showing more generalizable performance. Furthermore, DiNADO’s norm disentanglement helps control the regularization term below 0.5, ensuring that updates to the R function consistently improve the composite distribution. This is in contrast to plain NADO, where divergence in the regularization term can affect performance improvement, highlighting DiNADO’s superior training dynamics and effectiveness on controlled text generation tasks.
In summary, the researchers introduced DiNADO, an improved parameterization of the NADO algorithm. One of the main advantages of DiNADO is its compatibility with fine-tuning methods such as LoRA, allowing for a more capable NADO variant. Furthermore, the researchers performed a theoretical analysis of the flawed designs of the original NADO implementation and suggested specific solutions. This paper brings valuable insights and improvements to the field of controllable language generation, which could open new avenues for more efficient and effective text generation applications.
Take a look at the amazon.science/publications/dinado-norm-disentangled-neurally-decomposed-oracles-for-controlling-language-models” target=”_blank” rel=”noreferrer noopener”>Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and LinkedInJoin our Telegram Channel. If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
Sajjad Ansari is a final year student from IIT Kharagpur. As a technology enthusiast, he delves into practical applications of ai, focusing on understanding the impact of ai technologies and their real-world implications. He aims to articulate complex ai concepts in a clear and accessible manner.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>