Meet 'DRESS': a long-view language model (LVLM) that aligns and interacts with humans through natural language feedback
Large Vision and Language Models, or LVLMs, can interpret visual signals and provide simple responses for users to interact with. ...