A new artificial intelligence (AI) benchmark called 'OpenCQA' answers open questions about graphics using descriptive texts

Discovering and communicating key information in data using data visualization methods like bar charts and line charts is essential to many activities, but can be time consuming and labor intensive. Data analysis and communicating key findings are two common uses of charts. Analysis of visual representations is frequently used to provide explanations for problems that lack a clear yes or no answer. It takes a lot of mental and perceptual energy to answer questions like this. Therefore, doing so can take time and effort.

To address these issues, the Chart Query Answering (CQA) task was developed to accept a graphical and natural language question as input and produce an answer as output. Many studies have been done on CQA in recent years. The difficulty, however, is that most data sets only include examples where the answer is a single word or phrase.

Since few data sources with graphics and related textual descriptions are freely available, they have yet to try to create data sets consisting of open-ended questions and answer statements written by annotators. Therefore, the researchers used graphics from Pew Research (pewresearch.org), where experts use a variety of graphics and summaries to generate articles on market research, public opinion, and social concerns.

A total of 7,724 sample data sets were generated by adjusting the number of abstract words in the 9,285 graph-abstract pairs extracted from around 4,000 articles on this website. A total of 7724 records were included as part of the sample. The many charts and graphs in the data set cover various topics, from politics and economics to technology and more.

Four questions can be asked in OpenCQA, and the output text of the task acts as the answer.

To identify, ask questions about a given target within a set of bars.
Graph comparison questions are in the “compare” category.
One option is to summarize the data graphically, which is what the question wants you to do.
Undirected queries that require conclusions across the entire graph.

Models used as starting point

The new dataset was developed with reference to the following seven pre-existing models:

Improved performance over the standard BERT model by adding Directed Attention Layers, abbreviated as BERTQA
Models such as ELECTRA’s self-supervised representational learning and GPT-2’s transformer-based text generation can anticipate the next word in a text based on words that have already been used.
Models like BART, which use a common encoder-decoder transformer framework, have been shown to achieve state-of-the-art performance on text-producing tasks such as summarizing.
Models proposing a document-based generation task in which the model enhances text generation with information provided by the document include (a) T5, a unified encoder-decoder transformer model for converting linguistic tasks into a format text to text; (b) VLT5, a T5-based framework that unifies Vision-Language tasks as text generation subject to multimodal input; and (c) CODR, a model that proposes a document-based generation task.

Challenges and limitations

Many ethical considerations arose for researchers when collecting data and recording it. They used only freely accessible graphics found on publicly available resources that allow the dissemination of downloaded information for educational purposes so as not to infringe the intellectual property of the original producers of the graphic. Users may use Pew Research Center data as long as proper credit is given to the organization or no other entity is named as the source.

Researchers have speculated that the models may be used to spread false information. While the current model results may seem natural, they include several inaccuracies discussed in the cited study. Because of this, the general public could be misinformed if these erroneous model results are published in their current form.

However, due to the details of the assignment, only Pew Research (pewresearch.org) data can be used in the analysis, which restricts the data set. In the future, if more relevant data can be accessed, the dataset could be expanded. The researchers also ignored long-range sequencing models like the linformer and the recently suggested memorized transformer.

Since you can only work with the automatically produced OCR data, which is often noisy, the job setup is also restricted. To better feed the data into the model, future techniques can focus on refining the OCR extraction for this specific job.

In conclusion, OpenCQA is proposed as a method to provide detailed answers to free-form graph queries. At the same time, they present several cutting-edge standards and metrics. The test results show that while more advanced generative models can generate natural-sounding language, there is still a lot of work to be done before they can provide valid arguments that use both numbers and logic.

review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our reddit page, discord channel, Y electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.

Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.