Recent advances in text-to-image generation have made it possible to create detailed graphics from simple natural language descriptions. Results using models like Stable Diffusion and DALL-E often look like real images or works of art created by humans. These models do not produce the best raster images for scientific figures, often produced at low resolutions. Scientific figures are essential to scientific study because they help researchers explain complicated concepts or communicate important discoveries. Raster graphics need improvement in these areas because they require a high level of geometric precision and text that can be read even in small print. As a result, many academic conferences promote vector graphics, which divide data into geometric shapes, allow text search, and often have small file sizes.
The field of automated vector graphics creation is also expanding, although the available approaches have their own drawbacks. They mostly produce low-level path components in scalable vector graphics (SVG) format, either not retaining precise geometric relationships or producing results with a low degree of complexity, such as individual icons or typographic letters. Researchers from the University of Bielefeld, the University of Hamburg and the University of Mannheim and the University of Bielefeld are investigating the use of visual languages, which abstract from lower-level vector graphics formats, offering them high-level structures that can be compiled to solve these issues. restrictions.
Linguistic models suggest that it is possible to acquire these languages and use them to perform simple tasks. Still, the extent to which they can produce scientific figures is being determined. In this work they focus on the TikZ graphic language due to its expressiveness and emphasis on science, which allows the production of complicated figures with just a few instructions. They want to know if language models can automatically create scientific figures based on image captions, similar to text-to-image creation, and capture the subtleties of TikZ. Not only can this increase productivity and promote inclusivity (helping academics less familiar with programming-like languages, such as social scientists), but it could also improve teaching by producing customized TikZ examples. TEX Stack Exchange is an example of this in use, with TikZ being the most commonly discussed topic there, with about 10% of queries answered.
Its main contributions are:
(i) As part of their AutomaTikZ project, they developed DaTikZ, which has over 120,000 paired TikZ drawings and captions and is the first large-scale TikZ dataset.
(ii) The LLaMA large language model (LLM) in DaTikZ is fine-tuned and its performance is contrasted with that of general-purpose LLMs, in particular GPT-4 and Claude 2. Automatic and human evaluation finds that the scientific figures produced by LLaMA adjusted are more similar to figures created by humans.
(iii) They continue to work on CLiMA, an extension of LLaMA that includes multimodal CLIP additions. With this improvement, CLiMA can now more easily understand input subtitles, improving text and image alignment. Additionally, it allows you to use photos as additional inputs, which further improves speed.
(iv) They also show that all models provide original results and have few memorization problems. While LLaMA and CLiMA frequently provide degenerate solutions that maximize text-image similarity by overtly duplicating the input title in the output image, GPT-4 and Claude 2 often produce simpler results.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Data Science and artificial intelligence at the Indian Institute of technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around it. She loves connecting with people and collaborating on interesting projects.
<!– ai CONTENT END 2 –>