Sponsored content
As organizations strive to leverage generative ai, they often encounter a gap between its promising potential and realizing real business value. At Astronomer, we’ve seen firsthand how integrating generative ai (GenAI) into operational processes can transform businesses. But we’ve also observed that the key to success lies in orchestrating the valuable business data needed to power these ai models.
This blog post describes the critical role of data orchestration in implementing generative ai at scale. I will highlight real customer use cases where Apache Airflow, managed by Astronomer's starhas been instrumental in successful applications, before concluding with useful next steps to get started.
What is the role of data orchestration in the GenAI stack?
Generative ai models, with their extensive prior knowledge and impressive ability to generate content, are undoubtedly powerful. However, their true value emerges when they are combined with the institutional knowledge captured in their proprietary data sets and operational data streams. Successful implementation of GenAI involves orchestrating workflows that integrate valuable data sources from across the enterprise into ai models, grounding their results in relevant, up-to-date business context.
Integrating data into GenAI models (for inference, prompting, or tuning) involves complex, resource-intensive tasks that need to be optimized and executed repeatedly. Data orchestration tools provide a framework—at the core of the emerging ai application stack—that not only simplifies these tasks, but also improves the ability of engineering teams to experiment with the latest innovations emerging from the ai ecosystem.
Task orchestration ensures that computational resources are used efficiently, that workflows are optimized and tuned in real time, and that deployments are stable and scalable. This orchestration capability is especially valuable in environments where generative models must be frequently updated or retrained based on new data or where multiple experiments and versions need to be managed simultaneously.
Apache Airflow It has become the standard for this type of data orchestration, crucial for managing complex workflows and enabling teams to efficiently take ai applications from prototype to production. When run as part of Astronomer’s managed service, Astro, it also provides levels of scalability and reliability critical to business applications, and a layer of governance and transparency essential for managing ai and machine learning operations.
The following examples illustrate the role of data orchestration in GenAI applications.
Conversational artificial intelligence for support automation
A leading digital travel platform was already using Airflow, powered by Astro, to manage data streams for its analytics and machine learning processes. To accelerate the potential of GenAI in the business, the company’s engineers extended Astro to its new travel planning tool that recommends destinations and accommodations to millions of users daily, powered by large language models (LLMs) and operational data streams.
This type of conversational ai, often seen as chatbots or voicebots, requires well-curated data to avoid low-quality responses and ensure a meaningful user experience. Because the company has standardized on Astro to orchestrate both its existing operational/ML channels and GenAI channels, the travel planning tool can surface more relevant recommendations for users while still delivering a seamless experience from browsing to booking.
Astronomer's own support application, Ask Astrouses LLM and Retrieval Augmented Generation (RAG) to provide domain-specific answers by integrating knowledge from multiple data sources. By publishing Ask Astro is an open source project We show how Airflow simplifies both managing data streams and monitoring ai performance in production.
Content generation
ai-and-airflow/?utm_source=kdnuggets-newsletter&utm_medium=newsletter&utm_campaign=kdnuggets-newsletter” target=”_blank” rel=”noopener”>LaurelGenAI, an ai company focused on automating timekeeping and billing for professional services, demonstrates the power of content generation as another common GenAI use case. The company employs ai to create timesheets and billing summaries from detailed documentation and transactional data. Managing these upstream data streams and maintaining client-specific models can be complex and require robust orchestration.
Astro acts as a “single pane of glass” for Laurel’s data, handling massive amounts of user data efficiently. By adopting machine learning into its Airflow processes, Laurel not only automates critical processes for its customers, but literally makes them twice as efficient.
Reasoning and analysis
Several support organizations are using Airflow-powered ai models to route support tickets, significantly reducing resolution time by associating tickets with agents based on their expertise. This demonstrates the application of ai in reasoning to provide business logic that improves operational efficiency.
ai-apps/?utm_source=kdnuggets-newsletter&utm_medium=newsletter&utm_campaign=kdnuggets-newsletter” target=”_blank” rel=”noopener”>Dosuan ai platform for software engineering teams, uses a similar orchestration to manage data pipelines that ingest and index information from Slack, Github, Jira, and more. Reliable, maintainable, and monitorable data pipelines are crucial for Dosu’s ai applications, which help automatically categorize and assign tasks for major software projects like LangChain.
Dosu ai Workflows Orchestrated by Airflow Running on Astro
Optimizing application development with ai and Airflow
Large language models also aid in code generation and analysis. Dosu and Astro use large language models to generate code suggestions and manage cloud IDE tasks, respectively. These applications require careful management of data from repositories like GitHub and Jira, ensuring that organizational boundaries are respected and sensitive data is anonymized. Airflow’s orchestration capabilities provide transparency and lineage, giving teams confidence in their data management processes.
Next steps to get started with data orchestration
By leveraging Airflow’s workflow management and Astronomer’s deployment and scalability capabilities, development teams don’t need to worry about infrastructure management and the complexities of MLOps. Instead, they can focus on data transformation and model development, accelerating the deployment of GenAI applications while improving their performance and governance.
To help you get started, we recently published our ai-data-orchestration/?utm_source=web&utm_medium=blog&utm_campaign=guide-data-orchestration-gen-ai/?utm_source=kdnuggets-newsletter&utm_medium=newsletter&utm_campaign=kdnuggets-newsletter” target=”_blank” rel=”noopener”>Data Orchestration Guide for Generative aiThe guide provides you with more information on the key capabilities required for data orchestration along with a cookbook that incorporates reference architectures for a variety of different generative ai use cases.
Our teams are ready to run workshops with you to discuss how Airflow and Astronomer can accelerate your GenAI initiatives, so please follow along. Contact Us to schedule your session.