Integrating the capabilities of multiple ai models unlocks a symphony of potential, from automating complex tasks that require multiple skills such as vision, speech, writing and synthesis to improving decision-making processes. However, orchestrating these collaborations presents a significant challenge in managing internal relationships and dependencies. Traditional linear approaches often fall short and struggle to manage the complexities of diverse models and dynamic dependencies.
By translating your machine learning workflow into a graph, you get a display of how each model interacts and contributes to the overall result that combines natural language processing, computer vision and speech models. With the graph approach, nodes represent models or tasks, and edges define dependencies between them. This graph-based mapping offers several advantages, identifying which models depend on the output of others and leveraging parallel processing for independent tasks. Additionally, we can execute the tasks using existing graph navigation strategies, such as breadth-first or depth-first, depending on the priorities of the task.
The path to harmonious collaboration between ai models is not without obstacles. Imagine conducting an orchestra where each individual speaks different languages and the instruments operate independently. This challenge reflects communication gaps when integrating diverse ai models, requiring a framework to manage the relationships and which models can receive each input format.
The graph-based orchestration approach opens doors to interesting possibilities in several domains:
Collaborative tasks for drug discovery.
Researchers can accelerate the drug discovery process with a sequence of ai-powered assistants, each designed for a specific task, for example using a three-step discovery mission. The first step involves a language model that scans a large amount of scientific data to highlight possible protein targets strongly linked to specific diseases, followed by a vision model to explain complex diagrams or images, providing detailed information on the structures of the identified proteins. . This image is crucial to understanding how potential drugs could interact with the protein. Finally, a third model integrates information from language and vision models to predict how chemical compounds might affect target proteins, offering researchers valuable information to lead the process efficiently.
Several challenges will arise during model integration to deliver the entire project. Extracting relevant images from scanned content and feeding them to the vision model is not as simple as it seems. An intermediate processor is needed between text scanning and vision tasks to filter relevant images. Second, the analysis task itself must combine multiple inputs: the output of the data scan, the explanation of the vision model, and the instructions specified by the user. This requires a template to combine the information for the language model to process. The following sections will describe how to use a Python framework to handle complex relationships.
Creative content generation
Model collaboration can facilitate the creation of interactive content by integrating elements such as music composition, animation, and design models to generate animated scenes. For example, in a graph-based collaborative approach, the first task can plan a scene like a director and pass information to each music and image generation task. Finally, an animation model will use the output of the art and music models to generate a short video.
To optimize this process, we aim to achieve parallel execution of music and graphics generation, since they are independent tasks. Therefore, the music does not need to wait for the graphics to complete. Additionally, we need to address the various input formats through the animation task. While some models like Stable Video Diffusion work with images only, music can be combined using a post-processor.
These examples provide just a glimpse of the potential of graph theory in model integration. The graphics integration approach allows you to tailor multiple tasks to your specific needs and unlock innovative solutions.
Intelli is an open source Python module for orchestrating ai workflows, leveraging graphing principles through three key components:
- Agents Acting as representatives of your ai models, you define each agent by specifying its type (text, image, vision, or voice), its provider (openai, gemini, stability, mistral, etc.), and the mission.
- Tasks They are individual units within your ai workflow. Each task leverages an agent to perform a specific action and applies custom pre- and post-processing provided by the user.
- Flow It ties everything together, orchestrating the execution of your tasks, adhering to the dependencies you have established through the graphical structure. Flow management ensures that tasks are executed efficiently and in the correct order, allowing for both sequential and parallel processing when possible.
Using the flow component to manage the task relationship as a graph provides several benefits when connecting multiple models; however, for the single-task case, this might be overkill and calling the model directly will suffice.
Climbing: As your project grows in complexity, adding more models and tasks requires repetitive code updates to account for data format discrepancies and complex dependency. The graph approach simplifies this by defining a new node that represents the task, and the framework automatically resolves input/output differences to organize the data flow.
dynamic adaptation: With traditional approaches, changes for complex tasks will affect the entire workflow and require adjustments. Using flow will take care of adding, deleting or modifying connections automatically.
Explainability: The graph enables a deeper understanding of your ai workflow by visualizing how models interact and optimizing task path navigation.
Note: The author participated in the design and development of the intelli framework. It is an open source project with Apache license.
Starting
First, make sure you have Python 3.7+, as Intelli takes advantage of the latest Python Asyncio features, and install:
pip install intelli
Agents: task executors
Intelli agents are designed to interact with a specific ai model. Each agent includes a unified input layer to access any type of model and provides a dictionary that allows custom parameters to be passed to the model, such as maximum size, temperature, and model version.
from intelli.flow.agents import Agent# Define agents for various ai tasks
text_agent = Agent(
agent_type="text",
provider="openai",
mission="write social media posts",
model_params={"key": OPENAI_API_KEY, "model": "gpt-4"}
)
Tasks: the basic components
Tasks represent individual units of work or operations that agents must perform and include the logic to handle the outcome of the previous task. Each task can be a simple operation such as generating text or a more complex process such as analyzing the sentiment of user comments.
from intelli.flow.tasks import Task
from intelli.flow.input import TextTaskInput# Define a task for text generation
task1 = Task(
TextTaskInput("Create a post about ai technologies"),
text_agent,
log=True
)
Processors: Tuned I/O
Processors add an additional layer of control by defining a custom preprocess for the task input and a postprocess for the output. The following example demonstrates creating a function to shorten the text output from the previous step before calling the image model.
class TextProcessor:
@staticmethod
def text_head(text, size=800):
retupytrn text(:size)task2 = Task(
TextTaskInput("Generate image about the content"),
image_agent,
pre_process=TextProcessor.text_head,
log=True,
)
Flow: specify dependencies
Flow translates your ai workflow into a directed acyclic graph (DAG) and leverages graph theory for dependency management. This allows you to easily visualize task relationships and optimize the execution order of your tasks.
from intelli.flow.flow import Flowflow = Flow(
tasks={
"title_task": title_task,
"content_task": content_task,
"keyword_task": keyword_task,
"theme_task": description_theme_task,
"image_task": image_task,
},
map_paths={
"title_task": ("keyword_task", "content_task"),
"content_task": ("theme_task"),
"theme_task": ("image_task"),
},
)
output = await flow.start()
map_paths dictates task dependencies, guiding Flow to organize the order of execution and ensuring that each task receives the necessary result from its predecessors.
This is how Flow navigates nodes:
- Workflow mapping: Flow builds a DAG using tasks as nodes and dependencies as edges. This visual representation clarifies the task execution sequence and data flow.
- Topological sorting: The flow analyzes the graph to determine the optimal execution order. Tasks without incoming dependencies are prioritized, ensuring that each task receives the necessary input from its predecessors before execution.
- Task execution: The framework iterates through the ordered tasks and executes each one with the corresponding input. Depending on the dependency map, inputs can come from outputs of previous tasks and user-defined values.
- Input Preparation: Before execution, the task applies any preprocessing functions defined for the task, modifies the input data as necessary, and calls the assigned agent.
- Result management: The agent returns a result, which is stored in a dictionary with the task name as the key and returned to the user.
To visualize your flow as a graph:
flow.generate_graph_img()
The use of graph theory has transformed traditional linear approaches to orchestrating ai models by providing a symphony of collaboration between diverse models.
Frameworks like Intelli translate your workflow into a visual representation, where tasks become nodes and dependencies are mapped as edges, creating an overview of your entire process to automate complex tasks.
This approach extends to various fields that require collaborative ai models, including scientific research, business decision automation, and interactive content creation. However, effective scaling requires further refinement in managing data exchange between models.