In the age of information overload, it's easy to get lost in the wealth of content available online. YouTube offers billions of videos and the Internet is full of articles, blogs and academic papers. With such a large volume of data, it is often difficult to extract useful information without spending hours reading and looking. That's where ai-powered web summarizer comes to help.
In this article, let's create a Streamlit-based application using NLP and ai that summarizes YouTube videos and websites into highly detailed summaries. This application uses Groq's Llama-3.2 model and LangChain's summary chains to provide highly detailed summaries, saving the reader time without missing any points of interest.
Learning outcomes
- Understand the challenges of information overload and the benefits of ai-powered summarization.
- Learn how to create a Streamlit app that summarizes content from YouTube and websites.
- Explore the role of LangChain and Llama 3.2 in generating detailed content summaries.
- Learn how to integrate tools like yt-dlp and UnstructuredURLLoader for media processing.
- Create a powerful web summarizer using Streamlit and LangChain to instantly summarize YouTube videos and websites.
- Create a web summarizer with LangChain to get concise and accurate content summaries from URLs and videos.
This article was published as part of the Data Science Blogathon.
Purpose and Benefits of Summarizer App
From YouTube to web posts to in-depth research articles, this vast repository of information is literally at your doorstep. However, for most of us, the time factor rules out browsing videos that last several minutes or reading long articles. According to studies, a person spends only a few seconds on a website before deciding whether to proceed to read it or not. Now, here is the problem that needs a solution.
Enter ai-powered summarization: a technique that allows ai models to digest large amounts of content and provide concise, human-readable summaries. This can be especially useful for busy professionals, students, or anyone who wants to quickly understand the essence of content without spending hours on it.
Summary Application Components
Before we dig into the code, let's look at the key elements that make this app work:
- LangChain: This powerful framework simplifies the process of interacting with large language models (LLM). It provides a standardized way to manage prompts, chain together different language model operations, and access a variety of LLMs.
- illuminated: This open source Python library allows us to quickly create interactive web applications. It is easy to use and that makes it perfect for creating the interface of our summarizer.
- yt-dlp: When summarizing YouTube videos, yt_dlp is used to extract metadata such as title and description. Unlike other YouTube downloaders, yt_dlp is more versatile and supports a wide range of formats. It is ideal for extracting video details, which are then entered into the LLM for summary.
- Unstructured URL Loader: This LangChain utility helps us load and process website content. It handles the complexities of searching web pages and extracting their textual information.
Creating the app: step by step guide
In this section, we'll walk through each stage of developing your ai summary app. We will cover setting up the environment, designing the user interface, implementing the summary model, and testing the application to ensure optimal performance.”
Note: Get the Requisitos.txt file and the full code on GitHub here.
Importing libraries and loading environment variables
This step involves configuring the essential libraries required for the application, including machine learning and NLP frameworks. We will also load environment variables to securely manage API keys, credentials, and configuration settings needed throughout the development process.
import os
import validators
import streamlit as st
from langchain.prompts import PromptTemplate
from langchain_groq import ChatGroq
from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import UnstructuredURLLoader
from yt_dlp import YoutubeDL
from dotenv import load_dotenv
from langchain.schema import Document
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")
This section imports libraries and loads the API key from an .env file, which keeps sensitive information such as API keys safe.
Designing the frontend with Streamlit
In this step, we will create an interactive and easy-to-use interface for the application using Streamlit. This includes adding input forms, buttons, and displaying results, allowing users to seamlessly interact with backend functionalities.
st.set_page_config(page_title="LangChain Enhanced Summarizer", page_icon="🌟")
st.title("YouTube or Website Summarizer")
st.write("Welcome! Summarize content from YouTube videos or websites in a more detailed manner.")
st.sidebar.title("About This App")
st.sidebar.info(
"This app uses LangChain and the Llama 3.2 model from Groq API to provide detailed summaries. "
"Simply enter a URL (YouTube or website) and get a concise summary!"
)
st.header("How to Use:")
st.write("1. Enter the URL of a YouTube video or website you wish to summarize.")
st.write("2. Click **Summarize** to get a detailed summary.")
st.write("3. Enjoy the results!")
These lines set the page settings, title, and welcome text for the main user interface of the application.
Text input for URL and model loading
Here, we will set up a text input field where users can enter a URL to analyze. Additionally, we will integrate the necessary model loading functionality to ensure that the application can process the URL efficiently and apply the machine learning model as needed for analysis.
st.subheader("Enter the URL:")
generic_url = st.text_input("URL", label_visibility="collapsed", placeholder="https://example.com")
Users can enter the URL (YouTube or website) they want to summarize in a text entry field.
llm = ChatGroq(model="llama-3.2-11b-text-preview", groq_api_key=groq_api_key)
prompt_template = """
Provide a detailed summary of the following content in 300 words:
Content: {text}
"""
prompt = PromptTemplate(template=prompt_template, input_variables=("text"))
The model uses a message template to generate a 300-word summary of the provided content. This template is incorporated into the summary chain to guide the process.
Definition of function for uploading YouTube content
In this step, we will define a function that is responsible for searching and loading YouTube content. This feature will take the provided URL, extract relevant video data, and prepare it for analysis using the in-app machine learning model.
def load_youtube_content(url):
ydl_opts = {'format': 'bestaudio/best', 'quiet': True}
with YoutubeDL(ydl_opts) as ydl:
info = ydl.extract_info(url, download=False)
title = info.get("title", "Video")
description = info.get("description", "No description available.")
return f"{title}\n\n{description}"
This function uses yt_dlp to extract information from YouTube video without downloading it. Returns the title and description of the video, which will be summarized by the LLM.
Handling summary logic
if st.button("Summarize"):
if not generic_url.strip():
st.error("Please provide a URL to proceed.")
elif not validators.url(generic_url):
st.error("Please enter a valid URL (YouTube or website).")
else:
try:
with st.spinner("Processing..."):
# Load content from URL
if "youtube.com" in generic_url:
# Load YouTube content as a string
text_content = load_youtube_content(generic_url)
docs = (Document(page_content=text_content))
else:
loader = UnstructuredURLLoader(
urls=(generic_url),
ssl_verify=False,
headers={"User-Agent": "Mozilla/5.0"}
)
docs = loader.load()
# Summarize using LangChain
chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)
output_summary = chain.run(docs)
st.subheader("Detailed Summary:")
st.success(output_summary)
except Exception as e:
st.exception(f"Exception occurred: {e}")
- If it's a YouTube link, load_youtube_content extracts the content, wraps it in a document, and stores it in documents.
- If it is a website, UnstructuredURLLoader retrieves the content as documents.
Running the summary string: The LangChain summary chain processes the uploaded content and uses the request template and LLM to generate a summary.
To give your app a stylish look and provide essential information, we will add a custom footer using Streamlit. This footer can display important links, thank yous or contact details, ensuring a clean and professional user interface.
st.sidebar.header("Features Coming Soon")
st.sidebar.write("- Option to download summaries")
st.sidebar.write("- Language selection for summaries")
st.sidebar.write("- Summary length customization")
st.sidebar.write("- Integration with other content platforms")
st.sidebar.markdown("---")
st.sidebar.write("Developed with ❤ by Gourav Lohar")
Production
Input: https://www.analyticsvidhya.com/blog/2024/10/nvidia-nim/
YouTube Video Summary
Input video:
Conclusion
By leveraging the LangChain framework, we streamline interaction with the powerful Llama 3.2 language model, enabling the generation of high-quality summaries. Streamlit made it easy to develop an intuitive and easy-to-use web application, making the summary tool accessible and engaging.
In conclusion, the article offers a practical approach and useful ideas for creating a comprehensive summary tool. By combining cutting-edge language models with efficient frameworks and easy-to-use interfaces, we can open up new possibilities to facilitate information consumption and improve knowledge acquisition in today's content-rich world.
Key takeaways
- LangChain facilitates development by providing a consistent approach to interacting with language models, managing prompts, and chaining processes.
- Groq API's Llama 3.2 model demonstrates strong capabilities in understanding and condensing information, resulting in accurate and concise summaries.
- The integration of tools like yt-dlp and UnstructuredURLLoader allows the application to handle content from various sources such as YouTube and web articles easily.
- The web summarizer uses LangChain and Streamlit to provide fast and accurate summaries of YouTube videos and websites.
- Leveraging the Llama 3.2 model, the web summarizer efficiently condenses complex content into easy-to-understand summaries.
Frequently asked questions
A. LangChain is a framework that simplifies interaction with large language models. It helps manage prompts, chain operations, and access multiple LLMs, making it easy to create applications like this summarizer.
A. Llama 3.2 generates high-quality text and excels at understanding and condensing information, making it well suited for summary tasks. It is also an open source model.
A. While it can handle a wide range of content, there are limitations. Extremely long videos or articles may require additional features such as audio transcription or text splitting for optimal summaries.
A. Currently yes. However, future improvements could include language selection for broader applicability.
A. You must run the provided code in a Python environment with the necessary libraries installed. See GitHub for the full code and requirements.txt.
The media shown in this article is not the property of Analytics Vidhya and is used at the author's discretion.