Create your own web and YT summary

In the age of information overload, it's easy to get lost in the wealth of content available online. YouTube offers billions of videos and the Internet is full of articles, blogs and academic papers. With such a large volume of data, it is often difficult to extract useful information without spending hours reading and looking. That's where ai-powered web summarizer comes to help.

In this article, let's create a Streamlit-based application using NLP and ai that summarizes YouTube videos and websites into highly detailed summaries. This application uses Groq's Llama-3.2 model and LangChain's summary chains to provide highly detailed summaries, saving the reader time without missing any points of interest.

Learning outcomes

Understand the challenges of information overload and the benefits of ai-powered summarization.
Learn how to create a Streamlit app that summarizes content from YouTube and websites.
Explore the role of LangChain and Llama 3.2 in generating detailed content summaries.
Learn how to integrate tools like yt-dlp and UnstructuredURLLoader for media processing.
Create a powerful web summarizer using Streamlit and LangChain to instantly summarize YouTube videos and websites.
Create a web summarizer with LangChain to get concise and accurate content summaries from URLs and videos.

This article was published as part of the Data Science Blogathon.

Purpose and Benefits of Summarizer App

From YouTube to web posts to in-depth research articles, this vast repository of information is literally at your doorstep. However, for most of us, the time factor rules out browsing videos that last several minutes or reading long articles. According to studies, a person spends only a few seconds on a website before deciding whether to proceed to read it or not. Now, here is the problem that needs a solution.

Enter ai-powered summarization: a technique that allows ai models to digest large amounts of content and provide concise, human-readable summaries. This can be especially useful for busy professionals, students, or anyone who wants to quickly understand the essence of content without spending hours on it.

Summary Application Components

Before we dig into the code, let's look at the key elements that make this app work:

LangChain: This powerful framework simplifies the process of interacting with large language models (LLM). It provides a standardized way to manage prompts, chain together different language model operations, and access a variety of LLMs.
illuminated: This open source Python library allows us to quickly create interactive web applications. It is easy to use and that makes it perfect for creating the interface of our summarizer.
yt-dlp: When summarizing YouTube videos, yt_dlp is used to extract metadata such as title and description. Unlike other YouTube downloaders, yt_dlp is more versatile and supports a wide range of formats. It is ideal for extracting video details, which are then entered into the LLM for summary.
Unstructured URL Loader: This LangChain utility helps us load and process website content. It handles the complexities of searching web pages and extracting their textual information.

Creating the app: step by step guide

In this section, we'll walk through each stage of developing your ai summary app. We will cover setting up the environment, designing the user interface, implementing the summary model, and testing the application to ensure optimal performance.”

Note: Get the Requisitos.txt file and the full code on GitHub here.

Importing libraries and loading environment variables

This step involves configuring the essential libraries required for the application, including machine learning and NLP frameworks. We will also load environment variables to securely manage API keys, credentials, and configuration settings needed throughout the development process.

import os
import validators
import streamlit as st
from langchain.prompts import PromptTemplate
from langchain_groq import ChatGroq
from langchain.chains.summarize import load_summarize_chain
from langchain_community.document_loaders import UnstructuredURLLoader
from yt_dlp import YoutubeDL
from dotenv import load_dotenv
from langchain.schema import Document
load_dotenv()
groq_api_key = os.getenv("GROQ_API_KEY")

This section imports libraries and loads the API key from an .env file, which keeps sensitive information such as API keys safe.

Designing the frontend with Streamlit

In this step, we will create an interactive and easy-to-use interface for the application using Streamlit. This includes adding input forms, buttons, and displaying results, allowing users to seamlessly interact with backend functionalities.

st.set_page_config(page_title="LangChain Enhanced Summarizer", page_icon="🌟")
st.title("YouTube or Website Summarizer")
st.write("Welcome! Summarize content from YouTube videos or websites in a more detailed manner.")
st.sidebar.title("About This App")
st.sidebar.info(
    "This app uses LangChain and the Llama 3.2 model from Groq API to provide detailed summaries. "
    "Simply enter a URL (YouTube or website) and get a concise summary!"
)
st.header("How to Use:")
st.write("1. Enter the URL of a YouTube video or website you wish to summarize.")
st.write("2. Click **Summarize** to get a detailed summary.")
st.write("3. Enjoy the results!")

These lines set the page settings, title, and welcome text for the main user interface of the application.

Text input for URL and model loading

Here, we will set up a text input field where users can enter a URL to analyze. Additionally, we will integrate the necessary model loading functionality to ensure that the application can process the URL efficiently and apply the machine learning model as needed for analysis.

st.subheader("Enter the URL:")
generic_url = st.text_input("URL", label_visibility="collapsed", placeholder="https://example.com")

Users can enter the URL (YouTube or website) they want to summarize in a text entry field.

llm = ChatGroq(model="llama-3.2-11b-text-preview", groq_api_key=groq_api_key)
prompt_template = """
Provide a detailed summary of the following content in 300 words:
Content: {text}
"""
prompt = PromptTemplate(template=prompt_template, input_variables=("text"))

The model uses a message template to generate a 300-word summary of the provided content. This template is incorporated into the summary chain to guide the process.

Definition of function for uploading YouTube content

In this step, we will define a function that is responsible for searching and loading YouTube content. This feature will take the provided URL, extract relevant video data, and prepare it for analysis using the in-app machine learning model.

def load_youtube_content(url):
    ydl_opts = {'format': 'bestaudio/best', 'quiet': True}
    with YoutubeDL(ydl_opts) as ydl:
        info = ydl.extract_info(url, download=False)
        title = info.get("title", "Video")
        description = info.get("description", "No description available.")
        return f"{title}\n\n{description}"

This function uses yt_dlp to extract information from YouTube video without downloading it. Returns the title and description of the video, which will be summarized by the LLM.

Handling summary logic

if st.button("Summarize"):
    if not generic_url.strip():
        st.error("Please provide a URL to proceed.")
    elif not validators.url(generic_url):
        st.error("Please enter a valid URL (YouTube or website).")
    else:
        try:
            with st.spinner("Processing..."):
                # Load content from URL
                if "youtube.com" in generic_url:
                    # Load YouTube content as a string
                    text_content = load_youtube_content(generic_url)
                    docs = (Document(page_content=text_content))
                else:
                    loader = UnstructuredURLLoader(
                        urls=(generic_url),
                        ssl_verify=False,
                        headers={"User-Agent": "Mozilla/5.0"}
                    )
                    docs = loader.load()

                # Summarize using LangChain
                chain = load_summarize_chain(llm, chain_type="stuff", prompt=prompt)
                output_summary = chain.run(docs)

                st.subheader("Detailed Summary:")
                st.success(output_summary)

        except Exception as e:
            st.exception(f"Exception occurred: {e}")

If it's a YouTube link, load_youtube_content extracts the content, wraps it in a document, and stores it in documents.
If it is a website, UnstructuredURLLoader retrieves the content as documents.

Running the summary string: The LangChain summary chain processes the uploaded content and uses the request template and LLM to generate a summary.

To give your app a stylish look and provide essential information, we will add a custom footer using Streamlit. This footer can display important links, thank yous or contact details, ensuring a clean and professional user interface.

st.sidebar.header("Features Coming Soon")
st.sidebar.write("- Option to download summaries")
st.sidebar.write("- Language selection for summaries")
st.sidebar.write("- Summary length customization")
st.sidebar.write("- Integration with other content platforms")

st.sidebar.markdown("---")
st.sidebar.write("Developed with ❤ by Gourav Lohar")

Production

Input: https://www.analyticsvidhya.com/blog/2024/10/nvidia-nim/

YouTube Video Summary

Input video:

Conclusion

By leveraging the LangChain framework, we streamline interaction with the powerful Llama 3.2 language model, enabling the generation of high-quality summaries. Streamlit made it easy to develop an intuitive and easy-to-use web application, making the summary tool accessible and engaging.

In conclusion, the article offers a practical approach and useful ideas for creating a comprehensive summary tool. By combining cutting-edge language models with efficient frameworks and easy-to-use interfaces, we can open up new possibilities to facilitate information consumption and improve knowledge acquisition in today's content-rich world.

Key takeaways

LangChain facilitates development by providing a consistent approach to interacting with language models, managing prompts, and chaining processes.
Groq API's Llama 3.2 model demonstrates strong capabilities in understanding and condensing information, resulting in accurate and concise summaries.
The integration of tools like yt-dlp and UnstructuredURLLoader allows the application to handle content from various sources such as YouTube and web articles easily.
The web summarizer uses LangChain and Streamlit to provide fast and accurate summaries of YouTube videos and websites.
Leveraging the Llama 3.2 model, the web summarizer efficiently condenses complex content into easy-to-understand summaries.

Frequently asked questions

P1. What is LangChain and why is it used in this application?

A. LangChain is a framework that simplifies interaction with large language models. It helps manage prompts, chain operations, and access multiple LLMs, making it easy to create applications like this summarizer.

P2. Why was Llama 3.2 chosen as the language model?

A. Llama 3.2 generates high-quality text and excels at understanding and condensing information, making it well suited for summary tasks. It is also an open source model.

P3. Can this app summarize any YouTube video or web article?

A. While it can handle a wide range of content, there are limitations. Extremely long videos or articles may require additional features such as audio transcription or text splitting for optimal summaries.

Q4. Is the summary limited to English?

A. Currently yes. However, future improvements could include language selection for broader applicability.

Q5. How can I access and use this summary?

A. You must run the provided code in a Python environment with the necessary libraries installed. See GitHub for the full code and requirements.txt.

The media shown in this article is not the property of Analytics Vidhya and is used at the author's discretion.

Gourav Lohar

Hi, I'm Gourav, a data science enthusiast with a medium background in statistical analysis, machine learning and data visualization. My journey into the world of data began with the curiosity to unravel insights from data sets.

Create your own web and YT summary

Technical Terrence Team

DirecTV-Dish deal drama could affect 18 million customers

Leave a Reply Cancel reply

Recommended.

Apecoin surges to 6-month high amid strategic whale moves

Ethereum price could avoid a major breakdown if it closes above a key level

BTC/USD could get strong support below $23k

CryptoPunk #1563 sold for $56 million, legitimacy questioned

Former Ethereum Miner CoreWeave Signs AI Deal With Microsoft

Categories

Important Links

Create your own web and YT summary

Learning outcomes

Purpose and Benefits of Summarizer App

Summary Application Components

Creating the app: step by step guide

Importing libraries and loading environment variables

Designing the frontend with Streamlit

Text input for URL and model loading

Definition of function for uploading YouTube content

Handling summary logic

Production

YouTube Video Summary

Conclusion

Key takeaways

Frequently asked questions

Related

Technical Terrence Team

DirecTV-Dish deal drama could affect 18 million customers

Leave a Reply Cancel reply

Recommended.

Apecoin surges to 6-month high amid strategic whale moves

Ethereum price could avoid a major breakdown if it closes above a key level

BTC/USD could get strong support below $23k

CryptoPunk #1563 sold for $56 million, legitimacy questioned

Former Ethereum Miner CoreWeave Signs AI Deal With Microsoft

Categories

Important Links

Get daily news updates to your inbox!