I started my analysis by getting data from HuggingFace. The data set is called financial-reports-sec (This dataset is licensed under Apache 2.0 and permitted for commercial use) and, according to the authors of the dataset, contains the annual reports of US public companies that file with the SEC. EDGAR system from 1993 to 2020. Each annual report (10-K filing) is divided into 20 sections.
Two relevant attributes of this data are useful for the current task:
- Prayer: Excerpts from 10-K filing reports
- Section: Tags that indicate the section of the 10-K filing to which the sentence belongs
I have focused on three sections:
- Business (Item 1): Describes the company’s business, including subsidiaries, markets, recent events, competition, regulations, and labor. Denoted by 0 in the data.
- Risk Factors (Item 1A): Analyzes risks that could affect the company, such as external factors, potential failures, and other disclosures to warn investors. Denoted by 1.
- Properties (Item 2): Details important physical property assets. It does not include intellectual or intangible assets. Denoted by 3.
For each label, I sampled 10 examples without replacement. The data is structured as follows:
Once the data is ready, all I have to do is create a classifier function that takes the sentence from the data frame and predicts the label.
Role = '''
You are expert in SEC 10-K forms.
You will be presented by a text and you need to classify the text into either 'Item 1', 'Item 1A' or 'Item 2'.
The text only belongs to one of the mentioned categories so only return one category.
'''
def sec_classifier(text): response = openai.ChatCompletion.create(
model='gpt-4',
messages=(
{
"role": "system",
"content": Role},
{
"role": "user",
"content": text}),
temperature=0,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0)
return response('choices')(0)('message')('content')
I’m using GPT-4 here as it’s OpenAI’s most capable model yet. I also set the temperature to 0 just to make sure the model doesn’t drift. The really fun part is how I define the Role; that’s where I can guide the model on what I want it to do. The Role tells you to stay focused and deliver the type of result I’m looking for. Defining a clear role for the model helps you generate high-quality, relevant responses. The message in this function is:
You are an expert in SEC 10-K forms.
You will be presented with a text and must classify the text into ‘Item 1’, ‘Item 1A’ or ‘Item 2’.
The text only belongs to one of the mentioned categories, so it only returns one category.
After applying the sort function on all rows of data, I generated a sort report to evaluate the performance of the model. The macro average F1 score was 0.62, indicating reasonably strong predictive capabilities for this multi-class problem. Since the number of examples was balanced across the 3 classes, the macro and weighted averages converged to the same value. This benchmark score reflects the out-of-the-box accuracy of the pre-trained model before any additional tuning or optimization.
precision recall f1-score supportItem 1 0.47 0.80 0.59 10
Item 1A 0.80 0.80 0.80 10
Item 2 1.00 0.30 0.46 10
accuracy 0.63 30
macro avg 0.76 0.63 0.62 30
weighted avg 0.76 0.63 0.62 30
As mentioned, learning rarely consists of generalizing the model with some good examples. To that end, I modified my class by describing what Element 1, Element 1A, and Element2 are (based on Wikipedia):
Role_fewshot = '''
You are expert in SEC 10-K forms.
You will be presented by a text and you need to classify the text into either 'Item 1', 'Item 1A' or 'Item 2'.
The text only belongs to one of the mentioned categories so only return one category.
In your classification take the following definitions into account: Item 1 (i.e. Business) describes the business of the company: who and what the company does, what subsidiaries it owns, and what markets it operates in.
It may also include recent events, competition, regulations, and labor issues. (Some industries are heavily regulated, have complex labor requirements, which have significant effects on the business.)
Other topics in this section may include special operating costs, seasonal factors, or insurance matters.
Item 1A (i.e. Risk Factors) is the section where the company lays anything that could go wrong, likely external effects, possible future failures to meet obligations, and other risks disclosed to adequately warn investors and potential investors.
Item 2 (i.e. Properties) is the section that lays out the significant properties, physical assets, of the company. This only includes physical types of property, not intellectual or intangible property.
Note: Only state the Item.
'''
def sec_classifier_fewshot(text):
response = openai.ChatCompletion.create(
model='gpt-4',
messages=(
{
"role": "system",
"content": Role_fewshot},
{
"role": "user",
"content": text}),
temperature=0,
max_tokens=256,
top_p=1,
frequency_penalty=0,
presence_penalty=0)
return response('choices')(0)('message')('content')
The message now says:
You are an expert in SEC 10-K forms.
You will be presented with a text and must classify the text into ‘Item 1’, ‘Item 1A’ or ‘Item 2’.
The text only belongs to one of the mentioned categories, so it only returns one category.
In your classification take into account the following definitions:Point 1 (i.e. Business) describes the company’s business: who and what the company does, what subsidiaries it owns, and in what markets it operates.
It may also include recent events, competition, regulations, and labor issues. (Some industries are heavily regulated and have complex work requirements, which have significant effects on business.)
Other topics in this section may include special operating costs, seasonal factors, or insurance issues.Point 1A (i.e. Risk Factors) is the section where the company sets out everything that could go wrong, possible external effects, possible future breaches of obligations and other disclosed risks to adequately warn investors and potential investors.
Item 2 (i.e. Properties) is the section that sets out the important properties, the physical assets, of the company. This only includes types of physical property, not intellectual or intangible property.
If we run this on the texts we get the following performance:
precision recall f1-score supportItem 1 0.70 0.70 0.70 10
Item 1A 0.78 0.70 0.74 10
Item 2 0.91 1.00 0.95 10
accuracy 0.80 30
macro avg 0.80 0.80 0.80 30
weighted avg 0.80 0.80 0.80 30
The macro average F1 is now 0.80 i.e. 29% improves our prediction, just by providing a good description of each class.
Finally you can see the full data set:
In fact, the examples I provided give the model concrete instances from which to learn. Examples allow the model to infer patterns and characteristics; By looking at multiple examples, the model can begin to notice commonalities and differences that characterize the overall concept being learned. This helps the model form a more robust representation. Additionally, providing examples essentially acts as a weak form of supervision, guiding the model toward the desired behavior rather than large labeled data sets.
In the few-shot function, concrete examples help signal to the model the types of information and patterns to pay attention to. In summary, concrete examples are important for learning in rare cases, as they provide anchor points for the model to build an initial representation of a novel concept, which can then be refined with the few examples provided. Inductive learning of specific instances helps models develop nuanced representations of abstract concepts.
If you’ve enjoyed reading this and want to keep in touch, you can find me on my LinkedIn or through my website: iliateimouri.com
Note: All images, unless otherwise noted, are the author’s own.