Enter the experiment.
I decided over the weekend show, don’t tell. I wanted Skip to experience the new technology for himself, not just listen to me talk about it. What if we could make a business plan to start a hypothetical new farm in less than 40 minutes? You could do this using a ChatGPT 4 with Advanced Data Analysis enabled, sharing the screen with your iPhone; we would collaborate together to build it.
We briefly discuss our assumptions:
- We have 1,000 acres to farm in East Texas
- We have $1 million in initial capital to purchase equipment with
- We need to get a line of credit from a bank on year 1 operating expenses to get started.
- We are interested in growing wheat and soybeans (what Skip actually grew back in the day)
We start simple:
Question 1: “Write me an outline of a business plan to start a new farm in East Texas to grow wheat and soybeans.”
Skip was immediately impressed with the long-form results. But wait, we were just getting started…
Question 2: “Give me a list of the equipment I need to get started”
Question 3: “Put it in a table and prioritize it numerically based on what is most urgently needed and what is least urgent.”
Question 4: “Now create a new column for budget allocation. Suppose I am going to use a combination of leases and purchases, create an additional column to recommend leasing versus buying, assume I have $1 million in initial capital.”
(Elapsed time 4 minutes)
Result:
Question 5: “How much should I set aside to request a variable cost line of credit (seed, fertilizer, labor, etc.), let’s say 1000 hectares? Create a new table.”
Result:
At that point, I felt like Skip had just witnessed Apollo 11 all over again. He could barely contain his excitement. But we weren’t done yet…
Question 6: “Write me a Monte Carlo simulation where I can adjust my input assumptions and test the result on my budget”
Results:
import numpy as np
import matplotlib.pyplot as plt# Define number of simulations
n_simulations = 10000
# Assumptions with (mean, std deviation)
wheat_seed_cost = (20, 2) # Cost per acre
soybean_seed_cost = (50, 5)
fertilizer_wheat_cost = (50, 5)
fertilizer_soybean_cost = (40, 4)
labor_cost = (25, 3)
pesticide_cost = (30, 3)
fuel_maintenance_cost = (15, 2)
miscellaneous_cost = (10, 1)
def simulate_farm_costs():
wheat_seed = np.random.normal(wheat_seed_cost(0), wheat_seed_cost(1))
soybean_seed = np.random.normal(soybean_seed_cost(0), soybean_seed_cost(1))
fertilizer_wheat = np.random.normal(fertilizer_wheat_cost(0), fertilizer_wheat_cost(1))
fertilizer_soybean = np.random.normal(fertilizer_soybean_cost(0), fertilizer_soybean_cost(1))
labor = np.random.normal(labor_cost(0), labor_cost(1))
pesticide = np.random.normal(pesticide_cost(0), pesticide_cost(1))
fuel_maintenance = np.random.normal(fuel_maintenance_cost(0), fuel_maintenance_cost(1))
miscellaneous = np.random.normal(miscellaneous_cost(0), miscellaneous_cost(1))
total_cost = (wheat_seed + soybean_seed + fertilizer_wheat + fertilizer_soybean +
labor + pesticide + fuel_maintenance + miscellaneous) * 1000 # for 1000 acres
return total_cost
results = (simulate_farm_costs() for _ in range(n_simulations))
# Visualization
plt.hist(results, bins=50, edgecolor='black', alpha=0.7)
plt.title('Distribution of Total Farm Costs')
plt.xlabel('Total Cost')
plt.ylabel('Frequency')
plt.grid(True, alpha=0.2)
plt.show()
Now comes the moment of truth: will the code run? We jump to the Jupyter notebook to run the code. And there it is on the first try:
At this point, we were only 10-15 minutes in. We still have some time to spare… Could we make it interactive?
We need to create a list of parameters that allow the user to dynamically adjust the assumptions in a slider. Again we use ChatGPT to get some recommendations on what those parameters should be based on the Monte Carlo we built earlier:
Once we have our parameter list created, we create a ‘measures’ table in Power BI linked to 16 slicer visuals that allow the user to manually select their inputs and dynamically update the Monte Carlo simulation. To do this, we create a ‘Python Visual’ in Power BI, drag in all our measures, and then update the code as follows:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: # dataset = pandas.DataFrame(fertilizer_soybean_cost_avg Value, fertilizer_soybean_cost_std Value, fertilizer_wheat_cost_avg Value, fertilizer_wheat_cost_std Value, fuel_maintenance_cost_avg Value, fuel_maintenance_cost_std Value, labor_cost_avg Value, labor_cost_std Value, miscellaneous_cost_avg Value, miscellaneous_cost_std Value, pesticide_cost_avg Value, pesticide_cost_std Value, soybean_seed_cost_avg Value, wheat_seed_cost_avg Value, wheat_seed_cost_std Value)
# dataset = dataset.drop_duplicates()
# Paste or type your script code here:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Assuming the data from Power BI is passed as 'dataset'
df = dataset
# Fetch values from the dataset
wheat_seed_cost_avg = df('wheat_seed_cost_avg Value').iloc(0)
wheat_seed_cost_std = df('wheat_seed_cost_std Value').iloc(0)
soybean_seed_cost_avg = df('soybean_seed_cost_avg Value').iloc(0)
soybean_seed_cost_std = df('soybean_seed_cost_std Value').iloc(0)
fertilizer_wheat_cost_avg = df('fertilizer_wheat_cost_avg Value').iloc(0)
fertilizer_wheat_cost_std = df('fertilizer_wheat_cost_std Value').iloc(0)
fertilizer_soybean_cost_avg = df('fertilizer_soybean_cost_avg Value').iloc(0)
fertilizer_soybean_cost_std = df('fertilizer_soybean_cost_std Value').iloc(0)
labor_cost_avg = df('labor_cost_avg Value').iloc(0)
labor_cost_std = df('labor_cost_std Value').iloc(0)
pesticide_cost_avg = df('pesticide_cost_avg Value').iloc(0)
pesticide_cost_std = df('pesticide_cost_std Value').iloc(0)
fuel_maintenance_cost_avg = df('fuel_maintenance_cost_avg Value').iloc(0)
fuel_maintenance_cost_std = df('fuel_maintenance_cost_std Value').iloc(0)
miscellaneous_cost_avg = df('miscellaneous_cost_avg Value').iloc(0)
miscellaneous_cost_std = df('miscellaneous_cost_std Value').iloc(0)
# Define number of simulations
n_simulations = 10000
# Assumptions with (mean, std deviation)
wheat_seed_cost = (wheat_seed_cost_avg, wheat_seed_cost_std)
soybean_seed_cost = (soybean_seed_cost_avg, soybean_seed_cost_std)
fertilizer_wheat_cost = (fertilizer_wheat_cost_avg, fertilizer_wheat_cost_std)
fertilizer_soybean_cost = (fertilizer_soybean_cost_avg, fertilizer_soybean_cost_std)
labor_cost = (labor_cost_avg, labor_cost_std)
pesticide_cost = (pesticide_cost_avg, pesticide_cost_std)
fuel_maintenance_cost = (fuel_maintenance_cost_avg, fuel_maintenance_cost_std)
miscellaneous_cost = (miscellaneous_cost_avg, miscellaneous_cost_std)
def simulate_farm_costs():
wheat_seed = np.random.normal(wheat_seed_cost(0), wheat_seed_cost(1))
soybean_seed = np.random.normal(soybean_seed_cost(0), soybean_seed_cost(1))
fertilizer_wheat = np.random.normal(fertilizer_wheat_cost(0), fertilizer_wheat_cost(1))
fertilizer_soybean = np.random.normal(fertilizer_soybean_cost(0), fertilizer_soybean_cost(1))
labor = np.random.normal(labor_cost(0), labor_cost(1))
pesticide = np.random.normal(pesticide_cost(0), pesticide_cost(1))
fuel_maintenance = np.random.normal(fuel_maintenance_cost(0), fuel_maintenance_cost(1))
miscellaneous = np.random.normal(miscellaneous_cost(0), miscellaneous_cost(1))
total_cost = (wheat_seed + soybean_seed + fertilizer_wheat + fertilizer_soybean +
labor + pesticide + fuel_maintenance + miscellaneous) * 1000 # for 1000 acres
return total_cost
results = (simulate_farm_costs() for _ in range(n_simulations))
# Convert results into a dataframe
df_simulated_results = pd.DataFrame(results, columns=('Total Cost'))
# Calculate the Interquartile Range (IQR)
Q1 = df_simulated_results('Total Cost').quantile(0.25)
Q3 = df_simulated_results('Total Cost').quantile(0.75)
# Plotting the histogram
plt.figure(figsize=(10, 6))
n, bins, patches = plt.hist(df_simulated_results('Total Cost'), bins=50, color='blue', edgecolor='black', alpha=0.7)
plt.title('Distribution of Year 1 Variable Farm Costs from Simulation')
plt.xlabel('Year 1 Variable Cost')
plt.ylabel('Frequency')
plt.grid(True, which='both', linestyle='--', linewidth=0.5)
# Shade the IQR
for i in range(len(bins)):
if bins(i) > Q1 and bins(i) < Q3:
patches(i).set_facecolor('green')
plt.axvline(Q1, color='red', linestyle='dashed', linewidth=1)
plt.axvline(Q3, color='red', linestyle='dashed', linewidth=1)
plt.tight_layout()
plt.savefig('simulated_costs_histogram.png') # This will save the figure as an image file
plt.show()
Just for fun, we asked ChatGPT to define the interquartile range (IQR) and shade it a different color; We also manually update the chart labels and x-axis. The rest was just cleaning up the images in Power BI a bit to make it a little easier to use. The ending turned out:
We now have a dynamic Monte Carlo simulation to play with different input cost assumptions and project the variable operating expense we would need to get our farming operation off the ground. Leveraging ChatGPT 4, we wrote virtually no code, just modified it a little, did most of the work from an iPhone via screen sharing, created the last piece on the Power BI desktop, and shared the screen through the app Power BI for iPhone. All in about 30-40 minutes of work.
My grandfather’s verdict? “We accomplished in 40 minutes what would have taken him 2 years to do on his own in the past.” Yes, I recognize that there is much more. We could do – and the “simulation” is far from perfect. (For example, we do not delineate between the percentage of crops devoted to soybeans and wheat.) But for 40 minutes? Even I was impressed. And this is the promise of Generation ai: democratizing data science, encouraging experimentation, and accelerating the ability to develop it, all at our fingertips. Allowing a grandfather and his grandson the opportunity to reconnect over some statistics and leverage technology in new and unexpected ways.