As a climate scientist, Google Earth Engine (GEE) is a powerful tool in my toolbox. No more downloading heavy satellite images to my computer.
GEE's primary API is Javascript, although Python users can also access a powerful API to perform similar tasks. Unfortunately, there are fewer materials to learn GEE with Python.
However, I love Python. Since I learned that GEE has a Python API, I imagined a world of possibilities by combining GEE's powerful cloud processing capabilities with Python frameworks.
The five lessons come from my most recent project, which involved analyzing water balance and drought in a watershed in Ecuador. However, the tips, code snippets, and examples could be applied to any project.
The story presents each lesson following the sequence of any data analysis project: data preparation (and planning), data analysis, and visualization.
It's also worth mentioning that I also provide some general advice regardless of which language you use.
This GEE beginner article assumes an understanding of Python and some geospatial concepts.
If you know Python but are new to GEE (like I was for some time), you should know that GEE has functions optimized for processing satellite images. We won't delve into the details of these features here; you should check the official documentation.
However, my advice is to first check if a GEE can perform the analysis you want to perform. When I started using GEE, I used it as a catalog to search for data, relying only on its basic functions. I would then write Python code for most of the analysis. While this approach can work, it often poses significant challenges. I will discuss these challenges in later lessons.
Don't limit yourself to learning just the basics of GEE. If you know Python (or coding in general), the learning curve for these functions is not very steep. Try to use them as much as possible; It's worth it in terms of efficiency.
A final note: GEE features even support machine learning tasks. These GEE features are easy to implement and can help you solve many problems. Only when you cannot solve your problem with these functions should you consider writing Python code from scratch.
As an example for this lesson, consider the implementation of a clustering algorithm.
Example code with GEE functions
# Sample the image to create input for clustering
sample_points = clustering_image.sample(
region=galapagos_aoi,
scale=30, # Scale in meters
numPixels=5000, # Number of points to sample
geometries=False # Don't include geometry to save memory
)# Apply k-means clustering (unsupervised)
clusterer = ee.Clusterer.wekaKMeans(5).train(sample_points)
# Cluster the image
result = clustering_image.cluster(clusterer)
Example code with Python
import rasterio
import numpy as np
from osgeo import gdal, gdal_array# Tell GDAL to throw Python exceptions and register all drivers
gdal.UseExceptions()
gdal.AllRegister()
# Open the .tiff file
img_ds = gdal.Open('Sentinel-2_L2A_Galapagos.tiff', gdal.GA_ReadOnly)
if img_ds is None:
raise FileNotFoundError("The specified file could not be opened.")
# Prepare an empty array to store the image data for all bands
img = np.zeros(
(img_ds.RasterYSize, img_ds.RasterXSize, img_ds.RasterCount),
dtype=gdal_array.GDALTypeCodeToNumericTypeCode(img_ds.GetRasterBand(1).DataType),
)
# Read each band into the corresponding slice of the array
for b in range(img_ds.RasterCount):
img(:, :, b) = img_ds.GetRasterBand(b + 1).ReadAsArray()
print("Shape of the image with all bands:", img.shape) # (height, width, num_bands)
# Reshape for processing
new_shape = (img.shape(0) * img.shape(1), img.shape(2)) # (num_pixels, num_bands)
x = img.reshape(new_shape)
print("Shape of reshaped data for all bands:", x.shape) # (num_pixels, num_bands)
The first block of code is not only shorter, but it will handle large satellite data sets more efficiently because the GEE functions are designed to scale in the cloud.
While GEE's features are powerful, understanding the limitations of cloud processing is crucial when scaling your project.
Access to free cloud computing resources to process satellite images is a blessing. However, it is not surprising that GEE imposes limits to ensure a fair distribution of resources. If you plan to use it for a large-scale non-commercial project (for example, deforestation research in the amazon region) and intend to stay within the limits of the free tier, you should plan accordingly. My general guidelines are:
- Limit the size of your regions, divide them and work in batches. I didn't need to do this in my project because I was working with a single small watershed. However, if your project involves large geographic areas this would be the logical first step.
- Optimize your scripts by prioritizing the use of GEE functions (see Lesson 1).
- Choose data sets that allow you to optimize computing power. For example, in my last project, I used the Climate Hazards Group's Stationed Infrared Precipitation (CHIRPS) data. The original data set has a daily temporal resolution. However, it offers an alternative version called “PENTAD”, which provides data every five days. It corresponds to the sum of the rainfall of these five days. Using this data set allowed me to save computer power when processing the compacted version without sacrificing the quality of my results.
- Examine the description of your data set, as it could reveal scaling factors that could save computer power. For example, in my water balance project, I used Moderate Resolution Imaging Spectroradiometer (MODIS) data. Specifically, the MOD16 data set, which is a readily available evapotranspiration (ET) product. According to the documentation, I could multiply my results by a scale factor of 0.1. Scaling factors help reduce storage requirements by adjusting the data type.
- If worst comes to worst, be prepared to reach an agreement. Reduce the resolution of the analyzes if the study standards allow it. For example, the GEE function “reduceRegion” allows you to summarize the values of a region (sum, mean, etc.). It has a parameter called “scale” that allows you to change the scale of the analysis. For example, if your satellite data has a resolution of 10 m and GEE cannot process your analysis, you can adjust the scale parameter to a lower resolution (for example, 50 m).
As an example of my drought and water balance project, consider the following code block:
# Reduce the collection to a single image (mean MSI over the time period)
MSI_mean = MSI_collection.select('MSI').mean().clip(pauteBasin)# Use reduceRegion to calculate the min and max
stats = MSI_mean.reduceRegion(
reducer=ee.Reducer.minMax(), # Reducer to get min and max
geometry=pauteBasin, # Specify the ROI
scale=500, # Scale in meters
maxPixels=1e9 # Maximum number of pixels to process
)
# Get the results as a dictionary
min_max = stats.getInfo()
# Print the min and max values
print('Min and Max values:', min_max)
In my project, I used a Sentinel-2 satellite image to calculate the soil moisture index (MSI). Next, I applied the GEE function “reduceRegion”, which calculates a summary of values in a region (mean, sum, etc.).
In my case, I needed to find the maximum and minimum MSI values to check if my results made sense. The following graph shows the spatially distributed MSI values in my study region.
The original image has a resolution of 10 m. GEE had difficulty processing the data. Therefore, I used the scale parameter and lowered the resolution to 500 m. After changing this parameter, GEE was able to process the data.
I'm obsessed with data quality. As a result, I use data but rarely trust it without verification. I like to spend time ensuring the data is ready for analysis. However, don't let image fixes cripple your progress.
My tendency to spend too much time on image corrections is because I learned remote sensing and image corrections the “old fashioned” way. By this I mean the use of software that helps apply atmospheric and geometric corrections to images.
Today, scientific agencies supporting satellite missions can deliver images with a high level of preprocessing. In fact, a great feature of GEE is its catalog, which makes it easy to find ready-to-use analytics products.
Preprocessing is the most time-consuming task in any data science project. Therefore, it must be planned and managed properly.
The best approach before starting a project is to establish data quality standards. Based on your standards, allocate enough time to find the best product (which GEE makes easy) and apply only necessary fixes (e.g. cloud masking).
If you love programming in Python (like I do), you may often find yourself coding everything from scratch.
As a PhD student (starting with coding), I wrote a script to perform a t-test on a study region. Later, I discovered a Python library that performed the same task. When I compared the results of my script with those using the library, the results were correct. However, using the library from the beginning could have saved me time.
I'm sharing this lesson to help you avoid these silly mistakes with GEE. I will mention two examples from my water balance project.
Example 1
To calculate the water balance in my watershed, I needed ET data. ET is not an observed variable (like precipitation); you have to calculate it.
The calculation of ET is not trivial. You can look up the equations in textbooks and implement them in Python. However, some researchers published papers related to this calculation and shared their results with the community.
This is where GEE comes into play. The GEE catalog provides not only observed data (as I initially thought) but also many derived products or modeled data sets (e.g. reanalysis data, land cover, vegetation indices, etc.). Guess what? I found a ready-made global ET data set in the GEE catalog – a lifesaver!
Example 2:
I also consider myself a Geographic Information Systems (GIS) professional. Over the years, I have acquired a substantial amount of GIS data for my work, such as watershed boundaries in Shapefile format.
In my water balance project, my intuition was to import the shapefile of my watershed boundaries into my GEE project. From there, I transformed the file into a Geopandas object and continued my analysis.
In this case, I wasn't as lucky as in example 1. I wasted precious time trying to work with this Geopandas object that I couldn't integrate well with GEE. In the end, this approach didn't make sense. GEE does have in its catalog a product for delimiting hydrographic basins that is easy to use.
Therefore, a key takeaway is to keep your workflow within GEE whenever possible.
As mentioned at the beginning of this article, integrating GEE with Python libraries can be incredibly powerful.
However, even for simple analytics and graphs, the integration does not seem straightforward.
This is where Geemp comes in. Geemap is a Python package designed for interactive geospatial analysis and visualization with GEE.
Additionally, I also discovered that it can help create static graphs in Python. I made layouts using GEE and Geemap in my water balance and drought project. The images included in this story used these tools.
GEE is a powerful tool. However, as a beginner, obstacles are inevitable. This article provides tips and tricks to help you get started on the right foot with the GEE Python API.
European Space Agency (2025). European Space Agency. (Year). Harmonized Sentinel-2 MSI: multispectral instrument, level 2A.
Friedl, M., Sulla-Menashe, D. (2022). MODIS/Terra+Aqua Land Cover Type Annual L3 Global 500m WITHOUT Grid V061 (Dataset). NASA EOSDIS Earth Process Distributed Active Archive Center. Retrieved January 15, 2025 from https://doi.org/10.5067/MODIS/MCD12Q1.061
Lehner, B., Verdin, K., Jarvis, A. (2008): New global hydrography derived from spatial elevation data. Eos, Transactions, AGU, 89(10): 93–94.
Lehner, B., Grill G. (2013): Global River Hydrography and Network Routing: Reference Data and New Approaches to Studying the World's Large River Systems. Hydrological Processes, 27(15): 2171–2186. The data is available in www.hydrosheds.org