Convenient private endpoints and service endpoints – including Azure Backbone, storage account firewall, DNS, VNET, and NSG
Storage accounts play a vital role in a medallion architecture to establish an enterprise data lake. They act as a centralized repository, allowing for seamless data exchange between producers and consumers. This setup allows consumers to perform data science tasks and create machine learning (ML) models. Additionally, consumers can use the data for retrieval augmented generation (RAG), making it easier to interact with enterprise data through large language models (LLM) like ChatGPT.
Highly sensitive data is typically stored in the storage account. Defense-in-depth measures must be implemented before data scientists and ML pipelines can access the data. To perform defense in depth, multiple measures must be implemented, such as 1) advanced threat protection to detect malware, 2) authentication using Microsoft Entra, 3) authorization to perform fine-grained access control, 4) audit trail to monitor the access, 5) data leak prevention, 6) encryption and last but not least, 7) network access control using service endpoints or private endpoints.
This article focuses on storage account network access control. In the next chapter, the different concepts of storage account network access are explained (demystified). Below is a practical comparison between the service endpoint and private endpoints. Finally a conclusion is reached.
A typical scenario is that a virtual machine needs to have network access to a storage account. This virtual machine typically acts as a Spark cluster to analyze data from the storage account. The following image provides an overview of the available network access controls.
The image components can be described as follows:
Azure Global Network – Backbone: Traffic always goes over the Azure backbone between two regions (unless enforced by the customer), see also Microsoft Global Network: Azure | Microsoft learns. This is regardless of which firewall rule is used in the storage account and whether service endpoints or private endpoints are used.
Azure Storage Firewall: Firewall rules can restrict or disable public access. Common rules include whitelisting VNETs/subnets, public IP addresses, system-assigned managed identities as resource instances, or allowing trusted services. When a VNET/subnet is whitelisted, the Azure Storage account identifies the source of the traffic and its private IP address. However, the storage account itself is not integrated into the VNET/subnet; For this purpose, private connection points are needed.
Public DNS storage account: Storage accounts will always have a public DNS that can be accessed through network tools; see also Azure Storage account: Public access disabled, but still some level of connectivity – Microsoft Q&A. That is, even when public access is disabled on the storage account's firewall, public DNS will remain.
Virtual Network (VNET): Network on which virtual machines are deployed. Although a storage account is never deployed within a VNET, the VNET can be whitelisted by the Azure Storage Firewall. Alternatively, VNET can create a private endpoint for private and secure connectivity.
Service endpoints: When whitelisting a VNET/subnet in the storage account firewall, the service endpoint must be enabled for the VNET/subnet. The endpoint of the service must be Microsoft.Storage when the VNET and storage account are in the same region or Microsoft.Storage.Global when the VNET and storage are in different regions. Note that service endpoints are also used as a general term, covering both whitelisting a VNET/subnet in Azure Storage Firewall and enabling the service endpoint on the VNET/subnet .
Private endpoints: Integrate a Network Interface Card (NIC) from a Storage Account within the VNET where the virtual machine operates. This integration assigns the storage account a private IP address, making it part of the VNET.
Private DNS storage account: Within a VNET, a private DNS zone can be created in which the storage account's DNS resolves to the private endpoint. This is to ensure that the virtual machine can still connect to the storage account URL and that the storage account URL resolves to a private IP address instead of a public address.
Network Security Group (NSG): Deploy an NSG to limit access in and out of the VNET where the virtual machine is running. This can prevent data leakage. However, an NSG only works with IP addresses or tags, not URLs. For more advanced data breach protection, use Azure Firewall. For simplicity, the article skips this and uses NSG to block outgoing traffic.
The next chapter discusses service endpoints and private endpoints.
The chapter begins by exploring the scenario of unrestricted network access. The details of service endpoints and private endpoints are then discussed with practical examples.
3.1 Do not limit network access: public access enabled
Let's assume the following scenario where you create a virtual machine and a storage account. The storage account firewall has public access enabled; Please refer to the image below.
With this configuration, the virtual machine can access the storage account over the network. Since the virtual machine is also deployed in Azure, the traffic will go through Azure Backbone and be accepted; Please refer to the image below.
Companies often establish firewall rules to limit network access. This involves disabling public access or allowing only selected networks and whitelisting specific ones. The following image illustrates public access disabled and traffic blocked by the firewall.
In the next paragraph, the selected service endpoints and network firewall rules are used to grant network access to the storage account again.
3.2 Limit network access through service endpoints
To enable VNET virtual machine access to the storage account, enable the service endpoint on the VNET. Use Microsoft.Storage within regions or Microsoft.Storage.Global for cross-regions. Next, whitelist the VNET/subnet in the storage account's firewall. Then traffic is blocked again; also see the image below.
Traffic is now accepted. When the VNET/subnet is removed from the Azure storage account firewall or public access is disabled, traffic is blocked again.
In case an NSG is used to block outgoing public IPs in the VM's VNET, the traffic is also blocked again. This is because the storage account's public DNS is used; also see the image below.
In that case, private endpoints will be used to ensure that traffic does not leave the VNET. This is analyzed in the next chapter.
3.3 Limit access through private endpoints
To restore the virtual machine's network access to the storage account, use a private endpoint. This action creates a network interface card (NIC) for the storage account within the virtual machine's VNET, ensuring that traffic remains within the VNET. The image below provides further illustration.
Again, an NSG can be used again to block all traffic; Please refer to the image below.
However, this is counterintuitive as a private endpoint is first created on the VNET and then NSG blocks traffic on the same VNET.
The company always requires network rules to limit network access to your storage account. In this blog post, both service endpoints and private endpoints are considered to limit access.
Both are valid for service endpoints and private endpoints:
For service endpoints, the following is true:
- Requires enabling service endpoints on the VNET/subnet and whitelisting the VNET/subnet in the Azure storage account firewall.
- Requires traffic to leave the VNET of the virtual machine connecting to the storage account. See above, traffic remains on the Azure backbone.
For private endpoints, the following is true:
- Public access can be disabled in the Azure Storage firewall. See above, the storage account's public DNS entry will be maintained.
- The traffic does not leave the VNET that the virtual machine is also running on.
There are many other things to consider whether using service endpoints or private endpoints (costs, migration effort as service endpoints have been available longer than private endpoints, network complexity when using endpoints private endpoints, limited service endpoint support for newer Azure services, hard limit on number of private endpoints in storage account to 200).
However, in case it is required (“must have”) that 1) traffic never leaves the VM's VNET/subnet or 2) creating firewall rules in the Azure Storage Firewall is not allowed and must be blocked, then the endpoint must be serviced. it is not feasible.
In other scenarios, it is possible to consider both solutions and the best option must be determined based on the specific requirements of each scenario.