Data engineering landscape in the AI-powered world

Image from Bing Image Creator

One of the biggest impacts has been the broader adoption of “fast engineering,” essentially the ability to ask AI to help with coding-related tasks. I’ve seen Andrej Karpathy joke on Twitter: “The hottest new programming language is English.”

Generative AI has also started a gold rush with dozens of startups racing to develop an AI that can query the data warehouse and return an intelligent answer to ad hoc questions asked by data consumers in their natural language. “This would radically simplify the self-service analytics process and further democratize data, but it will be difficult to solve beyond basic ‘metrics extraction’ given the complexity of data pipelines for more advanced analytics,” the CTO commented. from Monte Carlo. shane murray.

“When I’m evaluating data engineering candidates for a position, I look for their track record of impact and startup,” Murray said. That could be in your main occupation or contributing to open source projects. In any case, it’s not like you were there, but what impact did you have?

If you don’t like change, data engineering is not for you. “Little in this space has escaped reinvention,” Murray said. It is clear that the process of creating and maintaining data pipelines will be much easier, as will the ability for data consumers to access and manipulate data.

What hasn’t changed, however, is the data lifecycle. “It’s issued, it’s transformed for a use and then it’s archived,” Murray said. “While the underlying infrastructure may change and automation will shift time and attention to the right or left, human data engineers will continue to play a crucial role in extracting value from data, whether designing data systems scalable and reliable or as specialist engineers within a chosen system. data domain”.

I’ve found that data platform teams, which are now quite common in data teams of various sizes, are great places for data engineers to get started.

Murray further explained: “Here, you can specialize in a specific domain of data that is critical to business operations, such as customer data or product/behavioral data. -Finish the problem, from the source to the analytics use case, whether that will make him an asset to the team and the business.”

“Alternatively, one could specialize in a specific data platform capability, such as reliability engineering, business intelligence, experimentation, or function engineering.” Murray specified. “These types of roles typically provide a broader, but shallower, understanding of each business use case, but may be an easier jump from a software engineering role to data.”

Another path I see more often for data engineers is the role of data product manager, Murray said. if one is growing data engineering skills but find they are more motivated to talk to end users, articulate the problems to be solved, and distill the vision and roadmap for the team, then a product management role may be a future prospect.

Data teams are starting to invest in this skill set as we move forward to deal with”data as product”, ranging from critical dashboards and decision support tools to machine learning applications that are critical to business operations or customer experience. data product managers You will have an understanding of how to build a reliable and scalable data product, but you will also apply product thinking to drive vision, roadmap and adoption,” Murray said.

The modern data stack is fast becoming the dominant and trending technology stack in the field of data engineering, Murray articulated. This stack has a cloud-based data warehouse or lake at its core, and complementary cloud-based solutions for data ingestion, transformation, orchestration, visualization, and observability.

It is advantageous in that it has a fast time to value, is fundamentally easier to use than the previous generation of tools, is extensible to a wide range of analytics and machine learning use cases, and can scale to the size and complexity of the data. managed in today’s world.

“The exact solutions will vary depending on the size of the organization and the specific data use cases, but in general the most common modern data stack is Snowflake, Fivetran, dbt, Airflow, Looker and Monte Carlo. There may also be Atlan and Immuta to address data catalog and access, respectively,” Murray explained. “Larger organizations or those with more machine learning use cases will typically have data stacks that use more Databricks and Spark.”

“The era of the modern data stack ushered in by Snowflake and Databricks has yet to reach a point of consolidation, and we’re already seeing ideas that may further disrupt the status quo of modern data pipelines,” Murray mused. “On the near horizon is more widespread adoption of streaming data, zero ETL, data sharing, and a unified metrics layer.” Zero-ETL and data sharing are particularly exciting as they have the potential to simplify the complexity of modern data pipelines, which have multiple integration points and therefore fail.

The tech industry job market is expected to undergo significant change in 2023, fueled by the growth of big data analytics. According Dice Media AnalysisThis shift will occur as the global big data analytics market is expected to grow at an impressive rate of 30.7 percent, reaching a projected value of $346.24 billion by 2030. This growth is anticipated to create numerous opportunities. for trained professionals in the field, such as data engineers, business analysts, and data analysts.

“I strongly believe that data engineering jobs will not be just about writing code, but will involve more communication with business stakeholders and end-to-end system design,” he said. deexith reddy, an experienced data engineer and open source enthusiast. “Therefore, to ensure job security, one must focus on both the breadth of data analysis and the depth of data engineering.”

Generative AI is likely to make the field of data engineering more competitive. However, during our call, Reddy also emphasized that contributing to open source projects will always be beneficial to building a strong portfolio, considering technological advances and recent advances in AI.

Reddy shed more light on the critical role data engineers play in enhancing an organization’s capabilities through the use of open source technologies. For example, there has been widespread adoption of open source technologies such as apache spark, apache kafkaand elasticsearch among data engineers, as well as Kubernetes among data scientists for data science practices. These OSS technologies help meet computational requirements for deep learning and machine learning workloads, as well as MLOps workflows.

Companies often identify and recruit top contributors to open source projects like these, fostering an environment that values and encourages open source contributions. This approach helps retain qualified data engineers and allows organizations to benefit from their expertise.

Jan Saqib He is a writer and technology analyst with a passion for data science, automation, and cloud computing.