Got Technical Skills As a Data Engineer? Good, Let’s Talk About Non-Technical Skills

Tech Zero
4 min readJul 29, 2022

Data Engineering is a technical field. Yes, we all know that. You are expected to be well-versed with Data Modeling on RDBMS or No-SQL DBs, Advanced SQL, Python/Scala programming, Data Warehousing Modeling and Concepts and Cloud Technologies To Build Pipelines With.

But, is that all there is? What about non-technical skills?

Working as a data engineer, I have used the following skills in the same capacity (perhaps even more at times) as I have used technical skills —

Being Comfortable In Someone Else’s Shoes

Pic by Frits Ahlefeldt on http://idioms.languagesystems.edu/2017/02/to-be-in-someones-shoes.html

Working as a team of data engineers, I have lost track of the number of times during which I needed to take over a team member’s work. This included understanding their solution from their perspective, how the pipeline was developed, what logic it followed and how to debug it.

When you work as a team, you will experience this scenario more often than not. Perhaps a data engineer left your organization or is on a vacation but their pipeline needs troubleshooting. A particularly challenging facet of this situation is taking over someone’s work without a proper documentation or troubleshooting a team member’s pipeline with a ticking clock. As a Data Engineer, you should get comfortable in taking over a team member’s work if the time comes.

When To Speak Code?

Pic credit — https://i.pinimg.com/736x/0f/ef/8f/0fef8ff07a7a6081557a95565352a7a2.jpg

Yes I have had a plethora of meetings with data engineers from my team to bounce ideas off of them or discuss a potential bottleneck. However, I have also had plenty of meetings with other project teams for whom I am designing a data pipeline solution. In fact, when I work on a project, I end up speaking with those teams more than I do with my own team.

Thus, with a change in the audience, a change should come in the way you pitch. When you discuss your solution with other teams, it’s crucial to be mindful about their expertise and technical know-how. You can quickly lose everyone in a meeting if your explanation consists of binaries instead of actual words.

Why Does It Cost More?

Pic credit — https://stylesage.co/blog/content/images/2018/05/Perezbox-Pricing.jpg

If your organization is a service provider for a client, infrastructure costs are footed by the client. This includes the tools and technologies you used to build the data pipelines with.

The longer the pipeline runs and uses compute = higher the costs.

Here, you need to arm yourself with answers for questions that may be asked regarding the overall time consumed by the pipeline, resources used, tools deployed and whether there’s any further chance of optimization to speed up the overall execution.

Managing Multiple Tasks

When I get to work in the morning, my first task is to check email alerts for any failed executions from yesterday. If there are any, I need to troubleshoot those in a reasonable amount of time so that there’s minimal impact on downstream users consuming that data.

But, this comes at a cost. And the cost is borne by the time I’d have used to build new solutions for a new project. When you start building and deploying numerous solutions, you essentially become the Point of Contact for any failures in your pipelines (although with good documentation your task can be assigned to another team member as mentioned in point 1). You will quickly find yourself shuttling from Azure Data Factory to debugging pyspark code in Databricks to checking table configuration in Snowflake to managing data in the data lake.

As a Data Engineer, doing all of this all at once can become overwhelming. This is where an ability to multitask, identifying priority deliverables and rearranging tasks on the sprint board helps immensely.

Dear reader, thanks for reading down this far. Of course what I described above are not all the soft skills you’d require as a Data Engineer. However, I wrote about those that I have most commonly used throughout my experience in this field.

--

--

Tech Zero

Data Engineering Manager | Azure, Databricks and Snowflake stack | Here to share my knowledge with everyone