How To Successfully Fail a Data Pipeline in ADF
This article demonstrates how to use the Fail activity in Azure Data Factory to intentionally fail a data pipeline.
In this article, we will build a data pipeline that uses fail activity in ADF to terminate a data pipeline and log a custom error message.
To be able to identify whether an excel file exists in storage container or not. If the file doesn’t exist, fail the data pipeline with an error message.
As a Data engineer, I need to build a data pipeline that checks whether a file has been uploaded everyday in a folder inside my storage account container or not. If the pipeline does not find a file with the name Azure File 1.xlsx among the list of files, it should fail the pipeline and log a custom error message.
👉In my storage account container — dev I have created a folder called temp. Inside that folder, I have placed two files — Azure File 1.xlsx and Azure File 2.xlsx.
Step 1: Create a Customized Binary Dataset
As we are working with excel files, let’s create a new generic binary dataset in the following way —
New dataset > Azure Data Lake Storage Gen2 > Binary > Linked Service to Connect to ADLSG2. Let’s parameterize the connection property of our new dataset as shown below:
The value for container will be dev and the value for directory will be temp.
Step 2: Bring Get Metadata1 Activity to ADF Canvas
In the get metadata activity, select the new dataset we created above and provide the values for container and directory as shown below. Under Field list, select Child Items.
The output of the Get Metadata1 activity should look like below:
Step 3: Drag a ForEach Activity To Connect With Get Metadata1 Activity
Create a new variable called files. Assign it a datatype of Array. As we loop over the output of Get Metadata activity, we will append the file names to this variable.
In the ForEach activity, enter @activity(‘Get Metadata1’).output.childItems under Items textbox. Also select Sequential checkbox.
Click activities inside for each activity. Drag append variable activity to the canvas. In the value field, write @item().name. This expression will append each element’s name (in our case file name) as the for each loop accesses it to the files variable.
Step 4: Check If File Exists And Fail Pipeline If File Not Found
Drag if condition activity to the blank canvas. In the activities expression, add @contains(variables(‘files’), ‘Azure File 1.xlsx’).
In the above expression, we are looking for the file named ‘Azure File 1.xlsx’ in the files array. Note that the files array was created in the previous activity where we looped over all the files and added them to the files array variable.
Click the False part of the if condition and drag Fail activity to the canvas.
In the fail message, I provided the error to read as — Azure File 1.xlsx is not found in the storage account container. We can also dynamically add content to this error message but for this tutorial, I’ll use the above.
For the error code, we can specify an error code that signifies the pipeline has failed. Think about this in terms of Error 404 that we usually see when a webpage fails to load. Here, I have provided 500 as the error code.
The final pipeline should look like this —
Time to test the pipeline.
👉 I will remove the Azure File 1.xlsx from the storage account container’s folder. Basically, when I run this pipeline, it should only see Azure File 2.xlsx in the list of files and thus show the error message configured above.
Here’s the output after running the pipeline:
✅ As you can see, ADF pipeline has successfully failed and gave the reason for the failure.
👉 I will now add Azure File 1.xlsx to the folder and run the pipeline again.
✅ As you can now see, ADF pipeline is all green as it now sees the required file inside the folder.
Before I sign-off on this article, here’s the official documentation from Microsoft about fail activity.