When it comes to integrating different applications using Azure Platform, Microsoft provides a lot of different tools (such as EventHub, EventGrid, ServiceBus, Storgage Queues, IOT Hub, Azure Data Factory, API Management etc). All these tools are often very confusing in terms of their purpose since the names and terminologies seems very similar. Sometimes its hard for me as well to remember which service does what. So I decided to do a POC in all of these to take the matter in my own hands. I did it based on the following two criteria.
- Exploration of Features and Limitations
- Performance bench-marking
In today’s blog I am only going to talk about the performance based bench-marking (point number 2) to keep the discussion short. I will be posting a series of Youtube videos to cover point number 1.
Case Study Background:
As part of the performance bench marking case study, I am going to demonstrate the statistics of transferring 50,000 Sales Order records from one place to another (Schema is given below).

Services Compared:
To limit the size of the Case study, I am doing bench-marking of the following 4 services.
-
Azure Event Hub
-
Azure Service Bus
-
Azure Data Factory (with CopyData Pipeline)
-
Azure Data Factory (with Data Flow)
Case Study Method:
Loading 50,000 Sales Order records and on the output end saving them in Azure SQL Server (General Purpose Serverless with Min of 0.5 VCores and Max of 4 VCores).
Case Study Results:
| Service Name | Input | Output | Time Taken |
| Event Hub | Custom .net core producer | Azure SQL Table | 18 seconds |
| Service Bus | Custom .net core producer | Azure SQL Table | 135 seconds |
| Data Factory with CopyData | CSV on Storage Blob | Azure SQL Table | 6 to 10 minutes (varied every time) |
| Data Factory with DataFlow | CSV on Storage Blob | Azure SQL Table | 13 seconds (with optimised partitioning) |
Case Study Conclusion:
As mentioned above the case study was done to keep bulk operations in mind.
Event Hub:
So if the nature of your integration is event like, then you should certainly go for EventHub as its built to cater huge scale of events but keep in mind that the consumers are serial in nature and its hard to have multiple consumers or subscribers on the receiving end.
Service Bus:
Not as fast as Event Hub but Service bus is very useful when you have to have multiple subscribers (you can use Topics) that way you can increase the number of subscribers to improve parallelism. Also its a good idea to use Service bus if your use case has workflows like sagas involved.
Data Factory with Copy Data:
If you are dumping the data from source to the destination, Data Factory is the fastest tool also the scale of data that can be managed by Data factory is huge as its optimized for Big Data and Data warehouse applications
Data Factory with Data Flow:
This is the best option to choose when you require customisation in your data pipelines. Like if you want to filter, sort, merge, cleanse etc. That’s why its called Data Flow because you can defind complex ETL like flows and pipelines of data.
Ending Remarks:
It goes without saying that these benchmarks are very initial level and in order for you to make a decision to choose one of these services you should consider other factors in mind (client capability, performance requirement, data requirements, costing etc). But this is something that gives you an initial comparison.
Thanks Tahir for sharing valuable information.
LikeLiked by 1 person