How to Scope and Estimate an Integration Project

Like many types of software projects, integration projects can be complex and difficult to comprehend at the outset. Integration projects are often a necessary evil, especially if you're building integrations as product features in a software product. It's important to manage cost when taking these projects on.

Scope management and estimation are two very important parts of managing cost. If integration projects are in your future, it's critical that you become competent at scoping them and estimating them.

In this post we'll share a methodology (that we use at Blended Edge) for scoping and estimating software integrations projects.

Endpoint Systems & Data Flows

There are two things that you need to know to properly scope an integration project: 1) which systems will be integrated--called endpoint systems--and 2) what data flows must be implemented to meet the business requirements?

Your goal should be to collect a reasonable amount of information to define a reasonable scope and estimate for an integration project. You'll have most of what you need by describing key attributes of the endpoint systems and data flows that make up the integration.

Defining Endpoint Systems

Endpoint systems are pretty easy to get your head around. They are the things being integrated. They are the pieces of business software that when combined via integration will create a cross-product experience for users that makes them better, faster, more accurate, whatever at their jobs.

For a given integration project there has to be at least two endpoint systems--otherwise, what are you integrating? Two is usually what you'll end up scoping. You could potentially have more than two, depending on how you are separating concerns of project scope.

For a SaaS product integration, the good news is that one of those systems is YOUR PRODUCT. That's a huge advantage when it comes to defining data flow requirements, because you're dealing with a known product on one end of the integration.

Obviously it's a waste of time to define every single possible attribute of a given endpoint system, so it's best to focus on those attributes that will have the biggest impact on the project's complexity and timeline.

You'll need to dive into and document the following attributes, for each endpoint system:

Data File/Format - What will the data look like coming out of or going into the system? With modern APIs, this could be JSON. It could be tabular data in a CSV file. There are many ways data can be produced in a machine readable way.
Data Protocol - How will the integration have to communicate to retrieve or save that data? HTTP is very common with modern APIs, but SFTP, FTPS, SSH, and a number of other protocols are still widely used.
Authentication - How will the integration authenticate into the endpoint system? Does it use an open, standard authentication approach, like Oauth, or something proprietary or unique to that endpoint?
Known Capacity Limits - To the extent it's known at scoping time, does the endpoint have any capacity considerations? For examples, many modern APIs publish in their docs exactly how many API calls in a given time frame you're allowed to make without the API returning errors.
Known Issues - Are there any other known issues about hosting, permissions, technology, or whatever that may impact the integration? The sooner you uncover complexity or uncertainty, the better. This is the catch-all question.

When you defining the scope for an integration project, it's also helpful to give each endpoint a simple, referenceable name. Sometimes an alias is even appropriate if you have multiple systems with the same or similar names.

You'd be surprised how many different names humans can come up with for different pieces of software. Later, when you're navigating the intricate details of the integration design, having a standard dictionary of names for the endpoint systems in play just removes one more potential point of confusion. You'll also use those names/aliases as part of expressing the data flows for the project.

Defining Data Flows

Data flows are a little more esoteric than endpoint systems. They represent, at an abstract requirements level, which data needs to flow in which direction and when.

Typically a data flow's intention is to move some primary record type, but it may also bring some related types with it. The record type may be fundamentally changed along the way, too (e.g. listening for a marketing event and pushing it to Slack as a message).

Data flows can be confusion for a few reasons...

For starters, data flows are kind of an abstraction. The actual technology that does the integration later, whether custom coded or a commercial product, may not literally do things exactly as a data flow. It's a concept that helps bridge the gap between business need and technical specification, but sometimes that's hard to get your head around.

Which brings me to the integration technology, which might be custom code, some kind of existing integration framework, or a commercial integration platform-as-a-service (iPaaS) product. Some technologies literally have a thing called a "data flow" (or comparable, see next point). That means depending on what tech you're using, the word "data flow" may actually have a very specific tangible meaning.

Finally, you can also call a "data flow" by many things and mean the same thing. Integration flow, data stream, automation flow, workflow, and pipeline are all valid terms to describe effectively the same thing. Choice of terminology is more about product marketing (and even habit) than anything else. But, to someone unfamiliar with the integration world, they may all sound like different concepts.

Here's a relatively straightforward way to think of data flows... Imagine that you have to build pipes from one system to the other to move different data liquids back and forth. Which pipes would flow in which direction? Which would mix multiple different liquids along the way? Which might have the same liquid flowing in both directions in different pipes?

Mechanically speaking, that's what an integration does with data.

Use Cases and Data Flows

The idea with defining data flows is to get to a level deeper than "make X talk to Y". Many business stakeholders need some help getting from that top-level concept to a tangible specification that would allow someone to actually build that integration. Data flows provide that level of specification, but it still might be too in the tech weeds for some people.

It's often helpful to start by defining use cases first. These tend to be expressed in a format that's more comfortable for a business stakeholder, because they are part of virtually any IT or software project. The goal is still to get to a list of data flows, but starting with defining use cases gives business stakeholders the opportunity to break down "make X talk to Y" in business value terms, with little regard to technical realities.

Often you'll find a 1:1 relationship with a use case and a data flow, but this isn't necessarily true. The analyst scoping the integration should be able to refine those use cases into a set of data flow definitions. Relating the data flows to their use cases will help business stakeholders understand why the technical output is what it is.

Data Flow Attributes

Just like with endpoint systems, you'll want to define a reasonable set of parameters about each data flow. Those parameters should give you a reasonable estimation of effort for implementing each data flow and in summation for the entire integration project.

Those parameters should include:

Source & Target System - Which direction is this particular data flow going? You shouldn't define a bi-directional data flow. If necessary, you should instead define two separate data flows that move the same record type.
Primary Record Type - What is the main thing that is moving through the data flow? This will also relate to the trigger and filter later.
Additional Record Type(s) - If applicable, are there any secondary or child records that the data flow will move? These show up as "getting more detail" from the source system.
Trigger - What should trigger the data flow to move a record or set of records? Is it always running on some schedule? Will the source system provide events to indicate when certain records should move? Will a person take some activity that kicks it off?
Filter - Once that trigger is fired, should the data flow ignore any set of primary records? Sometimes properties like statuses or timestamps are used to filter out records that shouldn't flow through the integration.
Failure Scenarios - To the extent you know upfront, what would cause data to fail. Are there any obvious mismatches between systems? Does one or both have a reputation for storing messy data? Is one system unreliable?
Volume Expectations To the extent you know upfront, what does the expected typical data volume look like. Building for 10 records/day is very different than building for 10 million. It's useful to understand both typical and peak volumes.

Some of this does sound a little technical, and yes, to some extend you are defining a pseudo integration. The rule of thumb should be: define each of these in business terms, but as closely as you reasonably can align them to actual technical terms. You're trying to document enough information that an integration analyst and engineer can take it to the finish line. Their jobs are to fill in the blanks.

Note: It is not recommended that you get into the weeds and include individual field-to-field or property-to-property mapping as part of defining scope. That effort is where most of the work resides for the project. You should be scoping the work to do that.

Scoping Integration Projects

Now you understand the two ideas that go into scoping an integration project: endpoint systems and data flows. But, what are you supposed to do with them? How do you use them to actually define the bounds of a given integration project?

Put simply, the scope for an integration project is the summation for the scope of the data flows that must be implemented to achieve the desired business use cases.

Data flows are the primary lever for scope. Add more data flows, and you're adding more scope. But, to understand the data flows completely enough to communicate scope and estimate effort, you must also define the endpoint systems those data flows connect in the terms described above.

Those endpoint system attributes trickle down to the individual data flows and contribute to the complexity and estimate of each one.

For example, let's say you are building a simple integration between system A and system B. The mere facts about systems A and B are useless on their own. System A has a REST API and supports token based authentication and system B has a custom-built SOAP web services API and proprietary authentication. Who cares?

Those facts matter when it comes to defining a data flow between those system--let's say to move data x from system A to system B. Consideration must be paid tot he fact that you're reading data from a REST API and writing to a SOAP web service.

However, if your integration includes a second data flow, moving data y from system B to system A (the other direction), that logic is reversed. Reading and writing data from or to different systems is harder or easier depending on the attributes of that system.

This matrix of detail helps the integration team decide what the parameters for each data flow are, then combined, what the parameters of the overall integration project are. From there you can estimate time and budget for the project.

Estimating Integration Projects

Understanding the parameters of an integration project is important. Documenting everything discussed so far gives all stakeholders the opportunity to read it back and buy into what's ultimately going to be implemented. At some point, you'll be asked, "How long will this take?" or, "How much will this cost?". Now you must take that information and provide an estimate.

Integration projects are not always a straight line to the finish, so estimating the labor that goes into one can be a challenge. You certainly can't consider and estimate around every possible variable in the project, but you also can't obfuscate them all away. The key to good integration project estimation is focusing on the handful of attributes about the project that will have outsized impacts on risk, and therefore time, that goes in.

The attributes shared above for defining data flows and endpoint systems are exactly what should contribute to the estimates.

Remember each of these attributes describes ONE OF the two or more systems involved in the integration. You'll want to consider these for all of them relative to the data flows that are necesary to achieve the desired business value.

We also like to estimate a t-shirt size on what we feel each data flow's complexity will be, that does require a little more experienced intuition, though. You don't want to be designing the whole thing before estimating. You want to use a quick glance at important attributes, so you can estimate from an imperfect but reasonable place.

The unit by which you estimate is up to you. You could consider labor hours or something more abstract like user stories. Consider making each attribute above a multiple choice question with a specific estimate (or estimate adjuster) hardcoded to each one. That'll help you codify what the impact of a GraphQL API (for example) would have an an integration project.

Estimation for any project is as much art as science. Start with a reasonable framework, but adjust as you learn how right or wrong you were. Try to codify that experience back into your estimation approach.

And, don't blindly trust any estimate. If you spit one out that feels wrong, check your assumptions.

Integration projects can be complicated enough. There's no reason to tear your hair out with overhead problems like trying to provide a reasonable project estimate. Using the above attributes in a consistent estimation model will help you cruise by this step while still providing informed estimates.

Save for a few details, this is basically how we estimate integration projects at Blended Edge, and it works pretty well. Try it out and let us know if we can help.