Data Platforms: Starting from scratch

Managing data when there’s already a data management system to manage is relatively easier, and I wrote about some ways to tackle that problem as a Product Manager who takes it on here.

However, ideating from scratch on data management, as a PM for infrastructure or platform, is a little bit different and slightly more challenging. For starters, there might not be clear requirements for you start with as the PM, there might simply be a vague notion that the organization needs someone to manage the backlog of ad-hoc requirements coming in. In addition, there might not be a strategy around how these requirements ought to be prioritized, and to top it, your team might actually have several stakeholders with conflicting interests, and you, as a PM that’s starting out with the team, might not actually have the birds’ eye view of how to go about prioritizing because you don’t have complete information. If you are the only PM managing the data, you’ve got an uphill task ahead.

So how do you go about it, then?

Why exactly was I hired?

That’s the first question you want to ask the person who hired you. Seriously. The answer to that question is a starting point to understand what the organization wants from your team (if you have a team) and you. Are they looking for ideas, because they don’t really know where to start? Do they have a bunch of engineers with ideas, but no one to manage the backlog? Do they have architects and data governance guidelines but too many stakeholders who want to determine the degree to which data is stored, managed and improved upon? Do they have a budget target, an idea of the scale and scope of the work ahead of you, or are you defining those?

What’s the budget?

You can’t start working on the data requirements until you’ve figured out how much money you have to burn through and how quickly.

Platform design can become expensive very quickly, especially if you’re going to move to the cloud, or invest in on-premise infrastructure, or make decisions on common data and infrastructure that multiple teams will rely on.

Some of these data investments will be huge, and you need to chart out a strategy for yourself that will depend on the fiscal tradeoffs you make against the scale, speed and quality of data you will be able to provide. If there isn’t a budget, find one, hunt for one or make one, because this is your baseline for how to invest for year one.

Who are my stakeholders?

This is basically the same regardless of whether you’ve got the data or are building the data. The best way to keep on track while building/ideating on a new data platform is to start with use cases, and to find the most compelling use cases, you have to meet the stakeholders. You want to identify who they are, but you also want them to tell you why they need the data and what their pain points are.

Design Thinking, while not traditionally used for data, can actually be very easily applied to identifying the use cases and then prioritizing them for development.

Start out with the goal of the session, and what the stakeholders’ primary objectives are for wanting a certain kind of data. Help them identify their pain points using empathy mapping and then find the opportunities from the pain points.

Then try finding larger themes around these opportunities by putting together journey maps. You might find that you need to adapt the process a bit. Sometimes, it might also help to journey map data. At other times, you might find that different stakeholders interact at different stages of a process and that’s where they need the data. Either way, you want to get to the point where stakeholders are able to look at all the opportunities they see from the data or platform and can narrow it down to their top few priorities, work through the feasibility-impact analysis and agree upon an ordering of use cases to be delivered in priority. If you have an architect or a senior data engineer, it helps to have them attend these sessions too, so they can keep the conversations grounded at best, but are aware of what is to come at least.

After one long workshop or five, you’re finally able to see the need for the platform. This is where you want to be — you can now PM the hell out of these use cases before taking them to your team.

Product Strategy Document

Yep. You need this one. Looking at a vast set of use cases from your stakeholder meetings, your first instinct might just be to rank them in order of what the stakeholders just put before you, convert them into Epics and tickets and take them to the team.
Resist the urge. Trust me, you will pat yourself on the back for your foresight later.

Why do you need to add in some strategy to this?

Because, having worked with loads of data, my experience has been that the same data processes can be re-used like a library across a myriad of use cases. Building a platform in waterfall usually means you’re going to re-work the same process with significant effort each time you get to a “similar” use case.

Take a first pass and see if all the different use cases have some common elements or follow some common patterns. Do they all need you to simplify access to large transactional data? Do they all need a certain degree of data quality? Is data governance (maintaining PII or PCI data) required for any or some use cases? Do multiple teams/stakeholders need the data to be set up for not just easy access and simple analytics, but are they likely to be building analytics products on top of this data? How frequently are they going to rely on data accuracy?

You will find some themes emerging quickly. You then want to put together a strategy document, juts like you would for any new product. As a PM, what will be your approach to spending the budget and what processes will you prioritize, and what impact (based on the different use cases these processes will address) will this first MVP have? What are the iterations going to look like? How much time and money is this going to cost? What are the risks?

So, I can present this strategy to my boss?

Not yet. You need to first ask if yourself if you absolutely understand how this is likely to be built.

Is this new platform best built on-premise, or on a cloud service, a private cloud, a hybrid? Do you need to provide an open-access analytics layer on the cloud to all stakeholders, will that cost more? Are you building any services in-house?

Sitting down with your data architect and engineer/lead up front once this document is ready and letting them take a look to give you a ballpark on the cost is a great idea, and that back-and-forth will get you to a real plan for building the platform, with the correct cost estimates.

Once you’re satisfied you can defend every number and every part of the plan in the document, you can take it to your boss.

And, then I hand it over to engineering and they build it out?

Nope. If you thought this is where you break the plan into epics, and groom and plan your backlog, you’ve got it, but you haven’t.

Because you’ve only tackled the process part of the platform. What about the data part?

Just because the use cases are similar in how the data for them can be obtained and processed, doesn’t mean the data itself is the same.

In fact, most of your use cases will need different sets of data, and identifying what data attributes are worth the time and money you’re spending is important too.

You could make this a stakeholder problem as well — if that’s how they want it. Every stakeholder could have the freedom to create on-demand data for themselves using self service data processes.

Whatever you decide, you still have to give this piece some thought, because you can’t store everything for everyone, and you need the optimal granularity or your platform becomes a labyrinth to navigate.

Can I finally get to building this?

Yes, actually you are. But even as you do this, don’t forget to apply the usual agile methodologies to the implementation. Many a times, processes can take long to implement and platforms can take long to build, but knowing how to break that implementation into user stories is critical for stakeholder trust and for team morale.

Once this first MVP is built, you can actually start working as any other PM who manages data would asI had described in my previous piece, Managing Stubborn, Elusive Data.

This is, by no means, an exhaustive list of things to do, and you will adapt much of this to your organization and needs, but missing any of these steps up front can end up costing a lot in the long run, and can make data evangelism harder for you, especially if your organization is just learning to be data-driven.

While I’m not done writing about managing data, I don’t want to renege on my promise to write more about managing a data product, so in my next few posts, I’ll detour down that route and come back to data in a bit.

Product Manager, Data Products in Travel. I’m curious about human interactions and their reflection in data, and what that says about society at large.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store