We are adding a new section to highlight content that was put out well before we started doing our newsletter, often even before we started the DML Slack. We plan to add 1-2 a week on average. And of course, Zhamak’s articles (#1 and #2) on Martin Fowler’s blog are the biggies but we want to cover other ones too.
Data Mesh Applied
by Sven Balnojan; article (on Medium…)
While this is a fantastic article, there are some things to call out at the end. We specifically do this as many have anchoring problems after reading this article because it is such a good explanation of how a data mesh implementation could work.
The quick and pithy summaries provided are a good introduction to data mesh as an overall concept. See anti-pattern section below re the on-demand comment but overall, the summaries are right on.
There is also some useful foreshadowing in the second side-note re the need for a common language in data mesh of exactly what a word means so when someone globally sees the word “customer”, they can find a specific definition that spans all domains.
Important context re data lake (mainly that it will still be part of the data mesh but there isn’t one big centralized lake)
The checklist for why you would consider a data mesh is crucial too. DATA MESH IS NOT FOR EVERY COMPANY/ORGANIZATION. Another great checklist for if data mesh is for you comes from a Barr Moses post What is a Data Mesh — and How Not to Mesh it Up.
In general, we agree with the recommended steps for figuring out what initial domains or data products to carve out from the data lake. There are some places to quibble but overall, it makes sense. The Stepping Stones section however, you would need to train your data generating team on data product ownership - so you are giving them new responsibilities without much incentive.
Another potential thing to understand is there is not a ton of coverage of the downstream data product idea, or a data product created from transformation of data in a data product and/or combination of data from 2+ data products. These are important to remember when beginning your data approach.
Anti-Patterns
1) For most of the article, Sven uses API as the only way to call data. He definitely mentions other ways later but still, it’s crucial to not lock on to API-only or even API-first. For most companies, you will need other “data output ports” - ways to consume data from a data product - as many data analysts will not be able to easily call APIs.
2) Sven recommends calling data via API directly from operational systems. We do not generally recommend this approach unless your team is VERY good at writing data APIs specifically. If your operational system is an S3 bucket, it might not be a big issue. If you have a malformed query that pings an operational database bringing down your production application, that’s a MUCH bigger issue.
3) Sven says:
But that means you are overly limiting your domains to try to fit into one model. You lose the agility and flexibility afforded by data mesh if you do this. There should be some standardization but it’s a push and pull balance between no standards and strict standards that each company needs to find.
4) An API is not