Data Mesh Newsletter #020
Retrospectives 2021 Issue
The quick bit…
This issue is dedicated to our Data Mesh Learning community members. In our last issue, we made an open solicitation for 2021 retrospectives from the community. In short, we wanted to hear how their understanding of Data Mesh has evolved from their perspective as they came to understand the principles and put them into action in their own organization and/or their client’s organizations. Here are their responses below. It has been a great year of learning, debate, and early hands-on building. See you in 2022!
Special thanks to the following community members who took the time to reflect on this past year and submit them to the newsletter: Pete Brown, Jon Cooke, Erica Cizniar, Matthew Darwin, Veronika Haderlein-Høgberg, Nick Heudecker, Jesse Paquette, Wannes Rosiers, Max Schultze, Juan Sequeda & Tim Gasper, Phil Taylor, and Molly Vorwerck.
Retrospectives 2021
Community Member: Pete Brown
While 2021 has been a challenge overall, it's been exciting to observe the growth of AutoZone's infrastructure data mesh. The number of onboarded services has more than doubled, stream processing capabilities have been enhanced and the mesh platform has been successfully tested in DR scenarios. We're ending the year with a few takeways for improvement. In 2022 we will place emphasis on two key areas; developer user experience and data security. The mesh MUST be more palatable to the developer community in order to continue driving adoption. By soliciting feedback and implementing recommended changes, we hope to double the number of onboarded services again this coming year. Also, we will be working closer with InfoSec to address security concerns regarding data publication and consumption.
Community Member: Jon Cooke
We entered into 2021 with an idea of the Data Mesh in our heads of a Enterprise solution that used data products to push valuable insight to every corner of an organisation. We learnt that actually a lot of people were seeing the Mesh as a solution to the boundaries of the Domains and not inside the domain itself. We also learned that there was a fierce debate on some key aspects like whether products were read-only and whether streaming was part of a typical mesh pattern. Answers largely came when seeing what people were actually building as Data Meshes and it became clear that these were more guide lines and a lot of companies were including these aspects in their core builds. Key challenges; One of the more interesting challenges we faced was building a mesh that can surface more than just tables in a database. We have one use-case where excel spreadsheets needed to be surfaced on the mesh and had to be intact i.e. not converted to a CSV. The only way to do this was via data virtualisation and we had to include a self-service full Lifecyle as the spreadsheets could be changed by the business at any time.
Community Member: Erica Cizniar
Genius Sports is a "Sports Tech" company of 1600 people on 5 continents registered in London and listed on the NYSE. We build innovative technology, connecting the worlds of sport, betting and media, linking the fans with the federations. 2 years ago, we designed a data strategy based on domain ownership, accessibility, cataloguing and data stewardship, at a time when the Distributed Data Mesh approach hadn't started trending. It is only recently that we found out that it belonged to the same school of Data Management - although not originally consciously intended. Similarly to the DDM, we also promote a self service data platform and a federated governance. This approach is perfectly adapted to a company like us, constantly growing through mergers and acquisitions, and where the autonomy of the domains is a sacred principle.
In mid 2021, we started focusing on a new aspect of our data strategy which will take full momentum in 2022: Data as a Product. By which, we are progressively adopting a product management mindset towards analytical data, putting data consumers at the heart of data product design, reducing bottlenecks, and empowering the subject matter experts -- the domains that produce the data. "Data as a Product" at Genius Sports is technically a simplified version of the DDM one: less microservices inspired and better suited to the specific needs and realities of the company. But the cultural revolution - the famous "paradigm shift" coined by Zhamak - is the same; and so are the expected benefits along with the challenges. To mitigate the latter, the Data Management team will provide methodological guidance and support, while internal consultancy resources are provided to the domains with the bespoke skills that they are potentially missing; the domains themselves retaining the ownership and product management of their data products. As we are unrolling our "Data as a Product" approach at Genius Sports, we expect our biggest challenge not to be technology but, as for any project of this kind, the management of change.
Erica Cizniar - Data Architect - Data Management - Genius Sports (geniussports.com)
Community Member: Matthew Darwin
2021 has definitely been "Data Mesh Year". Almost all my clients have talked about it in some guise or another, from what is it? to how do we actually implement this? What every discussion centred around, however, is the ambiguity in the architecture - is data mesh even an architecture or a set of ideas to work existing architectures around? There have been some who have used Data Mesh as a buzzword in order to simply refactor existing architectures and technologies on their data platform, and some very interesting use cases that really stretch the ideas brought and question whether there really is a need for a divide between the operational plane and the analytical plane. I also presented a view in my blog for Slalom discussing options for those organisations who feel it is just too much change altogether, and whether they can implement the organisational changes required alone, successfully.Next year brings some new challenges for me personally, which I'll update in the new year; but will involve a bit of a different course for me. From a Data Mesh perspective, I'm really hoping for some agreement upon reference architectures to be presented, and continue to be interested in those who are actually implementing any of the principles. And of course, the book! Above all, I'm hoping that the spark of conversation around data that Data Mesh has brought continues and people continue to think about solving their data challenges in new and interesting ways!
Community Member: Veronika Haderlein-Høgberg
My Data Mesh-year 2021 has been characterized by lively discussions with colleagues and colleagues-turned-friends about the role Data Mesh is playing in different industries – why some industries or regions are quicker to adopt this new way of thinking, while others do not seem to see a lot of fit for their everyday life. Also, living in both the Data Mesh-, the Knowledge Graph and the International Data Space-community in my daily work, I have started exploring how these can complement each other. Obviously, there is already a lot of exchange between the DM- and KG-communities, but there has been lesser exchange between these two communities and the IDS-community. It is a pity, because I think a lot of fruitful thinking could come out of the three communities talking more to each other:
From my perspective, what is unique to the IDS and could bring some true added value to the KGC and Data Mesh-communities is the “trust as code”-approach – how they are going about to encode the negotiating about who is allowed to use my data under which conditions etc. in actual software and/or standards and interfaces. One can say it is the realization of parts of what Zhamak Dheghani refers to as the Self-serve data infrastructure.
On the other hand, the IDS approach – perhaps for natural reasons, as it springs out of the IoT-world – doesn’t seem to address how the data comes about in the first place – what the organizational (Data Mesh) and conceptual (Knowledge Graph / Data fabric) pre-conditions are as to how to create data products people actually want to consume.
Community Member: Nick Heudecker
Data mesh became widely known, but not widely understood, in 2021. The response from the data management community ranged between curiosity and outright scorn, and it isn’t clear what triggered such visceral reactions. Some seemed unwilling to accept that data management is as much about managing the social aspects of people interacting with data as it is about acquiring and deploying technology. I’m hopeful that 2022 brings more open mindedness around the data mesh concept and a willingness to try new approaches.
Community Member: Jesse Paquette
In 2021, with Tag.bio, I've primarily been focusing on refining and documenting the low-code syntax for governed, testable ingestion of domain-centric data into our universal data product engine (AKA, the "Flux Capacitor", AKA, the "FC"). I've also been refining and documenting the low-code syntax for creation of domain-driven apps that get deployed within each data product, particularly those that use pluggable R and Python scripts for analysis, visualization and reporting. I feel like both sides for configuration of data products (ingestion and applications) have matured nicely this year, and I think we're ready for a 2022 launch of the FC as a "domain-driven data product development kit" (4DK) to a wider audience. I've also focused on refactoring the testing of data and app integrity from manually written tests to an automated framework.
As a startup co-founder and technical leader, I also "wear other hats", and I've been working with both Tag.bio and customer developers on high-utility data products and data mesh implementations. Their feedback has been highly useful to the 4DK system described above. I've been working with our Core Stack team on our deployable Kubernetes cluster and CI/CD system for automated, cloud-agnostic installation, and automated testing and deployment of data products from Git repositories. And while my front-end developer responsibilities have (fortunately) been relaxed in 2021, I coordinate our team around fixing bugs, refining UX, and improving data visualizations in the Tag.bio web application.
2022 is fixing to be a banner year for Tag.bio - our Healthcare and Life Sciences customer base is already hitting a "hockey stick" inflection, and with a healthy Series A round in Q1, we'll expand the business and technical sides of the company to handle the surge. I'm hoping to learn what it takes to better guide the process and manage the teams as we scale up - it's never as easy as I expect it to be. I'm perpetually thankful for the hard work and support of my co-founder and colleagues - and I'm pleased to have had so many useful discussions with the Data Mesh Learning Community.
Community Member: Wannes Rosiers
2021 was the year that we put the name data mesh on the north star we as a data team were already chasing as of October 2019. The year we started presenting the end state, both internal (to business stakeholders) as external. The chats with Scott, the presentation at the data mesh meetup brought a bigger perspective to the challenges we tried to solve with data mesh and the DATSIS principles. This led to introducing the concepts of discoverable and published data products, to better defined roles in our ownership triangle and many more. The interaction with my business counterparts and executives led to pragmatism, which implies a higher likelihood of success of implementation. One more insight to share by a quote from my architect: "there is no such thing as domain transcendant or agnostic data products, you just have introduced a new domain.”
2021 is also the year that I have changed employer, I've joined the Golazo group in November. A smaller team, a broader responsibility. Those 2 months have confirmed me that the set up we pursued at DPG fits them, but is overkill at Golazo. Yet that Golazo as well tries to tackle the same issues and that having the luxury to tackle some elements at the source instead of in your data landscape brings new opportunities. That's my hope for 2022: bring forward the maturity of domain driven design in the application landscape, take those benefits to our data landscape to tackle the issues data mesh wants to overcome pragmatic in a smaller sized company and broadly communucate about it such that other smaller companies see a productive example, a blueprint, and that bigger companies can benefit from the learnings of our pragmatism. And off course I hope as well to keep learning 1nd being inspired by the stories of all other community members.
Community Member: Max Schultze
Coming out of 2020 I was lucky enough to be among the few early adopters of Data Mesh and I had taken many opportunities to discuss with fellow community members and share my experience. 2021 for me was all about actively contributing to the development of Data Mesh myself, growing the community, and educating many further data practioners that were yet less experienced, always putting the focus on how to turn the theory into practice. In January for the first time I was given the opportunity to hold a Data Mesh in Practice online life training at O'Reilly, later to be repeated 3 more times throughout the year. Over the months I was able to get involved even more with the community and the thoughtleaders within, like participating in panel discussions with Zhamak herself or getting to meet hundreds of data pracitioners at my first in person conference in almost two years. The peak for me however was to be allowed to write an ebook on Data Mesh in Practice which has just recently been published as a major deep dive resource with a focus on turning things in practice. For the next year I sincerely hope that more community members will be able to join the conversations and to drive where Data Mesh as a movement is going towards.
The ebook is available directly on O'Reilly or also publicly available through Starburst at https://bit.ly/3E5i4Qu
Community Members: Juan Sequeda & Tim Gasper
As hosts of Catalog and Cocktails, an honest, no-bs, non-salesy podcast about data management, we have touched a variety of topics. Data Mesh has been the hottest one. We began the year trying to decipher what is data mesh. It started at a high level and was fairly abstract. Our April 2021 Catalog and Cocktails podcast with Zhamak had the following takeaways:
Data Mesh is not a thing. Not just an architecture. Not something you buy. It’s an approach. It’s a vision for a better future and the path to get there. Think about the ideal world. Break the problem to smaller pieces. Result: Move data to the source and give it ownership
Nodes on the mesh are shareable data. Make your data product awesome, balanced with incentives. Give bonus if people like it, but also if it’s connected with others. Data has a heartbeat.
Compute + Policy + Data = one autonomous unit
Throughout the year, we all enjoyed the data mesh drama that appeared in social media. We even had our own data mesh debate podcast.
This is where we have ended after the year:
It’s a paradigm shift towards 1) treating data as a product and 2) decentralization
Treating data as a product happens if you move data to the domain. The experience should be the same as buying a product on Amazon (find it on your own, have helpful information to make the decision to buy and how to use, have reviews, provide feedback).
You need to find the right balance between centralization and decentralization and this depends on the size of the company and culture.
Four clear pillars: domain driven ownership of data, data as a product, self service platform and federated computational governance
It’s not a technology. If someone is selling you a data mesh product. RUN AWAY!!!
The current focus is about technology, specially the self service platform. This is not surprising (we are technologists and this is what we best know) but worrisome (will we end up going back to old habits?)
Federated Computational Governance is the key enabling pillar, but to the best of our knowledge, we haven’t seen enough concrete about it.
What we are expecting to see in 2022 is defining what Federated Computational Governance actually means for data mesh and the role of data catalogs, and more case studies of data meshes successfully implemented in the wild!
Juan Sequeda and Tim Gasper
Hosts of Catalog and Cocktails Podcast
Community Member: Phil Taylor
I was one of Scott’s earlier recruits into Data Mesh Learning at the beginning of February. The invitation was compelling: to join a newly-formed vendor-free community with the simple goal of discussing and promoting the concept of a data mesh and how that might transform the world of data analytics for the better. I was already aware of the idea and Zhamak’s early presentation, so naturally I was ‘in’. And, I have to say it has been great from day #1. First, it has been tremendous to watch the community grow ten-fold and more; seeing how people have become much more actively engaged since those humble beginnings, and as we each find our respective feet with this emerging story. Second, there have been many interesting shared experiences and perspectives; bouncing ideas off each other, sometimes riffing-off a little too freely but all the same, a great go-to place for an encouraging, reinforcing, or alternate second opinion, and more. Debate is good. Third, and perhaps best, has been the formation of new friendships and collaborations, some that are outwith the confines of DML. Now that is great community.
Outside of community and within my professional consulting work, I’m finding something of a mixed or delaying response to Data Mesh. One or two have shown open interest enough to explore (the usual early-adopter types). Most are skeptical or plain dismissive. The first observation is that a few too many clients still don’t make distinction between data and information (this is especially true of those who consider themselves ‘data-rich’) and this is hampering their ability to see the topological relationships that exist in the latter and how product-level thinking can better support the interconnectedness that needs to exist or form as business evolves. The second observation is that clients remain gridlocked on existing organization fault lines, and for many there’s little-to-no appetite for setting up mini software data-factories within domain teams; even if the platform were to be largely provided for them. In both examples, clients are seeing data mesh as a curiosity lacking relevance rather than for the transformation value on offer. For them, more social proof with demonstrable cost-benefit value will be needed before they overcome current inertia.
So, my one big data mesh wish to Santa this Christmas and the year ahead is to see more real everyday wins that my clients will readily relate to at their level of maturity, and (if you’re not too busy, Santa) maybe a blueprint or two to help get them started…Seasonal best wishes to everyone. Stay safe.
Community Member: Molly Vorwerck
It's been an honor to be part of this community, and witness firsthand the evolution of the discussion around data mesh and building distributed data systems. Over the past several months in particular, I've spoken to dozens of companies embarking on this journey, and have had the opportunity to write about the challenges, opportunities, and tradeoffs associated with adopting the data mesh model. Regardless of whether or not teams choose to migrate to the mesh, there is a level of openness to new ideas and perspectives in DML that I've rarely experienced in technical groups. I anticipate that the early discussions in the DML Slack group about data mesh will help form the basis for the future of domain-driven data architectures for years to come!
/Newsletter
If you have questions/comments/concerns/suggestions for future newsletters, please let us know at datacequia@gmail.com.
Special thanks to Datacequia LLC for contributing time and effort to focus on this community and hopefully helping you learn more and give back