Skip to main content
HomePodcastsData Literacy

Making Data Governance Fun with Tiankai Feng, Data Strategy & Data Governance Lead at ThoughtWorks

Adel and Tiankai explore the importance of data governance in data-driven organizations, how to define success criteria and measure the ROI of governance initiatives, non-invasive and creative approaches to data governance and much more.
Apr 2024

Photo of Tiankai Feng
Guest
Tiankai Feng

Tiankai had many data hats on in his career so far—marketing data analyst, data product owner, analytics capability lead, and, for the last few years, data governance leader. He has found a passion for the human side of data—how to collaborate, coordinate, and communicate around data. TIankai often uses his music and humor to make data more approachable and fun.


Photo of Adel Nehme
Host
Adel Nehme

Adel is a Data Science educator, speaker, and Evangelist at DataCamp where he has released various courses and live training on data analysis, machine learning, and data engineering. He is passionate about spreading data skills and data literacy throughout organizations and the intersection of technology and society. He has an MSc in Data Science and Business Analytics. In his free time, you can find him hanging out with his cat Louis.

Key Quotes

There's this one line in my data governance rep song that says, if data is the force, then we're the Jedi council, right? Like really clearly towards the Star Wars reference, but the Jedi council and the force, it kind of makes sense and everybody can kind of get behind it.

If I notice already like a bias or a certain like a resistance towards the topic of data governance, I actually start my communication with that quote and basically then see the aha effect of many people that are like, oh, OK, this is not how I expected this presentation or this workshop to start. Let's hear that person out.

Starting data governance schemes can be tricky, you can say we’ll invest in MDM, let's start immediately all at once, but that isn't really working, right? People are really afraid of failure and misinvestments. So asking people to invest such a big amount of money immediately or organizations is really doomed for failure, right? So I would also say that with that quick win approach, you're actually on a better track. And sometimes you don't even need money to do something. Maybe it's really just an internal process with specific stakeholders that can help with something to get fixed. So for example, right? So let's say again, with that example of a product company, right? Where there are product managers that are putting money and producing product data at the very beginning. And then it flows down to, let's say, different data teams that are doing data science with it.

And if the data science teams think the quality is too bad, and the accuracy goes down and they cannot forecast properly the demand anymore, leading to some financial impact, then there needs to be an agreement of what is actually correct and what isn't correct. So basically agreeing then on what quality the data should have. And then let's say even training these data producers to just fill in the data attributes in a certain way and prioritize certain attributes to have them more on time and basically in the right way, could already improve the quality significantly just with that small kind of communication measure, right? The other way around could also be where certain master data, for example, is not harmonized, but on a specific attributes, you see the same pattern of issues popping up over and over again. You could just basically even code into like the downstream systems, one kind of harmonization algorithm.

And just to showcase, look, this is how we are rule-based harmonizing these master data inconsistencies, and it's working perfectly for all the systems downstream. If we would give a little bit of investment, then we could actually run it properly as a service or as a tool, so we don't have to do it in-house. But this is definitely the way to go. So you can always be creative about how to start small.

Key Takeaways

1

Data governance should be integrated seamlessly into existing business processes rather than being a disruptive force. This non-invasive approach ensures smoother implementation and better acceptance across departments.

2

Employ creative and engaging methods to communicate data governance initiatives. This could include using analogies, stories, or even gamification to make the concepts more accessible and interesting to stakeholders.

3

To showcase the value of data governance, focus on delivering small, quick wins that clearly demonstrate benefits. This strategy helps build momentum and support for broader data governance efforts within the organization.

Links From The Show

Transcript

Adel Nehme: Hello everyone, I'm Adel, Data Evangelist and Educator at DataCamp. And if you're new here, DataFramed is a weekly podcast in which we explore how individuals and organizations can succeed with data and AI. Now here's an intriguing thought for you to think about. While every organization claims the importance of data quality, curiously their investments don't always match up.

It seems like despite the critical importance of data quality, data governance in general is suffering from a serious branding problem. Data governance is often looked at as the data police within the organization, whereas that shouldn't really be the case. So how can we shift the perspective and inject a little fun into data governance?

Enter Tiankai Feng. Tiankai had many data hats in his career so far. Marketing data analysts, Data Product Owner, Analytics Capability Lead, and for the last few years, Data Governance Leader. He's currently Data Governance Lead at ThoughtWorks, where he has found a passion for the human side of data. How to collaborate, coordinate, and communicate around data.

Tiankai often uses music and humor to make data more approachable and fun. Throughout the episode, we talk about how important data quality is, why data governance efforts don't have to be seen as policing, how to best approach building effective relationships with stakeholders as a data lead, best practices for injecting fun into data governance, and a lot more.

And without further ado, on today's chat. Tiankai Feng, it... See more

's great to have you on the show.

Tiankai Feng: Thank you so much for having me. Absolutely.

Adel Nehme: are a principal data consultant and head of data strategy and data governance at ThoughtWorks. You are most well known for producing amazing content and songs on data governance that really make data governance fun for everyone, even for non data folks.

So I'm excited to unpack your approach in making data governance fun. And how leaders can learn maybe some trick or two, but maybe let's set the stage. Why is data governance essential for today's data driven organization?

Tiankai Feng: I think the reason why data governance has even increased in importance over the last years now it's really fourfold, I would say, right? The first one is that we live in a world now where business models are constantly changing. Simply because of digitalization and also COVID has caused a lot of transformation towards digital sales channels as compared to, for example, offline sales channels, That means that because of the operational data landscape changing so much, business models, business processes, and also the data landscape are changing accordingly to now need to catch up basically. with the new models that we have. Also, we have made a lot of efforts as a collective of all the organizations to invest more into data literacy and data fluency.

And we made data more available to democratize data, But that also meant that with more people knowing how to use data and more people how to get their hands on data that we lose a little bit of the overview of what data is being used for and what's happening to the data that we have in our organizations, We also have more and more regulations, right? Last but not least, the UAI Act, we have GDPR since years in place, and there's more and more coming as well that causes more pressure and I would say requirements onto what we do with data. And lastly all the hype around AI and any kind of innovative technologies, They put a lot of additional requirements on what the data needs to look like. And what needs to do and to really have a scalable way of making all that safe and making all of that, let's say, valuable in a structured way, we need data governance in place, To really ensure that we keep data in a secure but in a trustworthy way and to enable everything I just mentioned to make it right.

And we need the right basically guardrails and guidelines to do so.

Adel Nehme: that's, an awesome overview and I love the different kind of factors that you lay in here. You know, you mentioned something around AI, you know, and kind of creating a lot of, to be able to operationalize a lot of the cool use cases that we see today. We need high quality data in a scalable fashion, right?

I think there's often when you look at leaders, especially maybe non data leaders, there's often a competition between, you know, data governance projects and investments, or, you know, that shiny new toy like AI. And I think this creates, you know, this view and this perception that data governance is seen as an overhead and maybe not as a strategic priority.

Maybe unpack that perception a bit more and what do you think it has terms of an impact on data governance initiatives?

Tiankai Feng: I think it is true, right? So even if we say for data governance, there's always three types of impact, There's avoiding risk, right? There's decreasing costs and there's increasing revenue, And exactly in that order is also the level of difficulty to assign and quantify actually a price tag on it, So avoiding risk, I think is pretty clear, right? We have certain legal requirements if we don't follow them. We have basically either any fines or somebody has to go to jail, Decreasing costs usually comes down to a lot of manual efforts being put into cleaning data. So really to operationalize and automate this, so we don't have to clean up data in the first place, but they are just good by default.

That would basically decrease the costs of having the manual efforts. And the empowerment of data to actually have more revenue, It's tricky because that means you actually have to have a counterpart that is actually using it in a better way to then actually generate revenue. The thing is, data governance on its own cannot define its value.

And this is, I think, what we all need to agree with. Data governance always needs the cross functional collaboration of any decision making teams and data using teams to then actually feedback the value of data governance back to data governance so we can all agree on what the value of data governance is.

So. I think on principle, we all agree on what data governors can do, but only when we start taking actions and we actually do it. We can then start assigning value. And that, I think, is really the crux here in the value of data governance.

Adel Nehme: Yeah, and that's such an interesting insight here on the value of data governance, and I think this connects really well to my next question on, you know, what makes a successful data governance program, right? and I want to deep dive into how do you measure the ROI of a data governance program once we unpack the successful tenant.

So maybe walk us through the foundations of a successful data governance program. What do you think are the key dimensions for succeeding with data governance within the organization?

Tiankai Feng: I think the traditional framework for data governance is always people, process, and tools, right? That's, I think, a given that's impacting a lot of data activities, to be honest. But I think when it comes to data governance specifically, there are a few success criteria that are really important, The first one, I would say, is really the buy in and the agreement of leaders, collaborators, and team members into the cause of data governance, So, first of all, we need to agree that data governance is the right thing to do and get the right funding and priority and commitment by everyone to do it together, right? Once you have that, there's a lot of expectations towards data governance, So, here, I think the next step is to not really try to bowl the ocean.

But rather start small, but think big, So where can we quickly. Basically show value in a small amount of time to then scale it out and then get more and more buy in and then we can actually do big things with the data governance as well. I think another thing is that data governance should not really be disruptive, but rather non invasive, so to say, right, where it's really embedded into business processes and not too much interrupting anything.

And once that all comes together, I think then you can actually start data governance in the right way.

Adel Nehme: Okay. And, you know, I wanna unpack a lot of what you mentioned here. You know, you talked about the importance of being iterative and agile, right? And being able to deliver quick wins to scale the governance program. You also talked about the importance of data governance not being disruptive.

I wanna focus on that because I'm gonna touch upon the agile nature of a successful data governance program. Maybe walk me through what, is a disruptive data governance kind of approach and what is a non invasive data governance approach in a bit more detail? Walk me through an example maybe of a business process.

Tiankai Feng: Absolutely. So let's say, for example data quality management, So let's say in a certain data set it's that is downstream and data scientists are using it. that is continuously in the same way at a bad quality, So the disruptive way would say, let's stop all of the activities around that data set.

Now let's talk about what is going on with the data set. We assign new roles and we hire new people to really deal with that data quality problem. And then we fix it and get funding for it. everything else needs to wait until this is fixed. That would be the disruptive way. The non invasive way is basically following what is already there and to try to leverage that to make it as less disruptive as possible.

For example, that means first start with the complaints of the data science teams, right? So when they say this is bad data quality, Try to switch the conversation and say, instead of telling us what is wrong, could you tell us what you see as being correct in the data, That leads to requirements.

You can then use those requirements to go to the data production and the upstream system side and basically check and just like small conversations or just going into the systems yourself. If those requirements are met or not. But once you have it, you would know that certain people or certain functions are probably having an impact on how the data has not been inputted or produced in a certain way based on the requirements.

So the idea would be to just say, for the next iteration, when you create the data, let's give you a little bit of a training on what you could improve, because once you improve that, we could gain, let's say, 50 percent more accuracy in our data model, our algorithm that our data scientists are working.

So really be the coach, and could be that consultant that goes in and embeds new thinking and new ways into the existing tasks and mindsets, rather than to say, we need to stop everything just to do this one thing now. And then nobody's happy because you're interrupting the business operations, right?

So that would, for example, be one difference.

Adel Nehme: I love that example. And you know, what you're touching upon here is actually quite a big challenge when it comes to like managing, even data governance projects, let alone a data governance program, which is the importance of cross functional collaboration and, you know, working with downstream upstream teams, business stakeholders, keeping everyone together, maybe walk me through best practices you've learned and in managing these types of programs and projects, what are kind of tips and tricks you can share here?

Tiankai Feng: I think one thing I would say to get started is to follow the pain, I call it. So the idea is that my assumption that in any organization, there's a certain team or certain people are very frustrated with the data. And it's kind of a big part of their day to day job that they all go to continuously and frequently get upset over the data.

So the idea is if we follow the pain and talk to those people to get their data fixed or get their data problems sorted, that means actually that there's two things that already work from the beginning. First of all, They're very frustrated about it, meaning that they're very willing to actually work with you because they will do no matter what to get the thing started for them, right?

And number two is because they are fixing data potentially manually all the time already. They have a very clear opinion of what is correct and what is not correct. So you start exactly on the right way to say, we know how important this is for you, and we also know how to Define correct for you. So let's go with this one.

And the good thing is once that is actually, the problem is solved for that kind of stakeholder group, they are becoming advocates for data governance too, right? They know that data governance can work, and they know how important cross functional collaboration is, and they will spread the word about how good data governance is and how effective it is to others as well.

Thereby bringing more and more people to it.

Adel Nehme: So it's creating that virtual cycle of evangelism within the organization.

Tiankai Feng: Absolutely. Absolutely.

Adel Nehme: That's really great. And you know, you touched upon people, process, and tools, right? As the crux of successful data governance program. Maybe deep dive into how to balance these different dimensions, right? Are they in tandem?

What is the most important dimension? What do you start with? So I'd love to get your feedback here.

Tiankai Feng: I would say always people and process first and then technology like that is I think always the thing I'm trying to evangelize to don't just buy into a data catalog or buy into any tool that calls itself a data governance tool without having actually the right mindset and the right organizational structure.

to actually do something with it, right? So it really starts on the roles and responsibilities and the processes side of how we want to do data governance. So that really means is what are the roles that we need to really need to have in place to get started, So for example, that would mean you would need to have defined data owners or data stewards, like people that are actually the business experts in their domains.

that can help you with the data problems and being like responsible for them. That also means to have clearly defined like data use case owners or data product owners, for example, right? That actually feel responsible for the downstream side to be able to give feedback and so on and so on, It also means that you have the right technological counterparts, right?

Like that could be various system owners of operational systems, for example, that actually know that they are a data source. So they need to respond to certain things. But once you have assigned those roles, then you can actually bring the script together and come up with a plan, So how do we do data governance best?

And that means then the next part to say, how do we embed our data governance mechanisms into the existing processes, And this is where the process part kicks in, let's say we are a product driven company. How And we are creating fashion products, for example as a key business process.

So when during our product creation to basically market process, Do we check data quality and how can we ensure that this data is compliant on the way as well? Can we put mechanism as early as possible in or does it have to be later and so on, right? And agree on how that process should look like.

And only then you know what you need, right? That means this is what we as people need and this is what we working on that process need. Let's now look into the tool and technology that is exactly meeting our requirements to make this happen. Right? And then you go into selecting technologies, for example, data catalogs or data quality tools or master data management tools to really then enable the process and the people part that you've established before.

Adel Nehme: Yeah. And then you mentioned something when you were talking about the people bar about, do not invest in tools before you have the right mindset within the organization. You want to expand on what the right mindset looks like?

Tiankai Feng: The thing is with tools, they are not supposed to replace human beings, right? They're supposed to amplify or. make it more productive for human beings to deal with their day to day tasks so they can focus on other things. And this having said if we are not having the right mindset, where, for example, instead of thinking in data lineage or thinking in like rules and in policies, we rather think into quick fixes and think into manual data cleaning, then giving me a data catalog is not helping me, right?

Like, this is just additional work. I still just want to clean the data manually. Why do I have to start documenting So if you haven't convinced somebody up front that data catalog, for example, is there to actually enable transparency and findability of data and then have definitions in place that everybody is here to.

then they don't really know what to do with it. So you are actually causing it even worse where you bring in a new tool and nobody's going to use it because they don't know what to actually even do with it. So this is what I mean with having the right mindset first. So when the tool comes, they know that this is what they're supposed to do it with and what they're actually focusing on with that specific tool.

Adel Nehme: Yeah. And then touching upon something that we discussed earlier around getting people together and organizing folks, right? I think one big challenge on the people side that we often see in any transformation program, let alone something as challenging as data governance, Is aligning people around a common goal and dealing with resistance, right?

So, maybe how common do you get resistance in these types of engagement? Unpack maybe the reason behind the resistance and how to deal with it.

Tiankai Feng: Absolutely. I always make the joke, just a bit of context, that when I started my career, I was a data analyst, and as a data analyst, you're super welcomed by anybody, right? Like, oh, that person brings insights and recommendations. Let's invite them in to any team meeting and let them present. Once I switched to the data governance side, people didn't answer my emails or my calendar advice anymore.

It was just like all the doors get shut and nobody wants to even look at who it is. And it was a really interesting learning because I realized that there are so many stereotypes and almost like a bias towards data governance because of that word itself already. That you get a resistance from the very beginning, right?

So, what I noticed is that the main reason for it is twofold, right? Governance itself sounds like I'm going, I'm the police, right? So governance is coming. They're going to check what we're doing wrong and tell us to stop things, right? That is how the initial inception is. And the second one is once governance tells us what to do, we need to change the way we're working and we don't like change.

we just want to continue how we're doing it now and try to ignore governance as long as we can, right? So that, that fear of being policed and the fear of change are really, I think for me, the two main reasons for resistance. And the way to deal with it is to actually not start with what people are doing wrong and start with basically, I'm telling you as a police what is right and what is wrong, but rather from an empathetic place. That means I usually start the conversation with, tell me about what you do with data and what your day to day problems with data are. Once people have shared that, because everybody loves complaining, I can say, well, data governance can help with that, and this is what our plan could look like to deal with it together, And the second step is also to say we as a data governance team want to work with you and not against you, right? So in the end, we all want it to have be successful. And while we do it together, then it's better than when we do it in isolation, So really then rally up people together behind the same cause.

Again, the moment you have a proof of concept where you can actually showcase the first successful collaboration, then the resistance immediately drops because then actually other people and other teams also see that it can work. On the step to that very first case, that is hard, but it's doable, I would say.

You just need a little bit of patience.

Adel Nehme: And you mentioned here like gathering momentum by showing a win, right? Or showing, you know, the impact of a quick, quick win in data governance. I think this is really related to what we discussed, especially earlier in our conversation. We talked about data governance is seen as an overhead, We had Randy Bean on the DataFrame podcast. He was talking about the short tenure of chief data officers within the organization, right? And one of the reasons why Chief data officers tend to have a short tenure is that they get a lot of resistance for a lot of projects that they want to lead, one of them being data governance related.

He went for he told the story about this chief data officer that went to the CEO said, Hey, I need a 25 million budget for a master data management, investment and the CEO was like, I don't know what master data management means, right? So,

Tiankai Feng: Right.

Adel Nehme: And I think my hunch is that data governance is often a marathon and not a sprint, and it can quickly become an expensive effort, right?

Think all these tools, all these efforts, And so CEOs, CFOs are very reticent approve these investments. They don't want to approve these investments quite quickly. So given that, what are quick wins data leaders can show on the data governance side that can drive excitement around data quality agenda?

Tiankai Feng: That's a great question. I mean, generally speaking, right, also given the example that you just mentioned, asking for 25 million, for example, for an MDM tool is what I would call a big bang approach, So you basically start with, I need so much money to do all of this. Let's start immediately all at once.

But that isn't really working, People are really afraid of failure and misinvestment. So asking people to invest such a big amount of money immediately or organizations is really doomed for failure, So I would also say that with a quick win approach, you're actually on a better track. And sometimes you don't even need money to do something, right?

Maybe it's really just an internal process. with like specific stakeholders that can help with something to get fixed. So, for example, let's say again with that example of a product company, right, where there are product managers that are putting money and producing product data at the very beginning, and then it flows down to, let's say, different data teams that are doing data science with it.

And if the data science teams think the quality is too bad, And they're like, the accuracy goes down and they cannot forecast properly the demand anymore, leading to some financial impact. Then there needs to be an agreement of what is actually correct and what isn't correct, So basically agreeing then on what quality the data should have.

And then let's say even training these data producers to just fill in the data attributes in a certain way and prioritize certain attributes to have them more on time. basically in the right way, could already improve the quality significantly just with that small kind of communication measure, The other way around could also be where certain master data, for example, is not harmonized. But on the specific attributes, you see the same pattern of issues popping up over and over again. You could just basically even code into, like, the downstream systems one kind of harmonization algorithm. And just to showcase, look, this is how we are rule based harmonizing these master data inconsistencies, and it's working perfectly for all the systems downstream.

If we would give a little bit of investment, then we could actually run it properly as a service or as a tool, so we don't have to do it in house. But this is definitely the way to go. So you can always be creative about how to start small and to really give a good faith aspect to say, look, we can do it this way.

And if you invest in this, we can make this even bigger and have even more impact with it. But look how this small little thing already worked, So really to go from there, and then you can roll out data governance hopefully in a better way.

Adel Nehme: And, that's great insight and this connects back to something, you know, we discussed earlier around how to frame the value, we talked about cost reduction and then value, like, actually creating more revenue, maybe walk me through what is the next step, right, on this quick win?

Because it's very cross functional, right? Like the value needs to be from the data team, needs to come back to the data governance efforts, and it's kind of a bit of a cycle. Yeah. What does proving the value look like in these circumstances for these quick wins? So I'd love if you can maybe rely on an example here.

Tiankai Feng: Yeah, absolutely. I mean, in that example, again, with a data scientist using the data in a better quality they can ideally say, Oh, I mean, using this input, we spend 40 percent less time manually cleaning the data and the model immediately increased by 50 percent accuracy. regarding the forecast and the actuals, for example, meaning that we actually made a far better choice regarding our demand and we have a good financial impact of a good sell through in the end.

But I think the real challenge here is that that impact assessment doesn't happen immediately, So, you know, Imagine the data coming in, you're doing a forecast that's for in a year or so until then. So the forecast is there for a year until the actuals come and only the actuals against forecast mean that you can see the accuracy.

And this is where a lot of the problems lie, like it's so foundational with data governance that you need to just wait until the actual impact shows itself. But during that one year, you're already being asked five times by your leader to say what the value of data governance is. So you need to, I think, really play it smart and even start multiple use cases, maybe in parallel, like in a phased approach to then really get to the value and have really just strong endorsement by your stakeholders as well to get it going.

But basically it's one thing leading to another, So better data quality means better accuracy, means better financial impact. means higher profitability, But there's so many levels to go through that you can only do it together cross functionally to assess that.

Adel Nehme: Excellent. And then, of crux of today's conversation is not just about how to frame the value and best practice for data governance, but it's about how to evangelize data governance and make data governance accessible for the wider organization. And, you're well known for your unique approach for making data governance fun, most notably through songs and fun content you create on LinkedIn.

Maybe what inspired you to start this and how has it impacted the perception of data governance with those around you?

Tiankai Feng: Absolutely. I mentioned before how there was resistance at the very beginning, immediately when I switched to data governance because of what the area meant, right? And by my leader, I was actually given immediately the task that Data governance needs rebranding, and I should leverage my communication and creativity skills to actually do that.

So I went all in, right? I was like, there's really not a lot to do with Luce, because if people don't even answer my emails or answer my invitations, then it cannot get worse than that, right? So I basically started being really creative. I created a rap song around data governance called Governance of Data, for example, and put it on the intranet.

That broke some ice. We started a talk show where actually my team members were talking to stakeholders to really show in 10 minutes how great the collaboration was and what kind of impact they had. We started gamifying knowledge, right? We basically said what do you think are the right values for certain attributes?

And gamified it so the winner of it would get some kind of small presence. And during like academy sessions, for example, And we got more invested into onboarding sessions for new joiners into the company as well. So, for example, just to say, look we also exist, this is data governance, and we know it doesn't sound sexy, but here's how we're going to make your job a little bit easier if you help us as well in return.

So we really embedded ourselves into how it works. And I have to say even if data is many, for many of us, an exciting place, for even more people, it's not an exciting place to be in, So. That means there's a stereotype that data people are actually pretty dry and very serious about the topic. So anything to be creative with has the element of surprise on our side, right?

Nobody expects data people to come with a rap song or do a game or do some kind of quizzes, right, to make it work. So, I think that is, I think, really the key. don't expect anyone here to start writing rap songs about data governance or anything, but I would just encourage everyone to just be creative.

There's really not a lot to lose, The worst that it can get is people don't think it was really appealing, but then that's fine. We just try something else, You're not losing anything by doing it. So why not give it a try? But we can all be a little bit more creative, I think, when it comes to communicating data things.

Adel Nehme: And have you brought in that creativity in your actual work with clients and your data governance stakeholders? Or is this just on the online spaces?

Tiankai Feng: No, absolutely. I'm bringing that in. I think it, of course, still depends a lot on the different clients. Certain clients. are a little bit more serious, let's say, because of the industries and how the organizational culture is. But I think bringing in analogies or bringing in kind of like just things and stories to loosen up the vibe around it is always helping, So, I'm using my talent with that relatability of using pop culture analogies, for example, for data governance, also in conversations, and that I think really helps people to understand it. and to really get on board with our approach.

Adel Nehme: Okay, that's awesome. Maybe do you have a story to share here on, how approach this with one particular client that left an impression maybe?

Tiankai Feng: I would say that there's this one line in my data governance rep song that says if data is the force, then we're the Jedi Council, Like really clearly towards the Star Wars reference, but the Jedi Council and the force, it kind of makes sense and everybody can get behind it. So I realized that that Star Wars reference works 90 percent of the time.

So it's one of the things where if I notice already like a bias or a certain like a resistance towards the topic of data governance, I actually start my communication with that quote. and basically then see the aha effect of many people that are like, Oh, okay, this is not how I expected this presentation or those workshops to start.

Let's hear that person out. And that really helps to break the ice already immediately, right? And then just to get more into the details while keeping a little bit that entertainment and let's say fun factor to it.

Adel Nehme: Yeah, and of course, you know, you mentioned this earlier, not everyone is a singer or can, do a rap song about data governance. Maybe what are ways that you advise data leaders to make data governance fun and drive engagement with the data governance program within their own organizations?

Tiankai Feng: I would say using like, analogies or any kind of anecdotes about data governance always helps because that is basically using narratives and data storytelling to do it, right? The other way around is, I would say, as much as you can cross functionally communicate, not just as a central data governance team that is communicating themselves.

The better it is, right? Especially if you're a data leader and you want data governance to be understood by other departments, then let your data governance people do it together with their stakeholders, Communicate it together. And then you get also a more authentic as a word of mouth, rather than just, it feels like bragging, right?

How important and how cool data governance is. So really go for that approach where you use the element of surprise for, let's say creative communication. but also take care of who is saying that, like, and who is communicating that. and making sure that it's actually more perceived than as authentic as well.

Adel Nehme: great. And then we discussed this earlier in our discussion when I asked you why is data governance so important? And you said one of the things that's driving, the key importance in data governance as an agenda is the surge in generative ai. So do you believe that the focus on building generative AI solutions at the moment might detract from or reinforce the importance of data governance and data quality?

And how do you balance these priorities when you know you wanna go to market really quickly with your gen AI solutions, but you're. quality may not be up to par.

Tiankai Feng: It's definitely enforcing data governance. I think because given the amount of high quality data that LLMs or Gen AI is needing right now, it's impossible to manually deal with it, It's just not scalable to have more and more people manually correcting data. to put it in high quality. And the only solution to fix this is proper data governance mechanisms and to get the data right from the beginning.

And this having said, I feel like even for specifically, let's say for AI that is text based. You need a lot of metadata, So and metadata is one of the key outcomes of data governance, You want to document everything and you want to document it in a business friendly way so everybody understands.

And that is, in many cases, the base for LLMs and their generative use cases. So I already feel like myself just by conversations I'm having with clients and potential clients. This is how they realize the importance of data governance. it's even the other way around where we wanted to do gen AI, but realized we don't have data governance.

Can you please help us? And it's really exciting how that now created that reverse trend to invest back into foundations, because we all realized that doing advanced stuff without the right foundations, it's like building a house without the concrete down, and it's just falling over and down over and over again, So, I think it's definitely going the right direction.

Adel Nehme: Okay, and do you see that generative AI will create a nuance maybe in the data governance approach? Is there maybe, I'll reframe the question. Are there any data governance or data quality nuances that are specific to generative AI that are not specific to other types of data projects or initiatives? I think about the importance of unstructured data.

with generative AI solutions and what's the data quality angle to managing unstructured data here.

Tiankai Feng: Absolutely. I think generally speaking, first of all, you have the whole data security, data privacy aspect to it that gets even more important, So you don't want actually any secure or private data to be actually then one on one just exposed through JetAI, So how do we put in the right protection mechanisms for that to not happen in the future?

And how does it work? The other side is, I think that's even more tricky is the ethical part of it, Because before, let's having descriptive or just forecasting use cases, that was fine. But with Denny AI, you actually go into territory of giving advice and you're like imitating human interactions almost with these kind of chatbots.

So what are ethically and legally acceptable scenarios and where do you have to draw the line? And once you draw the line, how do we technically ensure that this doesn't happen, right? What are, again, the right governance policies and mechanisms to put in for this to not happen anymore? So it's going to be really interesting.

It grows from data governance being already cross functional to be even more cross functional when it comes to Gen AI. because you need all of these experts that have like a legal and ethical and a people point of view to actually do the right thing in the future. So, yeah, very exciting times.

Thanks.

Adel Nehme: Yeah, definitely. And you discussed the AI Act at the beginning of the episode of having an implication as well in the data governance conversation. Maybe what should organizations expect when it comes to the AI Act, especially if they're based in the EU or if they serve customers in the EU from a data quality and data governance perspective?

Tiankai Feng: Yes, I think, I mean, the good thing, I don't know, I'm not sure if I should call it a good thing, but it's really early stages, right? So the first agreements have been made on the UAI, but it's really not in effect yet, but I would not wait until it's in effect. I would start getting familiar with the rules in there already as soon as possible to lay the right foundations for when it then starts, right?

So. There are already like classifications and some guidelines out there of what the UAI, for example, classifies as high risk or low risk or middle risk kind of AI use cases. So start implementing these frameworks and categorizations into your organization already. So like make conscious choices about what you're using AI for, thinking already ahead for how a law or how certain Irregulatory bodies might audit you on may might basically investigate into your portfolio in the future and make sure you stand on the right side, basically, of the legal side, let's say so when it comes to any legal thing, right?

Be more up to date and be more anticipating of the things and don't wait until it's too late. Reacting is always worse than preventing. I always say, and that is especially true for data governance. So All we need to do is, all we can do is just like, predict and try to prevent as much as we can.

Adel Nehme: Yeah, and maybe that preventative approach, maybe what do you think are must avoid AI use cases at the moment, given how the regulation evolving?

Tiankai Feng: I think anything that is going deep and very heavily on privacy data, I think would be a complete no go for me, at least in the space. And there are luckily already ways where you can make privacy data less private, right, by anonymization or any obfuscation measures, and that is fine. But make the effort, right?

Don't just put raw PII data into your algorithm and start doing things with it. Do the obfuscation and do your privacy mechanism upfront so you don't get into trouble there. I think other ones are, I mean, they are, but more much related, anything that goes into like dangerous territory, like weapons or like cybercrime, then it's all difficult, I would say, right?

I mean, but those are not just data boundaries. They're like legal boundaries anyway. But AI unfortunately creates a whole lot of new opportunities to be more in the gray areas where You can make money with like not so legal measures, but I would, of course, avoid those as well.

Adel Nehme: And maybe final question on the generative AI boom when it comes to data governance, how do you advise the leaders to take advantage of the current boom of generative AI to drive value with data governance?

Tiankai Feng: I think that's a really good point because we talked about how we can govern the data that is input for GenAI, but we didn't talk about how to apply GenAI on data governance work itself, And for example, that means we have to write a lot of policies in data governance, And data governance policies can be a really tedious task to work on, right?

So if you would train your data AI to, for example, write policies in a certain structure and in a certain wording and language that is specific to your organization, then you could actually write policies and adapt them in a much more efficient way, Also with, I think, AI generally to identify outliers or identify any anomalies in your data, can be a lot smarter than it was in the past, Besides the only averages and the outliers, you might even like get unstructured data anomalies and unstructured data advice as well. So we can actually use AI to identify data quality problems up front before any team has to complain about it. And maybe deal with it already early on the pipeline, rather than actually too late downstream.

So, there's a lot of ways, I think, where this can already get implemented and can help us. I would only say, let's never forget the human oversight. Of it, So always fact check if the output is really correct or not. But that applies to everything. I would only use it to help me with the operational stuff, but not too much, let's say with the human thinking stuff at this point.

Adel Nehme: And then maybe, Tianka, as we wrap up our conversation, what are key trends that you see in the data governance space this year, and what are you most excited about?

Tiankai Feng: I think one key trend I see is still the decentralized data landscape, right? That would mean. Data as a product thinking is really here to stay, We all see basically the benefits of having data products and not centralized bottleneck data teams anymore.

But that comes with its own challenges for data governance, right? So you need to implement data governance in a federated way. So how do we balance centralization versus decentralization when it comes to governance? That's going to be a big topic this year as well. The second is, I would say we mentioned it already, but AI is going to be a lot of the reasons why we need to do data governance.

So we, I would suggest all data governance professionals to get more familiar with how AI actually works and to really be anticipating of what requirements might come that way. I would say Data governance, all of a sudden getting more and more attention, especially this year.

means that we should also invest more into upskilling others to get have an easier barrier into the data governance space. Data governance should not be as complicated as we all make it out to be at this point. And that's on all of us who work in data governance to hopefully have easier ways to let people understand what it is and to get into it more as well.

Adel Nehme: Yeah, that's very exciting. I'm also very excited for that. And maybe Tianka as we wrap up, any final closing notes or call to actions to the audience before we end today's episode?

Tiankai Feng: Not really. I would say only one thing, right? I would hope that more and more data leaders also get creative and find their own ways of making data topics more approachable and fun. We cannot afford for it to be a silo anymore, right? We need people to understand data more and we can use fun and entertainment a little bit more for that.

So I look forward, hopefully, to more exciting ways to communicate around data.

Adel Nehme: Likewise. Thank you so much for Tiankai for coming on the episode.

Tiankai Feng: Thank you.

Topics
Related

blog

Digital Upskilling Strategies for Transformative Success

Explore the power of digital upskilling in achieving transformative success and bridging the skills gap for a future-ready workforce.

Adel Nehme

7 min

blog

What is Data Fluency? A Complete Guide With Resources

Discover what data fluency is and why it matters. Plus find resources and tips for boosting data fluency at an individual and organizational level.
Matt Crabtree's photo

Matt Crabtree

8 min

podcast

How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

Adel and Saurabh explore the importance of data quality and how ‘shifting left’ can improve data quality practices, operationalizing ‘shift left’ strategies through collaboration and data governance, future trends in data quality and governance, and more.
Adel Nehme's photo

Adel Nehme

41 min

podcast

[Radar Recap] The Art of Data Storytelling: Driving Impact with Analytics with Brent Dykes, Lea Pica and Andy Cotgreave

Brent, Lea and Andy shed light on the art of blending analytics with storytelling, a key to making data-driven insights both understandable and influential within any organization.
Richie Cotton's photo

Richie Cotton

40 min

podcast

The Venture Mindset with Ilya Strebulaev, Economist Professor at Stanford Graduate School of Business

Richie and Ilya explore the venture mindset, the importance of embracing unknowns, how VCs deal with unpredictability, how our education affects our decision-making ability, venture mindset principles and much more. 
Richie Cotton's photo

Richie Cotton

59 min

cheat sheet

LaTeX Cheat Sheet

Learn everything you need to know about LaTeX in this convenient cheat sheet!
Richie Cotton's photo

Richie Cotton

See MoreSee More