Is more data always better? Are companies collecting an unnecessary amount of consumer data?
I hear in what you have presented a fundamental separation of duties commentary around AI, the segmentation of consumer experience with privacy and security.
That separation of duty is a principle that we should have in AI.
I teach my students this concept with a 10 digit number: 3013170124. Out of context it's a 10 digit number, it has absolutely positively no meaning. You throw context around it, you throw commas in certain places it's a number slightly over 3 million. You break it up into two groups of five, five groups of two, you put the three, zero up front and it's the area code for Belgium. You put a couple of dashes between the third and fourth and the sixth and seventh number you get a North American telephone number. And that is the correct context for this. Now, if I add another couple of pieces of data or information to tell you the 301 is one of the codes for Maryland and that I lived in Maryland from '95 to 2003 you may be able to glean some intelligence here that says that may have been my old phone number.
When I have discussions about what we should be doing regarding data collection and then what we do with it, I try to swing the discussion so that we are determining what information or knowledge we're trying to gather and what we are trying to do with it, which then will precipitate what data we need. We seem so focused on collecting as much data as possible because we don't necessarily know what we can or want to do with it. We stop asking the questions: What is it I'm trying to gather or achieve, et cetera? And that can help frame the discussion regarding what we keep, what we don't keep and by the way how far we are willing to go to get that data.
Adding to that the greater American public in my mind has not actually understood that distinction between data and information. I have a piece of data therefore I have information. They have conflated the two and we have seen countless examples of that, not the least of which is the stuff that's going on in the background in U.S. politics right now. But a lot of that has to deal with conflating data versus information versus knowledge. And if we're going to have these discussions we need to begin to separate those and define them appropriately.
The first thing that comes to my mind is the automotive industry and the technology industry's much beloved OKR model, Objective Key Result. And if we look at this space and we have an objective building upon the narrative you just expressed: our objective is to provide this set of services to this customer base and these business units, no more and no less, with an emphasis on doing that excellently and dominating the market in that way. Perhaps the very capitalist desire to surge and dominate and excel in a market space if aligned with an objective can be a helpful tool. I'm not espousing a view or an ideology or anything like that, I'm saying perhaps objectives and key results could be a very powerful tool in applying principles to AI and working on that normative piece, the “should” of it.
If we have OKRs defined properly, by the nature of defining them properly, it does data minimization.
That's the process we went through when our CISO became the head of consumer product engineering: the conversation flipped to what is the data required to do this? Excess data is excess risk, lets eliminate everything we do not specifically need. In other words - we conducted data minimization exercises across the board.
That would be a mindset shift for most businesses who if they think about their data strategically they think about it as an asset or a resource.
So does Dow Chemical when it's creating all of its stuff in DuPont but it recognizes that what it's creating, all of the stuff inside of that creation has a degree of toxicity with a high degree in safety concern for not only the workers but the environment around the factory if it gets screwed up.
I see where you're going now and I like that, that's amazing.
In 2020, and soon to be 2021, data is a liability. If you have more data, that is more data that can be breached. If you have more data, and are in scope for GDPR, CCPA, etc., than there is a lot more data falling under regulations.
More data is better if you have specific use cases that warrant if. If not, every bit of data is a liability, and should be stored only if there is a specific need.
And companies are collecting an unnecessary amount of consumer data. They often do not realize it until they are on the receiving end of a warrant, subpoena, etc., and suddenly have a very rude awakening.
Are companies collecting an unnecessary amount of consumer data?
Content you might like
Always12%
Often56%
Sometimes23%
Rarely5%
Never4%
Collaboration26%
Well-being33%
Socializing11%
Learning3%
All of the above23%
Other (please share below!)1%
organized a virtual escape room via https://www.puzzlebreak.us/ - even though his team lost it was a fun subtitue for just a "virtual happy hour"
As folks who live and exist to support the business, part of our job as businessmen and women is to explore the art of the possible. When you are young and struggling and looking to dominate market space you set the very low “is it legal” bar for that, which is reasonable to be honest with you. If there is no standard that says there can be a tasked illegality or liability, there is going to be that tendency just to try and create a new product, new service, new revenue stream, or find a niche in the market to head in that direction. But the larger issue that we have here is the speed to discriminate. Now, back in the day my dad would talk about, and I grew up in Massachusetts, those folks who would steer people of color away from certain neighborhoods or would price people of color out of doing certain things and create a level of exclusiveness or discrimination on an individual basis. That takes time, that takes energy, it's more evolutionary. It will happen but the great conspiracy to do this is probably not going to be very effective because there are six of us trying to do this across 150 mile radius, etc. It's going to take time. AI shrinks that time potentially down to picoseconds and expands that reach over a broader geography. So, I think the question that not just businesses but folks like us have to struggle with is, “Yeah I get it's not illegal, I get it's not liable, but before we dip our toe in the water should we establish that level of principles around what we will and won't do?” Because if we just rely on legality or even worse liability to determine what we do and don't do we will be farther down a path with potentially larger implications prior to the point of us hanging back.
It’s those race-to-the-bottom minimum compliance standards where overall regulation trails industry adaptation to new technologies by 10-15 years. I think it's a terrible way to look at the “should” standard of an emerging technical area. And I think you see this bubble up in the crypto space a lot around blockchain. You're seeing a very similar conversation. Regulatory narratives trail by a significant lead time.