Company Blog | Incognia

Episode 2: Building Trust Scores to Fight Fraud and Reward Good Users

Written by David Nesbitt | October 1, 2025 at 1:08 PM

 

Listen / Watch: Spotify | Apple Podcasts | YouTube

What happens when a gig economy platform replaces gut instinct with data-driven trust scores?

In this episode, we sit down with Kunal Kumar, Principal Product Manager at Rebel Foods—a global cloud kitchen company in the food delivery industry—who led the design and rollout of trust scores for both customers and delivery riders. He shares how Rebel built their models, what signals matter most, and the lessons learned while balancing fraud prevention with customer experience.

Key discussion topics include:

  • Why Rebel Foods decided to create trust scores in the first place
  • How trust scores balance fraud detection with fairness
  • Segmenting users into tiers and tailoring treatment flows
  • Cross-team alignment and the implementation process
  • Using metrics to monitor effectiveness
  • Pitfalls to avoid when designing scoring models

Key TakeAways

  • Data over instinct. Trust scores remove bias and subjectivity from fraud decisions, replacing gut calls with a consistent, data-driven framework.
  • Segmentation & treatment. Customers and riders are divided into tiers, with high scorers getting perks like fast refunds and low scorers facing stricter checks.
  • Impact without compromise. Rebel Foods reduced retention spend and doubled refund automation after launching trust scores—while keeping customer satisfaction steady.
  • Built for trust. Trust scores were built to reward genuine behavior, with every user starting with a high score and only dropping if risky patterns appear.
  • Location is critical. GPS fidelity is a critical rider signal that's used to improve customer ETAs, ensure fair pay for riders, and catch compensation abuse.

Show notes:

About Fraud On The Go Podcast

Fraud On The Go is the podcast for fraud fighters on the frontlines of the gig economy. Through candid conversations with experts, we explore how delivery, ride sharing, and marketplace platforms are fighting fraud and risk at scale, and what it takes to stay ahead.

Episode Transcript

David: Welcome to Fraud on the Go. This is the podcast for fraud fighters in the gig economy, and I'm your host, David Nesbitt. Today's episode is going to be all about scoring. And we're not talking about ratings or reviews, but actually internal trust scores for customers and drivers—or riders or couriers, whatever you call them—on gig platforms.

My guest today is Kunal Kumar. He's the principal product manager at Rebel Foods, and he's helped lead the development and implementation of trust scoring systems on both sides of the platform. In this episode, we're going to explore how Rebel built their customer and rider scores, how those scores impact decisions like refund policies and rider allocation, and some of the lessons they've learned along the way.

So excited to jump in. Kunal, thank you so much for joining me on the podcast.

Kunal: Thanks, David. Pleasure to be here. Excited to spill some secrets from the kitchen of risk and resilience.

David: Absolutely. Okay, so first, just to give some explanation—Rebel does use the term rider for what you might call a driver or courier on other platforms. So we're going to use that term today. But just so you know, that's what we're talking about: the driver/courier/rider side of the platform, and the customer side.

So let's jump in. Our first topic here is about designing a score—what goes into this. I want to ask you a first question, Kunal, to get us started. What led your team at Rebel to start developing internal customer and rider scores? What was the problem you were trying to solve?

Kunal: Well, Shakespeare once said, “Suspicion always haunts the guilty mind.” But in platform business, suspicion without data is just chaos. At Rebel, we saw abuse patterns repeating—COD fraud, location fraud, etc.—and we realized that our customer delight team was spending more time firefighting than actually delighting the customer.

That's when we realized there was a need for a trust score—not based on gut or anecdote, but on solid data. The trigger point came when we noticed multiple things. First, some customers were marked false positive; they were marked fraudulent when they were not. At the same time, some fraudulent customers slipped through.

In fact, we saw some Reddit threads where people were actually posting ways of getting a Faasos wrap for free. I think that was the trigger point when we decided maybe we need a way to quantify fraud and create a comprehensive customer trust score.

David: Yeah, that makes total sense. You need something concrete to act on so you're not just depending on gut instinct or someone's decision-making in the moment, but instead you have more of a framework behind what you're doing.

Help our audience think about this. People listening are probably at different stages—some might use something like this, some might not have even thought about it yet, and some may be building it right now. How do you think about the purpose of a trust score? What are the behaviors you're trying to reward or discourage when creating a trust score?

Kunal: Basically, we are not treating the trust score as something we use to penalize the customer. We think of it this way: most of our customers are genuine. They like ordering from our platform, and we like serving them and giving them the best possible experience.

However, there always exists 1–2% of customers who cause issues, and because of them, some other customers may also suffer. Many times, the customer support agents are the first line of defense, so to speak. When they interact with customers, there was a lot of subjectivity—looking at history, making judgment calls on the go. We wanted to make this whole process very objective, with no room for subjectivity.

So what we're trying to do is segregate our customers based on their intent of purchase with us. We want to reward customers who order frequently and are genuine, while at the same time creating roadblocks or checks for customers more likely to game the system to get free orders, discounts, etc.

So that's the whole purpose of creating a trust score—or really any abuse detection mechanism. At the core, it's still about customer experience. Because once we segregate that 1–2% of fraudulent customers, we feel we'll be able to serve our genuine customers better.

David: So for the next question I want to ask—when this idea was originally brought up at Rebel of creating the trust score, was there internal alignment? Was everyone on the same page, or did you have to build the case with other teams at Rebel about why this made sense and why now was the right time to do it?

Kunal: Right. So we had both qualitative and quantitative data, along with multiple pieces of feedback, that showed the same type of customer might be treated differently depending on which agent was handling them. And of course, there was a lot of pushback. Some feared that the NPS score might go down. Others worried that too many fraudulent customers might slip through, or on the other side, too many false positives would appear.

Different teams also often have different KRAs, and any new change in the process could impact someone's KRA. So it was essential to bring everyone on board. The good part was when we initially did a pilot—and even before the pilot, we simulated the data—we were able to show everyone that only a very small percentage of customers were actually fraudulent.

We also realized that less than 1% of our customers accounted for 14–15% of our refunds. Once that data was shared, other stakeholders started trusting the process and got on board. We also made it clear that this was a two-way door: we’d pilot it, and only once our success criteria were met would we scale the pilot across all geographies and customers.

So yes, there was initial resistance, but eventually we built enough trust to bring everyone on board.

David: That's interesting. Internally at Incognia, we've talked about the idea that a lot of fraud comes from a very small percentage of users. So let’s talk next about how Rebel Foods approached building this. Could you walk us through what goes into your customer score, and then we'll talk about the Rider Trust score after?

Kunal: Okay. Think of the customer score as a multidimensional credit score for behavior. There's the usual RFM—recency, frequency, and monetary value. Then we layer in refund count, refund value, number of cancellations, device usage, location quality, etc. There are many other factors, but these are the main ones. Each signal tells us something significant about the reliability of the customer.

For example, if you’ve ordered three times in the past week, never requested a refund, and never canceled an order, then you’re good—you’ll probably have a very high trust score. On the other hand, if you order once a month and in three out of your last five orders you’ve either requested a refund or canceled, that’s cause for doubt.

So instead of saying customers have to prove themselves through multiple orders before they get a high trust score, we said that by default, all customers are good. You have to do something bad for your score to go down. So when customers first sign up to our platform, they start with a very high trust score. If there are cancellations and other negative signals, their score gradually reduces.

Based on the trust score, we categorize our customers—ideal, good, etc.—all the way down to abusive and very abusive customers. And of course, we treat these different categories of customers differently.

David: So “innocent until proven guilty” is kind of the idea behind it. Getting into the nitty-gritty—curiosity question—is it a number score on a 0–100 scale, or how do you define it?

Kunal: Yeah, it’s a scale of 0 to 100, with 100 being the highest. The weights differ, of course. Some signals carry less weight—say, when it’s a 50/50 chance that it’s either a customer mistake or an error by the kitchen or rider. That’s weighted much less than if we have conclusive evidence that someone is trying to game the system. In those cases, the weight is much higher. So eventually all customers have a score from 1 to 100.

David: That’s interesting. Thanks for explaining. It helps to understand more about what goes into it. So let’s turn the page to the Rider Trust score. What signals are most important on that side?

Kunal: I think it’s similar in philosophy, just different signals. Some additional ones we use are GPS relevance—or GPS fidelity—SLA compliance, and customer feedback. We take customer feedback very seriously. There’s a metric we track called customer contact. Our north star is that throughout the customer journey, there should be no reason for a customer to contact us by mail, call, chatbot, anything.

There are also different buckets of contact, some more severe than others. For example, signals like a rider marking an order as delivered before actually delivering it are treated more seriously than others. So I’d say GPS fidelity and customer feedback have higher weight than many other signals. Other signals include SLA compliance, route consistency, device sharing (like one number being used by multiple riders or a rider using multiple numbers), and GPS spoofing—which is a big one for us.

Again, it’s a weighted average. Riders are scored, and each rider starts at the same level. Based on feedback, their score can move up or down in the tier.

David: That’s interesting. You mentioned GPS accuracy and GPS spoofing separately. Can you explain why GPS accuracy is so important, and how it factors in?

Kunal: Yes. I think there are three reasons it’s important.

The first is customer experience. Let me deviate a bit from the fraud angle and move toward customer experience. We want to be transparent with customers so they know when their order is coming and what their ETA is, and they should be able to locate their rider on the map. High GPS accuracy ensures our ETA predictions are correct and the customer sees their rider moving in real time. We don’t want situations—like many of us have seen—where the rider looks like they’re flying all over the map.

Second is rider experience. We want to ensure riders are paid fairly. With accurate GPS data, we can track their route precisely—say, by pulling their latitude and longitude every ten seconds. That way, we can make sure they’re compensated fairly for the actual distance traveled. Sometimes the customer’s address is 500–600 meters away from the pin location, or there may be one-way streets or barricades that maps don’t account for. In those cases, Google Maps might say the distance is 1.5 km, but in reality the rider traveled 2.5 km. We want to compensate them fairly.

Third is the fraud and abuse angle. Since distance traveled is a component of rider payment, some riders try to manipulate it. We’ve seen cases where riders take longer routes, deliver the order but don’t mark it as delivered until later, or even spoof their GPS. All of this inflates their kilometer count, meaning they get overpaid.

So GPS accuracy is critical—it ties directly into customer experience, rider fairness, and fraud prevention. That’s why it’s such a big signal for us.

David: Yeah, that makes sense. From a customer's perspective, I can definitely relate to watching the map, trying to figure out how long it's going to take. Is the ETA actually correct? I'm watching where they are, and if it's jumping all over, it's not accurate, it's not moving for a long time—it can be frustrating from a customer perspective.

And I think on the location spoofing angle, our team found an example with one of the platforms we were working with: they had a driver accepting an order and either right before or after accepting the order—I forget the order—they spoofed their location into the middle of a large lake. You could see the location point in the lake. So clearly, you know, they either had some kind of floating vehicle—amphibious—or there were some shenanigans going on. But it makes a lot of sense, and it’s interesting to hear how location plays into the trust score as well.

So, you've hit on segmentation a couple of times, which I know is a super important topic—really interesting in this industry—as people are trying to think about how to make great automated decisions, quick decisions about certain things happening on the platform and about users, and how to handle their activity. So I'm curious—and I think every platform is going to have different answers for this, or at least varying answers—how do you segment customers and riders into tiers, and how might treatment change based on that score or segment being different from another user?

Kunal: Yeah. So first, the base for segmenting them is always the score. For customers, it's the customer trust score; for riders, the rider scorecard or rider trust score. Now, there are two ways of segmenting. The simple way is percentage-wise segregation—top 5%, top 10%, and so on. The second is stricter, based on numbers—anyone having more than 90, 75 to 90, and so forth.

Based on that, we obviously name them as ideal customer, good customer, etc. For all of these tiers, we have a mapped resolution workflow. If you belong to a very high customer score tier and there are any issues, if you reach out to our chatbot, you'll probably get a full refund, no questions asked, and everything is automated—you'll be happy, etc. If you’re somewhere in between, in certain cases you might get a partial refund; in others, you might be routed to a human agent who will do some investigation. And if you are at the bottom of the spectrum, things will be stricter for you—more surveillance, more checks, and most probably it will be routed to a human agent who will do due diligence.

That's the first part. The second part is, since we now have a customer score, we can use this beyond the normal customer delight flow. For example, if I am launching some exclusive deal, I can target top-tier customers. If I want to launch a new product and beta test it, I know who to target. Opportunities are endless once this score is created.

Similarly for riders, one clear use case is incentives. Instead of focusing and creating an incentive program based on just one metric, this in a way gives us a scorecard where a lot of other factors are baked in. So the top 5%, top 2% will get higher incentives. Similarly, when I have fewer orders and enough riders, I will probably assign the order—the system automatically assigns the order—to riders with a high rider score. For things like bulk orders, we always try to assign them to high-scoring riders. These are some of the ways; I think we still have not explored everything we can do with these scores. Opportunities are endless, and we are still looking at ways to reuse the same scoring mechanism to ultimately treat our customers better.

David: That's super interesting. A question that came to mind as you were talking through that: do you ever expose your segments to riders or customers? Do you let them know they're in a higher or lower tier? Are there cases in which that's helpful?

Kunal: For customers, no—we don't tell them. However, these numbers are not fixed. One can start at a number and move up and down. A customer today might fall in an abusive category, but they can have a way of redemption. These move, but we don't explicitly tell customers their score or the tier they fall into.

We do have a loyalty program, and of course customers know the tier they belong to. Loyalty tier is also one of the signals that we use. But the trust score is something we’ve made a conscious decision not to tell customers upfront.

David: That makes sense. I know on the rider side—the driver/courier side—I’ve heard of another platform where, after a certain number of strikes for a violation, like abusive behavior, they might let the customer know they’re at the threshold of something, to give them a chance to redeem and change their behavior. It’s interesting to think about the advantages and disadvantages. One danger is people figure out how to game that. If you're a fraudster doing abusive behavior and you know you're on the verge of being banned, you may switch to another account. Now it’s actually helping you that they let you know, because you know the right timing to switch up your approach. So it’s an interesting challenge and decision to make.

Okay, I’d love to move on to our next section, which is about implementation and tuning. At Rebel Foods, what did this look like? What are some best practices? First question I’m curious about: how do you monitor—after you implement this trust score on the rider side, and you’re rolling it out with a smaller group first to make sure you’re not going to break everything—how do you monitor if it’s actually working like you intended it to?

Kunal: Yeah. We always track metrics. Our north star is to reduce our retention spend as a percentage of net sales. We saw that getting reduced. We also track multiple secondary metrics like refund automation and decisions overturned by human agents, etc.

At the same time, we have a few guardrail metrics like CSAT rating. We don't want to end up in a situation where we reduce our retention spend by displeasing customers or reducing our CSAT. We would rather err on the side of caution.

Once we implemented it, we saw a massive reduction in retention spend, and at the same time we saw our refund automation grow by 2x. These are metrics we constantly track—at least weekly.

David: So break that down for me a little bit in terms of retention spend. Help me understand what goes into that, and why measuring that is a good indication of the success of your trust score.

Kunal: I’d not say retention spend is only impacted by the trust score or is the sole measure of how good it is—that’s one part of it, but it depends on a lot of other factors. Basically, in case of any issue, how do we compensate the customer? Let's say an order has not been delivered—we would probably place another order for that customer. If the order is canceled, the food is wasted; again, we have to take that hit.

Any sort of refund, reordering, cashback given, etc.—we track all of these as a percentage of our net sales, and that’s a metric we call retention spend. It shows two things: (1) how good our trust score is, because it helps us catch fraudsters in real time; and (2) how good our process is, because sometimes it’s not the customer's problem if we are not able to meet our ETAs, or if we forgot to deliver the entire order correctly—missing items, wrong items, etc. So this metric shows both our fraud detection capability and our operational performance. That’s why it’s our north star metric in this regard.

David: That's super helpful. I wasn't familiar with the term, so that was helpful—thanks for explaining it. It could be helpful to some of our audience who don’t use that same term or haven’t thought about it that way.

So you've implemented the score and you start to see how it actually works as you're rolling it out. Have you run into behavior changing because of the score? You said you're not exposing it to them, but as they see the platform behavior and decisions change—have you seen behavior changing from customers or riders?

Kunal: We have—both on the positive side and the negative side. On the positive side, metrics have improved. And as a fraudster, if you try frequently to game the system and now you’re not able to, after a point in time you kind of give up and try your luck on some other platform or issue. We’ve seen some of those actually reduce.

At the same time, fraud is always evolving. I just remembered an anecdote. There’s a very old play written in probably the 5th century in India, in Sanskrit. Part of the story is: there was a thief who was finally caught and presented in front of the king. The king was so impressed by the creativity of the way he used to steal things—among many other things—that he actually forgave the thief. Sometimes you can't help but be impressed with the creativity some of these fraudsters bring to the table.

We have seen some of them figure out how the whole scoring system works. They cool down for a bit and then do the same thing again. Sometimes they try to behave like a regular customer—place three or four regular orders—and then go ahead and cancel an order or do something like that. So yeah, it's a constant cat-and-mouse game. But I think this is one game where now we have a radar to catch them as quickly as possible.

David: That's great. And I love that reference to the play. From the perspective of admiring the creativity of fraudsters, I’d say it would be good if we had the same level of creativity applied to our own work. If we’re that creative with our work, we’re probably doing pretty well. Sometimes you just have to shake your head in admiration at the creativity of the scheme—even as you disapprove of what they did.

So I want to finish up our time by pivoting to talk about both the best practices you and your team found, as well as some of the pitfalls. I always think it's interesting to find out afterwards not just what was successful, but also what mistakes were made—what ditches you can fall into when trying to do this.

Let’s think of people listening to this podcast who might be building this for their own platform right now, or are just starting to think about it. How do we arm them with the right information to help them be successful? I think you can help us do that. So first, what are some of the common pitfalls you would warn other people about when designing a scoring model like this?

Kunal: Yeah, I have a few points.

First, we should not overfit for recent trends, because in that case we’ll always be reactive.

Second—and I think this is very important—the trust score should be very easy to understand for the whole team. You want your CX team to completely understand how it works, what the benefits are, so they can internalize it and get onboarded with it.

Third, don’t look at the trust score as something permanent. It can and should change. We should always launch and iterate, because as things evolve, fraudsters bring new ways to abuse or game the system. We have to stay on our toes to keep our systems and models updated.

So yeah, those three would be the common pitfalls to avoid.

David: That's great. That’s helpful.

Have you seen cases where the score flagged legitimate users—made them look like bad users—or failed to catch risky ones? Any examples come to mind?

Kunal: There are many examples, but there’s a funny one that was a false positive.

We were launching a couple of important features on a particular day, and we wanted to check the entire flow all the way to placing an order in production. In our office we were placing a lot of orders on production. We spoke to a particular kitchen and told them these were test orders, and that immediately after placing them we’d cancel, so they wouldn’t waste food preparing them.

Obviously, a lot of such orders were placed over a couple of days, and the system flagged the entire office area as a potential risk area. That was a funny one. But we managed to bake in these signals so now, for QA, it’s safe to place test orders on production without being flagged for abuse.

David: Yeah, that’s important. You don’t want the whole team to have to go through that every time you do a big testing release.

One question I thought of as well—were there any signals you originally included in the rider or customer score that turned out not to be helpful and you threw them out?

Kunal: Not many, but there’s one. For example, we tracked what percentage of our orders were delivered within 30 minutes. We also tracked delivery within PDT—promised delivery time, which is the first ETA the customer gets when they place an order. We included this as a rider score signal.

But later we realized that’s not completely dependent on the rider. Many factors affect those ETAs: maybe the model sets an unrealistic, overly aggressive ETA; there could be delays in the kitchen for many reasons; there could be bad weather or road conditions. So we realized it wasn’t fair to judge riders on how soon the order was delivered.

Of course, once the rider picks up the order, if they don’t take the prescribed path or do something wrong, then it’s on them. But in most cases, there’s very little the rider can do. So that was one we decided to throw out.

David: That’s an interesting example. Thanks for sharing a specific one.

So my last question: if you were starting over this process of building the customer and rider trust scores from scratch, what would you do differently?

Kunal: I would focus more on storytelling. Not just looking at the numerical value of the score, but digging deeper into what it actually signifies.

I’d also focus on building better feedback loops. That’s something we’ve started doing now. For example, earlier we would audit customer-agent conversations, but only take a sample. Now, with the help of AI, we audit almost 100% of our conversations. That gives us a much better feedback loop.

So those two—storytelling and feedback loops—are what I’d prioritize if I had to start today.

David: That’s great. Well, thanks, Kunal, so much for joining me on the episode and talking in such detail about the process you took to build your customer and rider scores. Thanks for all the examples you shared and the time you took. I really appreciate it, and I think our audience will really benefit from hearing this—no matter what stage they’re at in the process, or whether they have a system like this yet.

I think it’s a great topic, super relevant. So thanks for joining today.