Josip Mihaljević
Josip MihaljevićJava developerThis email address is being protected from spambots. You need JavaScript enabled to view it.
Josip Mihaljević
Josip MihaljevićJava developerThis email address is being protected from spambots. You need JavaScript enabled to view it.

Increasing first attempt package delivery success rate using a custom-developed solution

 

During the COVID-19 pandemic, consumer behavior has changed drastically, boosting online shopping like never before.  The emerging trend of online shopping is here to stay, but how does it affect the process behind online store browsing and package delivery?

Through the years our teams at BISS have acquired experience developing software solutions that improve business processes in logistics around multiple countries in Europe. One of them is AddressHub, a custom-developed solution for large foreign package logistic company.

In short, AddressHub is responsible for resolving addresses and receivers to whom the package has been sent from the package sticker itself. The solution is in use in production depots in one of the European countries.

 

It’s not that hard to deliver a package, is it?

Delivering packages can be simplified into three steps:

  • the sender sends a package with a receiver (recipient) on it (written by hand, printed out, …)
  • the delivery company accepts that package and delivers it to the receiver
  • receiver accepts the package

But, for every one of these steps, numerous things can complicate successful package delivery.

  • What if the address noted on the package is renamed or does not exist anymore?
  • What if the noted receiver does not reside at the given address?
  • What if the receiver is not at home?
  • What if there are multiple spelling errors in the address part?
  • What if the sender uses colloquial or alternative address names? - Deliver this package to John Smith, Empire State Building, NYC

How to handle new senders/receivers and new addresses? What is the PIN code to enter a building?

(Numerous scenarios can occur during the package delivery, so it is vital to be prepared, and expect the unexpected.)

 

Main features

The two foundations for this kind of solution are “source of truth” databases for addresses and receivers and custom matching algorithm.

AddressHub should import address data from any relevant source (government data, official post service) and use it as a source of truth for further matching. It can import regions, cities and their alternative names, streets, and their alternative names, and the house number and their alternative names (generally meaning building names like Empire State Building). Every address part (city, street, house number) can be enriched with GPS coordinates, which can be later on used for more accurate deliveries (can be used for planning, if the delivery truck, van, or bicycle should be used).

For receivers, the situation is a bit more complicated since persons and their names can vary a lot. For example, a person can move out from the noted address, change the name or use name abbreviations, use the company name for delivery to the workplace, and much more. So, the idea is to initially import receivers from an existing static database of receivers (for example internally from invoices, or externally from third-party services) and link them to addresses. At the moment of writing, AddressHub does not support but plans to differentiate between private and business receivers.

Later, with the influx of new planned and successful deliveries, AddressHub can open temporary addresses and receivers (which are not in the current database) and confirm them when feedback of successful delivery arrives.

That’s why AddressHub supports address and receiver in three states:

  • exact (source of truth data)
  • temporary (created manually or through new planned delivery)
  • confirmed (confirmed through successful delivery)

With new successful/unsuccessful deliveries every day, relevance scores can have an impact on receivers. The address is scored only by lexical relevance (more about in the following chapter), but the receiver alongside with lexical score decay through time.

 

What does that mean?

It means that if the receiver does not receive the package for a long time (configured), its low weighted score will decrease the total score (average of lexical and weighted score). In practice, that means that if we have receiver A on address A which receives packages every week, its weighted score will be better than receiver B on address A which receives packages every 6 months and we would consider receiver A a more relevant one for delivery. With this approach, we have “solved” the issue of a person not living at that address anymore or changing its name.

AddressHub also supports edge cases when for example hotel is located on the corner of the street and is registered on street A with house number A, but the delivery ramp is actually on street B with house number B. This is what we call address alias, where address B is the owner and address A is the alias. So, if we ask for a hotel with address A, we will get the same hotel on address B with 100% relevance and that is actually a delivery point where delivery can be made.

A similar thing applies to private receivers too, where we tend to minimize duplicate receivers with spelling errors or abbreviated names and aim to resolve them also with 100% relevance.

Every address and receiver can have its attributes (metadata). For address, an example can be a PIN code for entrance into the building or a true/false value if the address is deliverable or not. Receivers can have their emails, phones, or any other specific information regarding them. Those attributes are also searchable through a matching algorithm and can be used for finding relevant business results.

And finally, the matching algorithm, the black magic supporting all these features, and many more.

 

Matching algorithm and scoring

It is important to state that the algorithm is fine-tuned for domain country and company functionalities, but is easily applicable to similar scenarios. This chapter is going to get a bit more technical, but I’ll do my best to keep it straightforward.

Matching is split into several steps:

  • storing normalized data into searchable format (data structure) in some kind of storage (currently using Elasticsearch)
  • querying data with normalized input using fuzzy search and boolean similarity
  • receiving best matches, scoring and sorting them by an internal algorithm
  • (optional) execute any retry if necessary (e.g. when a response from the first query does not satisfy the requested criteria, retry with/without some data)
  • enrich data with all necessary data and return it to the user

Have you noticed that I mentioned normalization? Well, that is another configurable part of AddressHub. Normalization is a set of predefined rules which are used to preprocess input data before matching and some of the examples are: extending often used abbreviations, removing unusable data, handling language-special characters, and much more.

Firstly, by storing data in storage with correct relations between them, we are also supporting all the above-mentioned features like aliases and alternative names (including support for multiple languages). One thing we also support for addresses is history names. By history, we consider every address that has changed/deleted some of its parts (renamed street, changed postal code, deleted house number by demolishing house). With this support, if the sender sends a package to some address using an old (no more used) street name, AddressHub will still manage to resolve a new address with high relevance percentage.

After storing data, it is available for querying. Fuzzy search enables the algorithm to differently rank tokens (words) in search using ponders (boosts). Essentially, AddressHub can give bigger priority to postal code, since it is least likely to be spelled incorrectly.

When results from storage are fetched, AddressHub does internal scoring using Damerau–Levenshtein distance algorithm combining custom logic, coefficients, and penalization.

House numbers in street are not lexically matched, since house numbers 1 and 11 are lexically similar but could be far away from each other. That’s why a custom algorithm is implemented for human-like resolving, where house number 3 is closer to house numbers 2 and 4 rather than to 33. The same thing applies to the textual part of house number, where the nearest house numbers to 3C are 3B and 3D.

In the end, response data is enriched with some additional data such as GPS coordinates, attributes, etc.

 

How does it work?

AddressHub is API orientated web service that also includes UI responsible for configuring the system itself. Two main API methods are one for resolving address and for resolving receiver-address pair.

In essence, AddressHub receives the necessary data to resolve the best possible address and/or receiver. For example, if a full address was given (postal code, town name, street name, and house number), AddressHub will try to resolve the full address. But if the house number was not present, AddressHub will resolve the street only.

Response from web service is structured to return (defined number of) best possible (relevant) matches alongside all additional data such as coordinates, attributes, and other information.

AddressHub is implemented in a way to support single and bulk requests, and also to support multiple calls per second to hold up to depot needs. This information and many more, such as total requests, average response time, and percentage of successfully matched addresses, are present and visualized in the monitoring tool.

 

Delivery improvements

With all mentioned features and use cases that AddressHub covers, there are some measurable improvements that reduced costs.

Some of them are:

  • the basis for coordinate-based routing or sorting inside the depot
  • reduced manual entries for sorting by postal code by more than 50%
  • reduced sorting and postal code errors by ~80% (which decreases the need for a big depot storing lots of packages)
  • achieved high delivery success rate (due to more precise data and presence of coordinates for each address)
  • decreased carbon footprint by a more effective logistics process
  • reduced incorrect routing between depots by 80%

These are all numbers, but what do they mean?

With AddressHub, parcel delivery drivers have better quality data which increases their first attempt package delivery percentage, and also the efficiency of their route. Depots are not clogged with packages, drivers (hopefully) return with a lesser amount of packages to a depot and by that, a lesser amount of packages are going through the same process all over again. With reduced incorrect routing between depots, there is a lesser chance of infinitely spinning packages between depots.

 

Possible improvements & future of AddressHub

 Since AddressHub holds address data, it is a solid basis for autocomplete search → providing users to select address instead of writing it manually.

Attributes can also be expanded to know which and where are “safe places” where the receiver accepted its package delivery if he's not at home. Also, they can be used for detecting households, so for example one of the family members can receive a package for another one, which is currently not at home. Blacklisting or blocking receivers is also another option. And all that information can be fed from any system integrated with AddressHub.