Thanh's Islet 🏝️

A Mildy Interesting Technical Challenge

I thought the question about a challenge that I faced was asked in my interviews twice or thrice, and could not come up with a good enough answer. Recently, I found one such mildly interesting technical challenge, and wanted to jot it down before it faints from my mind.

Context

As you may have (or may have not) known it, I joined a “typical” blockchain startup with a “typical” backend job there: EVM-compatible data synchronization; NFT-related stuff, of course. To be more specific, I needed to create a “cache” layer for a NFT Marketplace Service that answers these questions:

More Details

The “data flow” is also a “typical” one:

[Blockchain] --(1)--> [Event Queue] --(2)--> [Data Storage]

(1): A service named "Mediator", which periodically fetches events from
blockchain, does simple processing, and saves the data to an event queue (Kafka)

(2): Four services, sequentially are "Enricher", "Metadata", "Master Data", and
"Marketplace", interact with each other to make sure that in the end, there is
ready-for-serve data within "Data Storage" (Elasticsearch in this case)

The “processing” within the data flow is also quite “normal”:

[Event Queue] --(1)--> [Enricher] <--(2)-- [Master Data]
                          |   ^
                          |   |
                         (4) (3)--- [Metadata]
                          |
                          |
                          v
                    [Marketplace] --(5)--> [Data Storage]

For everything except Data Storage, there is not a lot to say:

What is stored within Data Storage also matters. To be simple, however, we just need to know that it stores… NFTs’ data, which includes these two fields:

A few events are handled, again, to be simple, we need to pay our attention to these three events:

There is also two rules:

// valid
{
    "owner": "0x00_market",
    "state": "being listed"
}

// valid
{
    "owner": "0x00_user",
    "state": "available"
}

// invalid
{
    "owner": "0x00_user",
    "state": "being listed"
}

// invalid
{
    "owner": "0x00_market",
    "state": "available"
}

Things Went Well, Until They Do Not

Everything went well for a while, until there is more data. 99.9% of the data is handled correctly and has the correct state, while incorrect ones sometimes show up, and I could do nothing but change them manually. I got fed up, and tried investigating.

Reading the code was my first thought. It did not gave me anything insightful. This pseudocode should give you a rough idea what was written then:

def handle_event(data):
    fetch_item(data['id']) # fetch from database
    mutate(data)           # mutate that data in-memory
    save_item(data)        # save the mutated data to database

I spent a while thinking about where could it go wrong, and (luckily) came to the right conclusion: the invalid state happens when two events are handled at once (a Transfer and a Listing come at the same time).

The next thing to do was to set the environment up. Let me remind you of the services:

The next thing I did was simulating the request, which is faking a few “valid” events and push them into Event Queue. Doing it manually with AKHQ got out of my thinking immediately. kcat (or kafkacat), is the next choice.

kcat -P -b 127.0.0.1:9200 -t nft.events /tmp/transfer.json
kcat -P -b 127.0.0.1:9200 -t nft.events /tmp/listing.json

I got stuck for a while as I could not reproduce the error. I thought of something more “barebone”:

curl -X POST -d '@/tmp/transfer_request.json' 127.0.0.1:9000/handle-transfer
curl -X POST -d '@/tmp/listing_request.json' 127.0.0.1:9000/handle-listing

I got frustrated for a while, until I realize I was using vim-slime to send those command to another tmux pane. The way my shell (zsh) handles those commands was:

- execute the first command
- wait until it is done
- execute the second command
- wait until it is done

A bit of modification to the commands and the way I handle it (instead of using vim-slime, copy the two commands and paste it into the shell at once) did reproduce the error:

curl -X POST -d '@/tmp/transfer_request.json' 127.0.0.1:9000/handle-transfer &
curl -X POST -d '@/tmp/listing_request.json' 127.0.0.1:9000/handle-listing &

Back To The… Fundamentals

I think people was having a faint idea of what went wrong, reading the pseudocode, but I will try to make it clearer here. The sequence of events that we needed are:

1. Handle Transfer Event
1.1. Fetch Data
1.2. Mutate Data
1.3. Save Data

2. Handle Listing Event
2.1. Fetch Data
2.2. Mutate Data
2.3. Save Data

But in reality, when those two comes at once, the sequence’s order can be messed up:

1.1. Fetch Data
2.1. Fetch Data
1.2. Mutate Data
1.3. Save Data
2.2. Mutate Data
2.3. Save Data

Then of course, the final version within our Data Store becomes incorrect. A natural solution that came was using a mutex and lock the data mutation.

You have guessed it. The lock does not work, either.

The Simplest Solution

Looking at the data and state again, I had another insight: Transfer mutates the ownership, while Listing mutates the current state, which are two different fields of a record.

{
    "owner": "0x00_market", // `Transfer` mutates this
    "state": "being listed" // `Listing` mutates this
}

A simple fix is just:

def handle_event(data):
    fetch_item(data['id'])   # fetch from database
    mutate(data)             # mutate that data in-memory
    fields_to_update = [...] # set the fields to be partially updated
    update_partial(          # partially update the data
        data,
        fields_to_update,
    )

Conclusion

Congratulate on slugging through my explanation of this mildly interesting technical challenge. I hope that it gave you some useful insights, and also hope that I can tell the story coherently to my next interviewers.