Software Portfolio (WIP)
TL;DR
I am a Fullstack Software Engineer, who has mostly worked in the backend spectrum, and is comfortable with most layers of modern web development (frontend, backend, and platform/DevOps).
I am going to list software projects (professional or personal) that I have worked on and what I contributed. The purpose is mainly to make it easier for my readers (future employers hopefully) to see what I have done, evaluate my technical skills, and decide that I am a suitable candidate or not.
Mones
The product is originally called Ceres M, a “traditional Web2 game”, which underwent:
- In-game balance patches
- Various UX improvements
- Various blockchain integration implementations
To become a “Web3 Play-N-Earn game” on BNB Chain. The differences are:
- Using cryptocurrencies to replace traditional top-up payments
- Turning in-game items to NFTs and building an NFT marketplace to replace traditional game marketplaces
NFT Marketplace Events Indexing
Problem
The game’s NFT marketplace is essentially a smart contract. End user can interact with the blockchain directly via JSON RPC calls to smart contracts, but it is not friendly enough in terms of speed and UX.
Solution
Implement various services to synchronize on-chain data with a “local” database, and serve the data to a user-friendly frontend.
Implementation
I took part in the design, and implemented a few services and functionalities of the overall system.
- Without the data synchronization:
- With the data synchronization:
- Details of the data synchronization:
Technologies used: Golang, Elasticsearch, TypeScript, web3.js
Challenges
- Race Condition Between Enricher, Internal API, and Database
I had a more detailed post here, but it can be turned in short to this:
- Enricher sometimes handles events too fast that the requests it send to Internal API can be considered come simultaneously
- Internal API also handles the requests simultaneously
- The handling creates a case where two threads write to the database at once
- In the end, the concurrent writing from two threads creates inconsistent states
To solve this issue, at step 3.
, I changed the implementation from updating
the whole record to update one field only.
Oxalus
The name “Oxalus” a brand for a suite of NFT-related products, namely:
- Oxalus Games: NFT games including Mones, Gunfire Hero, and StepHero.
- Oxalus Wallet: a wallet for NFTs
- Oxalus Analytics: NFT data insights
The product aimed to provides users with insights on NFT activities.
- Oxalus NFT Aggregator: NFT marketplaces data aggregator
As for Oxalus NFT Aggregator, it was created to address user’s problem on browsing and buying NFT on multiple marketplaces. It also allows them to save gas fee by purchasing multiple NFTs at once.
Before having NFT Aggregator:
After having NFT Aggregator:
ERC-20, ERC-721, and ERC-1155 Event Data Indexing
Disclaimer: I am not the one who implemented the whole service, but I fixed a few complicated bugs of it.
Problem
Let us suppose we have a ERC-20 contract’s address, we can easily know what is
the current balance of an account (invoking balanceOf
). To know the change
history is another story, however. It involves manually indexing Transfer
events. For ERC-721, apart from an NFT’s transfer history, we also want to know
every NFT that a wallet owns. It also involves indexing Transfer
events.
Indexing one or two contracts is not hard, but for more than a hundred, or even a thousand contracts, it simply becomes infeasible if we creates many connections to blockchain nodes and query the data, as we may get rate-limited.
More specificially, in Oxalus Wallet’s case, the product’s backend have an unknown number of wallet that it needs to track. Oxalus Analytics can also have other insights from the transfer data.
Solution
The team implemented a service that fetches every ERC-20, ERC-721, and
ERC-1155’s Transfer
events, and try to put the data into a blob storage.
Implementation
The service’s algorithm is:
- Read the last completed block from blob storage
- Pick a big block range
- Split the block range into smaller chunks
- Fetch the events concurrently with each chunk
- Join the data from the previous step
Challenges
- “Unreliable” Data
In the first time the service run, for the same block range (1 to 10 for example), there are 100 records in the storage, while in the second time, there are 101. In the third time, there are 99 records. My point is that: since a blockchain’s data is immutable, we expect the number to be stable, or the same every time, but we get different numbers each time.
Diving deep into the problem, I found that the issue lie at the way data’s returned by the blockchain’s node. To be more specific, the nodes from NodeReal limit their data to 50,000 records, which caused the unreliability.
Marketplace Contract Events Indexing
TODO
Minh Phu Analytics
The product aimed to aid Minh Phu, one of the biggest sea food corporate in Vietnam and in the world, on:
- Market Analytics
- Decision Making Support
Scheduled News Aggregation
Problem
Minh Phu’s staffs watch the market from various sources everyday, and need a more convenient way to do so.
Solution
Build simple services to aggregate (crawl) news from various sources.
Implementation
I designed, implemented, and deployed services which schedules and dispatches crawling workers. The overall model was something like this:
Scheduler first fetches configurations from the database, and then create jobs periodically. The jobs are pushed into a queue, and Workers consume from
The workers are going to fetch news from various websites, and then put the fetched data into a database. A queue is needed to scale the fetching horizontally.
Challenges
-
Make sure that Scheduler is in synchrony with Configuration Database. The strategy is:
- On Scheduler starting: fetch every configuration rows and do the scheduling normally.
- On updates occurs to any configuration row: send a message to the scheduler to make it update the corresponding local data.
-
Understanding Python advanced syntaxes:
first class function
,args
andkwargs
. The syntaxes are used byAPScheduler
on creating a newjob
instance like this:scheduler.add_job(my_func, args=[...], kwargs={...})
So in my case, I have a base function that has this signature to crawl data:
def crawl(url: str): ...
The way to make it works with APScheduler is:
fetched_configurations = ... urls = convert(fetched_configurations) for url in urls: scheduler.add_job(crawl, kwargs={url: url})
The “magic” comes from Python’s allowance for dynamic parameters:
def f(a, b): ... kwargs = {"a": 1, "b": 2} # this "typical" way of evoking function with named parameters f(a=1, b=2) # is equivalent to f(**args)
The technologies that were used are: PostgresQL, Python, RabbitMQ, pika, APScheduler, BeautifulSoup, and Selenium.
Time Series Prediction
Problem
Minh Phu has seafood production related data of the past, and they want to use the data to predict future price.
Solution
Implement a service which allows user to rapidly try out different combinations of models and data column.
Implementation
Assuming we have this data:
Date | Column 1 | Column 2 | Price |
---|---|---|---|
2022-01-01 | X1_1 | X2_1 | P1 |
2022-01-02 | X1_2 | X2_2 | P2 |
2022-01-03 | X1_3 | X2_3 | P3 |
2022-01-04 | X1_4 | X2_4 | P4 |
Adding the previous day’s price as a column, and change Price
to Target Price
:
Date | Column 1 | Column 2 | Previous Price | Target Price |
---|---|---|---|---|
2022-01-01 | X1_1 | X2_1 | _ | P1 |
2022-01-02 | X1_2 | X2_2 | P1 | P2 |
2022-01-03 | X1_3 | X2_3 | P2 | P3 |
2022-01-04 | X1_4 | X2_4 | P3 | P4 |
We can put everything into a formula like this:
C1*X1 + C2*X2 + C3*P_n-1 = P_n
X1
: first parameterX2
: second parameterP_n-1
: previous priceP_n
: target priceC1
: constant 1C2
: constant 2C3
: constant 3
Statistic models (or machine learning models, as a fancier word) are used to
find C1
, C2
, and C3
.
The service then expose these options for the “numbers finding” process:
- Model
- Model parameters
- Columns to use
- Time range
Challenges
- Data Wrangling: listing the steps is easy, but I could recall that I struggled
a lot, even though it sounded simple.
- Fetch the data
- Skip the first row
- Create the “previous price” column
- Create model
- Fit the data into the model
- Save the calculated data
- TBA
Technologies used: Python, pandas, sklearn, and statsmodels.
Side Projects
Personal Blog
Technologies used: Hugo, Caddy, systemd.
Darkest Savior
TBA
Esoterica
TBA
Miscellanies
WIP
GitOps
Disclaimer: I encountered this model at two places, and dabbled a bit on one that was built from scratch. I know how it works in theory, but not in details.
The model is something like this:
Technologies used: Docker, GitLabCI, Google Cloud, and Kubernetes.