Belfast: Reverse engineering a mobile game server (tiny)

In December 2024, I decided to quit a game.

Instead of uninstalling it, I reverse engineered the protocol.

Belfast became a Go server emulator for Azur Lane with custom TCP framing, protobuf handlers, PostgreSQL persistence (migrations + SQLC), gameplay domains, and reverse-engineering tooling.

The project only became sustainable once I forced a strict decoding order:

Frame boundaries
Packet IDs
Payload encoding
State transitions
Gameplay semantics

If you skip that order, every experiment looks random.

Wire protocol, decoded

The first meaningful clue was this header sample:

0x01 0x89 0x00 0x2a 0x31 0x00

0x2a31 is 10801 (SC_10801). From there the frame shape became clear:

2 bytes: packet size
1 byte: sentinel (0x00)
2 bytes: packet ID
2 bytes: packet index
N bytes: protobuf payload

func GetPacketId(offset int, buffer *[]byte) int {
	var id int
	id = int((*buffer)[3+offset]) << 8
	id += int((*buffer)[4+offset])
	return id
}

func GetPacketSize(offset int, buffer *[]byte) int {
	var size int
	size = int((*buffer)[0+offset]) << 8
	size += int((*buffer)[1+offset])
	return size
}

On send, Belfast rebuilds headers explicitly so packet reproduction stays deterministic:

func GeneratePacketHeader(packetId int, payload *[]byte, packetIndex int) []byte {
	var buffer bytes.Buffer
	payloadSize := len(*payload) + 5
	buffer.Write([]byte{byte(payloadSize >> 8), byte(payloadSize)})
	buffer.Write([]byte{0x00})
	buffer.Write([]byte{byte(packetId >> 8), byte(packetId)})
	buffer.Write([]byte{byte(packetIndex >> 8), byte(packetIndex)})
	return buffer.Bytes()
}

Packet index is often 0x0000 but can be 0x0001 in multi-packet frames; ignoring it causes subtle replay bugs.

Bootstrap flow (real packets, real handlers)

After framing, the next challenge is consistent boot/login sequencing with coherent state.

CS_10800 -> SC_10801 (Update check)
CS_10700 -> SC_10701 (Gateway info)
CS_10020 -> SC_10021 (Auth confirm + server list)
CS_10022 -> SC_10023 (Join server)
CS_10024 -> SC_10025 (Create player, if needed)
CS_11001 -> initial state sync fan-out

HandleAuthConfirm (internal/answer/auth_confirm.go) binds login input to account identity, emits a server ticket, and can create accounts in skip_onboarding mode:

intArg2, err := strconv.Atoi(payload.GetArg2())
if err != nil {
	return 0, 10021, fmt.Errorf("failed to convert arg2 to int: %s", err.Error())
}
client.AuthArg2 = uint32(intArg2)
protoValidAnswer.ServerTicket = proto.String(formatServerTicket(client.AuthArg2))

yostarusAuth, err := orm.GetYostarusMapByArg2(uint32(intArg2))
if err != nil && db.IsNotFound(err) && config.Current().CreatePlayer.SkipOnboarding {
	accountID, err := client.CreateCommander(uint32(intArg2))
	if err != nil {
		return 0, 10021, err
	}
	protoValidAnswer.AccountId = proto.Uint32(accountID)
}

JoinServer (internal/answer/join_server.go) resolves identity from account_id, device_id, and server ticket, then enforces one active session per commander:

if client.Server != nil {
	existingKicked := client.Server.DisconnectCommander(
		client.Commander.CommanderID,
		consts.DR_LOGGED_IN_ON_ANOTHER_DEVICE,
		client,
	)
	if existingKicked {
		logger.LogEvent("Server", "LoginKick",
			fmt.Sprintf("kicked previous session for commander %d", client.Commander.CommanderID),
			logger.LOG_LEVEL_INFO)
	}
}

CreateNewPlayer (internal/answer/onboarding/create_new_player.go) validates name bounds, starter ship IDs, and account/device mapping before provisioning:

nameLength := utf8.RuneCountInString(nickname)
if nameLength < createPlayerNameMin {
	response.Result = proto.Uint32(2012)
	return client.SendMessage(10025, &response)
}
if nameLength > createPlayerNameMax {
	response.Result = proto.Uint32(2011)
	return client.SendMessage(10025, &response)
}

if _, ok := starterShipIDs[shipID]; !ok {
	response.Result = proto.Uint32(1)
	return client.SendMessage(10025, &response)
}

Architecturally, these packets are where identity and session coherence are finalized. If 10020/10022/10024 are only partially accurate, later gameplay handlers can look broken even when their internal logic is correct, because the upstream commander/session state is already inconsistent.

Transport and dispatch choices

The network layer stays explicit by design; hidden abstractions around binary protocols usually make debugging worse.

Server (internal/connection/server.go): accept TCP, validate maintenance/private-client constraints, read into ring buffer, parse size/body, enqueue per-client frames.

Client (internal/connection/client.go): bounded queue (packetQueueSize = 512), packet pool (packetPoolSize = 128), dedicated dispatch goroutine, backpressure, runtime metrics.

Dispatch (internal/packets/handler.go) resolves handlers by packet ID, runs all handlers, then flushes buffered writes once:

handlers, ok := PacketDecisionFn[packetId]
headerlessBuffer := (*buffer)[offset+HEADER_SIZE:]
if !ok {
	_, _, err := client.SendMessage(10998, &protobuf.SC_10998{
		Cmd:    proto.Uint32(uint32(packetId)),
		Result: proto.Uint32(1),
	})
	if err != nil {
		return
	}
} else {
	for _, handler := range handlers {
		_, _, err := handler(&headerlessBuffer, client)
		if err != nil {
			client.CloseWithError(err)
			return
		}
	}
}

Flushing buffered writes once per cycle reduces syscall churn and keeps ordering deterministic.

Region routing, persistence, and data ingestion

Azur Lane diverges by region (CN/EN/JP/KR/TW). Belfast resolves that at registration time instead of scattering region checks in handlers:

packets.RegisterLocalizedPacketHandler(13101, packets.LocalizedHandler{
	CN: &[]packets.PacketHandler{answer.ChapterTracking},
	EN: &[]packets.PacketHandler{answer.ChapterTracking},
	JP: &[]packets.PacketHandler{answer.ChapterTracking},
	KR: &[]packets.PacketHandler{answer.ChapterTrackingKR},
	TW: &[]packets.PacketHandler{answer.ChapterTracking},
})

Persistence stack is deliberately boring: PostgreSQL + SQLC + strict migrations with advisory locks and checksums.

if _, err := lockConn.ExecContext(acquireCtx,
	`SELECT pg_advisory_lock($1, $2)`,
	migrationAdvisoryLockClassID,
	lockObjectID,
); err != nil {
	return err
}

if appliedChecksum, ok := applied[m.Version]; ok {
	if appliedChecksum != m.Checksum {
		return fmt.Errorf("migration %d already applied but checksum changed", m.Version)
	}
	continue
}

Most gameplay handlers depend on external game data (ships, chapter templates, shop data). misc.UpdateAllData centralizes importers and applies them in a stable order so reseeding remains reproducible:

err := db.DefaultStore.WithTx(ctx, func(q *gen.Queries) error {
	for _, key := range order {
		fn := dataFnSQLC[key]
		if fn == nil {
			return fmt.Errorf("missing sqlc importer for %s", key)
		}
		if err := fn(ctx, region, q); err != nil {
			return err
		}
	}
	return nil
})

That ordered ingest path matters because templates cross-reference each other. Applying imports in arbitrary order can create silent data gaps that only surface later in runtime behavior.

Chapter system deep dive

Core chapter handlers (internal/answer/chapter):

CS_13101 -> SC_13102 (tracking/start)
CS_13103 -> SC_13104 (actions)
CS_13106 -> SC_13105 (battle result request)
SC_13000 (base sync)

ChapterTracking computes oil costs, validates resources, builds CURRENTCHAPTERINFO, and persists chapter state:

baseOil := template.Oil
oilCost := uint32(float64(baseOil) * rate)
if !client.Commander.HasEnoughResource(2, oilCost) {
	response := protobuf.SC_13102{Result: proto.Uint32(1)}
	return client.SendMessage(13102, &response)
}

if oilCost > 0 {
	if err := client.Commander.ConsumeResource(2, oilCost); err != nil {
		return 0, 13102, err
	}
}

CS_13103 movement uses BFS over walkable cells before updating fleet position and step counters:

start := chapterPos{Row: group.GetPos().GetRow(), Column: group.GetPos().GetColumn()}
end := chapterPos{Row: payload.GetActArg_1(), Column: payload.GetActArg_2()}
path := findMovePath(grids, start, end)
if len(path) == 0 {
	response := protobuf.SC_13104{Result: proto.Uint32(1)}
	return client.SendMessage(13104, &response)
}

stepDelta := uint32(len(path) - 1)
group.Pos = buildPos(end)
group.StepCount = proto.Uint32(group.GetStepCount() + stepDelta)
current.MoveStepCount = proto.Uint32(current.GetMoveStepCount() + stepDelta)

Ambush probability mirrors client-side formulas documented in code to avoid visible statistical drift:

rate := 0.05 + posExtra + globalExtra
if step > 0 {
	denom := inv + investSums
	if denom > 0 {
		rate += (inv / denom) / 4 * float64(step)
	}
}
if posExtra == 0 {
	rate -= calculateFleetEquipAmbushRateReduce(group, client)
}
rate = clampChance(rate)
return uint32(rate * chapterChanceBase)

Tooling that paid for itself

cmd/pcap_decode reconstructs TCP streams, parses frames, decodes protobuf by reflection, and emits JSON lines:

packetID := int(binary.BigEndian.Uint16(buffer[3:5]))
packetIndex := int(binary.BigEndian.Uint16(buffer[5:7]))
payload := buffer[packets.HEADER_SIZE:frameSize]

if constructor, ok := s.registry[packetID]; ok {
	msg := constructor()
	if err := proto.Unmarshal(payload, msg); err != nil {
		record.Error = err.Error()
		record.RawHex = hex.EncodeToString(payload)
	} else {
		record.JSON, _ = protojson.MarshalOptions{EmitUnpopulated: true}.Marshal(msg)
	}
}

I also built cmd/gateway_dump, a tiny utility that dials a gateway, sends CS_10018, requires SC_10019, then JSON-dumps the protobuf server list (ids, name, ip, port, state, optional proxy fields). It uses strict framing checks and timeout-bounded reads so bad responses fail fast.

A friend and I used it with IP-range scans plus targets recovered from client constants, then diffed server lists across regions/builds. One hit was an Audit server (likely store-submission/QA). We connected once, finished the tutorial, created accounts with our nicknames, and stopped there.

cmd/packet_progress solved a recurring reverse-engineering problem: “coverage is good” without a measurable definition. It walks packet registrations, parses handler ASTs, applies weighted heuristics, and emits machine-readable diagnostics.

Status model:

implemented
partial
stub
panic
missing

const (
	statusImplemented = "implemented"
	statusPartial     = "partial"
	statusStub        = "stub"
	statusPanic       = "panic"
	statusMissing     = "missing"
)

Weights: heuristicWeights{
	SendMessage:  3,
	ResponseType: 2,
	RequestType:  1,
	ProtoSetter:  1,
	RequestParse: 1,
	CommanderUse: 2,
	ORMUsage:     2,
	DBWrite:      2,
},
Thresholds: heuristicThresholds{ImplementedMin: 4}

Scoring is weighted instead of binary: SendMessage, request/response typing, protobuf setter usage, commander/ORM touches, and DB writes all contribute to confidence.

This replaced intuition with repeatable planning: sort high-impact missing packets, detect regressions after refactors, and separate intentional stubs from silent breakage.

The report also includes packet-level and handler-level diagnostics (score, matched signals, file, line), which made implementation planning much more operational than “coverage feels fine”.

It also helped explain progress to contributors who were not deep in packet internals yet: people could see exactly why a handler scored as stub or partial and what specific signals were missing.

As protocol and gameplay coverage expanded, the bottleneck became repetitive Android UI navigation. Belfast already had ADB primitives (internal/debug/adb_watcher.go) for controls, logcat lifecycle, PID tracking, and optional restart automation, so I added an external MCP-style LLM loop:

Capture screenshot
Infer UI state
Select tap target
Send ADB input
Observe logcat + server behavior
Repeat until scenario ends

I do not treat the model as a truth source; I treat it as a repeatable integration-test operator.

Deterministic pass/fail checks stay server-side; the model just drives tedious UI traversal. That boundary kept the loop productive without letting probabilistic behavior decide correctness.

In practice, this was most useful for long, boring validation paths where humans make inconsistent taps and timing mistakes: reconnect loops, onboarding retries, and repeat chapter navigation after server changes. The model did not replace assertions; it reduced operator fatigue and improved run-to-run consistency for client-facing checks.

Lessons and roadmap

Main lessons:

Reverse engineering is mostly systems engineering.
Packet-level correctness is required; gameplay semantics are the finish line.
Deterministic transport, strict persistence, and measurable coverage carry the project.
Stubs are scaffolding.
Once testing becomes manual, automation quality controls velocity.

And yes, one major breakthrough was still converting hex to decimal in GNOME Calculator.

Roadmap now focuses on making updates boring:

Protocol diff pipeline for protobuf/packet drift detection.
Data sync hardening with region-aware versions and strict validation.
Scenario regression suite combining packet assertions with UI/log assertions.
Handler ergonomics via domain-tagged coverage and impact-ranked missing packets.

Concretely, that means generating change candidates before manual reversing starts, validating data imports against shape drift and missing references, and turning manual play routes into scripted scenarios for login, chapter start, battle result, and shop flows. On the maintenance side, packet coverage needs domain tags (auth, chapter, shop, world) so missing work can be ranked by runtime frequency and player impact instead of a flat backlog.

I started this project to stop playing a game and ended up with a protocol emulator, ingestion pipeline, and AI-assisted test harness. That accidental system-building arc is still my favorite part of reverse engineering.