Reverse engineer your own build


Disclaimer

Before we start, I want to make it very clear that this is in no way intended to dunk on the developers of the projects I’m mentioning here. I’m trying to use these artifacts as an example of a phenomenon I’m seeing, not as an excuse for cheap laughs.

It should also be noted that the software under review was in the state described below at 2024-04-13. Developers might have changed artifacts, builds and games since then. Your mileage may vary when trying this yourself.

One of my hobbies is reverse engineering games I play. A thing that surprises me is the fact that a number of them often have some files that generally should not be on my computer. These files range from debug symbols to OS specific files such as .DS_Store or folders like __MACOSX.

To show you what I mean, I’ve trawled through my game library to see if I could find some examples. It didn’t take long to find Cross Blitz:

$ ls -1 "Cross Blitz"

Cross Blitz_Data
.DS_Store
baselib.dll
Cross Blitz.exe
GameAssembly.dll
UnityCrashHandler64.exe
UnityPlayer.dll

In all honesty, I might be splitting hairs here as .DS_Store is the only offending file in this listing. But even for something so seemingly benign as a .DS_Store, shipping it to your customers can have far reaching consequences. Not noticing the .DS_Store file is something that I can understand: it doesn’t show up in Finder by default and if you’re only using a Mac you’ll probably never had to bother cleaning them up.

But .DS_Store files are just the start of what you can find in game folders. Next on the list is Hardspace: Shipbreaker by Blackbird Interactive, whose extra files are a bit more problematic:

$ ls -1 "Hardspace Shipbreaker"
app_1161580.mustache
app_1161580.vdf
config.ini
Data/
lib_burst_generated.pdb
MonoBleedingEdge/
Shipbreaker.exe*
Shipbreaker_BurstDebugInformation_DoNotShip/
Shipbreaker_Data/
UnityCrashHandler64.exe
UnityPlayer.dll

Two items pop out here: lib_burst_generated.pdb and Shipbreaker_BurstDebugInformation_DoNotShip/. A .pdb file contains debug symbols for PE executables, so lib_burst_generated.pdb likely contains debug symbols for code generated by the Unity Burst Compiler. Generally speaking, debug symbols aren’t something you need unless you’re debugging. To my mind, these symbols shouldn’t be part of the build shipped to end users.

But in case there was any question about shipping these files, the message the BurstDebugInformation_DoNotShip folder is sending is definitely clear: “do not ship this with your game!”. What’s even funnier is the fact that the debug symbols in the root folder and the ones in the Burst directory are equivalent!

$ cmp -b lib_burst_generated.pdb \
    Shipbreaker_BurstDebugInformation_DoNotShip/Data/Plugins/x86_64/lib_burst_generated.pdb
# nothing! that means they are byte-for-byte identical!

“But debug symbols are just harmless files” I hear you cry! Well, that may be so, but this is two ~18MB files sitting on my computer that I don’t need. And if Unity’s own guidance is anything to go by, the only people who would ever need it are the game devs.

And this brings me to my point: it’s not about calling out Blackbird in this case, but that if they are doing this, it is likely more games are doing this.

Naturally, if I look in my game library:

$ find . -name "*_BurstDebugInformation_DoNotShip"

card-board-town_BurstDebugInformation_DoNotShip
Cult Of The Lamb_BurstDebugInformation_DoNotShip
Death Must Die_BurstDebugInformation_DoNotShip
Shipbreaker_BurstDebugInformation_DoNotShip
Soul Stalker_BurstDebugInformation_DoNotShip
Potion Craft_BurstDebugInformation_DoNotShip
Stellar Settlers_BurstDebugInformation_DoNotShip
The Murder of Sonic The Hedgehog_BurstDebugInformation_DoNotShip
PanOrama_BurstDebugInformation_DoNotShip

That’s 9 games leaving unneeded Burst debug symbols and compilation information on my harddrive. And while debug symbols and Burst compilation info might not be huge by themselves, things tend to add up.

And we’re not done yet, there is always more. Let’s take a look at Necrosmith 2 by Alawar:

$ ls -1 "Necrosmith 2"

baselib.dll
GameAssembly.dll
Necrosmith2.exe
Necrosmith2_BackUpThisFolder_ButDontShipItWithYourGame/
Necrosmith2_Data/
UnityCrashHandler64.exe
UnityPlayer.dll

Another folder that explicitly tells you not to ship it! So what’s in the folder that Alawar should back up, but not ship?

$ ls -1 Necrosmith2_BackUpThisFolder_ButDontShipItWithYourGame/

il2cppOutput/
Managed/

If you see a Managed folder in a Unity game it generally means the .NET assemblies can be found there and it’s using Unity’s custom Mono runtime for scripting. But we just saw GameAssembly.dll in the root folder, which is a dead giveaway that the developer used IL2CPP instead. So that il2cppOutput folder must be interesting then!

$ ls -1 il2cppOutput | less

__Generated.cpp
__Generated_CodeGen.c
analytics.json
Assembly-CSharp.cpp
Assembly-CSharp__1.cpp
Assembly-CSharp__10.cpp
Assembly-CSharp__11.cpp
Assembly-CSharp__12.cpp
Assembly-CSharp__13.cpp
Assembly-CSharp__14.cpp
Assembly-CSharp__15.cpp

I had to pipe the output of ls into less to not flood my terminal. How many files are there anyway?

$ ls -1 il2cppOutput | wc -l
498

Oh dear, this looks like the entire output of IL2CPP for our reading pleasure. But wait a minute, if I’m not mistaken IL2CPP is notorious for generating big compilation units. How big is the folder we’re digging in?

$ du -hd 1 .
676M    ./il2cppOutput
15M     ./Managed
691M    .

…right. Necrosmith’s installation size is 1.76GB total. That means that a whopping 38.3% of the game’s install size is just… inert data. While reverse engineers will probably rejoice that they can throw the managed code into a .NET disassembler to figure out the game’s inner workings, it’s probably not what Alawar intended.

Now the point I’m trying to make here is not that devs are bad just for forgetting things. Rather, I feel like there’s a tendency to view “the build” as something that’s the least interesting part of making software. And if devs are leaving data on my disk that just sits there, you can bet more software is doing this. Now think past games and to all software installed on your computer. I get the sneaking suspicion there’s a non-trivial amount of it doing nothing more than taking up space on my harddrive.

Oh great, they’re going to talk about web bloat

Now before I start this next rant section, I want to make clear this isn’t a general “the web is bloated” point. People much smarter than me have already written way better posts about that. Instead, I want to highlight another form of bloat. It’s not one that solely depends on your framework of choice, or how many trackers and ads you decide to install. It’s more subtle, and thus way harder to detect and even remediate.

As an example, I was ordering some beers for Father’s day a while back. While checking the tracking page for the package the store had sent me, I poked around in the website source to see what tech it used. Digging around in the sources tab gets you a license file for vue-i18n pretty quickly: that means Vue.js! But then something stood out in the middle of all that minified JavaScript:

// not shown is megabytes of minified JS
/**
 * Prism: Lightweight, robust, elegant syntax highlighting
 *
 * @license MIT <https://opensource.org/licenses/MIT>
 * @author Lea Verou <https://lea.verou.me>
 * @namespace
 * @public
 */

Is that Prism? As in: the code syntax highlighting library Prism?! What on earth does a parcel tracking system need with code syntax highlighting? Now, I’ve tried a few wild guesses as to what could have caused the site to include Prism into their bundle:

Now, to my mind none of those features require syntax highlighting. So why is it included in their bundle? A likely explanation is that the editor the site uses pulls it in by default, but that makes me wonder even more: didn’t someone look at their build artifacts and ask themselves if they needed code syntax highlighting for tracking parcels?

Another, somewhat similar story: at some point I was using some app at my work and, being the nosy must-know-it-all, I naturally got curious and started poking around. It didn’t take me long to find out it was an Angular app (as is required by law in Enterprise™), but I was not prepared when I opened up the first JS file:

/******/ (() => { // webpackBootstrap
/******/ 	var __webpack_modules__ = ({
/***/ 3092:
/***/ ((__unused_webpack_module, exports) => {
"use strict";

exports.byteLength = byteLength;
exports.toByteArray = toByteArray;
exports.fromByteArray = fromByteArray;
var lookup = [];
var revLookup = [];
// megabytes of output omitted

Oh boy, deep breath, the developer had apparently just shipped Webpack debug output. No wonder the JS bundle size was over 7MB. All those comments and whitespace are just sitting there doing nothing! The big difference in this instance was that I could actually talk with the developer in question, since it was an internal app. So after a while of finding out who’s who, I stumbled upon the git repo in question. Looking around, I found my smoking gun in angular.json:

// other configuration omitted
"optimization": {
  "scripts": false,
}

Okay, tell me Angular docs, what does that do exactly?

This option enables various optimizations of the build output, including:
- Minification of scripts and styles
- Tree-shaking
- Dead-code elimination

Right, we’re disabling basically all optimizations just like that. And since the default is true, it means someone consciously set that parameter to false! So I created an issue and a pull request flipping the option back on, which was summarily accepted and merged without much further comment. So what then even was the point of this option being disabled? Well, when I asked the developer about it they told me a coworker (I guess you could say “customer” in this case) was having issues with the app, so they started debugging and turned off optimizations. And, they confessed, “I guess I forgot to turn it back on again”. Well, okay then.

Looking at the git log, this had been there for over a year. No one on the team noticed that they were shipping debug output all this time?

It’s not exclusive to the web

Now, you might be thinking: “okay, Duck is angry at the internet, join the bloody queue”. But my point is that it’s not just that. Games tend to be asset heavy, so they focus on optimizing those as much as possible. But even then folders saying “please do not ship me” end up in the final build. An enterprise app can apparently just float around for over a year on debug output.

I could go hunting for examples on how it’s doom and gloom on your mobile phone, but it turns out people way smarter have already done the hard work for me. Look at the Emerge Tools Blog and be amazed at the sheer amount of debug symbols, stale fonts, duplicate libraries and unused assets being downloaded that just sit there on your phone.

A small selection of their excellent analysis:

So what’s with all the rants examples? Well, I confess I was very careful with picking my examples. The central theme in all of these examples (and most of the things on Emerge’s blog) is that these things tend to happen over time. Stuff inside an app ages, gets refactored, rebuilt, removed and switched around. It’s not unique to a platform, programming language or framework. But each of these examples evoke a nasty and pervasive feeling in me that developers just don’t seem to care as much what they’re throwing over the fence to the user’s devices. To me, that feels alien. If I’m in the business of selling cakes, isn’t it normal that I want to see and test the final product? How come developers don’t seem to do that as much? Don’t you want to know what you are giving away behind that fancy icon in the app store?

Now, I want to end this blog on a high. Being angry on the internet is easy, but that doesn’t solve anything. Instead, I want to ask something of you, dear reader. When you are making software, don’t just run it on your dev machine and check off that ticket. Don’t just install that new dependency, or add that extra file. Look at your outputs. Be bold, take a release build and see what it is you’re delivering to other people. Try to reverse engineer your own code. By looking at the code that truly runs on the users’ device, you might just find some interesting stuff, and even learn a thing or two along the way!