Reverse engineer your own build
Disclaimer
Before we start, I want to make it very clear that this is in no way intended to dunk on the developers of the projects I’m mentioning here. I’m trying to use these artifacts as an example of a phenomenon I’m seeing, not as an excuse for cheap laughs.
It should also be noted that the software under review was in the state described below at 2024-04-13. Developers might have changed artifacts, builds and games since then. Your mileage may vary when trying this yourself.
One of my hobbies is reverse engineering games I play. A thing that surprises me
is the fact that a number of them often have some files that generally should not
be on my computer. These files range from debug symbols to OS specific files such as .DS_Store or
folders like __MACOSX.
To show you what I mean, I’ve trawled through my game library to see if I could find some examples. It didn’t take long to find Cross Blitz:
$ ls -1 "Cross Blitz"
Cross Blitz_Data
.DS_Store
baselib.dll
Cross Blitz.exe
GameAssembly.dll
UnityCrashHandler64.exe
UnityPlayer.dll
In all honesty, I might be splitting hairs here as .DS_Store is the only offending file in this listing.
But even for something so seemingly benign as a .DS_Store, shipping it to your customers
can have far reaching consequences.
Not noticing the .DS_Store file is something that I can understand: it doesn’t show up in Finder
by default and if you’re only using a Mac you’ll probably never had to bother cleaning them up.
But .DS_Store files are just the start of what you can find in game folders.
Next on the list is Hardspace: Shipbreaker
by Blackbird Interactive, whose extra files are a bit more problematic:
$ ls -1 "Hardspace Shipbreaker"
app_1161580.mustache
app_1161580.vdf
config.ini
Data/
lib_burst_generated.pdb
MonoBleedingEdge/
Shipbreaker.exe*
Shipbreaker_BurstDebugInformation_DoNotShip/
Shipbreaker_Data/
UnityCrashHandler64.exe
UnityPlayer.dll
Two items pop out here: lib_burst_generated.pdb and Shipbreaker_BurstDebugInformation_DoNotShip/.
A .pdb file contains debug symbols for PE executables,
so lib_burst_generated.pdb likely contains
debug symbols for code generated by the
Unity Burst Compiler.
Generally speaking, debug symbols aren’t something you need unless you’re debugging.
To my mind, these symbols shouldn’t be part of the build shipped to end users.
But in case there was any question about shipping these files, the message the
BurstDebugInformation_DoNotShip folder is sending is definitely clear:
“do not ship this with your game!”.
What’s even funnier is the fact that the debug symbols in the root folder and the ones in the Burst directory are equivalent!
$ cmp -b lib_burst_generated.pdb \
Shipbreaker_BurstDebugInformation_DoNotShip/Data/Plugins/x86_64/lib_burst_generated.pdb
# nothing! that means they are byte-for-byte identical!
“But debug symbols are just harmless files” I hear you cry! Well, that may be so, but this is two ~18MB files sitting on my computer that I don’t need. And if Unity’s own guidance is anything to go by, the only people who would ever need it are the game devs.
And this brings me to my point: it’s not about calling out Blackbird in this case, but that if they are doing this, it is likely more games are doing this.
Naturally, if I look in my game library:
$ find . -name "*_BurstDebugInformation_DoNotShip"
card-board-town_BurstDebugInformation_DoNotShip
Cult Of The Lamb_BurstDebugInformation_DoNotShip
Death Must Die_BurstDebugInformation_DoNotShip
Shipbreaker_BurstDebugInformation_DoNotShip
Soul Stalker_BurstDebugInformation_DoNotShip
Potion Craft_BurstDebugInformation_DoNotShip
Stellar Settlers_BurstDebugInformation_DoNotShip
The Murder of Sonic The Hedgehog_BurstDebugInformation_DoNotShip
PanOrama_BurstDebugInformation_DoNotShip
That’s 9 games leaving unneeded Burst debug symbols and compilation information on my harddrive. And while debug symbols and Burst compilation info might not be huge by themselves, things tend to add up.
And we’re not done yet, there is always more. Let’s take a look at Necrosmith 2 by Alawar:
$ ls -1 "Necrosmith 2"
baselib.dll
GameAssembly.dll
Necrosmith2.exe
Necrosmith2_BackUpThisFolder_ButDontShipItWithYourGame/
Necrosmith2_Data/
UnityCrashHandler64.exe
UnityPlayer.dll
Another folder that explicitly tells you not to ship it! So what’s in the folder that Alawar should back up, but not ship?
$ ls -1 Necrosmith2_BackUpThisFolder_ButDontShipItWithYourGame/
il2cppOutput/
Managed/
If you see a Managed folder in a Unity game it generally means the
.NET assemblies can be found there and it’s using Unity’s custom Mono
runtime for scripting. But we just saw GameAssembly.dll in the root folder,
which is a dead giveaway that the developer used IL2CPP
instead. So that il2cppOutput folder must be interesting then!
$ ls -1 il2cppOutput | less
__Generated.cpp
__Generated_CodeGen.c
analytics.json
Assembly-CSharp.cpp
Assembly-CSharp__1.cpp
Assembly-CSharp__10.cpp
Assembly-CSharp__11.cpp
Assembly-CSharp__12.cpp
Assembly-CSharp__13.cpp
Assembly-CSharp__14.cpp
Assembly-CSharp__15.cpp
I had to pipe the output of ls into less to not flood my terminal.
How many files are there anyway?
$ ls -1 il2cppOutput | wc -l
498
Oh dear, this looks like the entire output of IL2CPP for our
reading pleasure. But wait a minute, if I’m not mistaken IL2CPP
is notorious for generating big compilation units.
How big is the folder we’re digging in?
$ du -hd 1 .
676M ./il2cppOutput
15M ./Managed
691M .
…right. Necrosmith’s installation size is 1.76GB total. That means that a whopping 38.3% of the game’s install size is just… inert data. While reverse engineers will probably rejoice that they can throw the managed code into a .NET disassembler to figure out the game’s inner workings, it’s probably not what Alawar intended.
Now the point I’m trying to make here is not that devs are bad just for forgetting things. Rather, I feel like there’s a tendency to view “the build” as something that’s the least interesting part of making software. And if devs are leaving data on my disk that just sits there, you can bet more software is doing this. Now think past games and to all software installed on your computer. I get the sneaking suspicion there’s a non-trivial amount of it doing nothing more than taking up space on my harddrive.
Oh great, they’re going to talk about web bloat
Now before I start this next rant section, I want to make clear this isn’t a general “the web is bloated” point.
People much smarter than me have
already
written
way
better
posts
about
that.
Instead, I want to highlight another form of bloat. It’s not one that solely depends on
your framework of choice, or how many trackers and ads you decide to install. It’s more
subtle, and thus way harder to detect and even remediate.
As an example, I was ordering some beers for Father’s day a while back. While checking the tracking page for the package
the store had sent me, I poked around in the website source to see what tech it used.
Digging around in the sources tab gets you a license file for vue-i18n pretty quickly:
that means Vue.js!
But then something stood out in the middle of all that minified JavaScript:
// not shown is megabytes of minified JS
/**
* Prism: Lightweight, robust, elegant syntax highlighting
*
* @license MIT <https://opensource.org/licenses/MIT>
* @author Lea Verou <https://lea.verou.me>
* @namespace
* @public
*/
Is that Prism? As in: the code syntax highlighting library Prism?! What on earth does a parcel tracking system need with code syntax highlighting? Now, I’ve tried a few wild guesses as to what could have caused the site to include Prism into their bundle:
- Customers need a fancy text editor to create custom messages for their customers and it’s included by default.
- They provide some form of code customization and a convenient editor with syntax highlighting for their customers’ developers.
- They use a fancy editor for a chatbot so I can chat with customer service.
Now, to my mind none of those features require syntax highlighting. So why is it included in their bundle? A likely explanation is that the editor the site uses pulls it in by default, but that makes me wonder even more: didn’t someone look at their build artifacts and ask themselves if they needed code syntax highlighting for tracking parcels?
Another, somewhat similar story: at some point I was using some app at my work and, being the nosy must-know-it-all, I naturally got curious and started poking around. It didn’t take me long to find out it was an Angular app (as is required by law in Enterprise™), but I was not prepared when I opened up the first JS file:
/******/ (() => { // webpackBootstrap
/******/ var __webpack_modules__ = ({
/***/ 3092:
/***/ ((__unused_webpack_module, exports) => {
"use strict";
exports.byteLength = byteLength;
exports.toByteArray = toByteArray;
exports.fromByteArray = fromByteArray;
var lookup = [];
var revLookup = [];
// megabytes of output omitted
Oh boy, deep breath, the developer had apparently just shipped Webpack debug output.
No wonder the JS bundle size was over 7MB. All those comments and whitespace are just
sitting there doing nothing!
The big difference in this instance was that I could actually talk with the developer in question, since it
was an internal app. So after a while of finding out who’s who, I stumbled upon the git repo in question.
Looking around, I found my smoking gun in angular.json:
// other configuration omitted
"optimization": {
"scripts": false,
}
Okay, tell me Angular docs, what does that do exactly?
This option enables various optimizations of the build output, including:
- Minification of scripts and styles
- Tree-shaking
- Dead-code elimination
Right, we’re disabling basically all optimizations just like that. And since the default is true, it means someone consciously
set that parameter to false! So I created an issue and a pull request flipping the option back on, which was summarily
accepted and merged without much further comment.
So what then even was the point of this option being disabled?
Well, when I asked the developer about it they told me a coworker (I guess you could say “customer” in this case)
was having issues with the app, so they started debugging and turned off optimizations. And, they confessed, “I guess
I forgot to turn it back on again”. Well, okay then.
Looking at the git log, this had been there for over a year. No one on the team noticed that they were shipping debug output all this time?
It’s not exclusive to the web
Now, you might be thinking: “okay, Duck is angry at the internet, join the bloody queue”. But my point is that it’s not just that. Games tend to be asset heavy, so they focus on optimizing those as much as possible. But even then folders saying “please do not ship me” end up in the final build. An enterprise app can apparently just float around for over a year on debug output.
I could go hunting for examples on how it’s doom and gloom on your mobile phone, but it turns out people way smarter have already done the hard work for me. Look at the Emerge Tools Blog and be amazed at the sheer amount of debug symbols, stale fonts, duplicate libraries and unused assets being downloaded that just sit there on your phone.
A small selection of their excellent analysis:
- Showing stale assets and a lot of debug symbols in iOS apps
- Showing how a tooling upgrade can cost you 100MB extra
- Explaining how GM’s companion apps are almost 0.5GB
- How Candy Crush gets bitten by OS specific disk allocation units
- A twitter thread explaining why Linkedin’s iOS app is 468MB
So what’s with all the rants examples? Well, I confess I was very careful with picking my examples. The central
theme in all of these examples (and most of the things on Emerge’s blog) is that these things tend to happen over time.
Stuff inside an app ages, gets refactored, rebuilt, removed and switched around. It’s not unique to a platform,
programming language or framework.
But each of these examples evoke a nasty and pervasive
feeling in me that developers just don’t seem to care as much what they’re throwing over
the fence to the user’s devices. To me, that feels alien. If I’m in the business of selling cakes, isn’t
it normal that I want to see and test the final product? How come developers don’t seem to do that as much?
Don’t you want to know what you are giving away behind that fancy icon in the app store?
Now, I want to end this blog on a high. Being angry on the internet is easy, but that doesn’t solve anything. Instead, I want to ask something of you, dear reader. When you are making software, don’t just run it on your dev machine and check off that ticket. Don’t just install that new dependency, or add that extra file. Look at your outputs. Be bold, take a release build and see what it is you’re delivering to other people. Try to reverse engineer your own code. By looking at the code that truly runs on the users’ device, you might just find some interesting stuff, and even learn a thing or two along the way!