# Understanding C++ Mangled Symbols in Ghidra (FA.SMS Case Study) This document explains how mangled C++ symbols found in the Fighter Anthology (FA) `FA.SMS` file can be decoded and classified using Microsoft-style name mangling rules. It is intended for advanced modders and reverse engineers working with Ghidra or similar disassemblers. --- ## πŸ“¦ What Are Mangled Symbols? C++ compilers encode function names, classes, return types, and calling conventions into "mangled" strings to support **overloading**, **namespaces**, and **type safety** in compiled binaries. Example: ``` ?DrawHUD@@YAXXZ ``` This encodes the C++ function: ```cpp void DrawHUD(); ``` --- ## 🧠 Anatomy of a Mangled Name (Microsoft-style) A typical Microsoft-style C++ mangled name looks like: ``` ?FunctionName@ClassName@@ ``` ### Common Parts: | Code | Meaning | |-------|------------------------------------| | `?` | Start of a mangled name | | `@@` | Separators between parts | | `Y` | Free (global) function | | `Q` | Class member function (non-static) | | `S` | Static function (class-level) | | `3` | Global/static variable | | `XZ` | void return type, no parameters | --- ## πŸ” How to Classify Mangled Symbols Using patterns, we can classify symbols: ### βœ… Global / Free Functions Pattern: ``` ?FunctionName@@Y...XZ ``` Example: ``` ?DrawHUD@@YAXXZ β†’ void DrawHUD(); ``` ### βœ… Global / Static Variables Pattern: ``` ?VariableName@@3... ``` Example: ``` ?seqFonts@@3PAUSEQFNT@@A β†’ global static variable or array ``` ### βœ… Static Class Methods Pattern: ``` ?FunctionName@ClassName@@SG...XZ ``` `S` = static, `G` = global call convention Example: ``` ?Init@HUD@@SGXXZ β†’ static void HUD::Init(); ``` ### βœ… Member Functions (non-static) Pattern: ``` ?FunctionName@ClassName@@Q...XZ ``` `Q` = thiscall (object pointer passed implicitly) Example: ``` ?Update@Radar@@QAEHXZ β†’ int Radar::Update(); ``` ### βœ… Class Member Variables Pattern: ``` ?Variable@Class@@... ``` If it doesn't include `Y`, `Q`, or `S`, it's likely a class variable or constant. --- ## 🧱 Summary of Classifications | Symbol Format | Meaning | Category | |------------------------------|----------------------------|-----------------| | `?Name@@Y...XZ` | Free-standing function | Function | | `?Name@@3...` | Global/static variable | Variable | | `?Func@Class@@SG...XZ` | Static class method | Function | | `?Func@Class@@Q...XZ` | Member function | Function | | `?Var@Class@@...` | Member variable | Class Variable | --- ## πŸ›  How This Helps in Ghidra By parsing mangled symbols and decoding them: - You can **automatically rename** `FUN_0040AD40` to meaningful names like `HUD::Draw()` - You can sort and classify functions vs variables - You’ll understand **class structure**, not just flat functions --- ## βœ… Next Steps - Write a Python script to parse and classify all symbols from FA.SMS - Use this classification to help automate labeling in Ghidra - Reverse engineer class relationships and group logic --- ## 🏁 Final Notes The ability to decode these mangled names gives you a HUGE advantage in understanding old C++ games like Jane's Fighter Anthology. You’re essentially re-creating part of the original code architecture from raw binary β€” that's next-level modding. --- ## πŸ”Ž Real Examples: Mangled β†’ C++ Readable Below are real samples showing how Ghidra displays C++ symbols (mangled), and what they mean in clean C++ code. --- ### 🟒 Global Free Function ``` Ghidra Symbol: ?DrawHUD@@YAXXZ C++ Version: void DrawHUD(); ``` ``` Ghidra Symbol: ?GetAltitude@@YAMXZ C++ Version: float GetAltitude(); ``` --- ### πŸ”΅ Static Class Method ``` Ghidra Symbol: ?Init@HUD@@SGXXZ C++ Version: static void HUD::Init(); ``` ``` Ghidra Symbol: ?Shutdown@Renderer@@SGXXZ C++ Version: static void Renderer::Shutdown(); ``` --- ### 🟠 Member Function (non-static) ``` Ghidra Symbol: ?Update@Radar@@QAEHXZ C++ Version: int Radar::Update(); ``` ``` Ghidra Symbol: ?SetWeapon@Aircraft@@QAEXPAD@Z C++ Version: void Aircraft::SetWeapon(char*); ``` --- ### 🟣 Global / Static Variable ``` Ghidra Symbol: ?seqFonts@@3PAUSEQFNT@@A C++ Version: SEQFNT* seqFonts; ``` ``` Ghidra Symbol: ?g_bEnableDebug@@3_NA C++ Version: bool g_bEnableDebug; ``` --- ### 🟀 Class Member Variable ``` Ghidra Symbol: ?altitude@Aircraft@@1MA C++ Version: float Aircraft::altitude; ``` ``` Ghidra Symbol: ?nTargets@RadarSystem@@1HA C++ Version: int RadarSystem::nTargets; ``` --- These mappings are useful when reverse engineering binaries in Ghidra and rebuilding class or function layouts as they existed in the original C++ source code. --- ## πŸ” Advanced Mangled Name Decoding: Functions with Arguments As we dive deeper into FA.SMS and Microsoft-style C++ symbol mangling, many function names include encoded argument types. Here's how they work and how to decode them. ### 🧠 Structure Breakdown A mangled symbol like: ``` ?CN_NewSetLines@@YAXFFF@Z ``` Breaks down to: - `?` β†’ Symbol prefix for mangling - `CN_NewSetLines` β†’ Function name - `@@YAX` β†’ Calling convention + return type - `YAX` means `__cdecl void` - `FFF@Z` β†’ Three arguments of type `float` --- ### πŸ“˜ Common Argument Type Codes | Code | Meaning | C++ Type | |------|---------------------|------------------| | `H` | `int` | `int` | | `I` | `unsigned int` | `unsigned int` | | `F` | `float` | `float` | | `D` | `double` | `double` | | `M` | `bool` | `bool` | | `J` | `long` | `long` | | `PAD`| `char*` | `char*` | | `PAE`| `unsigned char*` | `unsigned char*` | | `PAUCLASS@@` | pointer to class | `CLASS*` | --- ### βœ… Real Example Mapping | Mangled Symbol | Decoded C++ | |--------------------------------------|------------------------------------| | `?CN_NewSetLines@@YAXFFF@Z` | `void CN_NewSetLines(float, float, float)` | | `?CN_GetString@@YAJPADPAE@Z` | `int CN_GetString(char*, unsigned char*)` | | `?SetWeapon@Aircraft@@QAEXPAD@Z` | `void Aircraft::SetWeapon(char*)` | | `?BuildVDOList@@YAPAUVDOLinkedList@@PAD@Z` | `VDOLinkedList* BuildVDOList(char*)` | --- ### 🧩 Why This Matters Understanding these symbol encodings allows us to: - Reconstruct original function signatures - Rebuild C++ classes and headers for simulation/modding - Refactor or relabel symbols intelligently in Ghidra We are effectively **reverse engineering the developer’s original C++ source**, line by line. --- ## 🧩 Distinguishing GLOBAL vs LOCAL in Mangled C++ Symbols When working with Microsoft C++ symbol mangling (like in FA.SMS), understanding **what is global vs local** is essential for reconstructing the original source layout. ### πŸ”΅ GLOBAL (Variables, Arrays, Pointers) A global symbol uses the `@@3` signature in mangled format: ``` ?Active@@3PAFA β†’ float* Active; ``` - `3` β†’ Global/static variable - `PAFA` β†’ Pointer to float array 🧠 These are most often: - Global variables (shared across files) - Static arrays or config pointers --- ### πŸ”΄ LOCAL (Functions, Methods) A local/function symbol uses the `@@Y...Z` or similar pattern: ``` ?AbortCampaign@@YIDXZ β†’ bool AbortCampaign(); ?CN_NewSetLines@@YAXFFF@Z β†’ void CN_NewSetLines(float, float, float); ``` 🧠 These include: - Free functions - Static or member class methods - Functions with typed arguments --- ### πŸ“˜ Why This Division Is Critical | Use Case | GLOBAL Symbol | LOCAL Symbol | |----------------------------------|--------------------------|---------------------------| | Search in Ghidra easily | βœ… Yes | ❌ Only via disassembly | | Access to memory/data segments | βœ… Yes | 🚫 Not relevant | | Actual game logic flow | 🚫 No | βœ… Yes | | Reconstruct headers / prototypes | βœ… Variables | βœ… Full functions | --- ### πŸ’‘ Naming Recommendation Although we label them as `GLOBAL` and `LOCAL` in our output, you may rename them: - `GLOBAL` β†’ `VARIABLES` - `LOCAL` β†’ `FUNCTIONS` This better reflects C++ source organization when rebuilding `.h` and `.cpp` files from reverse-engineered data. --- ## 🧬 Advanced C++ Symbol Breakdown Examples We are now decoding more advanced symbols with custom classes, structures, and enums. ### 🧩 Example Symbols and Meanings #### `?CN_ReadConfig@@YAXPAUCN_INFO@@PAE@Z` ```cpp void CN_ReadConfig(CN_INFO*, unsigned char*); ``` - `YAX` = void return - `PAUCN_INFO@@` = pointer to class `CN_INFO*` - `PAE` = unsigned char* --- #### `?CN_SetFactoryDefaults@@YAXPAUCN_INFO@@@Z` ```cpp void CN_SetFactoryDefaults(CN_INFO*); ``` --- #### `?COLDrawInfo@@YGXGF@Z` ```cpp void COLDrawInfo(unsigned short, float); ``` --- #### `?COLFlatGround@@YIDJPAUF24_POINT3@@00@Z` ```cpp bool COLFlatGround(int, long, F24_POINT3*, int, int); ``` - `YI` = bool return - `D` = int, `J` = long - `PAU...` = pointer to class --- #### `?COMMENT_TYPE@@3W4__unnamed@@A` ```cpp enum __unnamed COMMENT_TYPE; ``` - `W4` = Enum - `3` = global/static variable --- ### πŸ” Updated Type Map Used in Demangling | Mangled | Meaning | |---------|----------------------| | `PAD` | `char*` | | `PAE` | `unsigned char*` | | `PAUName@@` | `Name*` (custom class pointer) | | `J` | `long` | | `G` | `unsigned short` | | `W4` | `enum` | This update improves detection for class pointers and function arguments for advanced reverse engineering of FA.SMS. --- ## πŸ”„ Update: Demangling Advanced Return Types and Arguments We now support decoding of mangled symbols with: - Return types like `bool`, `unsigned short`, `long` - Arguments using codes like `G`, `J`, and class pointers - Functions with repeated simple types, like `int, int` --- ### πŸ§ͺ New Examples #### `?COLDrawInfo@@YGXGF@Z` ```cpp void COLDrawInfo(unsigned short, float); ``` #### `?COLFlatGround@@YIDJPAUF24_POINT3@@00@Z` ```cpp bool COLFlatGround(int, long, F24_POINT3*, int, int); ``` #### `?CPGetContact@@YGGJ@Z` ```cpp unsigned short CPGetContact(unsigned short, long); ``` --- ### πŸ”  New Supported Argument Codes | Code | Meaning | |------|------------------| | `G` | unsigned short | | `J` | long | | `0` | int (used in repetition) | These codes improve parsing for more complex function signatures. --- ## πŸ”„ Update: Real World Function Signature Examples Here are more native C++ examples successfully translated from Ghidra-style or mangled MSVC symbols: --- ```cpp void APEndArrestorCatch(); double APLanding(); void APStartFinalApproach(); bool AbortCampaign(); double AtFriendlyAP(); VDOLinkedList* BuildVDOList(char*); void CN_ClearLines(); int CN_GetBigString(char*, unsigned char*); int CN_GetString(char*, unsigned char*); void CN_NewPrint(char*); void CN_NewSetLines(float, float, float); void CN_Print(unsigned char*); void CN_ReadConfig(CN_INFO*, unsigned char*); void CN_SetFactoryDefaults(CN_INFO*); ``` --- ### πŸ”Ž Details: - `CN_INFO*` decoded from `PAUCN_INFO@@` - `unsigned char*` decoded from `PAE` - `char*` decoded from `PAD` - `float`, `int`, `bool`, `double` and class pointer returns are now recognized - Clean naming and signature formatting added --- These help in reconstructing real C++ source files like `.h` and `.cpp` from reverse-engineered content. --- ## πŸ”„ Update: Support for More Return Types and Arguments We've added support for decoding new mangled symbol formats including: - `YAHXZ` β†’ `int ()` - `YAXH@Z` β†’ `void (int)` - `YAPAUX@@XZ` β†’ `X* ()` for user-defined types --- ### πŸ§ͺ New Demangled Examples #### `?CreateNewCampaign@@YAPAUCAMPAIGN@@XZ` ```cpp CAMPAIGN* CreateNewCampaign(); ``` #### `?D3DChangeSettings@@YAHPADHH@Z` ```cpp int D3DChangeSettings(char*, int, int); ``` #### `?D3DCreateZBuffer@@YAHXZ` ```cpp int D3DCreateZBuffer(); ``` #### `?D3DInitTexture@@YAXH@Z` ```cpp void D3DInitTexture(int); ``` #### `?D3DInitTextures@@YAXXZ` ```cpp void D3DInitTextures(); ``` #### `?D3DRender@@YAXXZ` ```cpp void D3DRender(); ``` These cover a wider range of return types and argument patterns for decoding MSVC-mangled C++ symbols from the FA.SMS file. --- ## πŸ”„ Update: Support for Long Return Types and `char*, unsigned char*` Arguments ### βœ”οΈ Added `YAJ` handling: This indicates: - Return type: `long` - Arguments now support combinations like `PADPAE` --- ### πŸ§ͺ New Examples #### `?CN_GetBigString@@YAJPADPAE@Z` ```cpp long CN_GetBigString(char*, unsigned char*); ``` #### `?CN_GetString@@YAJPADPAE@Z` ```cpp long CN_GetString(char*, unsigned char*); ``` These symbols now decode properly using the updated Python script! --- ## πŸ†• Update: Pointer Qualifiers, `void`, `unsigned char`, and Function Pointers As our demangler evolved we encountered a number of additional encodings that show up in Fighter Anthology's SMS. To fully decode the remaining symbols we added support for the following: ### New primitive codes | Code | C++ Type | Notes | |------|-----------------|---------------------------------------------------------| | `E` | `unsigned char` | Single‑byte unsigned integer values. | | `X` | `void` | Indicates a `void` return type or a `void` parameter. | ### Pointer qualifiers Pointers begin with `P` followed by a qualifier letter and a base type code. The qualifier letter encodes const/volatile attributes: | Qualifier | Meaning | Example | |-----------|-------------------------------|----------------------------| | `A` | plain pointer | `PAX` β†’ `void*` | | `B` | pointer to `const` | `PBX` β†’ `const void*` | | `C` | pointer to `volatile` | `PCX` β†’ `volatile void*` | | `D` | pointer to `const volatile` | `PDX` β†’ `const volatile void*` | Trails of `A` after a pointer code add another level of indirection. For instance, `PADA` decodes to `char**` (pointer to pointer to `char`). Class pointers such as `PAUMYCLASS@@` decode to `MYCLASS*`. ### Function pointer types (`P6`) A particularly tricky pattern is `P6`, which introduces a **pointer to function**. The letter following `P6` denotes the calling convention (`A` = `__cdecl`), followed by a return type code and then a series of argument codes. The argument list is terminated by `@` if present; an `X` indicates a `void` parameter list. For example: ``` ?MP_Disconnect@@3P6AXJ@ZA β†’ void (*MP_Disconnect)(long); ?MP_ReadAvail@@3P6AJJJ@ZA β†’ long (*MP_ReadAvail)(long, long, long); ``` The script now identifies such patterns for global variables and decodes them into proper C++ function pointer declarations. Similarly, `P6` can appear in function arguments, allowing members that accept callbacks to be demangled. ### Function pointers as return types In rare cases a symbol may represent a function that **returns** a pointer to another function. The `P6` token then appears immediately after the `YA` in the mangled name to encode the return type itself. The demangler has been extended to recognise and decode these patterns. For example: ``` ?_query_new_handler@@YAP6AHI@ZXZ ``` decodes to: ```cpp int (*)(unsigned int) _query_new_handler(); ``` The return type `P6AHI@Z` means "pointer to a `__cdecl` function returning `int` (`H`) and taking an `unsigned int` (`I`) argument". The trailing `X` afterwards denotes that `_query_new_handler` itself takes no parameters. Another example with a parameter: ``` ?_set_new_handler@@YAP6AHI@ZP6AHI@Z@Z ``` becomes: ```cpp int (*)(unsigned int) _set_new_handler(int (*)(unsigned int)); ``` The function returns the previous new‑handler (a function pointer) and accepts a new handler of the same signature. Without special handling, symbols like these would remain partially mangled. ### New primitive code `K` (unsigned long) The type map has been expanded with the `K` code, which represents `unsigned long`. Its pointer forms `PAK` (pointer to `unsigned long`) and `PBK` (pointer to `const unsigned long`) are also recognised. This addition allowed functions and variables using 32‑bit unsigned integers to be demangled correctly. ### Heuristics for rare codes A few symbols in FA.SMS include obscure codes like bare `P` or `Y` inside complex signatures. These do not correspond to documented MSVC types, but to avoid leaving `UNKNOWN(...)` markers the script heuristically maps them to `void*` and `int` respectively. This ensures every symbol decodes to a valid C++ signature, even when the exact original type can't be recovered.