Database creation

In this article:

  • Why this database was created
  • What are the underlying technologies
  • Some early strategies and stories
  • Information on the collection scripts
  • Some « fun facts » about broken things

Why this database was created

Initially, I did some DFIR jobs at my previous employer. When doing forensics, specifically on Microsoft Windows workstations, you end up scrolling in log listings, file listings, registry listings, permissions listings, etc.

While some text-based artifacts could be read and understood (« InstallerFunction », « PointlessMediaMenu »), the most unnerving thing was to stumble on UUIDs, to paste them on Google, and to find that it's a standard UUID used in every Windows OS since MS DOS to do a pointless thing.

Sometimes as well, the UUID in question is the only handle you have to trace back the attack path, because it's a CLSID and is mapped via the Microsoft Windows registry to a DLL program. Which allows an attacker not to use « remote-code-execution.exe » but « 45640654065 » instead, which isn't human-readable.

I had the need for a global UUID database several times when doing forensics, and started a draft of this database in my hotel room in Dublin, in the week that ended with our team earning a SANS forensics coin. (Great training btw, thanks, $employer).

Fast forwards a few months, and we're in december, the new year will start in a few days, no one is home, and I've finally an occasion to re-open my long list of abandoned projects. Time to build this global UUID database.

How this database was created

I created this database on 2018-01-01, back in the time it had not a lot of UUIDs. It's a standard Django application, whose main objects are UUID, indexed by their UUID. There's « Comments », mapped to UUIDs, and that's pretty much all. (I recently added « Labels », but it's not mature yet).

The whole thing had little to no CSS, a simple API using the « Referer » field as an additional mandatory parameter, and a « full list » special page. (Which cannot exist anymore given the database size).

Having collected a few UUIDs, I could not resist but to publish it on 2018-01-01, such a nice date:

I started with approximately 8000 UUIDs, which were mostly extracted from the Microsoft Windows CLSIDs of my host.

Gathering more UUIDs

I looked a bit online for funny keywords (« error », « strange », « program », « erroneous », etc.) appending «UUID» or «GUID» to find out interesting and new UUIDs, as well as new UUID families. (« UUID namespaces »).

Also, whenever I stumbled upon a new UUID, I would look it up on Google, and find an associated list of related UUIDs. At the beginning, I mostly found MSDN listings. Fun fact, they are OCR'd, and contain '?' and 'L' instead of proper letters, sometimes. You cannot just copy-paste the data.

Microsoft search wasn't that helpful :

Sometimes I would also stumble on Github listings, way more useful. They usually included a source (here: mingw, itself linking to the windows SDK), which could then be fetched directly.

  • https://github.com/EddieRingle/portaudio/blob/master/src/hostapi/wasapi/mingw-include/propkey.h

I would eventually finish with such a folder, full of flat list and parsers. (I removed the parsers from the listing below, but you'll get an idea of the thing).

$ find . -maxdepth 1 -type f -perm 644 |sort | xargs wc -l
   1778 ./alltclsid.txt
     13 ./android_effects
    100 ./apple-uid
    111 ./baselinemgt
     82 ./bh-win-04-seki-up2
    111 ./biosbits
     78 ./bluetooth-logs
     19 ./boot_bcd
    366 ./bt
     62 ./btresponses
    419 ./canonical_names
    153 ./clspush
     30 ./control_panel
      7 ./dce-sec-acl-manager
     24 ./dcom
     44 ./dcomcnfg
  47499 ./dmde
     37 ./dsdt.alaska.ami.intl
     13 ./dsdt.dsl.1
  33973 ./dsdt.dsl.1.old
     19 ./dsdt.dsl.uuid
     19 ./dsdt.unknown
   2125 ./EDK2_2015_GUIDs-2017-04-27.csv
     35 ./efivar-guids
    126 ./efivar-protocol-guids
     77 ./extended-rights-reference
     52 ./fdisk
     20 ./flowerpower
     65 ./folder_type_identifiers
     21 ./gppref
    104 ./gpt-guid
    155 ./impacket-dcerpc-v5-epm
    148 ./impacket-dcerpc-v5-epm-knownuuid
   1141 ./ISMMC.html
    203 ./linux-uuid
      7 ./.linux-uuid.py.swp
      8 ./.linux-uuid.swp
     61 ./ms_audit
     24 ./ms-dcom-assign
     36 ./msgpp
   3813 ./msi-guids-windows.html
     15 ./mstrust
     70 ./mstscax
     42 ./ms-vds-assign
     54 ./nmap_nse_msrpc
     23 ./pset2
     17 ./psetid
     48 ./rdp_h
   2645 ./reactos
     21 ./reactos-acuuid
    795 ./schema_nt4.txt
    794 ./shellbags_tln.pl
    928 ./shellbags_xp.pl
     13 ./sony_smartband
     40 ./updateGroupGUIDs
    152 ./vc-redist
    971 ./vc-redist-packages-and-related-registry-entries
     11 ./ves-sony-w64
     17 ./windows-azure-permissions
   4572 ./wine
     31 ./yaho
 104437 total

This was (and is still) useful for static lists found in weird places, but definitely not suited for a massive collection initiative. I had to automate.

Automating the gathering

Because gathering manual lists is slow, painful and error-prone, I automated some things, starting with the Microsoft Windows registry CLSID enumeration :

A first collection script was developed, which simply enumerated a lot of weird places in Microsoft Windows, then produced a CSV.

Then, I implemented the upload feature, allowing for a fully automated operation mode.

So far, no one on the internet executed it in an attempt to participate in the database collection except :

  • xer, when directly asked to in PM
  • some polish CERT probably, since the software installed is what you would expect on a polish detonation VM

There are way more than one class of UUIDs in the Microsoft Windows namespaces. The DCOM (related to RPC) namespace is full of them as well, so I had to look at different places, and wrote the appropriate code:

Here is a version of the script, gathering the following UUIDs on a workstation:

  • MSI
  • CLSID (disabled)
  • Activex
  • comapplications
  • WMI (disabled)

Here is the code :

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
function collect-msi() {
    Get-ChildItem HKLM:SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\*\Products | ForEach-Object {
     $root  = $_.PsPath
     $_.GetSubKeyNames() | ForEach-Object {
         try {
             $RegKeyPath = (Join-Path -Path (Join-Path -Path $root -ChildPath $_) -ChildPath InstallProperties)
             $obj = Get-ItemProperty -Path $RegKeyPath -ErrorAction Stop
             if ($obj.UninstallString) {
                 [PSCustomObject]@{
                    uuid = ($obj.UninstallString -replace "msiexec\.exe\s/[IX]{1}","")
                     name = $obj.DisplayName ;
                     comment = @(
                     ("Publisher : {0}" -f $obj.Publisher),
                     ("Version : {0}" -f $obj.DisplayVersion) ,
                     ("RegKeyPath : {0}" -f $RegKeyPath)
                     ) -join "`n";

                 }
             }
         } catch {
         }
     }
}}


function collect-activex() {
Get-ChildItem "HKLM:SOFTWARE\Microsoft\Active Setup\Installed Components\" | ForEach-Object {
     $root  = $_.PsPath.split("\\")[-1];
     $a=$_;
     $name=$a.GetValue("");if ($name -eq $null) {$name="none"}; $name=$name.Trim();
     $compid=$a.GetValue("ComponentID");if ($compid -eq $null) {$compid="none"};$compid=$compid.Trim();
     $name = ( "compid:{0}_name:{1}" -f ($compid,$name))
     [PSCustomObject]@{
                     uuid = $root;
                     name = $name ;
                     comment = @(
                        ("ComponentID : {0}" -f $a.GetValue("ComponentID")),
                        ("Version : {0}" -f $a.GetValue("Version") )
                     ) -join "`n";

 }
}}

function collect-wmi () {
Get-WmiObject -Namespace Root -Class __Namespace | ForEach-Object {
    $ns=$_
    Get-WmiObject -Namespace ("root/{0}" -f $ns.Name) -List | ForEach-Object {
        $class=$_;
        $_.Qualifiers | Where-Object Name -eq "UUID" | % {
            $uuid=$_.Value;
            $comment = "{0}:{1}`n" -f ($ns.Name,$class.Name);
            foreach ($property in $class.Qualifiers) {$comment += "Qualifier : {0} : {1}`n" -f ($property.Name,($property.Value -join(", ")))}
            $comment = "{0}`nProperties:" -f ($comment.Trim(", "));
            foreach ($property in $class.Properties) {$comment += "{0}, " -f ($property.Name)}
            $comment = "{0}`nMethods:" -f ($comment.Trim(", "));
            foreach ($property in $class.Methods) {$comment += "{0}, " -f ($property.Name)}
            [pscustomobject]@{uuid=$uuid;name=("wmi:{0}:{1}" -f ($ns.Name,$class.Name));comment=$comment.Trim(", ")}
         }
    }
}
}

function collect-clsid() {
Get-ChildItem "HKLM:\SOFTWARE\Classes\CLSID" | % {
 $uuid=(($_.PSPath -split("\\"))[-1]);
 $name=$_.GetValue("")
 # Todo : add the InProcServer32 dll and other properties
 $comment="CLSID,`n";
 foreach ($key in $_.GetValueNames()) {
    if ($key -ne "") {
        $comment += ("{0}: {1}`n" -f ($key,$_.GetValue($key)))
    }
 }
 if ($name.Length -eq 0) {
    $name="noname"
 }
 $childs = Get-ChildItem $_.PSPath
 foreach ($subvalue in $childs) {
    $proname= ($subvalue.Name -split("\\"))[-1];
    foreach ($subvalueproperty in $subvalue.GetValueNames()) {
        $comment += ("{0}.{1}: {2}`n" -f ($proname,$subvalueproperty,$subvalue.GetValue($subvalueproperty)));
    }
    if ($proname -like "*Server32") {
        $name = ("{0}_{1}" -f ($name,($subvalue.GetValue("") -split "\\")[-1]));
    }
 }
 [pscustomobject]@{uuid=$uuid; name=$name; comment=$comment.Trim()}
 if ($_.GetValue("AppID") -eq $null) {} else {
    if ($uuid -notlike $_.GetValue("AppID")) {
        [pscustomobject]@{uuid=$_.GetValue("AppID"); name=("AppID_{0}" -f $name); comment=$comment.Trim()}
    }
 }

}
}


function collect-comapplications {
    Get-CimInstance Win32_COMApplication | % {
        $comapp=$_;
        [pscustomobject]@{
            uuid=$comapp.AppID;
            name=("DCOMApp:{0}" -f $comapp.Name);
            comment=@(
                ("Description: {0}" -f $comapp.Description),
                ("Caption: {0}" -f $comapp.Caption),
                ("ToString(): {0}" -f $comapp.ToString())
            ) -join"`n"
        }
    }
}

function get-context () {
    "{0} v{1}" -f (
        (Get-WmiObject -class Win32_OperatingSystem).Caption,
        ([System.Environment]::OSVersion.Version)
)}

function filter-gooduuid {

    Process {
        $x=$_;
        $found=0;
        $groups=[regex]::Match($x.uuid,'[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}').captures.groups;
        if ($groups.length -gt 0) {
            $uuid = $groups[0];
            if ( $uuid.length -eq 36 ) {
            [pscustomobject]@{source=$x.source;uuid=$uuid;name=$x.name;comment=$x.comment;author=$x.author};
            $found=1;
            }
        }
        if ($found -eq 0) {
        $groups=[regex]::Match($x.uuid,'[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{2,}').captures.groups;
        if ($groups.length -gt 0) {
            $uuid = "{0}" -f $groups[0];
            $padlen=4*(36-$uuid.Length);
            if ( $uuid.length -gt 20 ) {
            [pscustomobject]@{
                source=$x.source;
                uuid=$uuid.PadRight(36,"0");
                name=("0PAD{0}_{1}" -f ($padlen,$x.name));
                comment=$x.comment;
                author=$x.author;
                };
            $found=1
            }
        }
        }
    }
}

function extract-uuids () {
    $context=("c.ps1v001 {0} {1}" -f ((get-context),(Get-Date -UFormat "%Y-%m-%d")));
    foreach ($x in @(
      "collect-msi",
      "collect-activex",
      "collect-comapplications"
      #"collect-wmi",
      #"collect-clsid"
)) {
        &$x | % {
            [pscustomobject]@{source=$x;uuid=$_.uuid;name=$_.name;comment=$_.comment;author=$context}
        } | filter-gooduuid
    }
}

function push($record) {
    $ProgressPreference = "SilentlyContinue"
    # secret debug line :DD
    # $record ; return ;
    $url=("https://uuid.pirate-server.com/{0}" -f $record.uuid)
    echo ("{0} {1}" -f ($record.uuid,$record.name));
    $response=Invoke-WebRequest -Uri $url -SessionVariable session
    # Blog post note: there is no more CSRF check on this API page, so only one call is enough.
    $csrf=((($response.ToString() -split "csrfmiddlewaretoken' value='")[-1]) -split "'")[0];
    $params=@{
        csrfmiddlewaretoken=$csrf;
        title=$record.name;
        details=$record.comment;
        author=$record.author;
    };
    $response=Invoke-WebRequest -Uri "https://uuid.pirate-server.com/comment" -Method Post \
    -Body $params -WebSession $session -Headers @{Referer=$url;};
}

function upload-uuids {
    Process {
        $record=$_;
        push($record);
    }
}

function main () {
    extract-uuids | upload-uuids
}
main
# $a=push([pscustomobject]@{uuid="deafbeef-deaf-beef-deaf-beefbeefbeef";name="ignore this junk id-pspushtest";author="posh";comment="helo"})

Unfortunately, it requires at least powershell 3, which is not the default configuration of Microsoft Windows hosts (powershell 2.0). But still, it was useful to gather a lot of UUIDs.

Nevertheless, I gathered 10000 UUIDs !

Ingesting specification documents

A typical UUID would find its definition in a specification document:

  • The Bluetooth specification (which, for the record, does not mention at all Harald Bluetooth)
  • The ACPI & EFI specification
  • A lot of Microsoft Windows Server Protocols specification
  • The RFCs

While extracting the UUIDs from the text-based ones proved to be simple, the PDF-only documents were more tricky. Hopefully, poppler and pdftotext were here to convert from PDF to text. And extracting from the Windows_Server_Protocols.zip file is more convenient :

Finally, all those specification UUIDs have been ingested via two methods :

  • For small or simple lists, a large copy-paste to some TSV, then uploaded using my TSV uploader
  • For complex (XML, with a lot of attributes, broken PDF tables) I had to rely on pdftotext and other custom scripts. Regular expressions for the win :)

Diving in the different types of UUID

There are different versions & variants of UUIDs, as well as some vendor-defined custom flavors. Some colors were added so that one could instantly rejoice whenever an UUIDv1 is to be seen.

To highlight this fact, I decided to put colors (and to use Django templates as well). UUIDv1 are green, and the rest has funny colors as well.

As well, the timestamps are randomly respecting - or not - the specification. To produce information out of this, I had to :

Also, Adobe has a funny use of the available UUIDs, but I didn't (yet) implement anything to make use of this.

Broken UUIDs

Since UUIDs are commonly used and stored using their text-based representation rather than their 16-bytes pure form, it is possible to use non-UUID strings in some places. For example, Microsoft uses non-UUID strings in some WMI locations.

Since WMI is an interoperability layer with non-trivial interfaces, I guess the missing bits have been stripped because they were set to zero, and the standard C stripping methods cut them out. Since 1/256 random byte is equal to zero this could have been expected.

Informations on 455ce053-2552-4051-a3e4-c4200dc31b70:

0PAD8_wmi:CIMV2:Win32_VolumeChangeEvent

CIMV2:Win32_VolumeChangeEvent
Qualifier : abstract : True
Qualifier : Locale : 1033
Qualifier : UUID : 455CE053-2552-4051-A3E4-C4200DC31B7  <- missing bits
...................455CE053-2552-4051-A3E4-C4200DC31B70 <- padded to have 6 'node' bytes
...................                        123456789012 <- padded tohighlight the nibble count

Properties:DriveName, EventType, SECURITY_DESCRIPTOR, TIME_CREATED

See more here :

I arbitrarily padded them with zeroes, so that they could be recorded as UUIDs, but I honestly don't know if it's the correct choice.

Enough of the UUID families I found, here are the numbers !

Statistics on UUIDs

Since UUIDv1 are embedding timestamps, it is possible to extract the day of week and the hour of day. Since gnuplot is one of my tools of choice, I plotted all the UUIDs I had in a table :

  • X-axis : 7 days
  • Y-axis : 24 hours

I guess the week-end isn't at the end of the 7-days listing since 1582-10-15 might not have been a monday. There's clearly a pattern, which means it's not full-random, which means my database yields more than 0 bits of entropy :)

A lot of curious facts can be extracted from timestamps, expect more in the future. For now, deduce what you want from the listing below :

  • https://uuid.pirate-server.com/search?q=w64%3A

  • Results for "w64:"

    66666972-912e-11cf-a5d6-28db04c10000 1996-04-08 11:04:28 w64:RIFF 7473696c-912f-11cf-a5d6-28db04c10000 1996-04-08 11:12:02 w64:LIST abf76256-392d-11d2-86c7-00c04f8edb8a 1998-08-21 19:32:26 w64:MARKER 925f94bc-525a-11d2-86dc-00c04f8edb8a 1998-09-22 20:26:50 w64:SUMMARYLIST 20746d66-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:10:34 w64:FMT 61746164-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:12:23 w64:DATA 65766177-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:12:30 w64:WAVE 6b6e756a-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:12:40 w64:JUNK 6c76656c-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:12:42 w64:LEVL 74636166-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:12:55 w64:FACT 74786562-acf3-11d3-8cd1-00c04f8edb8a 1999-12-07 22:12:55 w64:BEXT

As well, here is another graph I made:

New UUID namespaces

My quest to new APIs, namespaces and specifications brought me to new places, and made me discover :

  • That some publications have a « LSID », a unique publication identifier, which is not what I initially searched (a UUID for each living species) but it's enough since I can now have a lizard in my database :

I also discovered funny attack surfaces (yes, my main job is security-related), but this is for another article ;)

Digging in old sources

Some old UUIDs are only to be found in legacy places, such as the Microsoft Windows SDK, or some old Microsoft Windows builds, only in binary form. The legendary « UUID.LIB » file, which isn't that simple to parse.

Also, here is a great place to visit in the Microsoft Windows SDK, the banned APIs header file :

Current status

Now that I ingested most of the UUIDs I could find, I surprise myself by annoying random people on the Internet and on twitter, searching every while and so for « UUID » and « GUID ».

This led me to discoveries, and made me discover the sphere of UUID persons on twitter. More on that later ;)

The current status of the database is :

  • I found some malware-shipping websites (download-dll.exe) that included MSI uninstallation UUIDs, and each of these websites is being dumped with custom scripts. « requests_html » is a great python module.

  • I plan to add more sections to my powershell acquisition script, and to make it compatible to powershell version 2 as well. I should run it on the Microsoft-provided virtual machines (modern.ie)

  • I'm preparing a set of blog posts, the first of which you've just finished reading :)

Thanks for reading, and have a nice day !