UuidCreate predictability

Abstract

This blog post attempts to list the current state of things in the field of random UUID prediction. While util-linux/libuuid/gen_uuid.c always uses 16 full random new bytes from the OS entropy source (namely, /dev/u?random), the Windows couterpart has predictability issues.

  • libuuid uses pure non-diluted entropy bits
  • Windows 10 uses an AES (CTR?)-based entropy dilution method
  • Windows XP uses an RC4-based entropy dilution method

We'll see the extents up to which someone could predict the UuidCreate outputs.

Context

So, Nikolay «denish» Denishchenko made an excellent review of the UuidCreate function entropy sources and predictability, which he published in 2008 on some russian .NET forum, gotdotnet.ru, which now redirects to a Microsoft MSDN Russian page : https://social.msdn.microsoft.com/Forums/ru-RU/home.

The last job (2008-2011) of our lad was to be Lead Developer of the « Astraea » project at Kaspersky (their main in-house scanning engine), so this guy is definitely not a randomer pressing F5 in Hex-Rays.

The guy even has some patents registered:

  • Detection and minimization of false positives in anti-malware processing - United States 7 640 589

Final words about the author (whose online traces seem to vanish in 2012), from a Linkedin « Recommendation » :

I would like to recommend Nikolay as researcher, team leader and software architect. On the Astraea project he did a splendid job in each of these roles. It was a pleasure for me to work with Nikolay in one team. He is a very creative person with strong technical background, excellent analytical and troubleshooting skills. He designed the real-time expert system “Astraea” from scratch and driven its development into production phase. He chose and designed mathematical model for the system and proved its applicability to the subject matter in series of researches. As an architect he paid incredible attention to the details and left no place for gaps that seemed to be unavoidable in such a complex project.

Even though no credentials check was needed, the lad happens to be such a pointure that I couldn't resist mentioning it.

The blog post

Back to our topic, the blog post he published on gotdotnet has been archived on the web archive, and a Chromium & Google Translate copy can be found on this site.

Online links:

It also seems to have been published in « RSDN Magazine » :

And an online copie with pictures exists !

Metadata:

  • Author: denish
  • Date: 06/23/2008 10:28 PM
  • Tags: Programming , Security , SQL , system development

Offline copies:

It's a good read, and here is a quick summary :

UuidCreate produces UUIDs. To do so, 16 random bytes are generated, and two masks are applied :

1
2
u[7] = ( u[7] & 0x0f ) | 0x40 # Version : 4
u[9] = ( u[9] & 0x3f ) | 0x80 # Variant : RFC4122

To generate those random bytes, a « round-robin » of 8 RC4 keystream generators is used. Each generator is initialized to a 256 bytes random key, and is reset to a new random state every 500 000 bytes of output. The 256 random bytes are passed to the RC4 KSA, which shuffles them a little bit. Since it's already 256 bytes of random bytes it does not (? :D) change anything.

UUID generation involves picking one more random byte from each generator modulo 8, which means every generator has produced two random bytes. As such, 250 000 UUIDs can be generated before the keys are changed.

UuidGenerator2

The 8 keys themselves are filled with the following function from ADVAPI32.DLL, which has the following stacktrace :

advapi32.dll!_SystemFunction036@8 ()
rsaenh.dll!_FIPS186Gen@28 ()        + 0x5f bytes
rsaenh.dll!__CPGenRandom@12 ()      + 0x33 bytes
advapi32.dll!_CryptGenRandom@12 ()  + 0x3d bytes

Confirming the UuidCreate internals layout

Before doing anything else, we need to know if our analysis is still valid for the current version of Microsoft Windows. Let's fire radare2 on our Windows dual-boot.

1
./Cutter-v1.7.2-x86_64.Linux.AppImage /media/windows/Windows/System32/rpcrt4.dll

So, our UuidCreate is still here, and it calls some weird remote function, then applies the two (0x3f,0x80) and (0x0fff,0x4000) AND & OR operations to set the versions and variants :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
/ (fcn) sym.RPCRT4.dll_UuidCreate 65
|   sym.RPCRT4.dll_UuidCreate (int arg4);
|           ; arg int arg4 @ rcx
|           0x18000a750      push rbx
|           0x18000a752      sub rsp, 0x20
|           0x18000a756      mov edx, 0x10 ; 16
|           0x18000a75b      mov rbx, rcx ; arg4
|           0x18000a75e      call qword sym.imp.ext_ms_win_core_winrt_remote_l1_1_0.dll_RemoteWinRTActivation_2 ; sym.imp.ext_ms_win_core_winrt_remote_l1_1_0.dll_GetLocalVmAliasName_8 ; [0x1801171a8:8]=0x1800752e0
|           0x18000a764      test eax, eax
|       ,=< 0x18000a766      je 0x18000a78a
|       |   0x18000a768      and byte [rbx + 8], 0x3f
|       |   0x18000a76c      mov eax, 0xfff
|       |   0x18000a771      and word [rbx + 6], ax
|       |   0x18000a775      mov eax, 0x4000
|       |   0x18000a77a      or word [rbx + 6], ax
|       |   0x18000a77e      or byte [rbx + 8], 0x80
|       |   0x18000a782      xor eax, eax
|      .--> 0x18000a784      add rsp, 0x20
|      :|   0x18000a788      pop rbx
|      :|   0x18000a789      ret
|      :`-> 0x18000a78a      mov eax, 0xe ; 14
\      `==< 0x18000a78f      jmp 0x18000a784

On no, this seems to be an entry point in some .NET runtime hell. I tried to dig a little bit further, but it's not simple.

It seems that the Microsoft runtime changed significantly in 2014:

A few links later :

, it turns out we'll have to debug a live windows to know which DLL is used, as this approach is a finite-time approach to getting the actual DLL name. (When compared to the other path: browsing random websites looking for an answer to a vague question)

Experimenting on Windows 10

To debug a live process using a specific Win32 API function, there is no better tool than the Python ctypes library. The following script isn't the one used, but a slightly more complex one focusing on the offset between the actual RFC4122 UUIDv1 time and the produced ones (using the 100ns increment trick).

The goal of this section is to find out if the RC4 environment is still in use today (in Window 10).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env python3

import struct
import uuid
import time
import ctypes

def reorder(x):
    return struct.pack('<I2H8c',*struct.unpack('>I2H8c',x))

def get4():
    x = bytes(16)
    ctypes.windll.rpcrt4.UuidCreate(x)
    return uuid.UUID(bytes=reorder(x))

def get1():
    x = bytes(16)
    ctypes.windll.rpcrt4.UuidCreateSequential(x)
    return uuid.UUID(bytes=reorder(x))

def get0():
    x = bytes(16)
    ctypes.windll.rpcrt4.UuidCreateNil(x)
    return uuid.UUID(bytes=reorder(x))

N=400
dataset = [
    [
        get1().time_low,
        uuid.uuid1().time_low,
        (int(time.time()*1e9/100)+0x01b21dd213814000) & 0xffffffff,
        time.sleep(0.001),
    ]
    for x in range(N)]

As you might notice, proper byte reordering is needed to marshal the UUIDs out of the Win32 API. The '4' column isn't in the right place, while the [89ab] column is. Hence the above « I2H8C » reordering struct.

PS C:\WINDOWS\system32> py -3 -c "import uuid,ctypes;x=bytes(16);print('\n'.join([[ctypes.windll.rpcrt4.UuidCreate(x),str(uuid.UUID(bytes=x))][1] for i in range(10)]))"
2ca895c6-6f7d-ac49-917a-e3f6b02b5b4d
0371e22f-f2ab-6144-9f50-4efe1571590b
6743f16a-26e4-7b48-ac92-59cfa0ada7bc
ab6f9a10-f5d2-a446-95d0-ee45e31fa345
f609c983-6f8d-7245-b9ef-08328fd6d191
ac5b03cb-d70c-0344-ae75-b5a1ca6213b2
bbd84648-a17d-754a-b835-a6ba0cbdd0de
cfb08dd7-fdd7-8a4a-800b-294cdf93f962
c2d67a86-ed54-814f-a776-e1beb82fde28
0d1bd2f8-a21a-b74f-9afa-45636cbe1fd9

Before digging further, and since debugging processes randomly isn't very efficient, I found a bit of information online :

  • https://msdn.microsoft.com/en-us/library/bb417a2c-7a58-404f-84dd-6b494ecf0d13#id11
  • [10] Section 2.5.5.2: Windows uses the cryptographic PRNG from the Cryptographic API (CAPI) and the Cryptographic API Next Generation (CNG) for generation of Version 4 GUIDs.
  • [11] Section 2.5.5.2: For reasons of increased privacy protection for our customers, Microsoft systems beginning with Windows 2000 prefer to generate version 4 GUIDs in which the 122 bits of nonformat information are random. Although only a small minority of version 4 GUIDs require cryptographic randomness, the random bits for all version 4 GUIDs built in Windows are obtained via the Windows CryptGenRandom cryptographic API or the equivalent, the same source that is used for generation of cryptographic keys. This source is FIPS 140-certified in various versions of Windows, as documented at [MSFIP140CryptCerts]

  • GUID 2.5.5.2 Internal Structure : https://msdn.microsoft.com/en-us/library/58d8c8ed-3cff-4a43-855a-351b4325c077#endNote11

(Fun fact, there seems to be a world of fake websites translating the Stack Overflow content in random languages :


Here are : the actual source post, the answer, and a related question as well.

A few boring screenshots and a stack trace later, I reproduced what the other lad found: there does not seem to be RC4 anymore, but some bcryptdll AES RNG used. While digging in the FIPS requirements met by Microsoft's implementation, I recall having read AES CTR. This needs to be investigated a bit further.

Debugger Debugger Debugger

And here's a sortof stack trace reproducing the results of another lad (Will Dean) that has as well investigated a little bit the UuidCreate generator. (https://stackoverflow.com/users/987/will-dean)

Address          To               From             Siz Comment                                  Party 
0000001EB4FEEDB8 00007FFCA3BE04AA 00007FFCA3BF4780 B0  bcryptprimitives.SymCryptWipeAsm         System
0000001EB4FEEE68 00007FFCA3BE0296 00007FFCA3BE04AA B0  bcryptprimitives.AesRNGState_generate+EA System
0000001EB4FEEF18 00007FFCA5C4A764 00007FFCA3BE0296 30  bcryptprimitives.ProcessPrng+126         System
0000001EB4FEEF48 0000000076500933 00007FFCA5C4A764 50  rpcrt4.00007FFCA5C4A764                  User
0000001EB4FEEF98 00000000764FF1E3 0000000076500933 80  _ctypes.0000000076500933                 User
0000001EB4FEF018 00000000764FAB9B 00000000764FF1E3 150 _ctypes.00000000764FF1E3                 User
0000001EB4FEF168 00000000764FB42D 00000000764FAB9B 100 _ctypes.00000000764FAB9B                 User
0000001EB4FEF268 00000000764F69DC 00000000764FB42D B0  _ctypes.00000000764FB42D                 User
0000001EB4FEF318 000000007655A84F 00000000764F69DC E0  _ctypes.00000000764F69DC                 User
0000001EB4FEF3F8 0000000076538979 000000007655A84F 190 python36.000000007655A84F                User
0000001EB4FEF588 000000007655A6CE 0000000076538979 E0  python36.0000000076538979                User
0000001EB4FEF668 0000000076538979 000000007655A6CE 190 python36.000000007655A6CE                User
0000001EB4FEF7F8 000000007655B35B 0000000076538979 B0  python36.0000000076538979                User
0000001EB4FEF8A8 0000000076566143 000000007655B35B 90  python36.000000007655B35B                User
0000001EB4FEF938 00000000765660A1 0000000076566143 70  python36.0000000076566143                User
0000001EB4FEF9A8 000000007656604B 00000000765660A1 40  python36.00000000765660A1                User
0000001EB4FEF9E8 00000000766BA661 000000007656604B 70  python36.000000007656604B                User
0000001EB4FEFA58 00000000766BAE01 00000000766BA661 70  python36.00000000766BA661                User
0000001EB4FEFAC8 00000000766BA53F 00000000766BAE01 30  python36.00000000766BAE01                User
0000001EB4FEFAF8 000000007660B268 00000000766BA53F 40  python36.00000000766BA53F                User
0000001EB4FEFB38 00000000765AA044 000000007660B268 100 python36.000000007660B268                User
0000001EB4FEFC38 000000001C4C126D 00000000765AA044 40  python36.00000000765AA044                User
0000001EB4FEFC78 00007FFCA69C1FE4 000000001C4C126D 30  python.000000001C4C126D                  System
0000001EB4FEFCA8 00007FFCA702CB31 00007FFCA69C1FE4 50  kernel32.00007FFCA69C1FE4                System
0000001EB4FEFCF8 0000000000000000 00007FFCA702CB31     ntdll.00007FFCA702CB31                   User

This presumably means a similar configuration to the previous state :

  • An actual «secure» RNG is used to seed a high-throughput RNG
  • The high-throughput is now AES-based instead of RC4-based

Consequence: just the same risk, would someone steal the seed of the state once, then he or she would be able to predict N UUIDs, given the following limits :

  • After N UUIDs generated, the seed is renewed
  • There might be thread-wise reorderings

Open questions :

  • How is this AES RNG output used ?
  • What are the reseeding conditions ?
  • How does the AES RNG gets its seed ?

Experimenting on Windows XP, take 1

So, the latest version of .NET (since UuidCreate is exposed by the .NET runtime, right ?) on Windows 10 are not using RC4 any more. What about installing Windows XP and testing this in this ancient OS ?

(This section is fairly short, but you can still figure yourself someone struggling with VirtualBox and TLS errors on Windows XP for a while. With Benny Hill playing in the background)

Statically analyzing the Windows XP rpcrt4.dll

After a set of failed attempts at live debugging a Windows XP (these fresh XP vms have a significant amount of configuration debt, so to speak. I tried MinGW gdb as a server to a remote IDA on wine64, a local x64dbg, tried the dbgsvr (svrdbg?) shipped with the Windows SDK something, and gave up.), I finally accessed the C:/windows/system32/rpcrt4.dll file thanks to a python.exe -m http.server 8000 and the VirtualBox bridge mode. (Not to mention « wmic nicconfig list brief » and « ipconfig /renew », there's so much things I'm skipping right now).

python server listing

So. Please pardon these digressions. Here we are, looking at the IDA views of rpcrt4.dll (MD5).

|Name                    Segment Start    Length    Locals    Arguments W h a t e v e r
 cookiecheck             .text  77E5602D  00000019  00000000  00000000  R  .  .  .  .  .  .
 actual_uuidcreate       .text  77E5604B  0000006D  00000118  00000008  R  .  .  .  B  T  .
 somthing_table_bitmask  .text  77E560BD  0000002B  00000004  0000000C  R  .  .  .  B  T  .
 precaller_crypto_maybe  .text  77E560ED  0000003F  00000010  00000010  R  .  .  .  B  T  .
 actual_crypto_call      .text  77E5612C  00000123  00000010  0000000C  R  .  .  .  .  T  .
*UuidCreate              .text  77E56254  00000036  00000008  00000004  R  .  .  .  B  T  .
+DllEntryPoint           .text  77E5628F  00000015  00000004  0000000C  R  .  .  .  B  T  .
 sub_77E562A4            .text  77E562A4  000001D8  0000010C  00000010  R  .  .  .  B  .  .
 sub_77E56481            .text  77E56481  00000064  0000000C  00000004  R  .  .  .  B  .  .
 sub_77E564F4            .text  77E564F4  00000029  00000014  00000000  R  .  .  .  B  .  .
 sub_77E56556            .text  77E56556  0000003F  00000008  00000004  R  .  .  .  B  T  .
 sub_77E5659A            .text  77E5659A  00000017  00000004  00000004  R  .  .  .  B  .  .
 sub_77E56651            .text  77E56651  00000035  00000008  00000004  R  .  .  .  B  .  .

As an experienced reverse engineer might know, in a binary the following rule of thumb can be applied : the lower the address, the juicier the function. Here's a small graph of the calls between the children of UuidCreate :

UuidCreate childrens

Down a few function calls, here we are, looking at something that looks like it finishes with a RC4 PRGA :

_BYTE *__stdcall actual_crypto_call(int generator, unsigned int len_16, _BYTE *output_buffer) { [ .. snip .. ] _BYTE *result; // eax int rc4_tmp_A; // eax char rc4_tmp_C; // bl _BYTE *v32; // [esp+1Ch] [ebp+Ch] output = output_buffer; rc4_i = *(unsigned __int8 *)(generator + 256); len_16_1 = len_16; rc4_tmp_B = *(unsigned __int8 *)(generator + 257); if ( len_16 ) { if ( len_16 >> 2 && !((unsigned __int8)output_buffer & 3) ) { output_1 = output_buffer; v32 = &output_buffer[4 * (len_16 >> 2)]; v8 = (int *)(output_1 - 4); v9 = len_16 >> 2; do { v10 = v8[1]; Ok, is this : v11 = (unsigned __int8)(rc4_i + 1); A/ The RC4 KSA (hint: there's no XOR int the RC4 KSA) v12 = *(_BYTE *)(generator + v11); B / A call to an external Random() function (hint: there's no «call» nor «jump») v13 = (unsigned __int8)(v12 + rc4_tmp_B); C / Some in-house weird thing (hint: it's weird, right ?) v14 = *(_BYTE *)(generator + v13); D / Where my asm-reading skills meet their end (hint: yes) *(_BYTE *)(generator + v11) = v14; ? *(_BYTE *)(generator + v13) = v12; v15 = *(unsigned __int8 *)(generator + (unsigned __int8)(v14 + v12)) ^ v10; v16 = (unsigned __int8)(v11 + 1); v17 = *(_BYTE *)(generator + v16); v18 = (unsigned __int8)(v17 + v13); Well, it might be a loop-less version of the v19 = *(_BYTE *)(generator + v18); RC4 PRGA, used for alignment reasons, maybe. *(_BYTE *)(generator + v16) = v19; *(_BYTE *)(generator + v18) = v17; v20 = (*(unsigned __int8 *)(generator + (unsigned __int8)(v19 + v17)) << 8) ^ v15; v21 = (unsigned __int8)(v16 + 1); v22 = *(_BYTE *)(generator + v21); v23 = (unsigned __int8)(v22 + v18); v24 = *(_BYTE *)(generator + v23); *(_BYTE *)(generator + v21) = v24; *(_BYTE *)(generator + v23) = v22; v25 = (*(unsigned __int8 *)(generator + (unsigned __int8)(v24 + v22)) << 16) ^ v20; rc4_i = (unsigned __int8)(v21 + 1); v26 = *(_BYTE *)(generator + rc4_i); rc4_tmp_B = (unsigned __int8)(v26 + v23); v27 = *(_BYTE *)(generator + rc4_tmp_B); *(_BYTE *)(generator + rc4_i) = v27; *(_BYTE *)(generator + rc4_tmp_B) = v26; v28 = (*(unsigned __int8 *)(generator + (unsigned __int8)(v27 + v26)) << 24) ^ v25; v8 = (int *)&v32[-4 * v9]; *v8 = v28; --v9; } while ( v9 ); len_16_1 = len_16 & 3; if ( !(len_16 & 3) ) goto LABEL_7; output = v8 + 1; } do { But there's no doubt, here is the RC4 PRGA ! :) rc4_i = (rc4_i + 1) & 0xFF; rc4_tmp_A = *(unsigned __int8 *)(generator + rc4_i); rc4_tmp_B = (rc4_tmp_A + rc4_tmp_B) & 0xFF; rc4_tmp_C = *(_BYTE *)(generator + rc4_tmp_B); *(_BYTE *)(generator + rc4_i) = rc4_tmp_C; *(_BYTE *)(generator + rc4_tmp_B) = rc4_tmp_A; *output++ ^= *(_BYTE *)(generator + (unsigned __int8)(rc4_tmp_C + rc4_tmp_A)); --len_16_1; } while ( len_16_1 ); } LABEL_7: result = (_BYTE *)(generator + 256); *result = rc4_i; result[1] = rc4_tmp_B; return result; }

I won't bother my dear reader anymore with the steps I had to take while doing this static reverse engineering effort, but here are the key results :

  • There does not seem to be a call to an external random source
  • Windows XP indeed uses RC4, but maybe not in the way described by Nikolay

Experimenting on Windows XP, take 2

We needed to confirm whether or not rpcrt4.dll!UuidCreate calls an actual random function. Well, here we go :

Debugger

A few « F7 » after our breakpoint on UuidCreate we stumble upon the infamous SystemFunction036, cited by Nikolay.

Debugger

Thanks for the answer, live debugging ! We won't go further, for this would mean digging in the native Windows entropy generator, which is for another story. (That's a funny one, for sure!)

https://sci-hub.tw/10.1145/1315245.1315304

Debugger

Counting our entropy bits

There seem to be a thread safety interlocking failure in the RC4 code, but the main finding of this analysis is that even though the very first entropy sources might be correct ones, the algorithm reuses the same 8 * 256 bytes = 16 KiB of initial key material to produce 500000 * 8 bytes = 31250 KiB of random bytes.

Which means roughly that each source entropy bit is « diluted » (derived) in 1953 bits of random output, in a place where Microsoft could just have called CryptGenRandom every time if security was the primary need. Turns out, UUID generation has specific performance needs, which can explain resorting on RC4.

Entropy

Each produced UUID has 4 + 2 bits wiped out to write down the version and variant, as explained above, which means 122 bits of entropy output at each produced UUID.

So, here are the entropy counts :

  • 16 Kib of initial random key material, applying the RC4 KSA is considered to preserve entropy;
  • 31250 Kib of produced keystream;
  • 122 bits per produced UUID. (Yes, this does not take into account the internal layout of 8 parallel PRNG. We have 8 bits (=100%) of info for most of them, but only 4 and 6 for two of them (=50% & 75%), but let's ignore this for a moment)

Without diving in the details of CryptGenRandom (while being an interesting trip in forbidden & cursed Redmond crypts, it is out of the scope of our article for valid reasons, as any informed reader would already have deduced.), we can reduce our current problem (predicting UuidCreate output) to the following :

  • Recovering the internal state of RC4 using less than 500000 bytes of known PRGA output. (2^22 bits)

Current RC4 weaknesses

RC4 has been known for a while as an insecure stream cipher for several reasons:

  • The KSA is poor, the key is nearly used as-is
  • The first bytes of PRGA are poorly random as well
  • The PRGA stream can be recognized as such

Would RC4 be the identity function, we would need 16 Kib (2048 bytes) of entropy to recover the internal state of our 8 PRNG, which means collecting 128 successive UUIDs.

Let's say we collected 128 consecutive UUIDs, we then know 256 bytes of PRGA output for our 8 PRNG (two of them being known at 50% and 75%).

Open questions :

  • Can we recover the internal state ?