Abstract
This blog post attempts to list the current state of things in the field of random UUID prediction. While util-linux/libuuid/gen_uuid.c always uses 16 full random new bytes from the OS entropy source (namely, /dev/u?random), the Windows couterpart has predictability issues.
- libuuid uses pure non-diluted entropy bits
- Windows 10 uses an AES (CTR?)-based entropy dilution method
- Windows XP uses an RC4-based entropy dilution method
We'll see the extents up to which someone could predict the UuidCreate outputs.
Context
So, Nikolay «denish» Denishchenko made an excellent review of the UuidCreate function entropy sources and predictability, which he published in 2008 on some russian .NET forum, gotdotnet.ru, which now redirects to a Microsoft MSDN Russian page : https://social.msdn.microsoft.com/Forums/ru-RU/home.
The last job (2008-2011) of our lad was to be Lead Developer of the « Astraea » project at Kaspersky (their main in-house scanning engine), so this guy is definitely not a randomer pressing F5 in Hex-Rays.
- https://www.linkedin.com/in/ndenish
- https://eugene.kaspersky.com/2012/11/15/finding-the-needle-in-the-haystack-introducing-astraea/
- https://eugene.kaspersky.com/2012/06/20/fighting-false-positives/
The guy even has some patents registered:
- Detection and minimization of false positives in anti-malware processing - United States 7 640 589
Final words about the author (whose online traces seem to vanish in 2012), from a Linkedin « Recommendation » :
I would like to recommend Nikolay as researcher, team leader and software architect. On the Astraea project he did a splendid job in each of these roles. It was a pleasure for me to work with Nikolay in one team. He is a very creative person with strong technical background, excellent analytical and troubleshooting skills. He designed the real-time expert system “Astraea” from scratch and driven its development into production phase. He chose and designed mathematical model for the system and proved its applicability to the subject matter in series of researches. As an architect he paid incredible attention to the details and left no place for gaps that seemed to be unavoidable in such a complex project.
Even though no credentials check was needed, the lad happens to be such a pointure that I couldn't resist mentioning it.
The blog post
Back to our topic, the blog post he published on gotdotnet has been archived on the web archive, and a Chromium & Google Translate copy can be found on this site.
Online links:
- https://web.archive.org/web/20120607034312/http://www.gotdotnet.ru/blogs/denish/1965
- https://web.archive.org/web/20120311182016/http://www.gotdotnet.ru:80/blogs/denish/
It also seems to have been published in « RSDN Magazine » :
And an online copie with pictures exists !
Metadata:
- Author: denish
- Date: 06/23/2008 10:28 PM
- Tags: Programming , Security , SQL , system development
Offline copies:
- /data/Device and cryptanalysis of the UUID-generator in OS Windows _ GotDotNet.Ru.html
- /data/Устройство и криптоанализ UUID-генератора в ОС Windows _ GotDotNet.Ru.html
- /data/Device and cryptanalysis of a UUID generator in Windows.html
It's a good read, and here is a quick summary :
UuidCreate produces UUIDs. To do so, 16 random bytes are generated, and two masks are applied :
1 2 |
|
To generate those random bytes, a « round-robin » of 8 RC4 keystream generators is used. Each generator is initialized to a 256 bytes random key, and is reset to a new random state every 500 000 bytes of output. The 256 random bytes are passed to the RC4 KSA, which shuffles them a little bit. Since it's already 256 bytes of random bytes it does not (? :D) change anything.
UUID generation involves picking one more random byte from each generator modulo 8, which means every generator has produced two random bytes. As such, 250 000 UUIDs can be generated before the keys are changed.
The 8 keys themselves are filled with the following function from ADVAPI32.DLL, which has the following stacktrace :
advapi32.dll!_SystemFunction036@8 ()
rsaenh.dll!_FIPS186Gen@28 () + 0x5f bytes
rsaenh.dll!__CPGenRandom@12 () + 0x33 bytes
advapi32.dll!_CryptGenRandom@12 () + 0x3d bytes
Confirming the UuidCreate internals layout
Before doing anything else, we need to know if our analysis is still valid for the current version of Microsoft Windows. Let's fire radare2 on our Windows dual-boot.
1 |
|
So, our UuidCreate is still here, and it calls some weird remote function, then applies the two (0x3f,0x80) and (0x0fff,0x4000) AND & OR operations to set the versions and variants :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
On no, this seems to be an entry point in some .NET runtime hell. I tried to dig a little bit further, but it's not simple.
It seems that the Microsoft runtime changed significantly in 2014:
- https://blogs.msdn.microsoft.com/vcblog/2014/06/10/the-great-c-runtime-crt-refactoring/
- https://blogs.msdn.microsoft.com/vcblog/2015/03/03/introducing-the-universal-crt/
A few links later :
- https://ofekshilon.com/2016/03/27/on-api-ms-win-xxxxx-dll-and-other-dependency-walker-glitches/
- https://blog.quarkslab.com/runtime-dll-name-resolution-apisetschema-part-i.html
- https://docs.microsoft.com/en-us/windows/desktop/apiindex/windows-apisets
- The UuidCreateNil (= UUIDv0) function MSDN link : https://docs.microsoft.com/en-us/windows/desktop/api/rpcdce/nf-rpcdce-uuidcreatenil
- The UuidCreate (= UUIDv4) function MSDN link : https://docs.microsoft.com/en-us/windows/desktop/api/rpcdce/nf-rpcdce-uuidcreate
- The UuidCreateSequential (= UUIDv1) function MSDN link : https://docs.microsoft.com/en-us/windows/desktop/api/rpcdce/nf-rpcdce-uuidcreatesequential
- WTH (: https://support.microsoft.com/en-us/help/981080/you-receive-a-warning-when-you-use-the-uuidgen-exe-command-or-the-uuid
- The UuidHash function MSDN link : https://docs.microsoft.com/en-us/windows/desktop/api/rpcdce/nf-rpcdce-uuidhash (fun fact, it's a 16-bit xor sum of a given UUID)
, it turns out we'll have to debug a live windows to know which DLL is used, as this approach is a finite-time approach to getting the actual DLL name. (When compared to the other path: browsing random websites looking for an answer to a vague question)
Experimenting on Windows 10
To debug a live process using a specific Win32 API function, there is no better tool than the Python ctypes library. The following script isn't the one used, but a slightly more complex one focusing on the offset between the actual RFC4122 UUIDv1 time and the produced ones (using the 100ns increment trick).
The goal of this section is to find out if the RC4 environment is still in use today (in Window 10).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
As you might notice, proper byte reordering is needed to marshal the UUIDs out of the Win32 API. The '4' column isn't in the right place, while the [89ab] column is. Hence the above « I2H8C » reordering struct.
PS C:\WINDOWS\system32> py -3 -c "import uuid,ctypes;x=bytes(16);print('\n'.join([[ctypes.windll.rpcrt4.UuidCreate(x),str(uuid.UUID(bytes=x))][1] for i in range(10)]))"
2ca895c6-6f7d-ac49-917a-e3f6b02b5b4d
0371e22f-f2ab-6144-9f50-4efe1571590b
6743f16a-26e4-7b48-ac92-59cfa0ada7bc
ab6f9a10-f5d2-a446-95d0-ee45e31fa345
f609c983-6f8d-7245-b9ef-08328fd6d191
ac5b03cb-d70c-0344-ae75-b5a1ca6213b2
bbd84648-a17d-754a-b835-a6ba0cbdd0de
cfb08dd7-fdd7-8a4a-800b-294cdf93f962
c2d67a86-ed54-814f-a776-e1beb82fde28
0d1bd2f8-a21a-b74f-9afa-45636cbe1fd9
Before digging further, and since debugging processes randomly isn't very efficient, I found a bit of information online :
- https://msdn.microsoft.com/en-us/library/bb417a2c-7a58-404f-84dd-6b494ecf0d13#id11
- [10] Section 2.5.5.2: Windows uses the cryptographic PRNG from the Cryptographic API (CAPI) and the Cryptographic API Next Generation (CNG) for generation of Version 4 GUIDs.
-
[11] Section 2.5.5.2: For reasons of increased privacy protection for our customers, Microsoft systems beginning with Windows 2000 prefer to generate version 4 GUIDs in which the 122 bits of nonformat information are random. Although only a small minority of version 4 GUIDs require cryptographic randomness, the random bits for all version 4 GUIDs built in Windows are obtained via the Windows CryptGenRandom cryptographic API or the equivalent, the same source that is used for generation of cryptographic keys. This source is FIPS 140-certified in various versions of Windows, as documented at [MSFIP140CryptCerts]
-
GUID 2.5.5.2 Internal Structure : https://msdn.microsoft.com/en-us/library/58d8c8ed-3cff-4a43-855a-351b4325c077#endNote11
- http://paginaswebpublicidad.com/questions/40101/uuidcreate-co-su-dung-csprng-khong
- http://landcareweb.com/questions/40208/uuidcreateshi-fou-shi-yong-csprng
- http://formacdownload.com/questions/43973/czy-uuidcreate-uzywa-csprng
- https://devask.in/questions/48875929/rtlgenrandomcryptgenrandom-or-other-winapi-to-generate-cryptographically-secure-random-numbers-first-quarter-of-2018 )
Here are : the actual source post, the answer, and a related question as well.
- https://stackoverflow.com/questions/35366368/does-uuidcreate-use-a-csprng
- https://stackoverflow.com/questions/35366368/does-uuidcreate-use-a-csprng/35384818#35384818
- https://stackoverflow.com/questions/3652944/how-securely-unguessable-are-guids
A few boring screenshots and a stack trace later, I reproduced what the other lad found: there does not seem to be RC4 anymore, but some bcryptdll AES RNG used. While digging in the FIPS requirements met by Microsoft's implementation, I recall having read AES CTR. This needs to be investigated a bit further.
And here's a sortof stack trace reproducing the results of another lad (Will Dean) that has as well investigated a little bit the UuidCreate generator. (https://stackoverflow.com/users/987/will-dean)
Address To From Siz Comment Party
0000001EB4FEEDB8 00007FFCA3BE04AA 00007FFCA3BF4780 B0 bcryptprimitives.SymCryptWipeAsm System
0000001EB4FEEE68 00007FFCA3BE0296 00007FFCA3BE04AA B0 bcryptprimitives.AesRNGState_generate+EA System
0000001EB4FEEF18 00007FFCA5C4A764 00007FFCA3BE0296 30 bcryptprimitives.ProcessPrng+126 System
0000001EB4FEEF48 0000000076500933 00007FFCA5C4A764 50 rpcrt4.00007FFCA5C4A764 User
0000001EB4FEEF98 00000000764FF1E3 0000000076500933 80 _ctypes.0000000076500933 User
0000001EB4FEF018 00000000764FAB9B 00000000764FF1E3 150 _ctypes.00000000764FF1E3 User
0000001EB4FEF168 00000000764FB42D 00000000764FAB9B 100 _ctypes.00000000764FAB9B User
0000001EB4FEF268 00000000764F69DC 00000000764FB42D B0 _ctypes.00000000764FB42D User
0000001EB4FEF318 000000007655A84F 00000000764F69DC E0 _ctypes.00000000764F69DC User
0000001EB4FEF3F8 0000000076538979 000000007655A84F 190 python36.000000007655A84F User
0000001EB4FEF588 000000007655A6CE 0000000076538979 E0 python36.0000000076538979 User
0000001EB4FEF668 0000000076538979 000000007655A6CE 190 python36.000000007655A6CE User
0000001EB4FEF7F8 000000007655B35B 0000000076538979 B0 python36.0000000076538979 User
0000001EB4FEF8A8 0000000076566143 000000007655B35B 90 python36.000000007655B35B User
0000001EB4FEF938 00000000765660A1 0000000076566143 70 python36.0000000076566143 User
0000001EB4FEF9A8 000000007656604B 00000000765660A1 40 python36.00000000765660A1 User
0000001EB4FEF9E8 00000000766BA661 000000007656604B 70 python36.000000007656604B User
0000001EB4FEFA58 00000000766BAE01 00000000766BA661 70 python36.00000000766BA661 User
0000001EB4FEFAC8 00000000766BA53F 00000000766BAE01 30 python36.00000000766BAE01 User
0000001EB4FEFAF8 000000007660B268 00000000766BA53F 40 python36.00000000766BA53F User
0000001EB4FEFB38 00000000765AA044 000000007660B268 100 python36.000000007660B268 User
0000001EB4FEFC38 000000001C4C126D 00000000765AA044 40 python36.00000000765AA044 User
0000001EB4FEFC78 00007FFCA69C1FE4 000000001C4C126D 30 python.000000001C4C126D System
0000001EB4FEFCA8 00007FFCA702CB31 00007FFCA69C1FE4 50 kernel32.00007FFCA69C1FE4 System
0000001EB4FEFCF8 0000000000000000 00007FFCA702CB31 ntdll.00007FFCA702CB31 User
This presumably means a similar configuration to the previous state :
- An actual «secure» RNG is used to seed a high-throughput RNG
- The high-throughput is now AES-based instead of RC4-based
Consequence: just the same risk, would someone steal the seed of the state once, then he or she would be able to predict N UUIDs, given the following limits :
- After N UUIDs generated, the seed is renewed
- There might be thread-wise reorderings
Open questions :
- How is this AES RNG output used ?
- What are the reseeding conditions ?
- How does the AES RNG gets its seed ?
Experimenting on Windows XP, take 1
So, the latest version of .NET (since UuidCreate is exposed by the .NET runtime, right ?) on Windows 10 are not using RC4 any more. What about installing Windows XP and testing this in this ancient OS ?
(This section is fairly short, but you can still figure yourself someone struggling with VirtualBox and TLS errors on Windows XP for a while. With Benny Hill playing in the background)
Statically analyzing the Windows XP rpcrt4.dll
After a set of failed attempts at live debugging a Windows XP (these fresh XP vms have a significant amount of configuration debt, so to speak. I tried MinGW gdb as a server to a remote IDA on wine64, a local x64dbg, tried the dbgsvr (svrdbg?) shipped with the Windows SDK something, and gave up.), I finally accessed the C:/windows/system32/rpcrt4.dll file thanks to a python.exe -m http.server 8000 and the VirtualBox bridge mode. (Not to mention « wmic nicconfig list brief » and « ipconfig /renew », there's so much things I'm skipping right now).
So. Please pardon these digressions. Here we are, looking at the IDA views of rpcrt4.dll (MD5).
|Name Segment Start Length Locals Arguments W h a t e v e r
cookiecheck .text 77E5602D 00000019 00000000 00000000 R . . . . . .
actual_uuidcreate .text 77E5604B 0000006D 00000118 00000008 R . . . B T .
somthing_table_bitmask .text 77E560BD 0000002B 00000004 0000000C R . . . B T .
precaller_crypto_maybe .text 77E560ED 0000003F 00000010 00000010 R . . . B T .
actual_crypto_call .text 77E5612C 00000123 00000010 0000000C R . . . . T .
*UuidCreate .text 77E56254 00000036 00000008 00000004 R . . . B T .
+DllEntryPoint .text 77E5628F 00000015 00000004 0000000C R . . . B T .
sub_77E562A4 .text 77E562A4 000001D8 0000010C 00000010 R . . . B . .
sub_77E56481 .text 77E56481 00000064 0000000C 00000004 R . . . B . .
sub_77E564F4 .text 77E564F4 00000029 00000014 00000000 R . . . B . .
sub_77E56556 .text 77E56556 0000003F 00000008 00000004 R . . . B T .
sub_77E5659A .text 77E5659A 00000017 00000004 00000004 R . . . B . .
sub_77E56651 .text 77E56651 00000035 00000008 00000004 R . . . B . .
As an experienced reverse engineer might know, in a binary the following rule of thumb can be applied : the lower the address, the juicier the function. Here's a small graph of the calls between the children of UuidCreate :
Down a few function calls, here we are, looking at something that looks like it finishes with a RC4 PRGA :
I won't bother my dear reader anymore with the steps I had to take while doing this static reverse engineering effort, but here are the key results :
- There does not seem to be a call to an external random source
- Windows XP indeed uses RC4, but maybe not in the way described by Nikolay
Experimenting on Windows XP, take 2
We needed to confirm whether or not rpcrt4.dll!UuidCreate calls an actual random function. Well, here we go :
A few « F7 » after our breakpoint on UuidCreate we stumble upon the infamous SystemFunction036, cited by Nikolay.
Thanks for the answer, live debugging ! We won't go further, for this would mean digging in the native Windows entropy generator, which is for another story. (That's a funny one, for sure!)
https://sci-hub.tw/10.1145/1315245.1315304
Counting our entropy bits
There seem to be a thread safety interlocking failure in the RC4 code, but the main finding of this analysis is that even though the very first entropy sources might be correct ones, the algorithm reuses the same 8 * 256 bytes = 16 KiB of initial key material to produce 500000 * 8 bytes = 31250 KiB of random bytes.
Which means roughly that each source entropy bit is « diluted » (derived) in 1953 bits of random output, in a place where Microsoft could just have called CryptGenRandom every time if security was the primary need. Turns out, UUID generation has specific performance needs, which can explain resorting on RC4.
Each produced UUID has 4 + 2 bits wiped out to write down the version and variant, as explained above, which means 122 bits of entropy output at each produced UUID.
So, here are the entropy counts :
- 16 Kib of initial random key material, applying the RC4 KSA is considered to preserve entropy;
- 31250 Kib of produced keystream;
- 122 bits per produced UUID. (Yes, this does not take into account the internal layout of 8 parallel PRNG. We have 8 bits (=100%) of info for most of them, but only 4 and 6 for two of them (=50% & 75%), but let's ignore this for a moment)
Without diving in the details of CryptGenRandom (while being an interesting trip in forbidden & cursed Redmond crypts, it is out of the scope of our article for valid reasons, as any informed reader would already have deduced.), we can reduce our current problem (predicting UuidCreate output) to the following :
- Recovering the internal state of RC4 using less than 500000 bytes of known PRGA output. (2^22 bits)
Current RC4 weaknesses
RC4 has been known for a while as an insecure stream cipher for several reasons:
- The KSA is poor, the key is nearly used as-is
- The first bytes of PRGA are poorly random as well
- The PRGA stream can be recognized as such
Would RC4 be the identity function, we would need 16 Kib (2048 bytes) of entropy to recover the internal state of our 8 PRNG, which means collecting 128 successive UUIDs.
Let's say we collected 128 consecutive UUIDs, we then know 256 bytes of PRGA output for our 8 PRNG (two of them being known at 50% and 75%).
Open questions :
- Can we recover the internal state ?