Robert Hensing's Blog

Software Security . . . and stuff.


Password vs. Passphrase redux

  • Comments 12
  • Likes

So today Jesper Johannson a gentlemen whom I have the pleasure of speaking with on occasion has posted his 2nd installment on the topic of passwords here.  I encourage you all to read it - in this installment he goes deep into the math and science behind passwords and pass-phrases and attempts to measure the strength of XX character passwords and XX character pass-phrases using personal experience from his own personal pen-testing efforts and studies from reputable sources that he's managed to find and cite (I'm not quite sure how he manages to do it - this man travels more and is busier than many people I know yet he still manages to find time to do his homework and write these amazing articles).

If you haven't read his article yet - here's a teaser:

If there are 5 characters per word, we have 25+4=29 characters, where 4 are spaces, in the pass phrase. How much entropy that pass phrase contains depends on whose estimates you use. Using Shannon’s estimates of 2.3 bits per letter in an 8-letter word nets a total entropy of 29*2.3=66.7 bits. The 66.7 bits calculation is probably a reasonable upper bound on the entropy of a pass phrase, and it compares favorably with a 9-character password with only 45 bits of entropy. For a lower bound, we can use Bruce Schneier’s estimate of 1.3 bits per letter, based on a study by Thomas Cover (B. Schneier, “Applied Cryptography, 2nd Edition,” Wiley, 1996). Shannon advanced 1.3 bits per letter for 16-letter words, though, so it is probably not entirely applicable to our 5-character words. In any case, using 1.3 as the entropy estimate computes to 29*1.3= 37.7, which is actually worse than the 9-character password. Based on that number, you would need a 6-word pass phrase to attain roughly the same entropy as a 9-character password.

I'm really glad Jesper has taken the initiative to dive into the math and science behind our language and calculating entropy and this was in fact a popular point made in many of the replies to my blog - people wanted me to address this but I knew Jesper was working on it so I of course deferred to him . . . Jesper makes the assumption of 5 characters per word and a space between each word - but what about punctuation?  I believe Jesper is assuming that a pass-phrase will be composed of just 5 random words strung together with spaces (i.e. the user has substituted characters as the 'token' in their passwords with random words so instead of using random letters / numbers / symobls you're using random words).

In my first blog on passwords, I was talking about pass-phrases (passwords composed of words) but I really wanted people to start using full sentences with proper punctuation and everything as their passwords.

It would be interesting to see some math behind using proper sentences as your pass-phrases (which is what I personally do as much as possible).  How many words are in the average proper sentence and would a user be more likely to remember a longer sentence over a shorter 'random word' pass-phrase composed of 5 random words?  I believe that a user is more likely to remember "I took my dog to the vet and he's got fleas!" vs. "cow moon cars women security" (wow, that should give you some great insight into how my brain works! <G>) and in addition to the former being longer (and thus containing more entropy) than the latter, it's easier to remember and thus more usable to boot!

Don't forget, sentences usually end with an extra character (like a period, question mark or exclamation point) so that adds one extra character that must be accounted for when doing calculations on the entropy of a passphrase (and what if the sentence is a quote and the user surrounds the qoute with "double quote" characters?).  Looking back at some of my recent pass-phrases that are sentences, I have a couple that are composed of way more than 5 words (i.e. "If we weren't all crazy, we would go insane!") but I have also have some that are shorter than 5 . . .

Jesper does point out in his article that sentences may be weaker than pass-phrases composed of random words due to assumptions that can be made about word grouping and the English language (ever watch Wheel of Fortune?  If so you're probably good at this).  In addition he also points out that if the world moves to pass-phrases then cracking tools will merely adapt to this new paradigm by incorporating words instead of characters as the symbols to try in cracking . . .

I say bring it on . . . I know how to make pass-phrases, based on sentences that are both extremely long AND easy to remember that make the computational power needed to crack (using lookup tables) mind boggling and un-attainable.  "If we weren't all crazy, we would go insane!" contains 44 characters with spaces, is easy to remember and easy to type for a touch typist like myself and it contains anywhere from 44*1.3=57.2 bits of entropy (on the low end) to 44*2.3=101.2 bits of entropy perhaps on the mid to high end and either way this compares very favorably to a completely random mess of 9 characters with 5 bits of entropy per character (45bits total).

Finally - there have been some posts to certain, ahem, "security lists" calling into question the strength of Windows password hashes (of *any* length) - no doubt the work of some of those 'chainsaw conultants' that I referred to in my original post with all of the standard misconceptions about how Windows stores password hashes.  Let me set the record straight.  The NT password hash, is a straight MD4 of whatever the user types in when they are asked for their password.  Microsoft did not invent the MD4 algorithm (I believe a Mr. Ron Rivest did, the 'R' in RSA) and I am not aware of any weaknesses in our implementation of the MD4 algorithm that would make it weaker than others.  The problems involving the LM hash do not apply in any way to the NT hash and they should not be confused with each other in discussions about password hashes, they are two completely different hashes using two completely different algorithms. 

Regarding 'salt' - it is true that Windows does not use 'salt' when generating / storing the hashes on disk, we chose to take a different approach.  The password hashes that are stored on disk are encrypted with 128 bit encryption - to get at the password hash on disk from an offline system, you'd first have to crack the 128bit encryption used to protect the hash (called the SYSKEY).  This symetric key is by default stored in the registry but you can take that key, make it a pass-phrase and store it in your head if you're really worried about physical offline attacks.

  • The problem is that many "memorable" phrases are likely to show up somewhere (on the web / in a book / in a forwarded email of quotable oneliners) and hence a "dictionary" can be built along these lines. Combine this with some simple word substitution (along the same lines as existing password crackers presumably do to substitute 3 for e, etc, because the people who write password crackers aren't stupid) and you can get a dictionary attack that, while *more* difficult than the equivalent attack on a password, isn't the orders-of-magnitude harder that you really need.

    I think that in order to be truly secure you need to combine multiple strategies: Combine phrases from multiple sources (or invent your own and never mention it anywhere in any form - witty phrases are out, because you usually want to share those), intentionally misspell and miscapitalize words, insert strange punctuation as well as normal punctuation, and throw in at least one word that's made up of a truly random sequence of characters. But if you do all these things, you've just made the passphrase even harder to remember than the password was.

  • His study compares passphrases with between 1.3 and 2.3 bits per character of entropy, and a completely random 9 character word with 5 bits/character entropy. He then goes on to assume that passphrases would be composed from a vocabulary of 300 words.

    Firstly, no-one ever uses completely random passwords, and if they do, they likely write them down somewhere, thus defeating the purpose.

    Secondly, even though people tend have a small working vocabulary, that doesnt necessarily mean their passphrase will be drawn from a small subset of that vocabulary. They will also draw on places and names, and all kinds of vocabulary that they dont use in day-to-day conversation, not to mention intentional misspellings, and the insertion of punctuation.

    Perhaps a good rule for a passphrase would be that the phrase must be at least N characters, at least one word must be 8 characters or more, and there must be at least one non-letter in the mix somewhere.

    A memorable 29 character passphrase, with one long word and one non-alphabetic character in it, will almost certainly be more secure than a memorable (i.e non-completely-random) password of some kind (e.g. a mixed-case dictionary word with one non-alphabetic character).

    Im with you on this.

  • Have been using passphrashes for years. The combination of upper/lower case chars and punctuation can only be a good thing and they are more memorable! The comments about brute forcing passphrases using a lookup of song lyrics, etc., are interesting. During WWII the Germans would do a similar thing to crack SIS and SOE's poem codes, until Leo Marks instigated the use of one time letter pads - a virtually unbreakable cipher (as long as the pads were only used once and destroyed immediatly). Nice to see that RSA now provide Secure-ID tokens for Windows (I've been using them for years for dialup authenication) which will provide the same sort of unbreakable authentication as One Time Pads.

    That's the ultimate security for a network and until I can convince people to pay for it I will be using passphrases as they are inherently more secure - and If I get hacked I might just lose my job - not my life like the agents though!

  • Christmas comes early this year; I was typing up some ideas on passphrases, but then saw some other items in this post and its replies that I wanted to address. (Maybe another reply, I'd rather not hog space.)

    On to the first item: the MD4 hash. There's a reason Rivest invented MD5 the year later (1991); attacks against MD4 had been theorized within months of its publication. Most recently a Chinese team determined a way to calculate collisions by hand (earlier attacks like Hans Dobbertin's work in '96 actually required some computer time). Fortunately, this doesn't extend to pre-image resistance; but weaknesses in the hash do make it possible to compute a full MD4 pre-image collision[0] in 2^40 operations. Thus, the theoretical upper bound on the security of the NT4 hash is 40 bits, regardless of the passphrase used. (This is, incidentally, why for over a decade MD4 has been considered "Broken, Do Not Use" within the cryptographic community.)

    The next part is about salt. I would say that requiring the passphrase on startup can be very impractical for two reasons: first, doing reboots after patching or upgrades requires someone with the key to be at the machine. Second, if the box gets '0wned' the (unsalted) hashes can be accessed by the attacked. If you're using Active directory you don't need to worry about the SAM files on each desktop (and requiring a startup password for a server is much more reasonable). However, you'll want to make sure your desktops aren't caching the users' passwords. The SYSKEY idea would work as an alternative to salt, but only in limited cases with a knowledgeable admin. That is, it doesn't target the "average user" case that well. (I'll stop here, this is something that could easily turn into a ko-fight.)

    Finally, this link is for Jamie, since he mentioned Secure-ID tokens:

    Improved Cryptanalysis of SecurID
    (Short version: don't leave the token unattended, someone could derive the key from the "random" numbers.)

    [0] Hans Dobbertin: Cryptanalysis of MD4. J. Cryptology 11(4): 253-271 (1998)

  • Requiem, this was a very good reply - thanks for the information.

    As for your issue with salts you state that 'if the box gets '0wn3d' the unsalted hashes can be accessed by the attacker'.

    I presume you are talking about when the box is ON-line (i.e. the attacker gets a remote shell on your box and is then able to dump your hashes using something like pwdumpX.exe).

    This is correct - SYSKEY only protects the SAM when the box is OFF-line, while the box is ON-line an administrator (or remote attacker exploiting a vulnerability that elevates privileges) can access the un-encrypted password hashes.

    This is no different on other platforms (i.e. Linux) where the /etc/shadow file is accessible by root when the box is on-line.

    If you have admin access to the box, the game is already over - you 0wN the box and the dumping of password hashes is the least of your worries - after all you could just install a keystroke logger and record the admins pass-phrase vs. having to try and crack hashes - which do you think an attacker would prefer to do? :)

    My point about SYSKEY was that its good for protecting the SAM against offline attacks. I travel a lot with my notebook and for me I use SYSKEY with the key stored in my head so that if my notebook ever gets stolen - I'm not too terribly worried about cracking my password after dumping the hash out of the SAM because its encrypted and the symmetric key used to encrypt the SAM file on my notebook is itself derived from a pass-phrase. :)

  • We use diceware-style passphrases here, with a custom (and much larger) wordlist.

    However, it has become apparent to me in the last few years that passwords are not even close to being the weak link in the security of a Windows network.

    None of our users have weak passwords, none run as local administrators, we use SUS to distribute patches, and we have managed AV software. But we still get a machine now and then that is infected with adware or spyware.

    Why? One is the laptop that hasn't been connected to the network for 2 months, and is then infected with spyware the first time the business traveller plugs it into a hotel network. This is a real problem - there should be patch management tools that check the patch status of the machine before any other services or applications are brought online.

    Second, the social engineering attacks that enable adware & spyware infections still get us. All of our education efforts can't seem to prevent people from clicking on "yes" when asked to install CoolWebSearch.

    The weak links in Windows security aren't passwords, they are still enforced patch management and the gullinility of the user.

  • I'd like to be able to use pass phrases everywhere.

    I have found that Microsoft's site won't support them. I tried to change my password and the forms says that spaces are not allowed.

    Any idea if they will change this?


  • Dear Robert Hensing,

    I have read the Japanese version of "The Great Debates: Pass Phrases vs. Passwords. Part 1 of 3".
    I sent the following comment for it.

    Don't avoid to explain that it is easy to crack challenge-response like LM authentication.
    It takes within two months to crack LM authentication against all possible 14-character passwords using the 69-character set.
    The img src="file://\\\test" attack is still alive after 7 years.

    I know it becomes increasingly more complex if the explanation of challenge-response is added. But it is easier to capture packets of challenge-response than to steal LM hashes.

    Thank you.

  • Password vs. Passphrase redux Interesting article covering passwords and passphrases......