Biz & IT —

Passphrases only marginally more secure than passwords because of poor choices

Passphrases may be an easy-to-remember way to pack dozens of characters into a …

Passphrases only marginally more secure than passwords because of poor choices

Passwords that contain multiple words aren't as resistant as some researchers expected to certain types of cracking attacks, mainly because users frequently pick phrases that occur regularly in everyday speech, a recently published paper concludes.

Security managers have long regarded passphrases as an easy-to-remember way to pack dozens of characters into the string that must be entered to access online accounts or to unlock private encryption keys. The more characters, the thinking goes, the harder it is for attackers to guess or otherwise crack the code, since there are orders of magnitude more possible combinations.

But a pair of computer scientists from Cambridge University has found that a significant percentage of passphrases used in a real-world scenario were easy to guess. Using a dictionary containing 20,656 phrases of movie titles, sports team names, and other proper nouns, they were able to find about 8,000 passphrases chosen by users of Amazon's now-defunct PayPhrase system. That's an estimated 1.13 percent of the available accounts. The promise of passphrases' increased entropy, it seems, was undone by many users' tendency to pick phrases that are staples of the everyday lexicon.

"Our results suggest that users aren't able to choose phrases made of completely random words, but are influenced by the probability of a phrase occurring in natural language," researchers Joseph Bonneau and Ekaterina Shutova wrote in the paper (PDF), which is titled "Linguistic properties of multi-word passphrases." "Examining the surprisingly weak distribution of phrases in natural language, we can conclude that even 4-word phrases probably provide less than 30 bits of security which is insufficient against offline attack," the paper says.

The "30 bits of security" means the chances of a single guess cracking a four-word passphrase would be one in 230. What's more, the two-word phrases cracked in the study provided just 220.8 (or 20,656/0.0113) bits of security. Another way of expressing the same finding is that a dictionary of slightly less than 21,000 phrases is enough to guess the login credentials that slightly more than 1 percent of people in the real world will use.

To be sure, that's a vast improvement over the security of normal passwords. Analyses of compromised passwords leaked onto the 'Net, including a corpus of 32 million plaintext codes dumped following the 2009 hack of online games provider RockYou, show that it's trivial to crack a sizable proportion of real-world codes. A dictionary of just two of the most common passwords—123456 and 12345 respectively—typically guess 1 percent of login credentials.

The study by Bonneau and Shutova is among the first to examine passphrases used by real-world people to access accounts. While it concludes phrases are harder to guess, the increased entropy isn't enough to withstand offline attacks, in which a stolen database of hashed passwords may be subjected to hundreds of millions of guesses in an attempt to find the right combination of characters.

"The most important thing about this paper is it provides some hard data on how people create passphrases when they are forced to use passphrases instead of passwords due to policy requirements," said independent security researcher Matt Weir, who focused on password cracking for his PhD at Florida State University. "That's actually a big deal because organizations can start using those findings when creating their own password policies. It makes it easier to estimate the actual effectiveness of security controls vs just saying 'it's more of a pain so it has to be secure.'"

Heard that before

To understand why passphrases failed to live up to their potential, the researchers extracted two-word phrases from sources including the British National Corpus and compared them to the phrases they had cracked from Amazon's PayPhrase system. They found most of the overlap involved common nominal modifier-noun phrases such as "bedtime story" or adverbial-modifier verb relations such as "never leave."

They ran another comparison using the Google n-gram Corpus, which harvests vast numbers of words and phrases published online. To evaluate how Amazon users may have chosen their passphrases, they compared them to a ranked list of the most common phrases and found a high correlation. "This leads us to conclude that users don't stray far from natural language patterns when choosing passphrases," their paper states.

Similarly, the researchers crawled Facebook's public index from 2010 and pulled 10,000 randomly selected names. A full four percent of the names were chosen as passphrases by Amazon users.

The findings are preliminary because the researchers could find passphrases only when they attempted to register an account and used a combination of characters that had already been selected. Unlike studies involving the RockYou compromise and other breaches, they didn't have the opportunity to analyze all the credentials. What's more, the Amazon service required users to combine their passphrase with a four-digit PIN, and that may have influenced how phrases were selected. The paper suggests further collaboration between security and linguistics researchers.

The inquiry into the added security benefits of passphrases comes as a report from security firm Trusteer concludes that passwords are the weakest link in the IT security chain, particularly those used to secure network administrative controls. The findings also arrive as some people are looking to passphrases to tame the problems that result when users authenticate themselves on smartphones, which have input interfaces that aren't ideal for entering many non-alpha numeric characters. The research is also important to those who use passphrases to protect private encryption keys used to encrypt email and SSH sessions.

"These finding suggest that multi-word phrases, if chosen naively according to natural language tendencies, are not as effective at mitigated guessing attacks as alternate choices, such as choosing 2 random words or choosing a personal name at random," the paper warns.

Channel Ars Technica