Unicode passwords strike back!

TOPlap continues its struggle with password security.

The IT department has encouraged the use of accented characters in their password policy for extra security. But then they got some complaints from users that they can't enter the system on some devices. After a lengthy investigation, two related problems were discovered¹.

The first problem is that different devices may have different input methods. Users may switch from one device to another and expect that they can still log in using the same password. But depending on the device, each accented character can be entered in either composed or decomposed form. What apparently looks like the same password can actually be encoded as different strings. It's even possible that an input string contains a mix of composed and decomposed characters.

The second problem is that the passwords are already encrypted. We never store passwords in plaintext, as this would be very poor security. The passwords are encrypted using bcrypt. (Libraries for bcrypt are available for most programming languages). Given the trapdoor nature of bcrypt, it's impossible to find back the original password. The only time that we can know the password, is when the user enters it, and we've checked that it hashes to the same result.

The result of a bcrypt hash may look like this:

bcrypt.hash("secret", 10)
$2b$10$v3I80pwHtgxp2ampg4Opy.hehc03wCR.JBZE6WHsrSQtxred57/PG

Note that the encrypted string includes a salt, a random string that increases the security of the password. Because the salt is random, each invocation of bcrypt could give a different result.

To check if a password matches, you must feed the salt back into the algorithm, and check that you still get the same hash.

bcrypt.hash("secret", "$2b$10$v3I80pwHtgxp2ampg4Opy.")
$2b$10$v3I80pwHtgxp2ampg4Opy.hehc03wCR.JBZE6WHsrSQtxred57/PG

You receive a list of log-in attempts. For each log-in, check if it matches using normalized and unnormalized forms.

Given some (UTF-8 encoded) test input:

etasche $2b$07$0EBrxS4iHy/aHAhqbX/ao.n7305WlMoEpHd42aGKsG21wlktUQtNu
mpataki $2b$07$bVWtf3J7xLm5KfxMLOFLiu8Mq64jVhBfsAwPf8/xx4oc5aGBIIHxO
ssatterfield $2b$07$MhVCvV3kZFr/Fbr/WCzuFOy./qPTyTVXrba/2XErj4EP3gdihyrum
mvanvliet $2b$07$gf8oQwMqunzdg3aRhktAAeU721ZWgGJ9ZkQToeVw.GbUlJ4rWNBnS
vbakos $2b$07$UYLaM1I0Hy/aHAhqbX/ao.c.VkkUaUYiKdBJW5PMuYyn5DJvn5C.W
ltowne $2b$07$4F7o9sxNeaPe..........l1ZfgXdJdYtpfyyUYXN/HQA1lhpuldO

etasche .pM?XÑ0i7ÈÌ
mpataki 2ö$p3ÄÌgÁüy
ltowne 3+sÍkÜLg._
ltowne 3+sÍkÜLg?_
mvanvliet *íÀŸä3hñ6À
ssatterfield 8É2U53N~Ë
mpataki 2ö$p3ÄÌgÁüy
mvanvliet *íÀŸä3hñ6À
etasche .pM?XÑ0i7ÈÌ
ssatterfield 8É2U53L~Ë
mpataki 2ö$p3ÄÌgÁüy
vbakos 1F2£èÓL

The first section of the input contains entries from the authentication database. It contains usernames, followed by the bcrypted password. The last section of the input contains a series of login /attempts/. Some of these attempts may be invalid (perhaps there was a typo, or perhaps somebody else tried to log in with a random password). The passwords and login attempts may contain composed or decomposed accented characters, they even may contain a mix of both.

Looking at the first login attempt above, user etasche logged in with .pM?XÑ0i7ÈÌ. This expands to:


.	p	M	?	X	Ñ	0	i	7	E	◌̀	Ì

But the original password was entered as .pM?XÑ0i7ÈÌ, which expands to:


.	p	M	?	X	N	◌̃	0	i	7	È	Ì

As you can see, in the original password, the Ñ was decomposed, but in the login attempt, the Ì was decomposed. To conclude, this login is indeed valid, because both passwords can be normalized to the same string.

In the same vein, we can check all twelve login attempts and get the following result:

etasche .pM?XÑ0i7ÈÌ is a valid login.
mpataki 2ö$p3ÄÌgÁüy is not a valid login.
ltowne 3+sÍkÜLg._ is not a valid login.
ltowne 3+sÍkÜLg?_ is a valid login.
mvanvliet *íÀŸä3hñ6À is not valid.
ssatterfield 8É2U53N~Ë is not valid.
mpataki 2ö$p3ÄÌgÁüy is not valid.
mvanvliet *íÀŸä3hñ6À is not valid.
etasche .pM?XÑ0i7ÈÌ is valid.
ssatterfield 8É2U53L~Ë is valid.
mpataki 2ö$p3ÄÌgÁüy is not valid.
vbakos 1F2£èÓL is not valid.

In this example, 4 (out of 12) logins were valid.

How many valid logins are there for your puzzle input?

Reading & reference materials

Unicode Normalization

Unicode normalization forms

Comparing Unicode codepoints can be tricky, but it's essential when searching in texts

Thanks to Roel Spilker for providing inspiration for this puzzle. ↩

To play, please log in with one of these options:
GitHub Login | Google Login

Internationali­zation Puzzles

Unicode passwords strike back!

Reading & reference materials

Internationalization Puzzles