User:Gmaxwell/visual fingerprint comparison

From Bitcoin Wiki
Revision as of 05:17, 8 August 2014 by Gmaxwell (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Users often need to compare two cryptographic hashes or fingerprints.

Because of the tedium of comparing full fingerprints users often only really compare a few digits (often from the beginning or end or both the beginning and end). Because of this an attacker with a lot of computational power can search for visually similar hashes with these restrictions in mind and have some degree of success.

For a while I've been talking about the notion of setting up a system that boosts security via cut-and-choose with the user— the notion being that if the presentation expands the size of the fingerprint and helps the user compare unpredictable parts (rather than just the first and end) the security will be greatly increased.

If we expand the input data to 1024 bits we can represent it as 204 base-32 characters arranged in a 2d hexagonal grid. To help users pick random spots to compare on every run the three random (not data dependent) hexagons are highlighted. Users should be instructed to compare at least the three hexagons highlighted by their own system (or all if they like).

Each hexagon involves 7 base 32 values, and so this simple and fast comparison has roughly 126 bits of security (ignoring some loss where hexagons overlap, and that a user might erroneously match a slightly offset hexagon— though not likely unless more of the surrounding values were visually similar).

Additionally, an application specific salt can be provided. For some applications it may be feasible to use the date in the salt (taking some risk that the desynchronized clocks will cause the wrong output, which can be made less bad by always showing two outputs (today and the previous day)) or for some a user provided challenge string.

To further improve security the base32 alphabet has vowels eliminated so that users are less likely to unintentionally or intentionally create "vanity fingerprints" with visible words that bias the comparison locations. This also prevents against accidentally offensive fingerprints. Furthermore, the alphabet is selected to maximize the visual distinctiveness of the characters in it (unfortunately, I only have similarity data for letters, not numbers right now).

Finally, as a last measure against abuse the transformation of the input to the 1024 bit output is selected to be as computationally expensive as would be tolerated by typical users on low end hardware, this should add several bits of effective security against grinding attacks for partial collisions.

Here are two fingerprints from the system (side by side):

+ 0 1 2 3 4 5 6 7 8 9 A B C D E F #  +     + 0 1 2 3 4 5 6 7 8 9 A B C D E F #  +
0 Y l r l z L d s M n r H x Y s f X  0     0 G X d L g F F w L q D d j j s g m  0
1  B G p Y M D G q j x G z t q x l V 1     1  k n p T r n H z j p q X p g D*z w 1
2 m V F D Z x X B J s z s*V B g n j  2     2 D x f W G g q t m M Z k W S s*n*D  2
3  s k q Y G n d s m w F*Z*k Y H H Z 3     3  w W d d D k Y Q M k s Y S F z*j Y 3
4 L j W f Y H Y Y H T V g*H L p q S  4     4 D r L L r*M B H t Z q F T W Y V X  4
5  M Q g Y s q F Z*G p f Y t d G V r 5     5  D V j H*Q*d B m k*f m w S z x Y B 5
6 d s l G Z Y D X*S*X k V p X g k z  6     6 V q L Y n*k f s w*B*q r s d m l L  6
7  J k f l S S H x*M S M q x H Q w s 7     7  L d J V D k F B t*V p z x t B p t 7
8 f Z H r B D l d G F T j V f m G J  8     8 l l Z D n w V k D S V D D r D p z  8
9  f Y M T s*G w W f k l T d D d F j 9     9  t V H r M w k n G m k V Y X w Q X 9
A G Y k p Z*D*w r s F V g r M f L D  A     A T r k Y f Y V S H G W l V d k V f  A
B  l f Q m g*m t t s p F H Y j m W m B     B  Y z F r H n F Q z B l f k Z t d g B
+  0 1 2 3 4 5 6 7 8 9 A B C D E F # +     +  0 1 2 3 4 5 6 7 8 9 A B C D E F # +

Fancier displays may choose to highlight the random points in more attractive ways.