32 hex character strings md5 hashing to themselves

A place to discuss the science of computers and programs, from algorithms to computability.

Formal proofs preferred.

Moderators: phlip, Prelates, Moderators General

32 hex character strings md5 hashing to themselves

Postby Blipo » Thu Nov 22, 2007 3:23 am UTC

Is it possible for a 32-hex-character-string to md5 hash to... itself?

Had this thought while on #stats. Anyone know if this is possible?
Last edited by Blipo on Thu Nov 22, 2007 3:35 am UTC, edited 1 time in total.
Blipo
 
Posts: 89
Joined: Mon Jul 30, 2007 8:15 pm UTC

Re: 32 hex character strings hashing to themselves

Postby Rysto » Thu Nov 22, 2007 3:32 am UTC

What's your hash function?
Rysto
 
Posts: 1443
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: 32 hex character strings hashing to themselves

Postby Blipo » Thu Nov 22, 2007 3:35 am UTC

Rysto wrote:What's your hash function?


Whoa. Fix'd. =\
Blipo
 
Posts: 89
Joined: Mon Jul 30, 2007 8:15 pm UTC

Re: 32 hex character strings md5 hashing to themselves

Postby joeframbach » Thu Nov 22, 2007 3:50 am UTC

md5 you say?

well I guess there's only one way to find out. There's a finite number of possible strings. Get crackin.
xxv/♂/♫
User avatar
joeframbach
 
Posts: 1478
Joined: Sun Nov 05, 2006 12:49 am UTC
Location: The City of Steel

Re: 32 hex character strings md5 hashing to themselves

Postby Blipo » Thu Nov 22, 2007 4:50 am UTC

joeframbach wrote:md5 you say?

well I guess there's only one way to find out. There's a finite number of possible strings. Get crackin.


It'd take rather more time than I have to figure this out, and I don't have nearly enough talent to write code to check for me.

Anyways, it's hypothetical. Is there anything preventing it from happening?
Blipo
 
Posts: 89
Joined: Mon Jul 30, 2007 8:15 pm UTC

Re: 32 hex character strings md5 hashing to themselves

Postby Notch » Thu Nov 22, 2007 2:50 pm UTC

There are 16^32 possible strings like that.
A 128 bit md5 has has 16^32 (2^128) possible values.

If we pretend md5 is totally random, the odds of the first string ("0000000000000000") hashing to itself is 1/(16^32). The second string ("0000000000000001") also has a 1/(16^32) chance of hashing to itself.

So the odds of no string hashing to itself is (1-1/(16^32))^(16^32), which works out to.. uh.. something like 0.37.
In other words there probably is a md5 hash that hashes to itself. And would someone please double check my math? ;)
Notch
 
Posts: 318
Joined: Tue Dec 12, 2006 5:52 pm UTC
Location: Stockholm, Sweden

Re: 32 hex character strings md5 hashing to themselves

Postby Rysto » Thu Nov 22, 2007 5:08 pm UTC

I got the same. The chances seem to come out to about 1/e.
Rysto
 
Posts: 1443
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: 32 hex character strings md5 hashing to themselves

Postby Yakk » Thu Nov 22, 2007 7:41 pm UTC

Just to note: there are hash functions that do not hash to themselves. The hash function "+5" doesn't, as an example. ;)

Code: Select all
#include <vector>
#include <assert.h>

static bool progress_report = true;
static unsigned int progress_freq = 100000;

typedef unsigned int (*hash_function)( unsigned char* data, size_t length );

bool find_to_itself( hash_function hash,  unsigned int* result, size_t min_length_ = 0 ) {
  unsigned int min_length = (min_length_<sizeof(unsigned int))?sizeof(unsigned int):min_length);
  std::vector<unsigned char> buff(min_length);
  unsigned char* buff_ptr = &buff.front();
  unsigned int* data_ptr = reinterpret_cast<unsigned int*>(buff_ptr);
  if (!data_ptr) {
    assert(false);
    return false;
  }
  unsigned int& data = *data_ptr;
  data = 0;
  unsigned int hash_result = hash_function( &buff.front(), min_length );
  if (hash_result == data) {
    if (result) *result = data;
    return true;
  }
  while( (++data) != 0 ) {
    hash_result = hash_function( &buff.front(), min_length );
    if (hash_result == data) {
      if (result) *result = data;
      return true;
    }
    if (progress_report) {
      if ((progress_freq % data) == 0) {
        printf("Progress: up to %d checked\n", data);
      }
    }
  }
  return false;
}

void main() {
  unsigned int result = 0;
  bool found = find_to_itself( md5hash, &result );
  if (found) {
    printf("Found: %d hashes to itself\n", result);
  } else {
    printf("No self-hash found\n");
  }
}


Write or find an md5 function, and that should do it.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10423
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: 32 hex character strings md5 hashing to themselves

Postby Rysto » Thu Nov 22, 2007 8:46 pm UTC

There are 2^128 different possible strings. Surely that would take a prohibitively long time to check.
Rysto
 
Posts: 1443
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: 32 hex character strings md5 hashing to themselves

Postby Yakk » Thu Nov 22, 2007 9:50 pm UTC

sizeof(unsigned int) is usually 4, and char's usually have 8 bits.

So that's 2^32 different unsigned ints to check, which is about 4 billion. I'd be shocked if the above program took more than a week to run, and if you remove the printf's and do some optimization you should be able to do this on the order of hours (if not minutes or seconds) on a modern consumer computer.

The real problem becomes "how expensive is generating an md5 checksum". And the easiest way to check that is to run the above program and time how long it takes to hit 1 million md5s. Multiply that by 4000, and that is your estimated time of computation.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10423
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: 32 hex character strings md5 hashing to themselves

Postby Yakk » Thu Nov 22, 2007 10:04 pm UTC

Oh god, I'm sorry. I misread the original problem.

32 character hex string? Ahhhh! PAIN!

Ya, that's far to large to test exhastively.

I thought he said 32 bit number. :)
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10423
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: 32 hex character strings md5 hashing to themselves

Postby phlip » Thu Nov 22, 2007 10:33 pm UTC

Now the real question is whether you're trying to hash the 16 bytes of binary data, or 32 bytes of that data as hexadecimal in ASCII... and in that latter case, whether it's uppercase or lowercase... or whether you'll accept a hex string with mixed cases that still hashes to itself... or maybe EBCDIC is OK...

Better test them all, just to be sure.
While no one overhear you quickly tell me not cow cow.
but how about watch phone?
User avatar
phlip
Restorer of Worlds
 
Posts: 7172
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia

Re: 32 hex character strings md5 hashing to themselves

Postby Yakk » Thu Nov 22, 2007 11:59 pm UTC

Oh ya: regardless of the medium, if the 32 bit result equals the input, the input must fit in 32 bits of freedom.

So it is only a 2^32 search, which is quite doable. My code should work, you just have to have the hash function take the integer input, turn it into the format you want to CRC from, then do the CRC32, and return the result as a 32 bit int.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10423
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: 32 hex character strings md5 hashing to themselves

Postby Rysto » Fri Nov 23, 2007 3:13 am UTC

SHA-5 gives a 128-bit hash.
Rysto
 
Posts: 1443
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: 32 hex character strings md5 hashing to themselves

Postby rrenaud » Fri Nov 23, 2007 3:22 am UTC

In case you are curious, the md5 hash function introduces some permutation on 32 hex character strings, as you realized. You are asking the question if the md5 induced permutation is (not) a derangement. You can read about them on wikipedia here.

http://en.wikipedia.org/wiki/Derangement
rrenaud
 
Posts: 47
Joined: Fri Jul 28, 2006 12:34 am UTC

Re: 32 hex character strings md5 hashing to themselves

Postby Marvin » Fri Nov 23, 2007 9:31 am UTC

i find taht very improbable, when you look how md5 works. First it makes a block of 512bits, then makes 16blocks of 32bits, then makes some nuts functions, rotations and XORing on that. You'd have to have mighty luck to find such a hash... and i still think it doesn't exist and i'm just not able to se how to prove it by looking at md5...
42
--
If God intended us to program we would be born with serial I/O ports.
User avatar
Marvin
 
Posts: 153
Joined: Thu Nov 22, 2007 8:03 am UTC
Location: Croatia

Re: 32 hex character strings md5 hashing to themselves

Postby xyzzy » Fri Nov 23, 2007 5:43 pm UTC

Simple.

    First, generate all possible MD5 hashes

    Second, hash them all.

    Third, compare the two for any items that match themselves

Done.

You'll be wanting a supercomputer to do this quickly though.
"Wile E. Coyote was a theoretical mathematician." - Leliel
"Modern life can be so boring without elements of the bizarre or the fantastical. Hence, we have steampunk." - Me
User avatar
xyzzy
Meta-Titled
 
Posts: 263
Joined: Sun Mar 18, 2007 10:02 pm UTC
Location: Colossal Cave

Re: 32 hex character strings md5 hashing to themselves

Postby Marvin » Fri Nov 23, 2007 6:38 pm UTC

xyzzy wrote:Simple.

    First, generate all possible MD5 hashes

    Second, hash them all.

    Third, compare the two for any items that match themselves

Done.

You'll be wanting a supercomputer to do this quickly though.


many of those i think...
42
--
If God intended us to program we would be born with serial I/O ports.
User avatar
Marvin
 
Posts: 153
Joined: Thu Nov 22, 2007 8:03 am UTC
Location: Croatia

Re: 32 hex character strings md5 hashing to themselves

Postby Yakk » Fri Nov 23, 2007 7:25 pm UTC

Rysto wrote:SHA-5 gives a 128-bit hash.

That is what I get for having CRC-32 on the brain.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
User avatar
Yakk
Poster with most posts but no title.
 
Posts: 10423
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove


Return to Computer Science

Who is online

Users browsing this forum: Google Feedfetcher and 2 guests