As a Thought Exercise

The following question was posed as a thought exercise: “If you were given $1 billion to help the world, by solving a problem, what would it be?” My mind immediately went to problems such as poverty, but let’s dig deeper. What does solving poverty look like? Is it allowing everyone in the world to enjoy a certain living standard? In order to do this, we need resources and energy to build infrastructure, provide clothing, and food. Therefore, this will drain the resources that our planet has to offer faster than we already are.

For the sake of argument, let’s say we do find an infinite amount of resources that we are able to consume to help us solve our world’s problems. As we transform the resources into energy or goods, there are bi-products such as heat and pollution. This leads to the next problem to solve: How do we clean up this pollution, such that we are able sustain life? To illustrate this, let’s examine yeast populations during the fermentation process. As the yeast feed, they start to divide and grow their population. However, they also create the bi-product of alcohol and carbon-dioxide. At a certain point, they produce so much waste (alcohol) that their population begins to die off complete. At what point will we populate our planet, produce enough waste that we can’t sustain life anymore?

From that line of reasoning I think learning to clean up after ourselves, so that we can continue to sustain life on this planet is the problem that I would want to solve. Some questions to ponder are: What are the sources of pollution today? Why is that pollution being emitted? Are there alternative methods to achieve a given goal without emitting pollution? If pollution must be emitted for a given process, must that process occur on earth? I think all of these questions may lead to interesting future businesses that can help shape our future. Companies like Tesla are already helping solve this problem by replacing gasoline powered vehicles with electricity powered vehicles. Where is this electricity coming from? From renewable energy sources such as solar panels. Where else can we bring this thinking to?

Interview Question #2

I found the following question today that piqued my interest:

Given an array of items where each item has a name and a weight (integer value), select a random item from the array based on the weight.

What this means is that items in the array should be randomly chosen such that the larger the weight, the more likely the item will be picked.

To approach this problem, I will create an array where each element gives us the probability the ith item would be chosen.  To build this array, I will first find the total amount of weight amongst all items and divided each item’s weight by this total value.  Once I have this array, I will be able to pick a random number between 0.0 (Inclusive) and 1.0 (Exclusive) to determine which item to pick.

My language of choice for this example will be Python.  Let’s say I am given the data as an ordered list of Tuples where the first element is the name of the item and the second element is the weight of that item.

items = [('Item1', 4), ('Item2', 3), ('Item3', 9), ('Item4', 7)]

With this array of items, I will first grab the weights and store it in an array such that the ith element in the this array corresponds to the ith element in the items array.

weights = map(lambda item: item[1], items) # [4, 3, 9, 2]

Once I have this array of weights, it’s trivial to find the total weight for all items.

total = float(sum(weights)) # 23

Note that I’m casting the sum to a float. I will need this total as a float to keep precision for when I divide each weight by this value.  If we left the total as an integer, the divide operation for each weight will return 0 since an integer divided by an integer is an integer.

probabilities = map(lambda weight: weight / total, weights) # [0.1739, 0.1304, 0.3913, 0.304] - Values shortened for brevity

Since these are probabilities, these values added together should equal 1.0.  Visually, think each of each entry as a slice of a pie.  The higher the probability value, the larger the piece of pie is.  We now need to determine which slice of pie we get, if we randomly select a piece.

To do this, we can use the Python function random.random(), which will return a value between 0.0 and 1.0.  To pick the correct value, I will iterate through this list of normalized weights and add the current item to a total.  If the random value is less than the total, then we have our random item.

random_val = random.random()
total_weight = 0
  for i, weight in enumerate(probabilities):
      total_weight += weight
      if random_val < total_weight:
          return items[i][0]

Let’s step through our example above to get a better understanding of how this all works. Let’s say our random value is 0.6195. During the first iteration, total_weight is set to to 0.1739. Our random value is greater than this value, so we continue onto the next iteration. During the second iteration total_weight is set to 0.3043. Again, the random value is greater than this value, so we continue onto the 3rd iteration where we set total_weight to 0.6956. Finally, the random value is less than total_weight, so the third item is our random value.

Overall this algorithm runs in O(n) time. If we know that the values are already sorted, we can instead use a binary search to get this down to O(lgn) time. However, sorting will take O(nlgn) operations, so an O(n) solution will suffice for now.

Interview Question #1

I was going through some old bookmarks on my computer and found a site of programming teaser questions that were asked of candidates applying at Google.  Being a sucker for these sort of things I checked out the first question:

There is an array A[N] of N numbers. You have to compose an array Output[N] such that Output[i] will be equal to multiplication of all the elements of A[N] except A[i]. For example Output[0] will be multiplication of A[1] to A[N-1] and Output[1] will be multiplication of A[0] and from A[2] to A[N-1].

Solve it without division operator and in O(n).

I thought about this question for a while and I have to admit that it absolutely stumped me!  In attempt to learn something for this question, we’re going to look at the answer and then examine why this works.

For this example, let’s say we’re given the array [1, 2, 3, 4, 5]; The solution to this problem would then be [120, 60, 40, 30, 24].

The solution is as follows:

  1. Initialize a variable called product, to 1.
  2. Starting at the second element of the array, iterate through the array going forwards.  Multiply the current value of the given array with the product and store that value in the previous element of the results array.
  3. Re-initialize the product variable to 1.
  4. Starting at the last element, Iterate through the elements going backwards. Multiply the current value of the given array with the product and store it in the product variable.
  5. Multiply the current value of the product with the previous element of the results array.

Before we look at any code, let’s work this out.

The first step is trivial, so using the sample given above, let’s examine how we would get to the end result.  Let’s check out what the results array looks like during the second step.

  1. results = [1, 1, 1, 1, 1]   // results[1] = data[0] × product (1)
  2. results = [1, 1, 2, 1, 1]   // results[2] = data[1] × product (1)
  3. results = [1, 1, 2, 6, 1]   // results[3] = data[2] × product (2)
  4. results = [1, 1, 2, 6, 24] // results[4] = data[3] × product (6)

Now, let’s take a look at the value of the product and the results array during step 4.

  1. results = [1, 1, 2, 6, 24] product = 5 (product × 5)
  2. results = [1, 1, 2, 30, 24] product = 20 (product × 4)
  3. results = [1, 1, 40, 30, 24] product = 60 (product × 3)
  4. results = [1, 60, 40, 30, 24] product = 120 (product × 2)
  5. results = [120, 60, 40, 30, 24] product = 120 (product × 1)

Bam! We end up our answer!  Why does this work?  Well, let’s examine the factors of the answer:

results[0] = 5 × 4 × 3 × 2
results[1] = 5 × 4 × 3 × 1
results[2] = 5 × 4 × 2 × 1
results[3] = 5 × 3 × 2 × 1
results[4] = 4 × 3 × 2 × 1

Let’s see how we accumulate those factors in step 2

  1. results = [1, 1, 1, 1, 1]
  2. results = [1, 1, 2 × 1, 1, 1]
  3. results = [1, 1, 2 × 1, 3 × 2 × 1 , 1]
  4. results = [1, 1, 2 × 1, 3 × 2 × 1, 4 x 3 x 2 x 1]

Step 3

  1. results = [1, 1, 2 × 1, 3 × 2 × 1, 4 x 3 x 2 x 1] product = 5
  2. results = [1, 1, 2 × 1, 5 × 3 × 2 × 1, 4 x 3 x 2 x 1] product = 5 × 4
  3. results = [1, 1, 5 × 4 × 2 × 1, 5 × 3 × 2 × 1, 4 x 3 x 2 x 1] product = 5 × 4 × 3
  4. results = [1, 5 × 4 × 3 × 1, 5 × 4 × 2 × 1, 5 × 3 × 2 × 1, 4 x 3 x 2 x 1] product = 5 × 4 × 3 × 2
  5. results = [5 × 4 × 3 × 2 × 1, 5 × 4 × 3 × 1, 5 × 4 × 2 × 1, 5 × 3 × 2 × 1, 4 x 3 x 2 x 1] product = 5 × 4 × 3 × 2 × 1

In step 2, we’re first distributing the factors that appear before the given element in the array. In step 4, the product is distributing the factors that appear after each element as we go back.  Therefore, each element in the results array contains all factors that appear and before the same element in the given array.

Anyway, here’s the code for it:

int[] data = { 1, 2, 3, 4, 5 };
int n = data.length;
int[] results = new int[n];
for (int i = 0; < n; i++) results[i] = 1
// Step 2
for (int i = 0, product = 1; i < n - 1; i++) {
 product *= data[i];
 result[i + 1] = product;
}
// Step 4
for (int i = n - 1, product = 1; i > 0; i--) {
 product *= data[i];
 result[i - 1] *= product;
}

Java String Pool

What will the following lines of code print?

1  String a = "abc";
2  String b = "abc";
3  System.out.println(a == b);
4  String c = new String("abc");
5  String d = new String("abc");
6  System.out.println(c == d);

In order to test for equality between two objects, you must use the equals method.  Otherwise, if you test the equality between two variables that point to objects, you’re comparing the value it stores; in this case the address that reference the objects.  As a result, both println statements should both return false, right?  Why does the first println statement (line 3) print true?

It turns out that variables a and b both point to the same object.  This is because both variables reference string literal objects.  A string literal is instead stored in a special area of the Java PermGen heap space called the String Pool.  The String Pool acts as a cache for string literals; If you create a string literal that already exists in the String Pool, Java is smart enough to use the existing reference.

On line 1, when I set the variable a to the literal “abc”, it will store that value in the String Pool.  When I do the same again for variable b, on line 2, it will find the same value in the String Pool and use the reference to the existing object.  Not only does this save memory, but the values of the variables a and b are both equal since they reference the same object.

On the other hand, if we create a String using the new operator (line 4), we will first find the “abc” reference that exists in the String Pool.  This reference will be passed into the String constructor to instantiate a completely new object that now lives in the Java heap space.  After we create two instances of the String object using the new operator (line 5), we have two completely separate instances of the String object.  Therefore the variables c and d reference two completely separate variables.

As you can see, it almost always never makes sense to use the new String(String) constructor.  If for whatever reason you must use this constructor, you may use the String.intern() method to place the object in the String Pool. This method will then return the  reference to the object that lives in the String Pool.

Given what you have just learned, what will the following lines of code print?

System.out.println(c.intern() == d.intern());
System.out.println("a" + "bc" == "ab" + "c");
System.out.println("abcd" == "abc" + "d");

Gawker Password Database Cracking

If you haven’t heard already, Gawker Media has been hacked! Long story short: The Gnosis crew has taken credit for breaking into Gawker Media’s servers and downloaded a database of roughly 1.3 million user records. One of the big stories here was that Gawker stored user passwords using the block cipher DES (Data Encryption Standard). This encryption scheme only encrypts the first 8 characters of a password before storing it in the database. This means ‘password1sSecureBec@useIt’sLong_4nd_uses_5pecial_characters’ will only be stored as DES(‘password’).

My senior project in college involved using the distributed computing framework Hadoop to crack passwords, so naturally this piqued my interest. Unfortunately I don’t have access to a cluster of computers, so instead I wanted to take the opportunity to learn something new (Cuda comes to mind). Before I jump into playing with Cuda, I first wanted to get a feel for what I was doing by writing some Python code to get me started.

First thing I had to do was filter out the records in the database that were not valid or didn’t have an encrypted password. Of the 1,248,120 records there were only 748,508 that I considered valid. Each password was encrypted using a salt, so I couldn’t simply create a rainbow table of encrypted passwords and do a lookup. Instead I had to crack each password individually. I decided to crack the passwords using a dictionary-based attack. I used a list of the 500 most commonly use passwords as my initial dictionary (The most popular passwords being first). I figured this would be my best bet for quickly shortening the list passwords that I would need to crack. The jist of how this works is the following:
1. Read in a word from the list.
2. Iterate through each account and compute the encrypted value using the user’s given salt.
3. If the computed value is the same as the encrypted password, then we have the original password.
4. If the password was cracked, remove it from the list so we don’t have to worry about it in future computations.

As of this writing I’m at the 39th word and I’ve managed to shorten the list by 16,512 users, or by 0.02%. This doesn’t seem to be going fast enough, so perhaps I’ll look more into doing this with something faster such as C or Cuda. The point of this exercise was really to get myself familiar with the process of cracking these passwords. I was able to quickly write some code in Python to test my ideas, and I quickly found out what ideas worked and what didn’t.

Starting out with Nmap

Nmap is one of those programs that I can’t live without.  Whether I’m using it to learn more about my own network, or somebody else’s network, it really gives me a better picture of what other devices are on the network without actually having to actually physically see the device.

Scanning your network

One of the basic ways to discover devices is to use a ping sweep with Nmap.  This sends out a ping to a single device or a range of devices. It used to be that hosts that dropped ICMP echo requests (ping) would up show as being down in Nmap. It seems that newer versions of Nmap are able to detect if a host is up if it doesn’t respond to a ping. (Try ping microsoft.com and see what it responds with.)

Pinging a single device

nmap -sP 192.168.0.1

Results

Nmap scan report for 192.168.0.1
Host is up (0.055s latency).

Pinging a range of IP addresses

nmap -sP 192.168.0.1-5

Results

Nmap scan report for 192.168.0.1
Host is up (0.0046s latency).
Nmap scan report for 192.168.0.3
Host is up (0.016s latency).
Nmap scan report for 192.168.0.5
Host is up (0.000098s latency).

Pinging the entire network

nmap -sP 192.168.0.0/24

Results

Nmap scan report for 192.168.0.1
Host is up (0.055s latency).
Nmap scan report for 192.168.0.3
Host is up (0.014s latency).
Nmap scan report for 192.168.0.5
Host is up (0.00029s latency).
Nmap scan report for 192.168.0.100
Host is up (0.0090s latency).

The last command will send a ping out to any device that has their IP address starting with 192.168.0.x.  (The /24 is short-hand for the subnet mask of 255.255.255.0.)

Device Enumeration

Once we have a list of devices on our network we may want to know some information about it.  Simply typing the command nmap along with the IP address or name of the device, will return what ports are open on the device and guess what service is running on the device.  For example, when I run nmap on my own machine I get the following:

nmap 192.168.0.5

Results

PORT     STATE SERVICE
22/tcp   open  ssh
88/tcp   open  kerberos-sec
139/tcp  open  netbios-ssn
445/tcp  open  microsoft-ds
515/tcp  open  printer
631/tcp  open  ipp
3689/tcp open  rendezvous

What if I want some more information like what version of ssh am I running?  Use the -sV flag.

nmap -sV 192.16.0.5

Results

PORT     STATE SERVICE      VERSION
22/tcp   open  ssh          OpenSSH 5.2 (protocol 2.0)
88/tcp   open  kerberos-sec Mac OS X kerberos-sec
139/tcp  open  netbios-ssn  Samba smbd 3.X (workgroup: WORKGROUP)
445/tcp  open  netbios-ssn  Samba smbd 3.X (workgroup: WORKGROUP)
515/tcp  open  printer
631/tcp  open  ipp          CUPS 1.4
3689/tcp open  daap         Apple iTunes DAAP 9.0.3
Service Info: OS: Mac OS X

Also notice we get some extra information such as the OS I’m running and the workgroup my samba shares are on.

Now let’s take what we learned and put them together.  How would I get information on all devices on my network?

nmap -sV 192.168.0.0/24

The output of this is too long for here, but I think you get the idea; it’ll display all the service information (along with service versions) for all hosts on the network.

That’s the basics of using nmap.  With the ideas presented here, you should be able to find out most information about about devices on the network you are currently connected to.

Most listened to music of the year

According to last.fm, this is the final count of the bands I listened to the most this year.

  1. Nine Inch Nails (528 Plays) – Nine Inch Nails is and has been one of my favorite bands for about 13 years now. NIN has always been the goto band for me for when I don’t feel like listening to anything else. This year I finally got around to actually listening to Year Zero. I was not enthusiastic about this CD when it first came out, but after seeing them live at the Santa Barbara Bowl this year, I finally got around to listening to it and love it! I actually found that it made for good running music! On top of that, I rediscovered what has to be one of my favorite albums: The Fragile. This CD is up there with Dark Side of the Moon in terms of replayability.
  2. NOFX (431 Plays) – Another one of goto bands when I don’t feel like listening to anything else. NOFX has been another one of my favorite bands for about 11 years now and I’m still not tired of them. This year they release coaster which has been in regular rotation in my library. This summer in particular I have also been re-listening to a lot of their albums when I went on my runs.
  3. The Appleseed Cast (329 Plays) – The Appleseed Cast have been one of my favorite bands of recent times. This year they released Sagarmatha which I absolutely feel in love with. The first track “As The Little Things Go” has to be my favorite track of the year. I will hopefully get to see them live for the first time in 2010.
  4. Placebo (291 Plays) – Again, another one of my favorite bands that I started to around 9 years ago. This year they released “Battle for the Sun” and did it without their original drummer Steven Hewitt. Despite his absense, Placebo released this slightly popier sounding album, but still staying true to their original sound. I have yet to see this band live; this needs to change!
  5. Muse (203 Plays) – This year muse released “The Resistance”, but I’ve also been rediscovering their CDs such as “Origin of Symmetry” and “Absolution”. I also recently watched the movie “Southland Tales” (which I won’t go into here), but it featured the song “Blackout” off of Absolution that I really never listened to. Holy crap what a great track!
  6. Asobi Seksu (181 Plays) – This band is a Japanese/American Indie/Pop band that I admit is a guilty pleasure. This year I’ve been listening to their self-titled album and “Citrus” mostly. This year they came out with the CD “Hush,” but it hasn’t gotten as much playtime as the other two albums from me.
  7. Pixies (163 Plays) – I had my pixies phase around six years ago. I think now was the time to come back and check out my old favorites. Not much to say here really.
  8. Steve Aoki (148 Plays) – Introduced to me randomly by a friend in SLO in August, Steve Aoki has been a running favorite this year.  Steve Aoki kind of reminds me of Girl Talk which I probably listened to A LOT last year.
  9. The Protomen (142 Plays) – The concept of a rock opera based on the Mega Man video game series just cannot fail. I was a big fan of the first act, but when act II came out, I listened to both acts back to back. Both CDs have a unique feel to them; the first act is more inspired by the videogame series (with a hint of Ennio Morricone) while the second act follows up with a sound inspired by the 80s.  Someone needs to turn these albums into an anime!
  10. Pink Floyd (140 Plays) – Pink Floyd is not one of those bands where you pick and choose their greatest hits. Instead when you listen to Pink Floyd, you listen to them an entire album at a time. In particular I remember listening to The Wall, Animals, and Dark Side of the Moon this year.

Find the Largest Prime Factor

I love programming competitions.  In the past two years I’ve participated in both my school’s programming competition and the ACM Southern California regional competition.  Recently I’ve been turned onto http://projecteuler.net which is a competition-like site that gives you a series of math questions which you need to solve by writing a program.  I saw this as a good opportunity to brush up on python, so today I’ve been trying my hand at a few of these problems.

Question #3 asks you to find the largest prime factor of a very large number.  My first instinct was to start at Sqrt(n) and work my way back until I find the first factor.  The problem here is that the first factor you find may not be the first PRIME factor.  Instead, my approach to this problem was to find all factors of the number between 2 and Sqrt(n) and put them in in a list.

Of all the numbers in the list, one of them is the largest prime factor, but which one?  To find out, I made an assumption that turned out to be right:  The composite factors have at least one prime factor that is already in the list.  If I can show that a given number has no factors in the list, then it is a prime factor.  Further, if I have a sorted list and start at the end, I can show that the number is the largest prime factor.

Anyway, here’s the code I used:

n = 600851475143 # Find the largest prime factor of this
div = [x for x in range(2,int(math.sqrt(n))) if n % x == 0] # Get all factors
i, j = 0, len(div) – 1
while j > i: # Determine which one of the factors is the largest prime
# This number is composite, go to the next largest factor
if div[j] % div[i] == 0: i,j = 0,j-1
# See if the next number at the beginning of the list divides the current number
else: i += 1
print div[j] # Print out the largest prime factor
n = 600851475143 # Find the largest prime factor of this
div = [x for x in range(2,int(math.sqrt(n))) if n % x == 0] # Get all factors
i, j = 0, len(div) - 1
while j > i: # Determine which one of the factors is the largest prime
# This number is composite, go to the next largest factor
if div[j] % div[i] == 0: i,j = 0,j-1
# See if the next number at the beginning of the list divides the current number
else: i += 1
print div[j] # Print out the largest prime factor

Fix for CodeIgniter White Screen on Bluehost

This past weekend I spent writing an application using the CodeIgniter framework for my programming languages class.  Sunday night I finally got it working and it was time to move it to production on bluehost.  After I made created the database and made some configuration changes I checked it out and I got nothing.  It was a white page of nothing.  I checked my logs and found nothing.  Normally this wouldn’t be such an issue, but when you have nothing in your logs, then you don’t have much to work with.  I looked at the source of the blank page and found this: “<!– SHTML Wrapper – 500 Server Error –>.”  Still not a lot to work with.

Today I spent some time investing the issue and found out that a lot of people were having the issue, but I didn’t find any fixes.  Long story short the fix was to enable FastCGI PHP5 on Bluehost.  When I did that an error finally appeared on the screen! Apparently I misspelled my database name >_<.

Anyway, if you’re having the same issue, let me know if this fix worked for you.

PHP Frameworks

I’ve been coding in PHP off and on for years now.  Each project I’ve done thus far has always been by hand.  Doing the architecture of your site by hand is the easiest way to introduce bugs and end up with sloppy code.   For the first time I’ve started a project where I used a framework in PHP.  My framework? Codeigniter.  I like it’s simplicity, MFC design and how it’s not like Rails.  I thought Rails was a bit too complicated and involved hacking the internal code way too much to get anything done, but that’s a whole other rant.

Anyway, the project I’m working on?  Just a twitter clone.  Why?  For the sake of learning, that’s all.  It’s actually for a project in school.  The idea is to implement this idea in something that I’m already familiar with (PHP) and then implement it again in another language I’m not familiar with at all.  After I’m doing with the first part I want to implement it in Python using the Django framework.  Python has always been one of those languages I’ve been meaning to learn.  In fact it seems I try to learn it every summer, but I just don’t seem to have the dedication to sit in doors learning it.

Regardless, I hope to use this knowledge in my professional life.  The last PHP project I worked on was stressful just because of some of the design decisisons I made (Doing a half-assed MVC implementation…didn’t do the best job of dividing up the view and controller).  However, I’ve learned from those mistakes and I might be taking up a new project at work where I’ll probably be using this framework.