DEEPAK.SURTI
  • Home
  • Blog
  • Mobile Apps
  • Open Source
  • Cool 3D Demos
  • About

Word Numbers (ITA Archived Puzzle, Solved)

4/28/2011

0 Comments

 
Picture
Please refer to the word numbers problem on this page.

Before I even explain how I tried to solve this problem, you can download and install the code from my bitbucket word-nums repository.

After you have tried finding numbers and are convinced that this application does solve the problem, if you are interested in understanding the details of the solution, you may continue reading.

Analysis

The characteristics of the problem are:
  • A collection of numbers sorted by their word equivalent
  • We need to look up a number by the number of letters counted while reaching that number in the sorted list
  • An extremely large input of 1 billion numbers
Thus while we build the collection of numbers, if we can keep the collection sorted and as a next step, compute the number of letters ’l’ covered while reaching a number in the sorted list, we can then use ’l’ as an index into the sorted collection. The collection then is a data structure with the requirements of fast inserts and queries, which is a binary search tree. Since we have an index to maintain, an indexed B-Tree will be a good choice of data structure. Thankfully, a B-Tree is also useful when the input is large, which is true in our case, but that does not imply we will not attempt an elegant solution simply because brute force might as well work here.

Finally, we first need to convert a given number into its word equivalent. The algorithm to do the conversion is described next.
Picture
Some analysis
Picture
Some more analysis

Converting a number into it's word equivalent

It can be observed that any number is a repetitive pattern of 1 to 999, scaled by thousand for the next pattern and by million for the next and so on. In our case, we scale only upto millions. Thus it would suffice to generate a word equivalent of a 3 digit number between 1 and 999 and use the scaling factor of thousand and million based upon the number of times the pattern is repeated.

Further for a 3 digit number, any digit that occurs in unit’s or hundred’s place is a simple look up from the numeric form of the digit to its word form. However the translations for the digit in ten’s place are best described as quirky as 1 in ten’s place means ten if followed by 0 in units place, 11 if followed by 1 and so on. The digits 2 to 9 in ten’s place are less quirkier as the conversion depends only on the digit and not by the digit in unit’s place as is the case for 1 in ten’s place.

Thus,it is now possible to generate word equivalent for any number by splitting it into patterns matching a 3 digit number and using the appropriate scale such as thousand and million.

The code that generates a word equivalent is shown here.
(defun join-words (words)
  "Join the separate numeric words into a single word."
  (apply #'concatenate 'string words))

(defun 3dnum->word (n)
  "Convert a 3 digit number into its equivalent word"
  (let* ((words (modify-word (worded-digits n)))
         (word (join-words words)))
    (replace-quirks word)))

(defun num->word (n)
  "Converts a number n into word equivalent. Works only for numbers upto
   999,999,999"
  (labels ((self (splits scales &optional acc)
             (if (null splits)
                 acc
                 (let* ((c (3dnum->word (car splits))))
                   (if (string= "" c)
                       (self (cdr splits) (cdr scales) acc)
                       (progn
                         (push c acc)
                         (push (car scales) acc)
                         (self (cdr splits) (cdr scales) acc)))))))
     (join-words (reverse (self (split-n n) *scales*)))))

(defun replace-all (string part replacement &key (test #'char=))
"Returns a new string in which all the occurences of the part
is replaced with replacement."
    (with-output-to-string (out)
      (loop with part-length = (length part)
            for old-pos = 0 then (+ pos part-length)
            for pos = (search part string
                              :start2 old-pos
                              :test test)
            do (write-string string out
                             :start old-pos
                             :end (or pos (length string)))
            when pos do (write-string replacement out)
            while pos)))

How about brute force?

Having identified that an index b-tree is a suitable data structure for our needs, and we have already figured out the generation of a word equivalent, it would be straightforward to generate an indexed b-tree of billion numbers. We can then traverse the tree in sorted order to update the index which is the number of letters covered. Then a simple lookup using the index should suffice.

However this approach will suffer from the problem of time to build an index, sorted b-tree as we have a billion numbers. The time is dependent upon the processing power and memory availabe on the machine that builds the data structure. Of course, we could use a large number of machines (or multiple cores) so that we can build the data structure faster. Even in that case, we still have to ensure that we distribute the numbers evenly between the machines so that all parallel computations complete around the same time!

But would there not be an elegant way out? It turns out, there is!

Elegant way out

We have already identified that there is a repititive pattern of digits. We used this property to generate a word equivalent. Then the same property should be helpful in lexicographical ordering of the words. Is this true? Let us find out.

If we sort just the first 999 numbers, the sorted list (A) looks something as shown below:
Picture
First 999 numbers in lexical order
Now consider the numbers upto 999,999. These are actually a group of 999 numbers where each group starts with [1000;2000;3000;999,000]. Their word equivalents are [onethousand,twothousand,threethousand,..,ninehundredninetyninethousand]. These word equivalents can be generated by adding the suffix ’thousand’ to the word equivalents of numbers from 1 to 999, which is our pattern (building block). So the sort order for the numbers upto 999,999 remains the same when expressed as a group of numbers as described above. Further each such group contains an additonal 999 numbers, which happen to be 1 to 999. The same argument holds true for numbers upto 1 billion. Just that a group in this case will contain 1 million numbers.

Overall the three sorted lists A, B, C respectively of multiples of one, thousand and million are shown below.
Picture
Multiples of 1, 1000 and 1,000,000 in lexical order
Since any number is either a million or thousand or hundred, we just need to know the lexicographical position of the successive scales starting from the highest scale. Thus for a number that is multiple of million, I need to find out the sorted order position of the millionth part of the number, followed by that for the thousandth part and then for the hundredth part.

As we have a billion numbers to look at, we need to maintain the following 3 databases:

  • A database of all the numbers that represent a multiple of one, thousand and million, denoted asmdb.
  • A database of all the numbers that represent a multiple of one and thousand, denoted as tdb.
  • A database of all the numbers that represent a multiple of one, denoted as odb.
These databases are shown visually below, in terms of A, B, C above.
Picture
Databases containing multiples of (1 1000 and 1000000), (1 and 1000) and 1 in lexical order
Now if we can precompute the number of letters counted and the sum, we can have an elegant solution to our problem.

Precomputing the sum and number of letters

Since we know the word equivalent of a number, we also know its length. For a number that represents a multiple of one, its length can be denoted as: \(l_i\). A number that is multiple of 1 is the smallest scale and we need to look no further. So the overall sum of all letters in word equivalents of numbers that are multiple of 1 can be represented as:
$$l_{ones} = \sum_{i=1}^{999} l_i$$
Now lets consider a number that represents a multiple of thousand. Let the length of its word equivalent be denoted by \(l_t\). Since such a group contains further 1000 numbers, the length for the group can be described as
$$l_{tg} = l_t + l_{ones}$$
So the total number of letters covered upto 1 million can be denoted as
$$l_{thousands} = \sum_{g=1000}^{999000} l_{tg}$$
Similary for a multiple of millions, the formula is
$$l_{mg} = l_m + l_{thousands}$$
So the total number of letters covered upto 1 million can be denoted as
$$l_{thousands} = \sum_{g=1000}^{999000} l_{tg}$$
Similary for a multiple of millions, the formula is
$$l_{mg} = l_m + l_{thousands}$$

Here we do not need to compute the overall summation as we are scaling only upto a billion.

On the same lines, the sum of numbers covered can also be computed.
$$s_{thousand} = \sum_{i=1}^{999} i$$
$$s_{million} = \sum_{i=1}^{999999} i$$
$$s_{tg} = t + s_{thousand}$$
where t = (1000 or 2000 or ... 999000)
$$s_{mg} = m + s_{million}$$
where m = (1000,000 or 200,000 or 999,000)

Algorithm

Now we can describe the entire algorithm. As a first step, we build the 3 databases which contains the multiples of one, thousand and million. Each database is an indexed btree. Next we update the btree which contains the multiples upto million with the sum and letters covered using the formuale described in the previous section.

Now given the number of letters covered, which is actually an index into our database, we look up the number starting with the millions database using this index. We find the first node n in the btree such that for this node n, if index is \(k_found\),
$$k_{found} >= index$$
If \(k_{found}\) is index, we have found our number. Else, we navigate to the previous node in the tree and check if that is the node we are looking for by computing the number of letters covered. Otherwise, this node forms a part of number and serves as a key into the next database, which is thousands database if it is multiple of thousand or one’s database. Before searching in the next database, we update that database with the letters and sums on each node, as the key is a prefix to each number in the next database., 
Every node of the binary tree contains the following data:
  • The number n
  • The word equivalent of the number n
  • The sum of numbers upto n when traversed in sorted order
  • The total number of letters covered upto n when traversed in sorted order
This is the code that creates the btrees, and finds an integer.
(defun find-integer (idx db &optional (n 0) (sum 0) (vals (list 0 0)))
  (with-btree-cursor (mcur (get-index db 'letter))
    (multiple-value-bind (f i v k) (cursor-pset-range mcur idx)
      (declare (ignore k))
      (if f
          (if (= i idx)
              (values (+ n (car v)) (+ sum (caddr v)))
              (multiple-value-bind (found index value key) 

                    (cursor-pprev mcur)
                (declare (ignore key))
                (if found
                    (let ((check (+ index (cadr value) (cadr vals))))
                      (cond ((= idx check) (values (+ n (car value))
                                                   (+ sum (car value) 

                                                              (caddr value))))
                            (t (let* ((next-idx (- idx index 

                                                       (cadr value) 
                                                       (cadr vals)))
                                      (pn (+ (car vals) (car value)))
                                      (pl (+ (cadr vals) (cadr value)))
                                      (nvals (list pn pl))
                                      (db (select-db (car value) nvals)))
                                 (if db
                                     (find-integer next-idx
                                                         db
                                                         (+ n (car value))
                                                         (+ sum (caddr value))
                                                         nvals)))))))))))))
A visual depiction of the algorithm is shown below.
Picture
Word numbers algorithm
Since we use an indexed B-Tree, the efficiency of the algorithm while creating and updating a btree is
$$O(n  (log n))$$
while looking up the number is 
$$\approx O(log n)$$. 
Hence the overall efficency is:
$$O(n  (log n))$$

Disclaimers

The solution is tested on SBCL 1.0.39 on Mac OS X 10.6 using cl-elephant as the btree implementation. I would like to emphasize that the solution was not generated in the sequence described above. I did some detailed whiteboarding and tried out code snippets and finally figured out the solution.

Further, the solution uses only \(999 * (3 + 2 + 1) = 5994\) nodes. So an in-memory version of a binary search tree should as well suffice. Since I had a ready to use disk based indexed B-Tree implementation, I chose to use that.

What is the number at 51th billion letter?

As per my solution, there is no number that ends at 51 billion. In fact, there is a number that ends at 51 billion and one. And that number is 676465328.

The sum of numbers upto that point is 206392395198132239862723. From the code run at REPL
* (word-nums:find-at 51000000001) 
==> 3676465325 
        4206392395198132916327723

0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Picture

    Me

    I am a 3D graphics software engineer.

    profile for dmsurti at Stack Overflow, Q&A for professional and enthusiast programmers

    Archives

    December 2011
    November 2011
    October 2011
    September 2011
    August 2011
    July 2011
    June 2011
    May 2011
    April 2011
    August 2010
    January 2010
    December 2009
    September 2009
    August 2009

    Categories

    All
    Automator
    Books
    Code
    Continuations
    Css
    Drakma
    Education
    Express Card
    External Drive
    Html
    Ita
    Learning
    Life
    Lisp
    Mac
    Macros
    Mercurial
    Productivity
    Puzzle
    Reading
    Rss
    Simplicity
    Sudoku
    Work

    RSS Feed

    Picture
    me @ delicious
Powered by Create your own unique website with customizable templates.
  • Home
  • Blog
  • Mobile Apps
  • Open Source
  • Cool 3D Demos
  • About