The array code in the previous post on binary search in haskell is messy. Mostly due to my lack of style and the use of arrays, which add a layer of indirection and bounds checks.
This time, I'm going to rewrite the examples to be slightly more readable and I'll implement a linear interpolation search routine. The latter could be handy if function evaluation or array access (access to data on disk?) are relatively slow, and the data is sorted and relatively linear. Although the implementations could be coded even tighter, with less checks, I skip that optimisation for the sake of readability.
First, binary search:
binarySearch :: (Int -> Int) -> Int -> Int -> Int -> Maybe Int
binarySearch f val low high = loop low high
where
loop low high
| high < low = Nothing
| (f mid) > val = loop low (mid - 1)
| (f mid) < val = loop (mid + 1) high
| otherwise = Just mid
where
mid = low + (high - low) `div` 2It is a fairly tight loop, and I think fairly readable. Then in similar vein, linear interpolation search:
interpSearch :: (Int -> Int) -> Int -> Int -> Int -> Maybe Int
interpSearch f val low high = loop low high
where
loop low high
| high < low = Nothing
| fm > val = loop low (mid -1)
| fm < val = loop (mid + 1) high
| otherwise = Just mid
where
mid = low + (val - fl) * (high - low) `div` (fh - fl)
fm = f mid
fl = f low
fh = f highThe difference between the two routines is in the calculation of the midpoint. The linear interpolation of the second one is way heavier, but with large amounts of reasonable, fairly linear data it should outperform the binary search method. Big O love - log2(N) vs log2(log2(N))[on average].
The performance of the binary search method depends only on the size of the interval, while the linear interpolation search depends on the linearity of the data. The two search methods are closely related to the bisection and the secant numerical methods for finding roots of functions. You could substitute the mid-point calculation with quadratic or cubic interpolation, or indeed any other interpolation technique or a combination of them, but that usually gets too heavy for normal applications.
One obvious generalisation is to enable the use any old method for calculating the midpoint:
bisect :: (Int -> Int) -> Int -> Int -> Int -> Int
bisect f val low high = low + (high - low) `div` 2
secant :: (Int -> Int) -> Int -> Int -> Int -> Int
secant f val low high = mid
where
mid = low + (val - fl) * (high - low) `div` (fh - fl)
fm = f mid
fl = f low
fh = f high
intSearch mf f val low high = loop low high
where
loop low high
| high < low = Nothing
| (f mid) > val = loop low (mid -1)
| (f mid) < val = loop (mid + 1) high
| otherwise = Just mid
where
mid = mf f val low high
binarySearch' :: (Int -> Int) -> Int -> Int -> Int -> Maybe Int
binarySearch' = intSearch bisect
interpSearch' :: (Int -> Int) -> Int -> Int -> Int -> Maybe Int
interpSearch' = intSearch secant
Still fairly readable, and more importantly, the relation between the underlying maths - the recurrent function, and the code is explicit. The code can be refactored further to suit other needs. For example, it can be refactored into a fold . build pipeline, or as a corecursive functions. Each of those can help with compiler optimisations and more importantly to serve as blocks for compositional search algorithm lego. Alas, that will wait for now.