Stabilization of Thermal Medical Images based on user-selected area of interest. 

I’ve converted a powerpoint presentation I made in 2006 about the algorithm we developed in the CWU imaging lab for alignment of brain images to HTML. Here it is!

Image Registration

  • Image registration is the process of aligning images in such a way that their features can be related
  • For medical purposes, accurately registering images is essential for proper diagnosis

Uses of Image Registration

  • Characterization of normal vs. abnormal Shape/variation
  • Functional brain mapping/removing shape variation
  • Surgical planning and evaluation
  • Image guided surgery
  • Pre-surgical Simulation

Methods of Image Registration

  • Manual
  • Automatic
      • Rigid
        • Global autocorrelation
        • Affine transform using points
        • Area ratios
    • Non-rigid
      • Mathematical models

Problem

  • We have a sequence of thermal images with local movement, global movement, and temperature changes
  • Movement can be indistinguishable from temperature change
  • The global movement needs to be removed
  • Local movement needs to be preserved

Factors

  • A thermal image is a set of discrete pixels
  • In a sequence of thermal images, each pixel intensity can be defined in terms of the previous slide in the sequence
  • pi,j,k+1= GlobalMotionPixel(pa,b,k) + ThermalImpact(pa,b,k)+ LocalMotion(pa,b,k)

Solvability

  • Global movement can only be solved when local movement and the temperature impact from surrounding pixels is known. Otherwise, the components are indistinguishable by observing the intensity of the destination pixels.

Other sources of information

  • Heartbeat
    • Detectable by looking at changes in blood vessel size
    • Can be used to find the temperature effects of blood flow
  • Ruler
    • Is detectable by using nonlinear color filters
    • Known to not have global movement
  • Bones
    • Easily seen and resistant to both local movement and temperature change

Ruler

Picture1

  • Not visible in grayscale
  • Very low resolution
  • Influence from other pixels heavily distorts ruler’s edge
  • Use of an artificial ruler may help

Bones

Picture2

  • Easy to see in grayscale
  • No local movement
  • Very little temperature change
  • Temperature influence from outside pixels is small

Basics of Autocorrelation

 

Image A
Image A
Image B
Image B
Image A minus Image B
Image A minus Image B
  • Place image A on image B
  • Subtract pixels (or subpixels)
  • Move image B in some direction then do step 2 again
  • The location where image B had the least difference from image A is the position where image B is registered with image A

Problems

  • Autocorrelation of whole images takes too long
  • Correlation of the entire image will remove some desired movement-like effects
  • Some global movement remains

Enhancements to Autocorrelation

  • Let user select most important features using different color mappings to bring out details
  • Do autocorrelation at a subpixel level on the parts that the user selected rather than the whole image
Before
Unfiltered image with no selections
Same image as before showing selected regions with various filters applied
Same image as before showing selected regions with various filters applied

Advantages

  • Letting user chose the most stable areas minimizes the chance of the desired movement being removed
  • Much faster than autocorrelation
  • Experimentally shown to produce better shifts

Measuring results

  • Compute pixel differences for selected area before and after shifts
  • Compute max, average, and min difference for each pixel
  • Average improvement =(Average difference before shifts) / (Average difference after shifts)

Results

Picture8

 

  • Average improvement of 1.66 for selected area
  • Experiments using an artificial ruler showed slightly less improvement

Comparison with Area Ratio Conflation Algorithm

  • Regions were chosen in areas with large amounts of temperature change
  • Best regions not considered.

    Comparison with Area ratio conflation algorithm.
    Comparison with Area ratio conflation algorithm.

Comparison with UCSB WebReg

  • Our algorithm
  • UCSB method
  • Worst pixel difference: 15.5 average pixel difference: 4.77259 best pixel difference: 0
  • worst pixel difference: 30.45 average pixel difference: 10.29 best pixel difference: 0.08
Points chosen by UCSB method
Points chosen by UCSB method
  • Our algorithm’s average improvement over UCSB WebReg method: 2.715
User selected area of interest in brain image used.
User selected area of interest in brain image used.

Future research

  • Use predictive thermal models to better match images
  • Try to learn parameters that users choose to identify stable image areas
  • Use a database of known anatomical features to help in identifying points that should remain stable

This blog post is based on a paper I coauthored called “An algorithm to stabilize a sequence of thermal brain images” and published in Proceedings of SPIE – The International Society for Optical Engineering 6512 · February 2007.

You can read the full paper here:

https://www.researchgate.net/publication/252234562_An_algorithm_to_stabilize_a_sequence_of_thermal_brain_images

Kovalerchuk, Boris, Joseph Lemley, and Alexander M. Gorbach. “An algorithm to stabilize a sequence of thermal brain images.” Medical Imaging. International Society for Optics and Photonics, 2007.

Stock market prediction using machine learning (elman, regression, and GMDH)

My primary interest is machine learning and computer vision, but in winter quarter, I took a graduate course in computational statistics.

We had a fun group project that involved using R to analyze stock prices which later turned into a presentation at SOURCE 2016 when we added some machine learning techniques to make it more interesting.

There is a great R package called Quantmod which we used to get stock data. “http://www.quantmod.com/

It is very easy to use, for example:


library(quantmod)
library(ggplot2) # Include ggplot so we can graph it. 
start <- as.Date("1986-03-01")
end <-as.Date("2015-12-30")
getSymbols(c('AAPL','MSFT','^IXIC','NDX'), from = start, to = end)

Loads the Quantmod package and gets stock price information between 1986-03-01 and 2015-12-30 for Apple, Microsoft, NASDAQ, and the Nasdaq Composite automatically.

Want to quickly graph the closing prices of Microsoft stocks during that time? That’s just 2 lines of code:



MSFT.df = data.frame(date=time(MSFT), Cl(MSFT))

ggplot(data = MSFT.df, aes(x = date, y = MSFT.Close)) + geom_point() + geom_smooth(se = F) + labs(x = "Date", y = "Close")

Closing prices of Microsoft stock as given by quantmod package and graphed with ggplot2.
Closing prices of Microsoft stock as given by quantmod package and graphed with ggplot2.

As you can see, R facilitates very fast data analytics.

We went on to make some simple predictive regression models and used the R packages RSNNS and GMDH package.

Like most R packages, it’s very easy to use RNSS:


library(quantmod)#for stock data
library(RSNNS) # Stuttguart neural network simulator. 

The training and prediction code segment is here:


modelElman = elman(df$date, df$MSFT.Close, size=8, learnFuncParams=c(0.1),maxit=1000)
predictions = append(pre,predict(modelElman,n+1)[1])

We ran this in a loop to get a series of predictions for various dates.

It’s similarly easy to use the GMDH model:


#####create time series
n = nrow(df)
stock <- ts(df, start=1, end=n, frequency=1)
#####predict
out = fcast(stock, input = 3, layer = 4, f.number = 1, tf = "all")
pre = append(pre,out$mean[1])

We then did a simulation to see which method performs the best on a range of stock values using a simple investment strategy:

Every time the model says the stock prices will go up tomorrow, buy 10 shares.
Every time the model says the stock prices will go down tomorrow: sell everything!
Continue for a year.

Elman neural networks gave the best results on a per stock basis, followed very closely by GMDH and regression far behind. Interestingly, however, if you were to follow this strategy with all the models in 2015, you would actually gain money from both Elman and regression. Surprisingly, GMDH lost money.

This is what you’d make if you used our model and investment strategy using YAHOO, JP morgan, CMS Energy Corporation, Verizon APPLE and Microsoft.

2015:
ELMAN $1334.999
REGRESSION $383.696
GMDH $-623.0998

It’s surprising that an Elman neural network did this well with only closing prices. Obviously, closing prices alone are not very reliable predictors of future stock prices but it managed anyway.

Clearly no one should actually use such a simple method with real money at stake, but it’s still interesting.

I’m graduating and I’ve been accepted to a PhD program in Ireland

I’m very happy to announce that I’m graduating and also, I’ve been accepted to a PhD program in Ireland. I can’t wait to work to push the boundaries of my knowledge of deep learning and image processing.

I defended my Masters Thesis on May 27 and am all set to graduate this spring. The thesis defense went very well. My wife and a friend recorded it. I’d upload a recording of it here but there is a temporary embargo on my thesis because we plan a third publication based on some of the work I’ve been doing the last month or so if it works out.

thesisAnnouncement

On June 11, I’ll be speaking in Atlanta at COMPSAC 2016 to present my research on finding large empty areas in high dimensional spaces.

I also published a full paper at the “Modern AI and Cognitive Science Conference” at the Modern AI and Cognitive Science Conference. The title was: Comparison of Recent Machine Learning Techniques for Gender Recognition from Facial Images and can be accessed here: Modern AI and Cognitive Science 2016 paper 21
I also made a presentation aimed at a more general audience, which I presented at SOURCE 2016 and which can be accessed here: http://digitalcommons.cwu.edu/source/2016/cos/2/

It feels strange (but nice) to not have any urgently pressing deadlines after months of non-stop urgency.

Thesis due in 18 days

I’ve been too busy to write anything here for a while.

Some interesting output for my algorithm on a 2600 dimensional dataset with around 19000 entries:

If(D2: Is Between 0 and 0.756863) AND
(D44: Is Between 0.069281 and 1) AND
(D225: Is Between 0 and 0.538562) AND
(D301: Is Between 0.0993464 and 1) AND
(D575: Is Between 0.0627451 and 1) AND
(D669: Is Between 0.538562 and 1) AND
(D752: Is Between 0.231373 and 1) AND
(D823: Is Between 0 and 0.598693) AND
(D1033: Is Between 0.454902 and 1) AND
(D1172: Is Between 0 and 0.635294) AND
(D1262: Is Between 0 and 0.945098) AND
(D1269: Is Between 0.0300654 and 1) AND
(D1418: Is Between 0 and 0.929412) AND
(D1509: Is Between 0 and 0.947712) AND
(D1577: Is Between 0 and 0.96732) AND
(D1615: Is Between 0.290196 and 1) AND
(D1629: Is Between 0.266667 and 1) AND
(D1787: Is Between 0.266667 and 1) AND
(D1977: Is Between 0 and 0.971242) AND
(D1986: Is Between 0 and 0.971242) AND
(D2130: Is Between 0 and 0.988235) AND
(D2177: Is Between 0 and 0.831373) AND
(D2287: Is Between 0.133333 and 1) AND
(D2416: Is Between 0.0261438 and 1) AND
(D2507: Is Between 0 and 0.836601) AND
(D2566: Is Between 0.0862745 and 1) AND
All other attributes rang from 0 to 1.

Then the hyper-rectangle bound by those points is empty and has a volume of 0.00351057

If you counted every elementary particle in the universe, that number would be MUCH smaller than the number of holes in this dataset.

Installing Theano, Scikit-learn, and Scipy on Windows with Cygwin.

I’ve started writing computer vision applications using Theano but I’ve found it does not work very well with the Windows 7 + Visual Studio 2013 installed in the lab. The Theano docs indicate that Theano “should” be able to work with Cygwin but that it’s not yet been tested.

So I figured I’d try it out. Here is what I did.

1. Grab Cygwin: https://cygwin.com/
2. From the setup, Install fortran, gcc, blas, linspace, and python 2.7 and any other tools you’d like. Don’t forget to install libraries libgfortranlib, gcc-fortran. To reduce the chances of future problems I choose to install the source for these packages also.

3. Get pip working. Use:
chmod 775 /usr/lib
chmod 775 /usr/bin
wget https://bootstrap.pypa.io/ez_setup.py -O – | python
easy_install pip

4. pip install numpy
5. pip install Tempita

Install SciPy with:
mkdir scipy
cd scipy

Use wget https://github.com/scipy/scipy/archive/master.zip to download the very latest source for scipy.

unzip master.zip
cd scipy-master

$ python setup.py install –user

5. pip install scikit-learn

6. pip install six
7. Install theano from latest build.

pip install –upgrade –no-deps git+git://github.com/Theano/Theano.git –user

8. Find location where Theano stores cutils_ext (On my system it is ~/.theano/compiledir_CYGWIN_NT-6.1-2.3.1-0.291-5-3-x86_64-64bit–2.7.10-64/cutils_ext)

Copy cutils_ext.pyd to a new file named cutils_ext.dll (if you don’t than importing theano will fail)

cp cutils_ext.pyd cutils_ext.dll

If you do #6 with just “pip install theano” it will try to uninstall your scipy and proceed to download a buggy one. Don’t let it!

Note: If you came here from google and tried to install scipy 0.16.1 on sunos or cygwin you probably are looking for this:

There is currently a bug in the above scipy branch related to a variable named infinity. It was fixed in november for the master build but not updated to the 0.16.1 archive.

If you use 0.16.1 from that link you’ll get a compile error on cygwin:

“error: Command “g++ -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix-map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python-2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall -I/usr/include/python2.7 -I/usr/lib/python2.7/site-packages/numpy/core/include -Iscipy/spatial/ckdtree/src -I/usr/lib/python2.7/site-packages/numpy/core/include -I/usr/include/python2.7 -c scipy/spatial/ckdtree/src/ckdtree_query.cxx -o build/temp.cygwin-2.3.1-x86_64-2.7/scipy/spatial/ckdtree/src/ckdtree_query.o” failed with exit status 1″
See this commit for more info if you want to investigate that error: https://github.com/scipy/scipy/commit/832baa20f0b5

If you get error:

Traceback (most recent call last):
File ““, line 1, in
File “theano/__init__.py”, line 72, in
from theano.scan_module import scan, map, reduce, foldl, foldr, clone
File “theano/scan_module/__init__.py”, line 41, in
from theano.scan_module import scan_opt
File “theano/scan_module/scan_opt.py”, line 65, in
from theano import tensor
File “theano/tensor/__init__.py”, line 7, in
from theano.tensor.subtensor import *
File “theano/tensor/subtensor.py”, line 26, in
import theano.gof.cutils # needed to import cutils_ext
File “theano/gof/cutils.py”, line 295, in
compile_cutils()
File “theano/gof/cutils.py”, line 260, in compile_cutils
preargs=args)
File “theano/gof/cmodule.py”, line 2014, in compile_str
return dlimport(lib_filename)
File “theano/gof/cmodule.py”, line 289, in dlimport
rval = __import__(module_name, {}, {}, [module_name])
ImportError: No module named cutils_ext

Try step 8 again.

Algorithms, Writing papers, Python, and classes.

I’ve been so busy that I have not had much time to write anything in a while.

This quarter I’m taking Swim conditioning, Graduate research, “CS 540 – Algorithms for Biological Data Analysis”.

I’m also taking some Coursera courses in security and machine learning as a review.

I particularly like the machine learning course so far. This is the first time I’ve used IPython Notebook which I highly recommend. It is included with Anaconda . The course also introduced me to Graphlab’s SFrame which I’m very impressed by.

In just a few lines of code, I was able to create and test a regression model from a file and display it on a very nice graph.

I’m also the teaching assistant for CS 528 – Advanced Data Structures and Algorithms. I’m teaching the labs and grading the homework. Last Thursday I went over dictionaries, lists, functions (including lambda functions which I think are awesome) and discussed python’s string functions which will be needed for next weeks homework.

I’ve been working on getting a paper finished that I plan to submit to an international this month.

Searching for Maximal Holes in Databases

I’ll be giving a presentation, open to the public, tomorrow on my research.

Here is my abstract:

The problem of finding maximal empty rectangles in a set of points in 2D and 3D space has been well studied, and efficient algorithms exist to identify maximal rectangles in 2D space. Unfortunately, such efficiency is lacking in higher dimensions where the problem has been shown to be NP complete when the dimensions are included in the input. We compare existing methods and suggest a novel technique to discover interesting maximal empty hyper-rectangles in cases where dimensionality and input size would otherwise make analysis impractical. Applications include big data analysis, recommender systems, automatic knowledge discovery, and query optimization.
Keywords: Maximal Empty Rectangle, Maximal Cuboid, Big Data

Curse of Dimentionality

I’ve been too busy to update this blog but I have an upcoming presentation on my research about holes in multi dimensional data where I plan to present a new algorithm that can quickly generate useful results.

One of the big problems is that (until now) there are no algorithms that scale well for finding empty hyper-rectangles.

This is related to something called “The curse of dimensionality”.

My wife is finishing up a drawing(she’s an artist) to include in my presentation and I’m to tired to keep working so I figured I’d post something about the curse here.

Here are some interesting lecture notes on The curse of dimensionality: http://math.arizona.edu/~hzhang/math574m/2014Lect10_curse.pdf

Pollard’s rho algorithm for fast factorization in Python

Algorithms to factor numbers are useful in many areas but especially encryption.

The most obvious method of factoring a number (by repeated division) is a very slow process (indeed the security of modern encryption depends on this process being slow)

Pollard’s Rho is a very fast, probabilistic method that can be used to find factors much faster than the standard method.

My implementation, based on the psudocode of page 976 in “Introduction to Algorithms, third edition” is as follows:



from random import randint
from fractions import gcd
from math import *

def PollardRho(n):
    i=0;
    xi=randint(0,n-1);
    k=2
    y=xi
    while i< n:
        i=i+1
        xi=((xi^2)-1)%n
        d=gcd(y-xi,n)
        if d!=1 and d!=n:
            print d;
        if i==k:
            y=xi
            k=2*k

PollardRho(1200)

The first two lines initialize i to 1 and x1 to a random integer between 0 and n-1

The code xi=((xi^2)-1)%n is the recurrence:   x_i=(x^2_{i-1} -1) mod n which produces the x_i value of the infinite sequence.

Most psudocode has this running forever with a while(true) so I modified it to only try n times. This is probably overkill because if n is composite we can expect this method to discover enough divisors to factor n completely after about n^(1/4) updates.

Adaptive Trapezoidal integration in Python

I’ve decided to start blogging about my experiences in graduate school and some of the more interesting projects I’ve been working on. As my first post I’ll cover the idea of trapezoidal integration and the Python code I wrote to do it.

The fundamental idea of trapezoidal integration idea is to use a sequence of trapezoids to perform integration numerically. Numerical integration is also called numerical quadrature.


from math import *

f1 = lambda x: x**3-x**2+3*x-cos(x)*x+atan(sin(x)+1)

First we import the math library, nothing special about that.

The next line may be unfamiliar for people who have not explored Python. Python allows so called “Lambda Functions”. It’s a really cool feature that is great for declaring mathematical functions (as opposed to functions that perform tasks that are not mathematical)

The above code is equivalent to saying f(x) = x^3-x^2+3x-cos(x)*x+arctan(sin(x)+1)


def ti1(f,a,b):
    return ((b-a)/2)*(f(a)+f(b))

def diff2(f,x,h=1E-6):
    r=(f(x-h)-2*f(x) + f(x+h))/float(h*h)
    return r
    
def trapezint(f,a,b,n):
    h=(b-a)/n
    sum=0    
    part1=(0.5)*h*(f(a)+f(b))
    for i in range(1,n):
        xi=a+i*h
        sum=sum+f(xi)        
    return part1+h*sum

def adaptivetrap(f,a,b,ep):
    
    max=0    
    step=float(abs(a-b)/1000)
    i=0
    while (i<1000):
        i=i+1        
        adj=a;
        adj=a+step*i;
        dval=diff2(f,adj)
        if(abs(dval)>max):
            max=abs(dval)
            
    h=sqrt(12*ep)*((b-a)*max)**.5
    n=(b-a)/h        
    return trapezint(f,a,b,int(ceil(n)))
    
print adaptivetrap(f1,0.0,10.0,1E-5)

The function, adaptivetrap, above takes the f1 we defined above and integrates it using the adaptive trapezoid rule from 0 to 10 with an error tolerance of 1E-5.