Geoff Hinton talking about neural networks (2007)

Saw this on youtube, G. E. Hinton on neural networks for machine learning, talk at google.

Advertisements

Doppleganger search

 

 

Image

Sophie Robehmed, a freelance journalist is searching for her own doppleganger, using the power of the internet, social networking, like twitter and facebook, doyoulooklikeme.wordpress.com, etc. It seems like Google’s neural network based image search (search link for above image) (search with alternative image)
is not able to provide this functionality as yet.

The image search is able to find other copies of the original flyer, or lots of pictures with women with brown hair, pictures of people with surrounded by similar colours as the image you are searching with… I guess that is pretty impressive but not doppelganger level yet. I guess this is to be expected since it is image search, not face search. I think better software is available for face matching. I think there is software available for automatic tagging of pictures (facebook would make a good database for training your neurons!).

Detextify – Find latex symbols

This website (http://detexify.kirelabs.org/classify.html) is really useful if you don’t know the markup/command for the LaTex symbol you want to write. Provided you are able to draw it.

Detexify

The website talks about the software learning how to recognise the symbols so I guess it is using some neural network scheme.

Latex Poster/ Modelling Thermal Conductivity

Norman Gray and Graeme Stewart at Glasgow have provided useful examples to make posters using Latex.

I used these to make a tex file, which produced a poster using pdflatex. Should also be possible using latex command if you include graphics as eps rather than pdf. Make sure if you convert graphics to pdf or ps that you have a bounding box (use pstoeps command for example), otherwise your poster will break horribly. It was important to convert from eps to pdf using the epstopdf command rather that ps2pdf.

This saved me time by allowing me to use the equations from latex directly, and allowed me to include my graphs at high resolution – it doesn’t make sense to convert to jpg or other bit map to include in a powerpoint presentation file.

You can find the poster here: Poster (best viewed with xpdf)
and the latex file to make it here: Poster source file

Thanks to the people who showed an interest in the poster. I wish I had taken picture at the conference to make this post more dynamic!

You can also see this poster on my website http://mathewpeet.org/publications/posters/

Empirical Rant

In metallurgy we often term very simple models to be `empirical models’ in contrast to `physical models’. I really wish there was a better name for the `empirical models` – since physical models are more empirical, and `empirical models’ are actually less empirical. Use of such equations can be very useful because they do provide a summary of observations with-in some range of observed behaviour. Even when a physical model exists these simple models are often still preferred because of the ease with which they can be used.

The source of my confusion is the now contradictory uses of the word empirical…

Physical models incorporate more physical understanding, are based on a theoretical understanding. Any theory can only be based on, and validated against, observations. (Edit: i.e. empirical observations)

A better description for our `empirical models’ would be Ad-hoc, make-do, summary or arbitrary.

Comparison of empirical and physical models
This is best described by an example. The martensite start temperature (MS) is often described by an equation of the form; MS = A*XC + B*XMn + C*XCr…

MS(C) = 521 – 353.C – 225.Si – 24.3.Mn – 27.4.Ni 0 17.7.Cr – 25.8.Mo

Another example is the use of various ‘carbon equivilant’s.
Carbon Equivilant = CE = C + Mn/5 + Mo / 5 + Cr/10 + Ni/50

Thomas Sourmail and Carlos Garcia-Mateo have written a paper on prediciton of M_S by various methods,
(Critical assessment of models for predicting the Ms temperature of steels, T. Sourmail and C. Garcia-Mateo Comp. Mater. Sci., 2005:34, p323-334) it is available on Thomas’s webpage;Predicting the martensite start temperature (Ms) of steels.

Ms/ K, all compositions in wt%
[8] 772-316.7C-33.3Mn-11.1Si-27.8Cr-16.7Ni-11.1Mo-11.1W
[9] 811-361C-38.9Mn-38.9Cr-19.4Ni-27.8Mo
[10] 772-300C-33.3Mn-11.1Si-22.2Cr-16.7Ni-11.1Mo
[11] 834.2-473.9C-33Mn-16.7Cr-16.7Ni-21.2Mo
[12] 812-423C-30.4Mn-12.1Cr-17.7Ni-7.5Mo
[12] 785-453C-16.9Ni-15Cr-9.5Mo+217(C)2-71.5(C)(Mn)-67.6(C)(Cr)

Potency of Elements on MS temperature (Change per weight percent).

N C Ni Co Cu Mn W Si Mo Cr V Al
-450 -450 -20 +10 -35 -30 -36 -50 -45 -20 -46 -53 P-1976
  • P-1976 F.B. pickering, `Physical metallurgy of stainless steel developments’, Int. Met. Rev., 21, pp 227-268, 1976.
  • 8 P. Payson and C. H. Savage. Trans. ASM, 33:261-281, 1944.
  • 9 R. A. Grange and H. M. Stewart. Trans. AIME, 167:467-494, 1945.
  • 10 A. E. Nehrenberg. Trans. AIME, 167:494-501, 1945.
  • 11 W. Steven and A. G. Haynes. JISI, 183:349-359, 1956.
  • 12 K. W. Andrews. JISI, 203:721-727, 1965.
  • 13 C. Y. Kung and J. J. Rayment. Metall. Trans. A, 13:328-331, 1982.

Neural network models have been developed to predict both martensite start and bainite start temperatures. It is also possible to calculate these using ‘physically’ based models based on thermodynamics.

They’re all just maths! 🙂

Casino Royalties

Bond

The new bond movie, Casino Royale has been released for a few weeks now, and seems to be doing fairly well in the Box office.

Previously I had posted about my neural network model of James Bond Box office takings and predicted the movie should make between 350-500 million USD depending upon the amount of kiss kiss and bang bang in the film. The ‘average’ bond film would make 350 million and a high grossing film should make 350-500 million.

Network, Neural Network. License to model?

Image from James Bond starting sequence.

I’ve created a neural network analysis of James Bond movies using neuromats model manager software. The neuromat software implements a neural network using a Bayesian statistics framework using the methods developed by David Mackay.

Training the model

I hoped to find which factors are important to make a successful Bond film and then to predict the revenue of the James Bond film to be released in November, Casino Royale. The model is limted to variables that can be easily quantified, and to which I had easy access.

The data included inputs of, the number of female conquests by Bond, the number of Martinis he drinks, the number of licensed kills, the year of the film, and the number of times Bond introduces himself with the catch-phrase ‘Bond, James Bond. The world wide box office for the film in dollars was discounted to a present day value, using the year of release and data of US inflation rate.

After adjusting the box office takings for inflation the database looked like this:

   "Conquests"   "Martinis"   "Kills"   "BJB"    "Year"    "M$*2006"      "Label"
        2              2         16       1      2002       470.44      "Die_Another_Day"
        3              1         19       2      1999       423.91      "The_World_Is_Not_Enough"
        3              1         25       1      1997       419.98      "Tomorrow_Never_Dies"
        2              1         12       1      1995       465.61      "GoldenEye"
        2              1         12       1      1989       262.48      "Licence_To_Kill"
        2              2          2       1      1987       347.68      "The_Living_Daylights"
        4              0          5       2      1985       292.92      "A_View_To_A_Kill"
        2              0         14       1      1983       381.22      "Octopussy"
        2              0         11       2      1981       480.78      "For_Your_Eyes_Only"
        3              1         14       1      1979       651.71      "Moonraker"
        3              1         14       1      1977       690.11      "The_Spy_Who_Loved_Me"
        2              0          1       2      1974       385.46      "The_Man_With_The_Golden_Gun"
        3              0          6       1      1973       658.51      "Live_and_Let_die"
        1              0          7       1      1971       652.83      "Diamonds_are_Forever"
        3              1          8       2      1969       408.40      "On_Her_Majesty's_Secret_Service"
        3              1          21      0      1967       758.09      "You_only_live_twice"
        3              0          22      0      1965      1004.90      "Thunderball"
        2              1          10     1.5     1964       900.42      "Goldfinger"
        4              0          17      0      1963       575.94      "From_Russia_with_Love"
        3              2          5       1      1962       439.60      "Dr_No"

The data is also represented in the figure directly below, with each variable normalised by dividing by its maximum value in the database.

Variation of inputs and box office with year.

After training on half of the data, 208 potential models where tested by their ability to predict the unseen data. Attempting to create a committee of models from the best models it was found that the best predictions could be made using just one model. This model was retrained using all of the data. Bayesian inference should automatically prevent overtraining since each model represents a distribution of weights, and complex relationships are penalised.

The graph below is a plot of the output against the target after selection of the commitee and retraining with all the data. The failure to have all the points lying on the line could indicate that we haven’t taken account off all the factors which influence the box office takings, with the two most profitable films, Goldfinger and Thunderball, out-performing the expectation of the model.
Committee predictions of the training data.

The significances of each of the inputs percieved by the model shows that the year of the film and the number of kills have a strong influence on the box office takings, as we can see directly below. The number of conquests also has an influence, but the number of Martinis drank and the use of ‘Bond, James Bond” catch-phrase are not very important at the box office.

James Bond input Significances

Bond Movie Trends

To see the trends in the data, predictions were made using average values of the inputs in the database, and stepping each value. The average values were 2.6 conquests, 0.75 martinis, 12.05 kills, 1.12 utterings of ‘Bond, James Bond’ and year of 1979.

Bond, James Bond
Pierce Brosnan drinking a Martini cocktail.
James Bond, Martinis

There is a linear decrease with the world wide box office takings with the year, this may be due to decrease in popularity of James Bond or a general decrease in the total size of the international Box Office. This prediction is for the average number of kills, however films since 1995 had higher than average number of kills (and conquests, Martinis and ‘Bond, James Bond’s) and made ‘average’ box office takings as seen in the database. The recent films have therfore maintained their box office takings at about 400 Million dollars.

There may be a trend that recent films make less at the box office but more from secondary sources such as movie rental, merchanising and cross-promotion, in which case the box office takings may not be the best index to determine the profitabilty or popularity of a film. For example may James Bond computer games exist such as Golden Eye which was popular in it’s own right, and would have generated a large amount of revenue.
James Bond, Year

The number of kills made in the film has a stong positive correlation with the Box Office takings. The most number of kills in a film was Tomorrow Never Dies (1997) with 25 followed by Thunderball (1965) with 22 and You only live twice (1967) with 21 Kills. According to the model the trend continues to higher number of kills. It seems that action is popular in James Bond films.
James Bond, Kills

Daniel Craig poses with the Bond girls from Casino Royale 2006.

Surprisingly by the number of Bonds conquests, has a negative correlation with the Box Office takings. So according to the neural network analysis the producers should minimise the number of times Bond has to make this sacrafice in the line of duty. There is no film in which James Bond neglects to have a female conquests but according to the neural network this would be more profitable.

James Bond, Conquests

In conclusion

According to this simple analysis the box office takings of the next James Bond Movie can be maximised by increasing the number of on screen kills, and contrary to expectation by minimising the number of conquests. The neural network predicts a slightly larger box office with 0 conquests than with 1 conquest, although no bond film exists were he does not go to bed with a bond girl. It would be very brave of the producers to take this action since it has been a factor which is characteristic of the Bond films, however it seems excesses aren’t appreciated, presumably because it takes time away for other kinds of action with broader appeal (or plot development?).

Of the factors included in the database we saw that the number of Martinis drank and the use of the bond catchphrase were not regarded as significant and had flat trend lines with small error bars. These factors can be safety removed from the database in a future model. Many possible inputs can be imagined for example it is possible to get estimates of the budget for each film, which should be related to the number of stunts, or we could count the number of explosions, or the time that ‘bond girls’ are on screen for.

One factor which is not simple to include but which is probably the most frequently discussed is the actor playing Bond. Which there is no simple way to objectively include in the model without extra information, perhaps the best way would be by the wage the actor recieved which should be atleast a measured of his popularity as percieved by the production team).

Daniel Craig will play bond in the 2006 movie, Casino Royale

If 2006 movie had same inputs as previous movies
The graph directly above shows the prediction of the box office if the previous movies were released in 2006. In reality the last 3 movies made 420, 424 and 470 million dollars at the box office, however according to this model they should have made 250, 300 and 150 million if released in 2006. If we simply take an average of the last 3 movies we could expect the bond movie to make 450 million. The highest revenue predicted by the model would be for a new version of Thunderball which is predicted to make 350-500 million. It seems that the model has been influenced by the downward trend in the box office revenue between 1975 and 1990.

The prediction for the ‘average’ film above says that we can expect the movie to make 350 million dollars. Since the production of the film will have included some analysis of the previous movies we can expect that it will have a large amount of violence and action, so we can expect a high amount of kills.

The failure of the model to predict the box office revenue of the most popular films suggests that not all of the most important factors have been included in the film, it might be worth to consider the inclusion of other factors for which data is available for example the estimated budget of the film. The model has suceeded in showing us trends in the data which can give us some idea how a bond film can be optimised.