Archive for the ‘data’ Category

Accessing Papers from University of Cambridge

1 October, 2007

The University of Cambridge news feed had notified about the new use of Raven Passwords to access electronic resources.

The ATHENS system of controlling access to networked services has been centrally funded as a national service for many years. That funding is being withdrawn by the JISC in favour of national Federated Access Control based on local user login.

ATHENS handles around 3 million users and 250 online services but it is a purely UK solution to providing access to remote services. The JISC have established that a
Federated solution using open source software and international standards will lead to a Single Sign On system with an individual using their institutional login to use both national and local services. This will also tie in with strategies for e-learning and e-science.

Raven is the web authentication system administered by University of Cambridge Computing Service.

Using gnuplot for programming

18 February, 2007

Here’s an example script to calculate stuff in gnuplot. gnuplot is capable of many functions of the c programming language such as sin(x) erf(x). This is easy to extend since it is possible to define functions and perform logic.

Save this into a file calc.gnu and run with with command > gnuplot calc.gnu

# gnuplot script by Mathew Peet
# 18 February 2007
# Example script to calculate something
#
A = 1
B = 2
Answer = A + B
print "The answer: ", Answer

This could be used in conjunction with regression of data to do all sort of useful stuff.

List of supported functions

In general, any mathematical expression accepted by C, FORTRAN, Pascal, or BASIC may be plotted. The precedence of operators is determined by the specifications of the C programming language.

The supported functions include:
abs(x), acos(x), asin(x), atan(x), cos(x), cosh(x), erf(x), exp(x), inverf(x), invnorm(x), log(x), log10(x), norm(x), rand(x), sgn(x), sin(x), sinh(x), sqrt(x), tan(x), tanh(x).

Light cars

25 August, 2006

Heavy cars may be good for the owners but worse for the environment. Roel Boesenkool of Corus presented a the SMEA conference. He noted that new cars are heavier despite technology being developed that could easilty make them lighter. I wrote about this in the previous post about light cars.

The light ineffiecient car is the Ferrari.

I guess fuel doesn’t cost enough yet.

Weight and Fuel Consumption of Cars
Roel Boesenkool presented this graph at SMEA conference in sheffield

Network, Neural Network. License to model?

5 June, 2006

Image from James Bond starting sequence.

I’ve created a neural network analysis of James Bond movies using neuromats model manager software. The neuromat software implements a neural network using a Bayesian statistics framework using the methods developed by David Mackay.

Training the model

I hoped to find which factors are important to make a successful Bond film and then to predict the revenue of the James Bond film to be released in November, Casino Royale. The model is limted to variables that can be easily quantified, and to which I had easy access.

The data included inputs of, the number of female conquests by Bond, the number of Martinis he drinks, the number of licensed kills, the year of the film, and the number of times Bond introduces himself with the catch-phrase ‘Bond, James Bond. The world wide box office for the film in dollars was discounted to a present day value, using the year of release and data of US inflation rate.

After adjusting the box office takings for inflation the database looked like this:

   "Conquests"   "Martinis"   "Kills"   "BJB"    "Year"    "M$*2006"      "Label"
        2              2         16       1      2002       470.44      "Die_Another_Day"
        3              1         19       2      1999       423.91      "The_World_Is_Not_Enough"
        3              1         25       1      1997       419.98      "Tomorrow_Never_Dies"
        2              1         12       1      1995       465.61      "GoldenEye"
        2              1         12       1      1989       262.48      "Licence_To_Kill"
        2              2          2       1      1987       347.68      "The_Living_Daylights"
        4              0          5       2      1985       292.92      "A_View_To_A_Kill"
        2              0         14       1      1983       381.22      "Octopussy"
        2              0         11       2      1981       480.78      "For_Your_Eyes_Only"
        3              1         14       1      1979       651.71      "Moonraker"
        3              1         14       1      1977       690.11      "The_Spy_Who_Loved_Me"
        2              0          1       2      1974       385.46      "The_Man_With_The_Golden_Gun"
        3              0          6       1      1973       658.51      "Live_and_Let_die"
        1              0          7       1      1971       652.83      "Diamonds_are_Forever"
        3              1          8       2      1969       408.40      "On_Her_Majesty's_Secret_Service"
        3              1          21      0      1967       758.09      "You_only_live_twice"
        3              0          22      0      1965      1004.90      "Thunderball"
        2              1          10     1.5     1964       900.42      "Goldfinger"
        4              0          17      0      1963       575.94      "From_Russia_with_Love"
        3              2          5       1      1962       439.60      "Dr_No"

The data is also represented in the figure directly below, with each variable normalised by dividing by its maximum value in the database.

Variation of inputs and box office with year.

After training on half of the data, 208 potential models where tested by their ability to predict the unseen data. Attempting to create a committee of models from the best models it was found that the best predictions could be made using just one model. This model was retrained using all of the data. Bayesian inference should automatically prevent overtraining since each model represents a distribution of weights, and complex relationships are penalised.

The graph below is a plot of the output against the target after selection of the commitee and retraining with all the data. The failure to have all the points lying on the line could indicate that we haven’t taken account off all the factors which influence the box office takings, with the two most profitable films, Goldfinger and Thunderball, out-performing the expectation of the model.
Committee predictions of the training data.

The significances of each of the inputs percieved by the model shows that the year of the film and the number of kills have a strong influence on the box office takings, as we can see directly below. The number of conquests also has an influence, but the number of Martinis drank and the use of ‘Bond, James Bond” catch-phrase are not very important at the box office.

James Bond input Significances

Bond Movie Trends

To see the trends in the data, predictions were made using average values of the inputs in the database, and stepping each value. The average values were 2.6 conquests, 0.75 martinis, 12.05 kills, 1.12 utterings of ‘Bond, James Bond’ and year of 1979.

Bond, James Bond
Pierce Brosnan drinking a Martini cocktail.
James Bond, Martinis

There is a linear decrease with the world wide box office takings with the year, this may be due to decrease in popularity of James Bond or a general decrease in the total size of the international Box Office. This prediction is for the average number of kills, however films since 1995 had higher than average number of kills (and conquests, Martinis and ‘Bond, James Bond’s) and made ‘average’ box office takings as seen in the database. The recent films have therfore maintained their box office takings at about 400 Million dollars.

There may be a trend that recent films make less at the box office but more from secondary sources such as movie rental, merchanising and cross-promotion, in which case the box office takings may not be the best index to determine the profitabilty or popularity of a film. For example may James Bond computer games exist such as Golden Eye which was popular in it’s own right, and would have generated a large amount of revenue.
James Bond, Year

The number of kills made in the film has a stong positive correlation with the Box Office takings. The most number of kills in a film was Tomorrow Never Dies (1997) with 25 followed by Thunderball (1965) with 22 and You only live twice (1967) with 21 Kills. According to the model the trend continues to higher number of kills. It seems that action is popular in James Bond films.
James Bond, Kills

Daniel Craig poses with the Bond girls from Casino Royale 2006.

Surprisingly by the number of Bonds conquests, has a negative correlation with the Box Office takings. So according to the neural network analysis the producers should minimise the number of times Bond has to make this sacrafice in the line of duty. There is no film in which James Bond neglects to have a female conquests but according to the neural network this would be more profitable.

James Bond, Conquests

In conclusion

According to this simple analysis the box office takings of the next James Bond Movie can be maximised by increasing the number of on screen kills, and contrary to expectation by minimising the number of conquests. The neural network predicts a slightly larger box office with 0 conquests than with 1 conquest, although no bond film exists were he does not go to bed with a bond girl. It would be very brave of the producers to take this action since it has been a factor which is characteristic of the Bond films, however it seems excesses aren’t appreciated, presumably because it takes time away for other kinds of action with broader appeal (or plot development?).

Of the factors included in the database we saw that the number of Martinis drank and the use of the bond catchphrase were not regarded as significant and had flat trend lines with small error bars. These factors can be safety removed from the database in a future model. Many possible inputs can be imagined for example it is possible to get estimates of the budget for each film, which should be related to the number of stunts, or we could count the number of explosions, or the time that ‘bond girls’ are on screen for.

One factor which is not simple to include but which is probably the most frequently discussed is the actor playing Bond. Which there is no simple way to objectively include in the model without extra information, perhaps the best way would be by the wage the actor recieved which should be atleast a measured of his popularity as percieved by the production team).

Daniel Craig will play bond in the 2006 movie, Casino Royale

If 2006 movie had same inputs as previous movies
The graph directly above shows the prediction of the box office if the previous movies were released in 2006. In reality the last 3 movies made 420, 424 and 470 million dollars at the box office, however according to this model they should have made 250, 300 and 150 million if released in 2006. If we simply take an average of the last 3 movies we could expect the bond movie to make 450 million. The highest revenue predicted by the model would be for a new version of Thunderball which is predicted to make 350-500 million. It seems that the model has been influenced by the downward trend in the box office revenue between 1975 and 1990.

The prediction for the ‘average’ film above says that we can expect the movie to make 350 million dollars. Since the production of the film will have included some analysis of the previous movies we can expect that it will have a large amount of violence and action, so we can expect a high amount of kills.

The failure of the model to predict the box office revenue of the most popular films suggests that not all of the most important factors have been included in the film, it might be worth to consider the inclusion of other factors for which data is available for example the estimated budget of the film. The model has suceeded in showing us trends in the data which can give us some idea how a bond film can be optimised.

Some effects of Alloying elements in Steel

21 April, 2006

Alloying additions are commonly added to steels to;

  • increase hardenability,
  • improve strength,
  • improve mechanical properties (at operating temperature),
  • improve toughness for a given strength or hardness,
  • increase wear resitance,
  • improve magnetic properties.

Increasing the hardenability means that pearlite transformation will be delayed to longer times. This means it is easier to obtain martensite or bainite on cooling, or by isothermal holding after cooling past the pearlite start temperature.

Classification of alloying elements by Bain in The Alloying Elements in Steel

Dissolved in Ferrite

Ni, Si, Al, Zr, Mn, Cr, W, Mo, V, Ti, P, S (?) Cu.

Nickel, silicon, aluminium, zirconia, manganese, chromium, tungsten, molybdenum, vanadium, titanium, phoshorous, sulphur and copper.

Combined in Carbide

Mn, Cr, W, Mo, V, Ti.

Manganese, chromium, tungsten, molybdenum, vanadium, titanium.

In Nonmetallic Inclusions

SiO2, MxOy, Al202, etc

ZrO, MnS, MnFeO, MnO, SiO2, CrxOy

VxOy, TixOy, MnFeS, ZrS

Special Intermetallic Compounds

Ni-Si Compound (?), AlxNy, ZrxNy

VxNy, TixNyC2, TixNy

Elemental state

Cu above 0.8%

Pb (?)

The effects of common alloying elements in steel was summarised as follows (data from ‘Metals Handbook’ 1948, American Society for Metals, Metals Park, Ohio.

Al - Aluminium

Solid Solubility

In Gamma Iron (austenite)

1.1 % (increased by C)

In Alpha Iron (ferrite)

36 %

Influence on ferrite

Hardens considerably by solid solution.

Influence on austenite (hardenability)

Increases hardenability mildly, if dissolved in austenite.

Influence exerted through carbide

Carbide forming tendency

Negative (graphitizes).

Action during tempering

-

Principal functions

  • Dexodises efficiently.
  • Restricts grain growth (by forming dispersed oxides or nitrides).
  • Alloying element in nitriding steel.

Cr - Chromium

Solid Solubility

In Gamma Iron (austenite)

12.8 % (20 % with 0.5 C)

In Alpha Iron (ferrite)

Unlimited

Influence on ferrite

Hardens slightly; increases corrosion resistance.

Influence on austenite (hardenability)

Increases hardenability moderately.

Influence exerted through carbide

Carbide forming tendency

Greater than Mn; less than W.

Action during tempering

Mildly resists softening.

Principal functions

  • Increases resistance to corrosion and oxidation.
  • Increases hardenability.
  • Adds some strength at high temperatures.
  • Resists abrasion and wear (with high carbon).

Co - Cobalt

Solid Solubility

In Gamma Iron (austenite)

Unlimited

In Alpha Iron (ferrite)

75 %

Influence on ferrite

Hardens considerably by solid solution.

Influence on austenite (hardenability)

Decreases hardenability as dissolved.

Influence exerted through carbide

Carbide forming tendency

Similar to Fe.

Action during tempering

Sustains hardness by solid solution.

Principal functions

  • Contributed to red-hardness by hardening the ferrite.

Mn - Manganese

Solid Solubility

In Gamma Iron (austenite)

Unlimited

In Alpha Iron (ferrite)

3 %

Influence on ferrite

Hardens markedly; reduces plasticity somewhat.

Influence on austenite (hardenability)

Increases hardenability moderately.

Influence exerted through carbide

Carbide forming tendency

Greater than Fe; less than Cr.

Action during tempering

Very little in usual quantities.

Principal functions

  • Counteracts brittleness from sulphur [by forming MnS sulphides).
  • Increases hardenability inexpensively.

Mo - Molybdenum

Solid Solubility

In Gamma Iron (austenite)

~3% (8% with 0.3% C)

In Alpha Iron (ferrite)

37.5% (less with lowered temperature)

Influence on ferrite

Provides age hardening system in high Mo-Fe alloys.

Influence on austenite (hardenability)

Increases hardenability strongly (Mo > Cr).

Influence exerted through carbide

Carbide forming tendency

Strong; greater than Cr.

Action during tempering

Opposes softening, by secondary hardening.

Principal functions

  • Raises grain-coarsening temperature of austenite.
  • Deepens hardening.
  • Counteracts tendency toward temper brittleness.
  • Raises hot and creep strength, red-hardness.
  • Enhances corrosion resistance in stainless steel.
  • Forms abrasion resisting particles.

Ni - Nickel

Solid Solubility

In Gamma Iron (austenite)

Unlimited

In Alpha Iron (ferrite)

10% (irrespective of carbon content)

Influence on ferrite

Strengthens and toughens by solid solution.

Influence on austenite (hardenability)

Increases hardenability mildly, but tends to retain austenite at higher carbon contents.

Influence exerted through carbide

Carbide forming tendency

Negative (graphitizes).

Action during tempering

Very little in small percentages.

Principal functions

  • Strengthens unquenched or annealed steels.
  • Toughens pearlitic-ferritic steels (especially at low temperature).
  • Renders high-chromium iron alloys austenitic.

P - Phosphorus

Solid Solubility

In Gamma Iron (austenite)

0.5%

In Alpha Iron (ferrite)

2.8% (irrespective of carbon content)

Influence on ferrite

Hardens strongly by solid solution.

Influence on austenite (hardenability)

Increases hardenability.

Influence exerted through carbide

Carbide forming tendency

.

Nil

Action during tempering

-

Principal functions

  • Strengthens low-carbon steel.
  • Increases resistance to corrosion.
  • Improves machinability in free-cutting steels.

Si - Silicon

Solid Solubility

In Gamma Iron (austenite)

~2% (9% with 0.35% C)

In Alpha Iron (ferrite)

18.5% (not much changed by carbon).

Influence on ferrite

Hardens with loss in plasticity (Mn Influence on austenite (hardenability)
Increases hardenability moderately.

Influence exerted through carbide

Carbide forming tendency

Negative (graphitizes).

Action during tempering

Sustains hardness by solid solution.

Principal functions

  • Used as a general purpose deoxidiser.
  • Alloying element for electrical and magnetic sheet.
  • Improve oxidation resistance.
  • Increase hardenability of steels carrying non-graphitising elements.
  • Strengthens low-alloy steels.

Ti - Titanium

Solid Solubility

In Gamma Iron (austenite)

0.75% (1% with 0.2 % C)

In Alpha Iron (ferrite)

~6% (less with lowered temperature)

Influence on ferrite

Provides age hardening system in high Ti-Fe alloys.

Influence on austenite (hardenability)

Probably increases hardenability very strongly as dissolved, the carbide effects reduce hardenability.

Influence exerted through carbide

Carbide forming tendency

Greatest known (2% Ti renders 0.5% carbon steel unhardenable).

Action during tempering

Persistent carbides probably unaffected. Some secondary hardening.

Principal functions

  • Fixes carbon in inert particles;
  • reduces martensitic hardness and hardenability in medium Cr steels.
  • prevents formation of austenite in high Cr steels.
  • prevents localised depletion of chromium in stainless steel during long heating.

W - Tungsten

Solid Solubility

In Gamma Iron (austenite)

6% (11% with 0.25C)

In Alpha Iron (ferrite)

33% (less with lowered temperature)

Influence on ferrite

Provides age hardening system in high W-Fe alloys.

Influence on austenite (hardenability)

Increases hardenability strongly in small amounts.

Influence exerted through carbide

Carbide forming tendency

Strong.

Action during tempering

Opposes softening by secondary hardening.

Principal functions

.

  • Forms hard, abrasion resistant particles in tool steels.
  • Promotes hardness and strength at elevated temperature.

V - Vanadium

1 (4% with 0.2% C)

Solid Solubility

In Gamma Iron (austenite)

Unlimited.

In Alpha Iron (ferrite)

Hardens moderately by solid solution.

Influence on ferrite

Unlimited.

Influence on austenite (hardenability)

Increases hardenability very strongly as dissolved.

Influence exerted through carbide

Carbide forming tendency

Very strong

Action during tempering

Maximum for secondary hardening.

Principal functions

  • Elevates coarsening temperature of austenite (promotes fine grain).
  • Increases hardenability (when dissolved).
  • Resists tempering and causes marked secondary hardening.