Visitor menu

Genetic Programming
I programmed a Genetic Programming classifier based on Koza for Weka.

1. Developer
2. Objective
3. What is it?
4. System structure
5. Download and installation
6. Contribution

1. Developer of GP for Weka

Yan Levasseur:
contact me
Imagery, Vision and Artificial Intelligence Lab (LIVIA), http://www.livia.etsmtl.ca/?LANG=en
Ecole de Technologie Superieure (ETS), Montreal, Canada, http://www.etsmtl.ca/english/index.html

2. Objective

Many Genetic Programming (GP) tools are already available on the internet. Whey create another?
I wanted to develop a collaborative GP tool that would include as many options as possible, and that would be part of a software that already includes a large library of algorithms. This should allow easy comparison of classifier performance. Weka is a popular tool for researchers of the IA / data-mining domain. I thought important that Weka had its own GP algorithm.

The GP algorithm shoud be user-friendly, easy to compare, fast and modular. This last point is of extreme importance for a collaborative project. Ease to adapt the algorithm to the particularities of our specific research is of great interest in an open software.

Many new techniques have been elaborated for GP since its creation. Theses techniques modify the structure of the program, its creation and evolution. In order to study in details the possibilities of GP, every promising feature should be implemented (as an option) in our algorithm. At least, we should design the structure for later integration of theses techniques.

3. What is it?

It's an algorithm to be used with
Weka open source (GPL license) data-mining software.

Our Genetic Programming (only tree structure available for now) is inspired by Koza (1) and Banzhaf et al. (2). It can perform symbolic regression (continuous) or classification. Classification uses multiple one-against-all classifiers. Better accuracy is obtained on classification by the integration of a boosting technique (3).

(1) Koza, J.R. (1992), Genetic Programming: On the Programming of Computers by Means of Natural Selection, MIT Press
(2) Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D. (1998), Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications, Morgan Kaufmann
(3) Yoav Freund and Robert E. Schapire (1996). Experiments with a new boosting algorithm. Proc International Conference on Machine Learning, pages 148-156, Morgan Kaufmann, San Francisco.

The third chapter the master thesis I wrote give more details about the inside and options of the algorithm. You can download this chapter (only available in French for now) with this link.

4. System structure

The programming language of Weka is Java, and fast and portable object-oriented language.
To have a better idea of the class interaction of the whole algorithm, have a look at theses two UML class diagrams :
The main class of the algorithm is :
weka.classifiers.functions.GeneticProgramming

Here are the options currently available for the GP algorithm:
  • Proportion of training data used for validation
  • Preprocessing of data
  • Population size
  • Max depth of program trees
  • Stop criterion
  • Functions available for programs
  • Automatically Defined Functions (ADF)
  • Population creation method
  • Program selection method
  • Proportion for each genetic operator
  • Nb of parents and children for each genetic operator
  • Evolution controller
  • Elite Manager
  • Size of Elite
The help box of Weka, or a call of the main class with -h option shows the names and explanations for each available option.

5. Download and installation

The GP algorithm is not in the default stable Weka distribution. To use it, you have to download and install some files. There are two ways to do it.
  • 1. Download Weka 3.4.12 WITH the GP algorithm already integrated. Use the project's website at SourceForge. The release name is WekaGP 1.1 (filename is WekaGP-3-4-12.zip). Unzip the file and use the Weka shortcut to start the application. More information from Weka.

  • 2. Download the GP algorithm for Weka ONLY, using the SourceForge website. The name of the release is GPforWeka 1.0. In this case, you have to extract the files in the folder weka/classifiers/functions of your weka.jar file (you can use 7zip or another file compression tool for that purpose). Note that you could integrate GP to a more recent version of Weka (or a dev version), but without guarantee of working correctly.
About Weka:

6. Contribution

The documentation about the program under the GP algorithm is incomplete, but the program is very clear (at least according to my criterias). I'm not a professional programmer but I have some experience.

Here are a few things that I would like to see added to the algorithm, as options:
  • Graph program structure
  • Linear program structure
  • Allow parallelisation, (the island method)
I also think that it would help to optimize the integrated boosting method and the code for managing programs and population (to reduce the learning time of GP). Finallly, I would like to integrate the GP algorithm to the
RapidMiner software.

This project is Open Source and use the GNU General Public License (GPL). The source code is included in each of the packages at SourceForge. If you would like to collaborate with me on this project, please contact me.

Created by: Yan last modification: Saturday 28 of March, 2009[20:40:32 UTC] by Yan