Please note that the available statistics writer of core.main changed. Now, there are best-candidate-per-generation and candidates-per-generation. Both are configurable to change their behaviour. The attributes are:
prefix-data, a boolean value that makes the writer prefix the attributes by search-space-value and optimisation-space-value-. This is handy in some use cases.
store-search-space, a boolean value that controls if the search space values of a candidate are stored
store-optimisation-space, a boolean value that controls if the optimisation space values of a candidate are stored
@cplump Do you think it would be a good idea to add something like a raw-optimisation-value-of-best-candidate-per-generation and raw-optimisation-value-of-candidates-per-generation? This would allow us to store the optimisation values before any adjustment.
We have the standard use case, where we evaluate an optimisation run via its population individuals and their respective (adapted) fitness values. For this, the above variant sounds good, although an attribute as array, containing a list of (adapted) fitness variations would be handy as well. E.g. optimisation values = ['objective-function', 'adjusted-fitness', 'predicted-value']. Something like additional information might be useful as well, e.g. pareto-classes (@lau_pau) or gof-measures for that given individual, number of violated constraints or generated malus value. @berber, should we make a list?
Then, we have correlation writers. These, however, are somewhat distinct from the actual optimisation and always relate to the entire generation, not just one individual. Maybe, it's worth giving them their own statistics writer instead of only attributes?
I think, we should distinguish between different writers for different situations. A list, such as values = ['objective-function', 'adjusted-fitness', 'predicted-value'] would be possible but would break any seperation of concerns. Currently, I have the feeling, we have to rework the documentation part to make it more flexible and less intrusive as the current structure requires us to calculate everything at the end of a generation. It would easier to allow any component in the calculation process to add information. This is a larger change in the structure and should be moved to the next release, I think.
The middle two were the ones, I was referring to in the above comment with
<Then, we have correlation writers. These, however, are somewhat distinct from the actual optimisation and always relate to the entire generation, not just one individual. Maybe, it's worth giving them their own statistics writer instead of only attributes?>
Testing is surely a good idea, and I think some of them need work because research has evolved since then, so we wouldn't use them like that anymore.
In my opinion, but I may be wrong about that, there are three groups of writers:
focusing on some value on the co-domain side (fitness, objective, adjusted, predicted, pareto-class )
focusing on some value on the domain side (violated constraints of which type, considered amount of malus)
focusing on the entire population on either side (atm: correlated, range-correlated)
No, since the idea was to not repeat logging several times. If you want the search-space values as well, you have to use candidates-per-generation and prediction-per-individual and join the data on the columns target, run, generation, and index. Perhaps, we should revisit this topic for the next version of the "Logging API".
@berber I have done so, see Commit 8d6a4d7f. However, I am wondering whether we should talk about including the non-surrogate specific writers in the general optimisation.dl file as they can also be used when working with benchmarks and not surrogates.
The correlated writer generates the information for the winning individual since the calculation is expensive. We could remove the age from the output as it is not necessary for the use case.
@cplump Can you provide a simple example for range-correlated and constraints?
A more general aspect: Is it possible to make "target" optional? It is only relevant when we're optimising towards different targets, and most cases will almost always optimise towards 0. As you made some search space information optional as well, could that be extended to target?
This is a great idea. Target and run are some kind of external context that we are using to repeat evaluations. I am not sure if there is a short-term way of removing them. Definitely, something for the next release.