See section Preprocess Raw Image for details.
There are three ways to open a data file:
To keep the application working smoothly on larger data sets, it is possible to load the data in batches. To switch between single batches use the drop down menu Batch No. There is also a Reload button to update the gallery. The default batch size is 10000. It can be changed in the preferences dialog Preferences Dialog.
Sorting occurs linearly from the upper left to the bottom right corner. It is necessary to define, dependent on the sort algorithm, at least one reference example. After sorting the reference example will be positioned always in the upper left corner of the view. CellCognition Explorer uses by default the cosine similarity as measure. Other sort algorithms are explained here.
Sorting is controlled most easily by the
Sorting Toolbar.
It has buttons to sort ascending, descending and a drop down
menu to select the sort algorithm.
The Quick Sort-button is used as follows:
First click the button and then a reference example in the gallery. Note the button stays pressed until the procedure has finished. Also the cursor changes to a cross if hovered over the gallery.
From top to bottom: Drop down menu for sort algorithm, tool bar with buttons, list view with reference examples. Note: double-click on a item in the list view selects the item in the gallery view to.
To add/remove items to the sorting dock , simply select/mark one or several items in the graphics view simply press the corresponding button (/)in the dock widget. | The tab "Measurements" allows to sort the objects taking only certain feature groups into account. It is same functionality that is descripted in section Feature Groups, but here, the feature selection applies only for sorting. |
Buttons in the tool bar:
Add items | |
Remove items | |
Clear items | |
Sort ascending | |
Sort descending |
Euclidean distance | Rank items simply by its euclidean distance to the reference example, works well in low dimensional space. |
Cosine Similarity | Measures orientation in the feature space not the distance. This is the default sorting method. Robust against outliers in one dimension, works well in high dimensional space. |
Class labels | First sorts items by class labels, second whether it is a training example, and third by prediction probability.
|
Treatment | If during image preprocessing the option DirectoryPerTreatment was used, it is possible to sort the gallery alpha-numerically by treatment, experimental condition resp. If available, class prediction probabilities are used to rank the thumbnails within the treatment groups. |
A white (neutral) square indicates that the item, cell resp. is not classified yet. | |
A colored solid square indicates the class color i.e. the predicted class. | |
A patterned square with a half circle on top indicates the predicted and annotated class. Color of the square indicates the prediction and the color of the half circle the annotation. Prediction and annotation are not necessarily the same. |
Double Click to select an item. To select multiple items, click on many items while holding down the Ctrl-button. Drag the mouse while holding down the Ctrl-button will also select multiple items.
If a thumbnail is hoverd by the mouse coursor, a tooltip pops up that shows an Id number (e.g. Id: #99). The same Id number is saved in the first column in the exported csv file.
Depending on the input data, CellCognition Explorer displays one or more color channels.
It is important to know that CellCognition Explorer does not support RGB
images (3-channels, 8bit only). It supports as many color channels as
there are in the data. The color display also depends on the input
data. The displayed colors are saved to hdf5 along with the numerical image
data.
To support multiple colors particularly means that feature vectors calculated from different
channels are always concatenated.
Zoom: It is
possible to zoom either using the drop down menu or one
presses the Shift-key and scroll the mouse wheel.
Classification: Toggle class indicator squares in the left bottom corner of a thumbnail.
Mask: Toggle black background mask of a thumbnail.
Outline: Toggle outlines in a thumbnail.
Treatment: Toggle text that indicates the experimental condition.
Microscope images can be extremely dim and at least for display purpose it is necessary to adjust the contrast parameters i.e. min. and max. value, brightness and contrast. A dock widget supplies this functionality for each channel independently. It can be toggled using the Menu View->Contrast or using the shortcut Alt-Shift-C.
For prediction it is useful to toggle the class label indicators in the gallery viewer i.e. enable the check box Classification in the view tool bar. Then simply press the Predict button and the view gets updated automatically. Sometimes an item can't be predicted because it's feature vector contains NAN's making prediction impossible. The Predict button is color encoded meaning it's indicating a necessary update:
Predict | Update required: either the parameters of the classifier or the training set has changed. |
Predict | No update required. |
Function of the buttons:
Add class to tree view. | |
Remove class from tree view. | |
Open dialog for cross validation & grid search. | |
allow reassign | If this option is disabled (default) it is not possible to annotate a cell that is already annotated to a different class. This option helps to prevent annotation errors. For reannotation, in case of already wrong annotated cells, enable this option and reannotation is possible without any warning. |
Remove selected items from annotations. | |
Clear class definition and remove all items. | |
Saves the current classifier. | |
Load a previously trained classifier. |
The Dialogs for saving and loading a classifier are similar. To save a classifier a classifier name and an optional description. It is possible to overwrite a classifier. To load a classifier, a drop down menu shows a list of classifiers saved in the currently opened data file.
Main Elements are a tool bar and annotation tree view:
Each class is represented by a line in the tree view. It has a name and color. Name and color can be changed by clicking on the corresponding field in the line. To add items to a particular class, select the items in the the gallery and click on the Add button ()
Open the dialog by clicking on the symbol . If possible grid search is triggered directly after the dialog pops up. The dialog consists of three tabs and some buttons at the bottom.
Cross Validation | Triggers Cross Validation for the current setup of parameters (no grid search). |
Grid search | Trigger Grid Search. |
Apply | Apply current parameters to the classifier, leave the dialog open. |
Ok | Apply current parameters to the classifier and close the dialog. |
The first tab (Cross Validation & Grid Search) is to setup parameters manually: five or ten-fold cross validation, grid size for grid search or setup Gamma and C the cost parameter.
The output
contains different measures of accuracy: Accuracy, F1,Precession and
Recall. A detailed description of the measures can be
found in the documentation of the
sklearn python package.
The result of a grid search is displayed as a contour plot in a logarithmic scale. The yellow cross hairs indicates the optimal parameters. The value displayed is the Accuracy measure mentioned above.
The third tab shows the confusion matrix
By clicking in the plot and using the shortcut t the matplotlib navigationtool shows up at the bottom of a plot. See shortcuts for a detailed description.
This classifier convers data sets which small sample numbers.
Training a automated classifier is here not possible anymore but one wants to maintain
the data output in the same format for subsequent data analysis.
The destinction of training set and test set does not exist. Adding samples to a class in the sidebar,
a sample is immediately classified.
In order to avoid bias due to annotation erros caused by the user, CellCognition Explorer proviedes three different methods for Outlier & Novelty detection.
The parameter setup is slightly different from SVC, since cross validation is not possible one has to setup the parameter semi-automatically.
Nu | Minimum fraction of outliers in the training set (training error). In this use case it is a valid assumption to have a very low number of training errors (1%). |
Gamma | Kernel band width: A smaller value means a lower fraction of support vectors, a higher value means more support vectors. To many support vectors can lead to over fitting. |
Estimate Gamma under the constraint that max. 20% of the training samples are support vectors. The value of 20% can be adjusted in the "Preferences" dialog of the application. Nu still has to be set manually. | |
Add items to the training set (annotate items as inlier). | |
Remove selected items from classifier. | |
Clear all items from the classifier. |
Quantile | Cutoff used to estimate the denstiy threshold. |
Bandwidth | Kernel band width. |
Estimate Kernel bandwidth | |
Add items to the training set (annotate items as inlier). | |
Remove selected items from classifier. | |
Clear all items from the classifier. |
Quantile | Cutoff used to estimate the denstiy threshold. |
Add items to the training set (annotate items as inlier). | |
Remove selected items from classifier. | |
Clear all items from the classifier. |
CellCognition Explorer is an interactive application. To improve performance and prevent unnecessary computation of data some steps in the pipeline are preprocessed such as segmentation, calculation of bounding boxes, features or thumbnails. Image preprocessing is the first step for image analysis and classifier training.
Currently CellCognition Explorer supports 4D Zeiss LSM images and 4D tiff images (ImageJ version of tiff). 4D means (XYZC) i.e. z-stacks and multiple color channels are supported. In terms of an tiff image one color channel has to be saved as single grey level image. i.e. a multi-color image is a multi-page tiff. CellCognition Explorer does not support RGB images!
The tif-standard does not support multidimensional image data as it is used here.
Microscope manufacturers and image processing software projects
(e.g. ImageJ) came up with
their own variant of the format which usually does not comply the tif-standard.
CellCognition Explorer supports reading of lsm files and multipage tifs.
If another format such as PNG are is used to store the data, the dimensional information
has to be stored in the file name. In this case open the Image Dimensions dialog
by clicking on the button, which opens
the Image Dimensions dialog. One can use regular expression
to map the images to a 4d stack.
Separator | String between single coordinates (e.g. "_") |
Z-stack | String at the beginning of the regex group indicating the z-slice. |
Channel | String at the beginning of the regex group indicating the color channel. |
Example using the settings above:
The file tubulin_P0037_T00001_Crfp_Z1_S1.tif has a single underscore as separator, the z-slice is "1" and the color channel is "rfp".
FlatDirectory | Scan the Image directory for lsm and tiff files. No subdirectories are scanned. | |
DirectoryPerTreatment | Scan each subdirectory of the Image directory for lsm and tiff files. No furhter (sub-)subdirectories are scanned. The names of the subdirectories are used as identifiers of the treatment. |
The color channels can be toggled by clicking on the corresponding check boxes. The the color map can be changed to. Contrast sliders work similar as those for the gallery viewer. Additionally there is a "Auto"-button that cuts off 1% of the image histogram and the buttons "Min/Max" and "Reset" are self explanatory.
Hint: It is good practice to use size and intensity filters to reduce the number of foreground objects. Small objects (<100 px²) can be removed since the origin usually from noise and dim objects caused by out-of-focus cells. They have ragged outlines and lot of features evaluate to NAN which make objects unpredictable. |
Close | Close the dialog. |
Start | Start image preprocessing for all images. |
Segmentation is performed on single color channels. For each channel gets a particular
algorithm (plugin) assigned. The program distinguishes between primary- and and non-primary
plugins.
Available primary plugins are:
"Channels 1" | Select the color channel in the drop down menu. The list in the menu is updated automatically, if
the image directory is changed. |
Gallery size | Size of the thumbnails (bounding boxes) around foreground objects. This value is fixed after preprocessing is finished and can't be changed later on. |
Z Project | Standard method for z slice selection. (select, minimum projection, mean, maximum projection max. total intesity) |
Outline smoothing | Smoothes outlines of foreground objects using a morphological closing operation followed by an morphological opening. Dim out of focus cells have usually a ragged outline causing a lot of features to be invalid i.e. their value is NaN. Smoothing outlines helps to overcome this problem and increases classifier performance. The parameter is the size of the structuring element of the morphological operation. Negative values disable outline smoothing. |
Min. contrast | Minimum contrast (i.e. threshold) for local adaptive threshold segmentation. |
Mean radius | Radius in pixels for the Gaussian blurring filter. |
Window size | Filter size used for local adaptive thresholding. |
Watershed | Enable/disable watershed segmentation. |
Seeding size | Size parameter for finding seed for the watershed segmentation. Seeding is applying minimum and maximum filters. This parameter is the the size of the filters. There is only one seed within an area of radius seeding size. |
Intensity filter | Remove objects outside the intensity range (min, max). A value of -1disables the border e.g. (10, -1) filters all objects with a mean intensity lower than 10 (uint8 gray values). |
Size filter | Remove foreground objects outside the range (min, max). A value of -1 disables the border e.g. (100, -1) filters all objects smaller than 100 px². |
Remove border objects | Removes all foreground objects that touches the image boundary (border objects are artefacts). |
Fill holes | Topologically close all foreground objects. |
Intensity normalization | Renormalize input images from min/max (by default 0, 255) to unsigned 8 bit images. The data type of the images determines the range of the values. By default the values 0 and 255 (8bit images) are used. For 16bit and higher bit depths minimum and maximum values of the current image are proposed. |
Drop down menu | One of Expand, Shrink, Ring or Voronoi |
Dist inner | Distance between primary outline and shrunk outline (any value from 0 to inf) |
Dist outer | Distance between primary outline and expanded outline (any value from 0 to inf) |
Intensity Normalization | Renormalize input images from min/max (by default 0, 255) to unsigned 8 bit images. The data type of the images determines the range of the values. By default the values 0 and 255 (8bit images) are used. For 16bit and higher bit depths minimum and maximum values of the current image are proposed. |
Segmentation settings are automatically saved to the data file. Additionally it is possible to save settings as xml file which can be edited manually. Loading or reusing of settings is therefore possible either from the data file or from xml.
Application Preferences can be found in the Menu File -> Preferences.
Use complementary colors for outlines | If enabled, outlines are displayed in complementary colors e.g. thumbnails are red, outlines are cyan. For gray level images the outlines are yellow. If this option is disabled outlines use the same color as the thumbnails. |
Default similarity metric | Set the default metric for the method Quick Sort |
Feature Grouping | Default feature grouping used in the feature selection dialog. |
Enable Segmentation Plugins | If enabled, the drop down menu in the in the segmentation dialog is shown allow different primary segmentation plugins. By default the plugin AdaptiveThreshold is used. |
Max. fraction of outliers | Parameter estimation for the One Class SVM assumes a that max. 20% of the trainings samples are support vectors. This option allows to change the value. A high fraction of support vectors may lead to overfitting. |
Hdf5 compression | Use "gzip", "lzf" or "None" as compression algorithm. Degree of compression ranges from 1 (low compression) to 9 (high compression). "lzf" has no options. Default is gzip, 4. This is a good compromise of speed vs. compression factor. It is recommended to not change the default. |
Batch Size | Default batch size to keep the application working smoothly. The Batch size determines the max. number of thumbnails that are shown in the gallery view. See the Batch Toolbar how to control the Batch loading feature. |
Data Export | If on save raw image data to hdf5. Default is off. |
The term "data file" which was used quite often above means either a hdf5 file, or a cellh5 file.
It is possible to export data for further analysis with external programs.
The export format is a simple csv-file since any data analysis program
can import csv files.
See the menu Data to proceed
Save View Panel as image | Saves the current gallery view as png image (large images size!). |
Save counting statistics | For a quick overview over the counts per class and treatment after classification. Here is an example. The counting statistics does not take samples from the training set i.e. samples used to train the classifier, into account. |
Save data to csv |
Saves data of all cells in the gallery view in a csv file. The convetion is one line per cell. Each column represents a different attribute. i.e. id, class name, class label, treatment, training sample, feature(s). Here is an example. The meaning of the features is documented somewhere else. Note: The first column is the cell Id number. It establishes a relation between destinct thumbmail (Id appears it the tooltip) and a line in the exported table. |
The current implementation uses the descriptors set of the
cellcognition-framework.
It consists of 239 single values including simple features like
mean intensity or roisize but also more abstract ones
like Haralick features, descriping the texture of an
object. It is important to note that different color channels are treated
independently i.e. it is possible to toggle features (or feature groups)
always separately.
By Default all 239 features are used for analysis but it is possible to select only a subset and there are two ways to do this. Select feature by name and select feature by group.
First open the Feature selection Dialog (Menu: View):
Select features by name | Select features by group |
---|---|
This is the general case. Only selected (checked) features are used for sorting and classification. | Here it is possible to toggle features in groups. Feature groups are provided alongside with the feature names by the hdf5 and cellh5 files and updated automatically if a file is opened. Feature groups provide a more intutitive access to the choise of the features since feature names are choosen usually cryptically. Be aware that feature groups are arbitrary. |
Both feature names and feature groups are saved in the hdf file (or ch5 file resp.) and both depend on the image processing applicaton that processed the input data. CellCognition Explorer and Cellcognition use the same feature set. |
Simple box- and bar graphs can be displayed using either the corresponding menu or shortcut. The plots give a quick overview over the class counts and dedicated features. Each color channel generates an independent plot. The colors of the bars are defined by the user in the class definition.
Please note that the plots adjust the ranges and scales automatically. Custom adjustment is possible using the toolbar at the bottom of each plot.
Following plots are available: