birl.utilities.dataset module¶

Some functionality related to dataset

birl.utilities.dataset.args_expand_images(parser, nb_workers=1, overwrite=True)[source]¶

expand the parser by standard parameters related to images:

image paths
allow overwrite (optional)
number of jobs

Parameters

parser (obj) – existing parser
nb_workers (int) – number threads by default
overwrite (bool) – allow overwrite images

Return obj

>>> import argparse
>>> args_expand_images(argparse.ArgumentParser())  
ArgumentParser(...)

birl.utilities.dataset.args_expand_parse_images(parser, nb_workers=1, overwrite=True)[source]¶

expand the parser by standard parameters related to images:

image paths
allow overwrite (optional)
number of jobs

Parameters

parser (obj) – existing parser
nb_workers (int) – number threads by default
overwrite (bool) – allow overwrite images

Return dict

birl.utilities.dataset.common_landmarks(points1, points2, threshold=1.5)[source]¶

find common landmarks in two sets

Parameters

points1 (ndarray|list(list(float))) – first point set
points2 (ndarray|list(list(float))) – second point set
threshold (float) – threshold for assignment (for landmarks in pixels)

Return list(bool)

flags

>>> np.random.seed(0)
>>> common = np.random.random((5, 2))
>>> pts1 = np.vstack([common, np.random.random((10, 2))])
>>> pts2 = np.vstack([common, np.random.random((15, 2))])
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])
>>> np.random.shuffle(pts2)
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[ 0, 13],
       [ 1, 10],
       [ 2,  9],
       [ 3, 14],
       [ 4,  8]])

birl.utilities.dataset.compute_bounding_polygon(landmarks)[source]¶

get the polygon where all point lies inside

Parameters: landmarks (ndarray) – set of points
Return ndarray: pints of polygon

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (45, 2))
>>> compute_bounding_polygon(points)  
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]

birl.utilities.dataset.compute_convex_hull(landmarks)[source]¶

compute convex hull around landmarks

Parameters: landmarks (ndarray) – set of points
Return ndarray: pints of polygon

>>> np.random.seed(0)
>>> pts = np.random.randint(15, 30, (10, 2))
>>> compute_convex_hull(pts)
array([[27, 20],
       [27, 25],
       [22, 24],
       [16, 21],
       [15, 18],
       [26, 18]])

birl.utilities.dataset.compute_half_polygon(landmarks, idx_start=0, idx_end=- 1)[source]¶

compute half polygon path

Parameters

idx_start (int) – index of starting point
idx_end (int) – index of ending point
landmarks (ndarray) – set of points

Return ndarray

set of points

>>> pts = [(-1, 1), (0, 0), (0, 2), (1, 1), (1, -0.5), (2, 0)]
>>> compute_half_polygon(pts, idx_start=0, idx_end=-1)
[[-1.0, 1.0], [0.0, 2.0], [1.0, 1.0], [2.0, 0.0]]
>>> compute_half_polygon(pts[:2], idx_start=-1, idx_end=0)
[[-1, 1], [0, 0]]
>>> pts = [[0, 2], [1, 5], [2, 4], [2, 5], [4, 4], [4, 6], [4, 8], [5, 8], [5, 8]]
>>> compute_half_polygon(pts)
[[0, 2], [1, 5], [2, 5], [4, 6], [4, 8], [5, 8]]

birl.utilities.dataset.convert_landmarks_from_itk(lnds, image_size)[source]¶

converting ITK format to used in ImageJ

Parameters

lnds (ndarray) – landmarks
image_size ((int,int)) – image size - height, width

Return ndarray

landmarks

>>> convert_landmarks_to_itk([[5, 20], [100, 150], [0, 100]], (150, 200))
array([[ 20, 145],
       [150,  50],
       [100, 150]])

birl.utilities.dataset.detect_binary_blocks(vec_bin)[source]¶

detect the binary object by beginning, end and length in !d signal

Parameters: vec_bin (list(bool)) – binary vector with 1 for an object
Return tuple(list(int),list(int),list(int))

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> detect_binary_blocks(vec)
([0, 20], [15, 39], [14, 19])

birl.utilities.dataset.estimate_scaling(images, max_size=5000)[source]¶

find scaling for given set of images and maximal image size

Parameters

images (list(ndarray)) – input images
max_size (float) – max image size in any dimension

Return float

scaling in range (0, 1)

>>> estimate_scaling([np.zeros((12000, 300, 3))])  
0.4...
>>> estimate_scaling([np.zeros((1200, 800, 3))])
1.0

birl.utilities.dataset.find_largest_object(hist, threshold=0.01)[source]¶

find the largest objects and give its beginning end end

Parameters

hist (list(float)) – input vector
threshold (float) – threshold for input vector

Return list(int)

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_largest_object(vec)
(20, 39)

birl.utilities.dataset.find_split_objects(hist, nb_objects=2, threshold=0.01)[source]¶

find the N largest objects and set split as middle distance among them

Parameters

hist (list(float)) – input vector
nb_objects (int) – number of desired objects
threshold (float) – threshold for input vector

Return list(int)

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_split_objects(vec)
[17]

birl.utilities.dataset.generate_pairing(count, step_hide=None)[source]¶

generate registration pairs with an option of hidden landmarks

Parameters

count (int) – total number of samples
step_hide (int|None) – hide every N sample

Return list((int, int)), list(bool)

registration pairs

>>> generate_pairing(4, None)  
([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)],
 [True, True, True, True, True, True])
>>> generate_pairing(4, step_hide=3)  
([(0, 1), (0, 2), (1, 2), (3, 1), (3, 2)],
 [False, False, True, False, False])

birl.utilities.dataset.get_close_diag_corners(points)[source]¶

finds points closes to the top left and bottom right corner

Parameters: points (ndarray) – set of points
Return tuple(ndarray,ndarray): begin and end of imaginary diagonal

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (20, 2))
>>> get_close_diag_corners(points)
(array([1, 2]), array([7, 8]), (12, 10))

birl.utilities.dataset.histogram_match_cumulative_cdf(source, reference, norm_img_size=1024)[source]¶

Adjust the pixel values of a gray-scale image such that its histogram matches that of a target image

Parameters

source (ndarray) – 2D image to be transformed, np.array<height1, width1>
reference (ndarray) – reference 2D image, np.array<height2, width2>

Return ndarray

transformed image, np.array<height1, width1>

>>> np.random.seed(0)
>>> img = histogram_match_cumulative_cdf(np.random.randint(128, 145, (150, 200)),
...                                      np.random.randint(0, 18, (200, 180)))
>>> img.astype(int)  
array([[13, 16,  0, ..., 12,  2,  5],
       [17,  9,  1, ..., 16,  9,  0],
       [11, 12, 14, ...,  8,  5,  4],
       ...,
       [12,  6,  3, ..., 15,  0,  3],
       [11, 17,  2, ..., 12, 12,  5],
       [ 6, 12,  3, ...,  8,  0,  1]])
>>> np.bincount(img.ravel()).astype(int)  
array([1705, 1706, 1728, 1842, 1794, 1866, 1771,    0, 1717, 1752, 1757,
       1723, 1823, 1833, 1749, 1718, 1769, 1747])
>>> img_source = np.random.randint(50, 245, (2500, 3000)).astype(float)
>>> img_source[-1, -1] = 255
>>> img = histogram_match_cumulative_cdf(img_source / 255., img)
>>> np.array(img.shape, dtype=int)
array([2500, 3000])

birl.utilities.dataset.image_histogram_matching(source, reference, use_color='hsv', norm_img_size=4096)[source]¶

adjust image histogram between two images

Optionally transform the image to more continues color space. The source and target image does not need to be the same size, but RGB/gray.

See cor related information:

Parameters

source (ndarray) – 2D image to be transformed
reference (ndarray) – reference 2D image
use_color (str) – using color space for hist matching
norm_img_size (int) – subsample image to this max size

Return ndarray

transformed image

>>> from birl.utilities.data_io import update_path, load_image
>>> path_imgs = os.path.join(update_path('data-images'), 'rat-kidney_', 'scale-5pc')
>>> img1 = load_image(os.path.join(path_imgs, 'Rat-Kidney_HE.jpg'))
>>> img2 = load_image(os.path.join(path_imgs, 'Rat-Kidney_PanCytokeratin.jpg'))
>>> image_histogram_matching(img1, img2).shape == img1.shape
True
>>> img = image_histogram_matching(img1[..., 0], np.expand_dims(img2[..., 0], 2))
>>> img.shape == img1.shape[:2]
True
>>> # this should return unchanged source image
>>> image_histogram_matching(np.random.random((10, 20, 30, 5)),
...                          np.random.random((30, 10, 20, 5))).ndim
4

birl.utilities.dataset.inside_polygon(polygon, point)[source]¶

check if a point is strictly inside the polygon

Parameters

polygon (ndarray|list) – polygon contour
point (tuple|list) – sample point

Return bool

inside

>>> poly = [[1, 1], [1, 3], [3, 3], [3, 1]]
>>> inside_polygon(poly, [0, 0])
False
>>> inside_polygon(poly, [1, 1])
False
>>> inside_polygon(poly, [2, 2])
True

birl.utilities.dataset.is_point_above_line(point_begin, point_end, point_test)[source]¶

If point is left from line

Parameters

point_begin (list(float)) – starting line point
point_end (list(float)) – ending line point
point_test (list(float)) – testing point

Return bool

left from line

>>> is_point_above_line([1, 1], [2, 2], [3, 4])
True

birl.utilities.dataset.is_point_in_quadrant_left(point_begin, point_end, point_test)[source]¶

If point is left quadrant from line end point

Note

negative response does not mean that that the point is on tight side

Parameters

point_begin (list(float)) – starting line point
point_end (list(float)) – ending line point
point_test (list(float)) – testing point

Return int

gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 2])
1
>>> is_point_in_quadrant_left([3, 1], [1, 1], [2, 0])
1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 0])
-1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [4, 2])
0

birl.utilities.dataset.is_point_inside_perpendicular(point_begin, point_end, point_test)[source]¶

If point is left from line and perpendicularly in between line segment

Note

negative response does not mean that that the point is on tight side

Parameters

point_begin (list(float)) – starting line point
point_end (list(float)) – ending line point
point_test (list(float)) – testing point

Return int

gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 2])
1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 0])
-1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [4, 2])
0

birl.utilities.dataset.line_angle_2d(point_begin, point_end, deg=True)[source]¶

Compute direction of line with given two points

the zero is horizontal in direction [1, 0]

Parameters

point_begin (list(float)) – starting line point
point_end (list(float)) – ending line point
deg (bool) – return angle in degrees

Return float

orientation

>>> [line_angle_2d([0, 0], p) for p in ((1, 0), (0, 1), (-1, 0), (0, -1))]
[0.0, 90.0, 180.0, -90.0]
>>> line_angle_2d([1, 1], [2, 3])  
63.43...
>>> line_angle_2d([1, 2], [-2, -3])  
-120.96...

birl.utilities.dataset.list_sub_folders(path_folder, name='*')[source]¶

list all sub folders with particular name pattern

Parameters

path_folder (str) – path to a particular folder
name (str) – name pattern

Return list(str)

folders

>>> from birl.utilities.data_io import update_path
>>> paths = list_sub_folders(update_path('data-images'))
>>> list(map(os.path.basename, paths))  
['images', 'landmarks', 'lesions_', 'rat-kidney_'...]

birl.utilities.dataset.load_large_image(img_path)[source]¶

loading very large images

Note

For the loading we have to use matplotlib while ImageMagic nor other lib (opencv, skimage, Pillow) is able to load larger images then 64k or 32k.

Parameters: img_path (str) – path to the image
Return ndarray: image

birl.utilities.dataset.norm_angle(angle, deg=True)[source]¶

Normalise to be in range (-180, 180) degrees

Parameters

angle (float) – input angle
deg (bool) – use degrees

Return float

norma angle

birl.utilities.dataset.parse_path_scale(path_folder)[source]¶

from given path with annotation parse scale

Parameters: path_folder (str) – path to the scale folder
Return int: scale

>>> parse_path_scale('scale-.1pc')
nan
>>> parse_path_scale('user-JB_scale-50pc')
50
>>> parse_path_scale('scale-10pc')
10

birl.utilities.dataset.project_object_edge(img, dimension)[source]¶

scale the image, binarise with Othu and project to one dimension

Parameters

img (ndarray) –
dimension (int) – select dimension for projection

Return list(float)

>>> img = np.zeros((20, 10, 3))
>>> img[2:6, 1:7, :] = 1
>>> img[10:17, 4:6, :] = 1
>>> project_object_edge(img, 0).tolist()  
[0.0, 0.0, 0.7, 0.7, 0.7, 0.7, 0.0, 0.0, 0.0, 0.0,
 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0, 0.0, 0.0]

birl.utilities.dataset.save_large_image(img_path, img)[source]¶

saving large images more then 50k x 50k

Note

For the saving we have to use openCV while other lib (matplotlib, Pillow, ITK) is not able to save larger images then 32k.

Parameters

img_path (str) – path to the new image
img (ndarray) – image

>>> img = np.zeros((2500, 3200, 4), dtype=np.uint8)
>>> img[:, :, 0] = 255
>>> img[:, :, 1] = 127
>>> img_path = './sample-image.jpg'
>>> save_large_image(img_path, img)
>>> img2 = load_large_image(img_path)
>>> img2[0, 0].tolist()
[255, 127, 0]
>>> img.shape[:2] == img2.shape[:2]
True
>>> os.remove(img_path)
>>> img_path = './sample-image.png'
>>> save_large_image(img_path, img.astype(np.uint16) * 255)
>>> img3 = load_large_image(img_path)
>>> img.shape[:2] == img3.shape[:2]
True
>>> img3[0, 0].tolist()
[255, 127, 0]
>>> save_large_image(img_path, img2 / 255. * 1.15)  # test overwrite message
>>> os.remove(img_path)

birl.utilities.dataset.scale_large_images_landmarks(images, landmarks)[source]¶

scale images and landmarks up to maximal image size

Parameters

images (list(ndarray)) – list of images
landmarks (list(ndarray)) – list of landmarks

Return tuple(list(ndarray),list(ndarray))

lists of images and landmarks

>>> scale_large_images_landmarks([np.zeros((8000, 500, 3), dtype=np.uint8)],
...                              [None, None])  
([array(...)], [None, None])

birl.utilities.dataset.simplify_polygon(points, tol_degree=5)[source]¶

simplify path, drop point on the same line

Parameters

points (ndarray) – point in polygon
tol_degree (float) – tolerance on change in orientation

Return list(list(float))

pints of polygon

>>> pts = [[1, 2], [2, 4], [1, 5], [2, 8], [3, 8], [5, 8], [7, 8], [8, 7],
...     [8, 5], [8, 3], [8, 1], [7, 1], [6, 1], [4, 1], [3, 1], [3, 2], [2, 2]]
>>> simplify_polygon(pts)
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]

birl.utilities.dataset.CONVERT_RGB = {'hed': (skimage.color.rgb2hed, skimage.color.hed2rgb), 'hsv': (skimage.color.rgb2hsv, skimage.color.hsv2rgb), 'lab': (skimage.color.rgb2lab, skimage.color.lab2rgb), 'lch': (<function <lambda>>, <function <lambda>>), 'luv': (skimage.color.rgb2luv, skimage.color.luv2rgb), 'rgb': (<function <lambda>>, <function <lambda>>)}[source]¶: define pair of forward and backward color space conversion

birl.utilities.dataset.IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')[source]¶: supported image extensions

birl.utilities.dataset.MAX_IMAGE_SIZE = 5000[source]¶: maximal image size for visualisations, larger images will be downscaled

birl.utilities.dataset.REEXP_FOLDER_SCALE = '\\S*scale-(\\d+)pc'[source]¶: template for detecting/parsing scale from folder name

birl.utilities.dataset.TISSUE_CONTENT = 0.01[source]¶: threshold of tissue/background presence on potential cutting line