birl.utilities.dataset module¶

Some functionality related to dataset

birl.utilities.dataset.args_expand_images(parser, nb_workers=1, overwrite=True)[source]¶

expand the parser by standard parameters related to images:

image paths
allow overwrite (optional)
number of jobs

Parameters:	parser (obj) – existing parser nb_workers (int) – number threads by default overwrite (bool) – allow overwrite images
Return obj:

>>> import argparse
>>> args_expand_images(argparse.ArgumentParser())  # doctest: +ELLIPSIS
ArgumentParser(...)

birl.utilities.dataset.args_expand_parse_images(parser, nb_workers=1, overwrite=True)[source]¶

expand the parser by standard parameters related to images:

image paths
allow overwrite (optional)
number of jobs

Parameters:	parser (obj) – existing parser nb_workers (int) – number threads by default overwrite (bool) – allow overwrite images
Return dict:

birl.utilities.dataset.common_landmarks(points1, points2, threshold=1.5)[source]¶

find common landmarks in two sets

Return list(bool):
Parameters:	points1 (ndarray\|list(list(float))) – first point set points2 (ndarray\|list(list(float))) – second point set threshold (float) – threshold for assignment (for landmarks in pixels)
	flags

>>> np.random.seed(0)
>>> common = np.random.random((5, 2))
>>> pts1 = np.vstack([common, np.random.random((10, 2))])
>>> pts2 = np.vstack([common, np.random.random((15, 2))])
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])
>>> np.random.shuffle(pts2)
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[ 0, 13],
       [ 1, 10],
       [ 2,  9],
       [ 3, 14],
       [ 4,  8]])

birl.utilities.dataset.compute_bounding_polygon(landmarks)[source]¶

get the polygon where all point lies inside

Parameters:	landmarks (ndarray) – set of points
Return ndarray:	pints of polygon

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (45, 2))
>>> compute_bounding_polygon(points)  # doctest: +NORMALIZE_WHITESPACE
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]

birl.utilities.dataset.compute_convex_hull(landmarks)[source]¶

compute convex hull around landmarks

Parameters:	landmarks (ndarray) – set of points
Return ndarray:	pints of polygon

>>> np.random.seed(0)
>>> pts = np.random.randint(15, 30, (10, 2))
>>> compute_convex_hull(pts)
array([[27, 20],
       [27, 25],
       [22, 24],
       [16, 21],
       [15, 18],
       [26, 18]])

birl.utilities.dataset.compute_half_polygon(landmarks, idx_start=0, idx_end=-1)[source]¶

compute half polygon path

Parameters:	idx_start (int) – index of starting point idx_end (int) – index of ending point landmarks (ndarray) – set of points
Return ndarray:	set of points

>>> pts = [(-1, 1), (0, 0), (0, 2), (1, 1), (1, -0.5), (2, 0)]
>>> compute_half_polygon(pts, idx_start=0, idx_end=-1)
[[-1.0, 1.0], [0.0, 2.0], [1.0, 1.0], [2.0, 0.0]]
>>> compute_half_polygon(pts[:2], idx_start=-1, idx_end=0)
[[-1, 1], [0, 0]]
>>> pts = [[0, 2], [1, 5], [2, 4], [2, 5], [4, 4], [4, 6], [4, 8], [5, 8], [5, 8]]
>>> compute_half_polygon(pts)
[[0, 2], [1, 5], [2, 5], [4, 6], [4, 8], [5, 8]]

birl.utilities.dataset.convert_landmarks_from_itk(lnds, image_size)[source]¶

converting ITK format to used in ImageJ

Parameters:	lnds (ndarray) – landmarks image_size ((int,int)) – image height, width
Return ndarray:	landmarks

>>> convert_landmarks_from_itk([[ 20, 145], [150,  50], [100, 150]], (150, 200))
array([[  5,  20],
       [100, 150],
       [  0, 100]])
>>> lnds = [[ 20, 145], [150,  50], [100, 150], [0, 0], [150, 200]]
>>> img_size = (150, 200)
>>> lnds2 = convert_landmarks_from_itk(convert_landmarks_to_itk(lnds, img_size), img_size)
>>> np.array_equal(lnds, lnds2)
True

birl.utilities.dataset.convert_landmarks_to_itk(lnds, image_size)[source]¶

converting used landmarks to ITK format

Parameters:	lnds (ndarray) – landmarks image_size ((int,int)) – image size - height, width
Return ndarray:	landmarks

>>> convert_landmarks_to_itk([[5, 20], [100, 150], [0, 100]], (150, 200))
array([[ 20, 145],
       [150,  50],
       [100, 150]])

birl.utilities.dataset.detect_binary_blocks(vec_bin)[source]¶

detect the binary object by beginning, end and length in !d signal

Return tuple(list(int),list(int),list(int)):
Parameters:	vec_bin (list(bool)) – binary vector with 1 for an object

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> detect_binary_blocks(vec)
([0, 20], [15, 39], [14, 19])

birl.utilities.dataset.estimate_scaling(images, max_size=5000)[source]¶

find scaling for given set of images and maximal image size

Parameters:	images (list(ndarray)) – input images max_size (float) – max image size in any dimension
Return float:	scaling in range (0, 1)

>>> estimate_scaling([np.zeros((12000, 300, 3))])  # doctest: +ELLIPSIS
0.4...
>>> estimate_scaling([np.zeros((1200, 800, 3))])
1.0

birl.utilities.dataset.find_largest_object(hist, threshold=0.01)[source]¶

find the largest objects and give its beginning end end

Return list(int):
Parameters:	hist (list(float)) – input vector threshold (float) – threshold for input vector

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_largest_object(vec)
(20, 39)

birl.utilities.dataset.find_split_objects(hist, nb_objects=2, threshold=0.01)[source]¶

find the N largest objects and set split as middle distance among them

Return list(int):
Parameters:	hist (list(float)) – input vector nb_objects (int) – number of desired objects threshold (float) – threshold for input vector

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_split_objects(vec)
[17]

birl.utilities.dataset.generate_pairing(count, step_hide=None)[source]¶

generate registration pairs with an option of hidden landmarks

Return list((int, int)), list(bool):
Parameters:	count (int) – total number of samples step_hide (int\|None) – hide every N sample
	registration pairs

>>> generate_pairing(4, None)  # doctest: +NORMALIZE_WHITESPACE
([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)],
 [True, True, True, True, True, True])
>>> generate_pairing(4, step_hide=3)  # doctest: +NORMALIZE_WHITESPACE
([(0, 1), (0, 2), (1, 2), (3, 1), (3, 2)],
 [False, False, True, False, False])

birl.utilities.dataset.get_close_diag_corners(points)[source]¶

finds points closes to the top left and bottom right corner

Return tuple(ndarray,ndarray):
Parameters:	points (ndarray) – set of points
	begin and end of imaginary diagonal

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (20, 2))
>>> get_close_diag_corners(points)
(array([1, 2]), array([7, 8]), (12, 10))

birl.utilities.dataset.histogram_match_cumulative_cdf(source, reference, norm_img_size=1024)[source]¶

Adjust the pixel values of a gray-scale image such that its histogram matches that of a target image

Parameters:	source (ndarray) – 2D image to be transformed, np.array<height1, width1> reference (ndarray) – reference 2D image, np.array<height2, width2>
Return ndarray:	transformed image, np.array<height1, width1>

>>> np.random.seed(0)
>>> img = histogram_match_cumulative_cdf(np.random.randint(128, 145, (150, 200)),
...                                      np.random.randint(0, 18, (200, 180)))
>>> img.astype(int)  # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
array([[13, 16,  0, ..., 12,  2,  5],
       [17,  9,  1, ..., 16,  9,  0],
       [11, 12, 14, ...,  8,  5,  4],
       ...,
       [12,  6,  3, ..., 15,  0,  3],
       [11, 17,  2, ..., 12, 12,  5],
       [ 6, 12,  3, ...,  8,  0,  1]])
>>> np.bincount(img.ravel()).astype(int)  # doctest: +NORMALIZE_WHITESPACE
array([1705, 1706, 1728, 1842, 1794, 1866, 1771,    0, 1717, 1752, 1757,
       1723, 1823, 1833, 1749, 1718, 1769, 1747])
>>> img_source = np.random.randint(50, 245, (2500, 3000)).astype(float)
>>> img_source[-1, -1] = 255
>>> img = histogram_match_cumulative_cdf(img_source / 255., img)
>>> np.array(img.shape, dtype=int)
array([2500, 3000])

birl.utilities.dataset.image_histogram_matching(source, reference, use_color='hsv', norm_img_size=4096)[source]¶

adjust image histogram between two images

Optionally transform the image to more continues color space. The source and target image does not need to be the same size, but RGB/gray.

See cor related information:

Parameters:	source (ndarray) – 2D image to be transformed reference (ndarray) – reference 2D image use_color (str) – using color space for hist matching norm_img_size (int) – subsample image to this max size
Return ndarray:	transformed image

>>> from birl.utilities.data_io import update_path, load_image
>>> path_imgs = os.path.join(update_path('data_images'), 'rat-kidney_', 'scale-5pc')
>>> img1 = load_image(os.path.join(path_imgs, 'Rat-Kidney_HE.jpg'))
>>> img2 = load_image(os.path.join(path_imgs, 'Rat-Kidney_PanCytokeratin.jpg'))
>>> image_histogram_matching(img1, img2).shape == img1.shape
True
>>> img = image_histogram_matching(img1[..., 0], np.expand_dims(img2[..., 0], 2))
>>> img.shape == img1.shape[:2]
True
>>> # this should return unchanged source image
>>> image_histogram_matching(np.random.random((10, 20, 30, 5)),
...                          np.random.random((30, 10, 20, 5))).ndim
4

birl.utilities.dataset.inside_polygon(polygon, point)[source]¶

check if a point is strictly inside the polygon

Parameters:	polygon (ndarray\|list) – polygon contour point (tuple\|list) – sample point
Return bool:	inside

>>> poly = [[1, 1], [1, 3], [3, 3], [3, 1]]
>>> inside_polygon(poly, [0, 0])
False
>>> inside_polygon(poly, [1, 1])
False
>>> inside_polygon(poly, [2, 2])
True

birl.utilities.dataset.is_point_above_line(point_begin, point_end, point_test)[source]¶

If point is left from line

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point point_test (list(float)) – testing point
Return bool:	left from line

>>> is_point_above_line([1, 1], [2, 2], [3, 4])
True

birl.utilities.dataset.is_point_in_quadrant_left(point_begin, point_end, point_test)[source]¶

If point is left quadrant from line end point

Note that negative response does not mean that that the point is on tight side

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point point_test (list(float)) – testing point
Return int:	gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 2])
1
>>> is_point_in_quadrant_left([3, 1], [1, 1], [2, 0])
1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 0])
-1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [4, 2])
0

birl.utilities.dataset.is_point_inside_perpendicular(point_begin, point_end, point_test)[source]¶

If point is left from line and perpendicularly in between line segment

Note that negative response does not mean that that the point is on tight side

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point point_test (list(float)) – testing point
Return int:	gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 2])
1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 0])
-1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [4, 2])
0

birl.utilities.dataset.line_angle_2d(point_begin, point_end, deg=True)[source]¶

Compute direction of line with given two points

the zero is horizontal in direction [1, 0]

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point deg (bool) – return angle in degrees
Return float:	orientation

>>> [line_angle_2d([0, 0], p) for p in ((1, 0), (0, 1), (-1, 0), (0, -1))]
[0.0, 90.0, 180.0, -90.0]
>>> line_angle_2d([1, 1], [2, 3])  # doctest: +ELLIPSIS
63.43...
>>> line_angle_2d([1, 2], [-2, -3])  # doctest: +ELLIPSIS
-120.96...

birl.utilities.dataset.list_sub_folders(path_folder, name='*')[source]¶

list all sub folders with particular name pattern

Return list(str):
Parameters:	path_folder (str) – path to a particular folder name (str) – name pattern
	folders

>>> from birl.utilities.data_io import update_path
>>> paths = list_sub_folders(update_path('data_images'))
>>> list(map(os.path.basename, paths))  # doctest: +ELLIPSIS
['images', 'landmarks', 'lesions_', 'rat-kidney_'...]

birl.utilities.dataset.load_large_image(img_path)[source]¶

loading very large images

Note, for the loading we have to use matplotlib while ImageMagic nor other: lib (opencv, skimage, Pillow) is able to load larger images then 64k or 32k.

Parameters:	img_path (str) – path to the image
Return ndarray:	image

birl.utilities.dataset.norm_angle(angle, deg=True)[source]¶

Normalise to be in range (-180, 180) degrees

Parameters:	angle (float) – input angle deg (bool) – use degrees
Return float:	norma angle

birl.utilities.dataset.parse_path_scale(path_folder)[source]¶

from given path with annotation parse scale

Parameters:	path_folder (str) – path to the scale folder
Return int:	scale

>>> parse_path_scale('scale-.1pc')
nan
>>> parse_path_scale('user-JB_scale-50pc')
50
>>> parse_path_scale('scale-10pc')
10

birl.utilities.dataset.project_object_edge(img, dimension)[source]¶

scale the image, binarise with Othu and project to one dimension

Return list(float):
Parameters:	img (ndarray) – dimension (int) – select dimension for projection

>>> img = np.zeros((20, 10, 3))
>>> img[2:6, 1:7, :] = 1
>>> img[10:17, 4:6, :] = 1
>>> project_object_edge(img, 0).tolist()  # doctest: +NORMALIZE_WHITESPACE
[0.0, 0.0, 0.7, 0.7, 0.7, 0.7, 0.0, 0.0, 0.0, 0.0,
 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0, 0.0, 0.0]

birl.utilities.dataset.save_large_image(img_path, img)[source]¶

saving large images more then 50k x 50k

Note, for the saving we have to use openCV while other lib (matplotlib, Pillow, ITK) is not able to save larger images then 32k.

Parameters:	img_path (str) – path to the new image img (ndarray) – image

>>> img = np.zeros((2500, 3200, 4), dtype=np.uint8)
>>> img[:, :, 0] = 255
>>> img[:, :, 1] = 127
>>> img_path = './sample-image.jpg'
>>> save_large_image(img_path, img)
>>> img2 = load_large_image(img_path)
>>> img2[0, 0].tolist()
[255, 127, 0]
>>> img.shape[:2] == img2.shape[:2]
True
>>> os.remove(img_path)
>>> img_path = './sample-image.png'
>>> save_large_image(img_path, img.astype(np.uint16) * 255)
>>> img3 = load_large_image(img_path)
>>> img.shape[:2] == img3.shape[:2]
True
>>> img3[0, 0].tolist()
[255, 127, 0]
>>> save_large_image(img_path, img2 / 255. * 1.15)  # test overwrite message
>>> os.remove(img_path)

birl.utilities.dataset.scale_large_images_landmarks(images, landmarks)[source]¶

scale images and landmarks up to maximal image size

Return tuple(list(ndarray),list(ndarray)):
Parameters:	images (list(ndarray)) – list of images landmarks (list(ndarray)) – list of landmarks
	lists of images and landmarks

>>> scale_large_images_landmarks([np.zeros((8000, 500, 3), dtype=np.uint8)],
...                              [None, None])  # doctest: +ELLIPSIS
([array(...)], [None, None])

birl.utilities.dataset.simplify_polygon(points, tol_degree=5)[source]¶

simplify path, drop point on the same line

Return list(list(float)):
Parameters:	points (ndarray) – point in polygon tol_degree (float) – tolerance on change in orientation
	pints of polygon

>>> pts = [[1, 2], [2, 4], [1, 5], [2, 8], [3, 8], [5, 8], [7, 8], [8, 7],
...     [8, 5], [8, 3], [8, 1], [7, 1], [6, 1], [4, 1], [3, 1], [3, 2], [2, 2]]
>>> simplify_polygon(pts)
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]

birl.utilities.dataset.CONVERT_RGB = {'hed': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'hsv': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'lab': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'lch': (<function <lambda>>, <function <lambda>>), 'luv': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'rgb': (<function <lambda>>, <function <lambda>>)}[source]¶: define pair of forward and backward color space conversion

birl.utilities.dataset.IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')[source]¶: supported image extensions

birl.utilities.dataset.MAX_IMAGE_SIZE = 5000[source]¶: maximal image size for visualisations, larger images will be downscaled

birl.utilities.dataset.REEXP_FOLDER_SCALE = '\\S*scale-(\\d+)pc'[source]¶: template for detecting/parsing scale from folder name

birl.utilities.dataset.TISSUE_CONTENT = 0.01[source]¶: threshold of tissue/background presence on potential cutting line

birl.utilities.dataset module¶

Previous topic

Next topic

This Page