birl.utilities.dataset module¶

Some functionality related to dataset

birl.utilities.dataset.args_expand_images(parser, nb_workers=1, overwrite=True)[source]¶

expand the parser by standard parameters related to images:

image paths
allow overwrite (optional)
number of jobs

Parameters:	parser (obj) – existing parser nb_workers (int) – number threads by default overwrite (bool) – allow overwrite images
Return obj:

>>> import argparse
>>> args_expand_images(argparse.ArgumentParser())  # doctest: +ELLIPSIS
ArgumentParser(...)

birl.utilities.dataset.args_expand_parse_images(parser, nb_workers=1, overwrite=True)[source]¶

expand the parser by standard parameters related to images:

image paths
allow overwrite (optional)
number of jobs

Parameters:	parser (obj) – existing parser nb_workers (int) – number threads by default overwrite (bool) – allow overwrite images
Return dict:

birl.utilities.dataset.common_landmarks(points1, points2, threshold=1.5)[source]¶

find common landmarks in two sets

Return list(bool):
Parameters:	points1 (ndarray\|list(list(float))) – first point set points2 (ndarray\|list(list(float))) – second point set threshold (float) – threshold for assignment (for landmarks in pixels)
	flags

>>> np.random.seed(0)
>>> common = np.random.random((5, 2))
>>> pts1 = np.vstack([common, np.random.random((10, 2))])
>>> pts2 = np.vstack([common, np.random.random((15, 2))])
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[0, 0],
       [1, 1],
       [2, 2],
       [3, 3],
       [4, 4]])
>>> np.random.shuffle(pts2)
>>> common_landmarks(pts1, pts2, threshold=1e-3)
array([[ 0, 13],
       [ 1, 10],
       [ 2,  9],
       [ 3, 14],
       [ 4,  8]])

birl.utilities.dataset.compute_bounding_polygon(landmarks)[source]¶

get the polygon where all point lies inside

Parameters:	landmarks (ndarray) – set of points
Return ndarray:	pints of polygon

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (45, 2))
>>> compute_bounding_polygon(points)  # doctest: +NORMALIZE_WHITESPACE
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]

birl.utilities.dataset.compute_convex_hull(landmarks)[source]¶

compute convex hull around landmarks

Parameters:	landmarks (ndarray) – set of points
Return ndarray:	pints of polygon

>>> np.random.seed(0)
>>> pts = np.random.randint(15, 30, (10, 2))
>>> compute_convex_hull(pts)
array([[27, 20],
       [27, 25],
       [22, 24],
       [16, 21],
       [15, 18],
       [26, 18]])

birl.utilities.dataset.compute_half_polygon(landmarks, idx_start=0, idx_end=-1)[source]¶

compute half polygon path

Parameters:	idx_start (int) – index of starting point idx_end (int) – index of ending point landmarks (ndarray) – set of points
Return ndarray:	set of points

>>> pts = [(-1, 1), (0, 0), (0, 2), (1, 1), (1, -0.5), (2, 0)]
>>> compute_half_polygon(pts, idx_start=0, idx_end=-1)
[[-1.0, 1.0], [0.0, 2.0], [1.0, 1.0], [2.0, 0.0]]
>>> compute_half_polygon(pts[:2], idx_start=-1, idx_end=0)
[[-1, 1], [0, 0]]
>>> pts = [[0, 2], [1, 5], [2, 4], [2, 5], [4, 4], [4, 6], [4, 8], [5, 8], [5, 8]]
>>> compute_half_polygon(pts)
[[0, 2], [1, 5], [2, 5], [4, 6], [4, 8], [5, 8]]

birl.utilities.dataset.convert_landmarks_from_itk(lnds, image_size)[source]¶

converting ITK format to used in ImageJ

Parameters:	lnds (ndarray) – landmarks image_size ((int,int)) – image height, width
Return ndarray:	landmarks

>>> convert_landmarks_from_itk([[ 20, 145], [150,  50], [100, 150]], (150, 200))
array([[  5,  20],
       [100, 150],
       [  0, 100]])
>>> lnds = [[ 20, 145], [150,  50], [100, 150], [0, 0], [150, 200]]
>>> img_size = (150, 200)
>>> lnds2 = convert_landmarks_from_itk(convert_landmarks_to_itk(lnds, img_size), img_size)
>>> np.array_equal(lnds, lnds2)
True

birl.utilities.dataset.convert_landmarks_to_itk(lnds, image_size)[source]¶

converting used landmarks to ITK format

Parameters:	lnds (ndarray) – landmarks image_size ((int,int)) – image size - height, width
Return ndarray:	landmarks

>>> convert_landmarks_to_itk([[5, 20], [100, 150], [0, 100]], (150, 200))
array([[ 20, 145],
       [150,  50],
       [100, 150]])

birl.utilities.dataset.detect_binary_blocks(vec_bin)[source]¶

detect the binary object by beginning, end and length in !d signal

Return tuple(list(int),list(int),list(int)):
Parameters:	vec_bin (list(bool)) – binary vector with 1 for an object

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> detect_binary_blocks(vec)
([0, 20], [15, 39], [14, 19])

birl.utilities.dataset.estimate_scaling(images, max_size=5000)[source]¶

find scaling for given set of images and maximal image size

Parameters:	images (list(ndarray)) – input images max_size (float) – max image size in any dimension
Return float:	scaling in range (0, 1)

>>> estimate_scaling([np.zeros((12000, 300, 3))])  # doctest: +ELLIPSIS
0.4...
>>> estimate_scaling([np.zeros((1200, 800, 3))])
1.0

birl.utilities.dataset.find_largest_object(hist, threshold=0.01)[source]¶

find the largest objects and give its beginning end end

Return list(int):
Parameters:	hist (list(float)) – input vector threshold (float) – threshold for input vector

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_largest_object(vec)
(20, 39)

birl.utilities.dataset.find_split_objects(hist, nb_objects=2, threshold=0.01)[source]¶

find the N largest objects and set split as middle distance among them

Return list(int):
Parameters:	hist (list(float)) – input vector nb_objects (int) – number of desired objects threshold (float) – threshold for input vector

>>> vec = np.array([1] * 15 + [0] * 5 + [1] * 20)
>>> find_split_objects(vec)
[17]

birl.utilities.dataset.generate_pairing(count, step_hide=None)[source]¶

generate registration pairs with an option of hidden landmarks

Return list((int, int)), list(bool):
Parameters:	count (int) – total number of samples step_hide (int\|None) – hide every N sample
	registration pairs

>>> generate_pairing(4, None)  # doctest: +NORMALIZE_WHITESPACE
([(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)],
 [True, True, True, True, True, True])
>>> generate_pairing(4, step_hide=3)  # doctest: +NORMALIZE_WHITESPACE
([(0, 1), (0, 2), (1, 2), (3, 1), (3, 2)],
 [False, False, True, False, False])

birl.utilities.dataset.get_close_diag_corners(points)[source]¶

finds points closes to the top left and bottom right corner

Return tuple(ndarray,ndarray):
Parameters:	points (ndarray) – set of points
	begin and end of imaginary diagonal

>>> np.random.seed(0)
>>> points = np.random.randint(1, 9, (20, 2))
>>> get_close_diag_corners(points)
(array([1, 2]), array([7, 8]), (12, 10))

birl.utilities.dataset.histogram_match_cumulative_cdf(source, reference, norm_img_size=1024)[source]¶

Adjust the pixel values of a gray-scale image such that its histogram matches that of a target image

Parameters:	source (ndarray) – 2D image to be transformed, np.array<height1, width1> reference (ndarray) – reference 2D image, np.array<height2, width2>
Return ndarray:	transformed image, np.array<height1, width1>

>>> np.random.seed(0)
>>> img = histogram_match_cumulative_cdf(np.random.randint(128, 145, (150, 200)),
...                                      np.random.randint(0, 18, (200, 180)))
>>> img.astype(int)  # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
array([[13, 16,  0, ..., 12,  2,  5],
       [17,  9,  1, ..., 16,  9,  0],
       [11, 12, 14, ...,  8,  5,  4],
       ...,
       [12,  6,  3, ..., 15,  0,  3],
       [11, 17,  2, ..., 12, 12,  5],
       [ 6, 12,  3, ...,  8,  0,  1]])
>>> np.bincount(img.ravel()).astype(int)  # doctest: +NORMALIZE_WHITESPACE
array([1705, 1706, 1728, 1842, 1794, 1866, 1771,    0, 1717, 1752, 1757,
       1723, 1823, 1833, 1749, 1718, 1769, 1747])
>>> img_source = np.random.randint(50, 245, (2500, 3000)).astype(float)
>>> img_source[-1, -1] = 255
>>> img = histogram_match_cumulative_cdf(img_source / 255., img)
>>> np.array(img.shape, dtype=int)
array([2500, 3000])

birl.utilities.dataset.image_histogram_matching(source, reference, use_color='hsv', norm_img_size=4096)[source]¶

adjust image histogram between two images

Optionally transform the image to more continues color space. The source and target image does not need to be the same size, but RGB/gray.

See cor related information:

Parameters:	source (ndarray) – 2D image to be transformed reference (ndarray) – reference 2D image use_color (str) – using color space for hist matching norm_img_size (int) – subsample image to this max size
Return ndarray:	transformed image

>>> from birl.utilities.data_io import update_path, load_image
>>> path_imgs = os.path.join(update_path('data_images'), 'rat-kidney_', 'scale-5pc')
>>> img1 = load_image(os.path.join(path_imgs, 'Rat-Kidney_HE.jpg'))
>>> img2 = load_image(os.path.join(path_imgs, 'Rat-Kidney_PanCytokeratin.jpg'))
>>> image_histogram_matching(img1, img2).shape == img1.shape
True
>>> img = image_histogram_matching(img1[..., 0], np.expand_dims(img2[..., 0], 2))
>>> img.shape == img1.shape[:2]
True
>>> # this should return unchanged source image
>>> image_histogram_matching(np.random.random((10, 20, 30, 5)),
...                          np.random.random((30, 10, 20, 5))).ndim
4

birl.utilities.dataset.inside_polygon(polygon, point)[source]¶

check if a point is strictly inside the polygon

Parameters:	polygon (ndarray\|list) – polygon contour point (tuple\|list) – sample point
Return bool:	inside

>>> poly = [[1, 1], [1, 3], [3, 3], [3, 1]]
>>> inside_polygon(poly, [0, 0])
False
>>> inside_polygon(poly, [1, 1])
False
>>> inside_polygon(poly, [2, 2])
True

birl.utilities.dataset.is_point_above_line(point_begin, point_end, point_test)[source]¶

If point is left from line

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point point_test (list(float)) – testing point
Return bool:	left from line

>>> is_point_above_line([1, 1], [2, 2], [3, 4])
True

birl.utilities.dataset.is_point_in_quadrant_left(point_begin, point_end, point_test)[source]¶

If point is left quadrant from line end point

Note

negative response does not mean that that the point is on tight side

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point point_test (list(float)) – testing point
Return int:	gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 2])
1
>>> is_point_in_quadrant_left([3, 1], [1, 1], [2, 0])
1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [2, 0])
-1
>>> is_point_in_quadrant_left([1, 1], [3, 1], [4, 2])
0

birl.utilities.dataset.is_point_inside_perpendicular(point_begin, point_end, point_test)[source]¶

If point is left from line and perpendicularly in between line segment

Note

negative response does not mean that that the point is on tight side

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point point_test (list(float)) – testing point
Return int:	gives +1 if it is above, -1 if bellow and 0 elsewhere

>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 2])
1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [2, 0])
-1
>>> is_point_inside_perpendicular([1, 1], [3, 1], [4, 2])
0

birl.utilities.dataset.line_angle_2d(point_begin, point_end, deg=True)[source]¶

Compute direction of line with given two points

the zero is horizontal in direction [1, 0]

Parameters:	point_begin (list(float)) – starting line point point_end (list(float)) – ending line point deg (bool) – return angle in degrees
Return float:	orientation

>>> [line_angle_2d([0, 0], p) for p in ((1, 0), (0, 1), (-1, 0), (0, -1))]
[0.0, 90.0, 180.0, -90.0]
>>> line_angle_2d([1, 1], [2, 3])  # doctest: +ELLIPSIS
63.43...
>>> line_angle_2d([1, 2], [-2, -3])  # doctest: +ELLIPSIS
-120.96...

birl.utilities.dataset.list_sub_folders(path_folder, name='*')[source]¶

list all sub folders with particular name pattern

Return list(str):
Parameters:	path_folder (str) – path to a particular folder name (str) – name pattern
	folders

>>> from birl.utilities.data_io import update_path
>>> paths = list_sub_folders(update_path('data_images'))
>>> list(map(os.path.basename, paths))  # doctest: +ELLIPSIS
['images', 'landmarks', 'lesions_', 'rat-kidney_'...]

birl.utilities.dataset.load_large_image(img_path)[source]¶

loading very large images

Note

For the loading we have to use matplotlib while ImageMagic nor other lib (opencv, skimage, Pillow) is able to load larger images then 64k or 32k.

Parameters:	img_path (str) – path to the image
Return ndarray:	image

birl.utilities.dataset.norm_angle(angle, deg=True)[source]¶

Normalise to be in range (-180, 180) degrees

Parameters:	angle (float) – input angle deg (bool) – use degrees
Return float:	norma angle

birl.utilities.dataset.parse_path_scale(path_folder)[source]¶

from given path with annotation parse scale

Parameters:	path_folder (str) – path to the scale folder
Return int:	scale

>>> parse_path_scale('scale-.1pc')
nan
>>> parse_path_scale('user-JB_scale-50pc')
50
>>> parse_path_scale('scale-10pc')
10

birl.utilities.dataset.project_object_edge(img, dimension)[source]¶

scale the image, binarise with Othu and project to one dimension

Return list(float):
Parameters:	img (ndarray) – dimension (int) – select dimension for projection

>>> img = np.zeros((20, 10, 3))
>>> img[2:6, 1:7, :] = 1
>>> img[10:17, 4:6, :] = 1
>>> project_object_edge(img, 0).tolist()  # doctest: +NORMALIZE_WHITESPACE
[0.0, 0.0, 0.7, 0.7, 0.7, 0.7, 0.0, 0.0, 0.0, 0.0,
 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.0, 0.0, 0.0]

birl.utilities.dataset.save_large_image(img_path, img)[source]¶

saving large images more then 50k x 50k

Note

For the saving we have to use openCV while other

lib (matplotlib, Pillow, ITK) is not able to save larger images then 32k.

Parameters:	img_path (str) – path to the new image img (ndarray) – image

>>> img = np.zeros((2500, 3200, 4), dtype=np.uint8)
>>> img[:, :, 0] = 255
>>> img[:, :, 1] = 127
>>> img_path = './sample-image.jpg'
>>> save_large_image(img_path, img)
>>> img2 = load_large_image(img_path)
>>> img2[0, 0].tolist()
[255, 127, 0]
>>> img.shape[:2] == img2.shape[:2]
True
>>> os.remove(img_path)
>>> img_path = './sample-image.png'
>>> save_large_image(img_path, img.astype(np.uint16) * 255)
>>> img3 = load_large_image(img_path)
>>> img.shape[:2] == img3.shape[:2]
True
>>> img3[0, 0].tolist()
[255, 127, 0]
>>> save_large_image(img_path, img2 / 255. * 1.15)  # test overwrite message
>>> os.remove(img_path)

birl.utilities.dataset.scale_large_images_landmarks(images, landmarks)[source]¶

scale images and landmarks up to maximal image size

Return tuple(list(ndarray),list(ndarray)):
Parameters:	images (list(ndarray)) – list of images landmarks (list(ndarray)) – list of landmarks
	lists of images and landmarks

>>> scale_large_images_landmarks([np.zeros((8000, 500, 3), dtype=np.uint8)],
...                              [None, None])  # doctest: +ELLIPSIS
([array(...)], [None, None])

birl.utilities.dataset.simplify_polygon(points, tol_degree=5)[source]¶

simplify path, drop point on the same line

Return list(list(float)):
Parameters:	points (ndarray) – point in polygon tol_degree (float) – tolerance on change in orientation
	pints of polygon

>>> pts = [[1, 2], [2, 4], [1, 5], [2, 8], [3, 8], [5, 8], [7, 8], [8, 7],
...     [8, 5], [8, 3], [8, 1], [7, 1], [6, 1], [4, 1], [3, 1], [3, 2], [2, 2]]
>>> simplify_polygon(pts)
[[1, 2], [2, 4], [1, 5], [2, 8], [7, 8], [8, 7], [8, 1], [3, 1], [3, 2]]

birl.utilities.dataset.CONVERT_RGB = {'hed': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'hsv': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'lab': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'lch': (<function <lambda>>, <function <lambda>>), 'luv': (<sphinx.ext.autodoc.importer._MockObject object>, <sphinx.ext.autodoc.importer._MockObject object>), 'rgb': (<function <lambda>>, <function <lambda>>)}[source]¶: define pair of forward and backward color space conversion

birl.utilities.dataset.IMAGE_EXTENSIONS = ('.png', '.jpg', '.jpeg')[source]¶: supported image extensions

birl.utilities.dataset.MAX_IMAGE_SIZE = 5000[source]¶: maximal image size for visualisations, larger images will be downscaled

birl.utilities.dataset.REEXP_FOLDER_SCALE = '\\S*scale-(\\d+)pc'[source]¶: template for detecting/parsing scale from folder name

birl.utilities.dataset.TISSUE_CONTENT = 0.01[source]¶: threshold of tissue/background presence on potential cutting line

birl.utilities.dataset module¶

Previous topic

Next topic

This Page