utilities

Overview

A collection of various useful routines.

Functions

pymisclib.utilities.dir_path(path: str) Path

Convert the string to a path and return it if it is a directory.

This function is intended to be used as the type argument to Argparser.add_argument().

Parameters:

path (str) – String containing path to check.

Returns:

Path to a directory.

Return type:

Path

Raises:

argparse.ArgumentTypeError – The string was not a valid path to a directory.

pymisclib.utilities.exit_hard(returncode: int = 0)

Terminate the running application.

Parameters:

returncode (int) – Exit code to return to spawning shell.

pymisclib.utilities.extract_from_zip_preserving_mtime(zip_file: ZipFile, zip_info: ZipInfo, out_path: dir_path)

Extract file from a ZIP archive preserving the modification time.

Parameters:
  • zip_file (ZipFile) – The archive to extract from.

  • zip_info (ZipInfo) – The information about the file to extract.

  • out_path (dir_path) – Destination directory for the extracted file (must exist).

See:

https://stackoverflow.com/q/9813243

pymisclib.utilities.file_path(path: str) Path

Convert the string to a path and return it if it is a file.

Parameters:

path (str) – String containing path to check.

Returns:

Path to a file.

Return type:

Path

Raises:

argparse.ArgumentTypeError – The string did not contain a valid path to a file.

pymisclib.utilities.get_language() str

Determine the language the current user has set their OS to.

pymisclib.utilities.hexdump(b: bytes, bytes_per_line: int = 16, start_offset: int = 0, offset_digits: int = 8, show_ascii: bool = True) str

Generator function to create a pretty representation of the given bytes and return a line per call.

` first_prefix   00000000  64 65 66 67 68 69 6A 6B  @ABCDEFG always_prefix  00000000  64 65 66 67 68 69 6A 6B  @ABCDEFG `

Parameters:
  • b (bytes) – The bytes to print.

  • bytes_per_line (int) – Number of bytes per line.

  • start_offset (int) – Starting offset for the first byte.

  • offset_digits (int) – Number of digits in the offset.

  • show_ascii (bool) – Print ASCII characters after the hex dump if True.

pymisclib.utilities.initialize_console()

Initialize the console to work with UTF-8 encoded strings.

On windows, the console is strange and if output is redirected to a file, it gets even stranger. This confuses Python and even though PEP 528 solves the problem for interactive consoles, this does not help for non-interactive (which redirected streams are).

The solution for now is to reconfigure the codecs for stdout and stderr to always use UTF-8 and replace unmappable characters.

pymisclib.utilities.logging_add_trace()

Add a loglevel TRACE.

This function should be called only once.

pymisclib.utilities.logging_initialize(loglevel: int = 30, log_dir_path: Path = PosixPath('.'), log_file_name_format: str = '%P.log', log_rotation: int = 0, log_compression: int = 5, loglevel_console: int = 30, loglevel_file: int = 10) Logger

Initialize the logging interface with the command line options passed through the object ‘args’. An instance of the root logger is returned to the caller.

If a submodule uses logging.getLogger(‘somename’), the logger will be a child of the root logger and inherit the settings made here.

Parameters:
  • loglevel (int) – Loglevel of the logger. Log messages not meeting the level requirement are not processed at all.

  • log_dir_path (Path) – Path of the directory containing log files. None will prevent log file creation.

  • log_file_name_format (str) – Format string for the log file name. None will prevent log file creation.

  • log_rotation (int) – How many logfiles of the same name to keep. If set to 0, any existing log file is overwritten. If set to >0, that many old log file copies (named <name>.1, <name>.2, etc.) are retained.

  • log_compression (int) – Zlib compression level to use in compressing log. Level 0 is no compression, level 9 is the highest possible.

  • loglevel_console (int) – Loglevel filter of the console logger.

  • loglevel_file (int) – Loglevel filter of the file logger.

Returns:

The root logger instance for the application.

Return type:

logging.logger

Note:

If the log_file_name_format contains a timestamp, log_rotation will only work on other log file copies with the exact same timestamp.

pymisclib.utilities.initialize_logging(args: Namespace)

Initialize the logging interface with the command line options passed through the object ‘args’. An instance of the root logger is returned to the caller.

If a submodule uses logging.getLogger(‘somename’), the logger will be a child of the root logger and inherit the settings made here.

Parameters:

args (argparse.Namespace) – Namespace containing attributes used as parameters.

Returns:

The root logger instance for the application.

Return type:

logging.logger

Namespace attributes used:
  • args.debug bool - Enable or disable debug mode.

  • args.logging str - Loglevel (e.g. CRITICAL, ERROR, etc.)

  • args.verbose bool - Enable or disable verbose mode.

Deprecated since version 1.2.0: Use logging_initialize() and logging_add_trace() (if needed) instead.

pymisclib.utilities.is_power_of_two(n: int) bool

Return True if the given number n is a power of two.

Parameters:

n (int) – number to check

Returns:

True if n is a power of two, False otherwise.

Return type:

bool

pymisclib.utilities.iso8601_str(d: datetime, seconds_digits: int = 0, compact: bool = False, utc: bool = False) str

Convert a datetime to an ISO8601 string.

An ISO8601 compliant string is not fully defined, implementations have some leeway in the formatting. This function will produce various strings, depending on the input parameters.

The timezone information (”±HHMM”) is only present for aware datetime objects (e.g. datetime with tzinfo).

The number of fractional seconds digits is given by seconds_digits. If zero digits are requested, the decimal point is left out (e.g. “2023-11-21T18:37:31”). If more than 6 digits are requested, only 6 digits will be output because datetime objects are limited to microsecond resolution.

Compact notation does not separate the different fields of the string (e.g. “YYYY-MM-DDTHH:MM:SS” becomes “YYYYMMDDTHHMMSS”).

UTC notation will convert the datetime into UTC. Note that if the datetime is not aware, Python will assume it is in localtime. Instead of the time zone offset, a “Z” will be appended.

Parameters:
  • d (datetime) – Date to convert to a string.

  • seconds_digits (int) – Digit of precision of the seconds.

  • compact (bool) – Use separators between values if False, compact representation if True.

  • utc (bool) – Output UTC time with ‘Z’ suffix instead of timezone offset.

Returns:

String representation of the datetime.

Example

Given the datetime 2023-11-21T18:37:31.123456+0100 and that we are running in GMT-02, the following will be output:

aware

seconds_digits

compact

utc

Result

Yes

0

False

False

2023-11-21T18:37:31+0100

Yes

1

False

False

2023-11-21T18:37:31.1+0100

Yes

6

False

False

2023-11-21T18:37:31.123456+0100

Yes

7

False

False

2023-11-21T18:37:31.123456+0100

No

6

False

False

2023-11-21T18:37:31.123456

Yes

6

True

False

20231121T183731.123456+0100

Yes

6

False

True

2023-11-21T17:37:31.123456Z

No

6

False

True

2023-11-21T15:37:31.123456Z

Note: non-aware and utc=True result depends on current

New in version 1.4.1.

pymisclib.utilities.log_hexdump(fn_logger: Logger, b: bytes, bytes_per_line: int = 16, level: int = 10, start_offset: int = 0, first_prefix: str = '', always_prefix: str = '', show_ascii: bool = True)

Log a pretty representation of the given bytes to the logger.

` first_prefix   00000000  64 65 66 67 68 69 6A 6B  @ABCDEFG always_prefix  00000000  64 65 66 67 68 69 6A 6B  @ABCDEFG `

Parameters:
  • fn_logger (logging.Logger) – The logger to log to.

  • b (bytes) – The bytes to print.

  • bytes_per_line (int) – The number of bytes per line.

  • level (int) – Level for logging (e.g. CRITICAL, ERROR, .. DEBUG)

  • start_offset (int) – The starting offset for the first byte.

  • first_prefix (str) – A string before the offset on the first line.

  • always_prefix (str) – A string that will be printed before every line (except the first if first_prefix was specified).

  • show_ascii (bool) – Print ASCII characters after the hex dump if True.

pymisclib.utilities.log_pp(fn_logger: Logger, obj: Any, level: int = 10, indent: int = 0, nesting_indent: int = 1, max_width: int = 80, compact: bool = False, sort_dicts: bool = True)

Log an object formatted with pprint.pformat to a logger. Mutli-line output is spread to multiple log entries.

Parameters:
  • fn_logger (logging.Logger) – Logger instance to log to.

  • obj (Any) – Object to log.

  • level (int) – Level for logging (e.g logging.DEBUG, logging.INFO, …)

  • indent (int) – Number of spaces to indent all entries.

  • nesting_indent (int) – Additional indentation for each object nesting level.

  • max_width (int) – Maximum width of the formatted object if indent is zero. The maximum log message length is max_width + indent.

  • compact (bool) – True to use compact representation, False for pretty.

  • sort_dicts (bool) – True to sort dictionaries before output.

log_pp() Parameter correspondence to pprint.PrettyPrinter

log_pp

pprint.PrettyPrinter

nesting_indent

indent

max_width

width

compact

compact

sort_dicts

sort_dicts

pymisclib.utilities.log_stacktrace(fn_logger: Logger, level: int = 10)

Log the current stack trace inside or outside an exception.

Parameters:
  • fn_logger (logging.Logger) – The logger to log to.

  • level (int) – Level for logging (e.g. CRITICAL, ERROR, .. DEBUG)

pymisclib.utilities.log_xml(fn_logger: Logger, xml_elem: Element | ElementTree, level: int = 10, indent: int = 0)

Log the XML as one or more log entries.

Parameters:
  • fn_logger (logging.Logger) – The logger to log to.

  • xml_elem (etree._Element) – The XML element to log.

  • level (int) – Level for logging (e.g logging.DEBUG, logging.INFO, …)

  • indent (int) – Number of spaces to indent all entries.

pymisclib.utilities.resolve_wildcards(filenames: list[str]) list[Path]

Resolve unique Paths from a list containing wildcards.

Parameters:

filenames (list[str]) – A list of filenames that may contain wildcards.

Returns:

A list of unique Paths.

Return type:

list[Path]

pymisclib.utilities.rotate_file(file_name: str, file_dir: ~pathlib.Path, rotation: int, compress_level: int = 9, fn_logger: ~logging.Logger = <Logger pymisclib.utilities (WARNING)>)

Rotate a (log-)file.

The parameter rotation specifies the number of copies that will be retained.

When rotating, file_name.rotation-1 is renamed to file_name.rotation for all values of rotation down to 1.

The initial file (with no .rotation suffix) gains a suffix. If compress_level > 0, the initial file is also compressed.

Parameters:
  • file_name (str) – Name of the log file.

  • file_dir (Path) – Path to the directory containing the log file.

  • rotation (int) – Number of copies to rotate. 0 to disable.

  • compress_level (int) – Zlib compression level with 0 being no compression and 9 the highest possible.

  • fn_logger (logging.Logger) – Logger used by the function.

pymisclib.utilities.round_down(n: int, m: int) int

Round the given number n down to the nearest multiple of m.

Parameters:
  • n (int) – number to round

  • m (int) – multiple to round to

Returns:

n rounded down to a multiple of m.

Return type:

int

pymisclib.utilities.round_up(n: int, m: int) int

Round the given number n up to the nearest multiple of m.

Parameters:
  • n (int) – number to round

  • m (int) – multiple to round to

Returns:

n rounded up to a multiple of m.

Return type:

int

pymisclib.utilities.std_open(filename: str | Path | None = None, mode: str = 'rt', encoding: str = 'utf-8', newline: str | None = None)

Open either a file or stdin/stdout for use with with.

If the filename is None or ‘-’ then stdin or stdout (depending on the mode) are used. Otherwise, the file is used. It is closed when the with block is done.

If the filename ends with a well-known compression format extension, this compression format is used:

.bz2

bzip2 compressed file, uses bz2.open()

.gz

GZip compressed file, uses gzip.open().

.xz

LZMA compressed file, uses lzma.open().

Parameters:
  • filename (str|Path|None) – A filename, Path-like, ‘-’, or None.

  • mode (str) – The mode to use for open(). Valid modes are: ‘rb’, ‘ab’, ‘wb’, ‘xb’ for binary mode, or ‘rt’, ‘at’, ‘wt’, or ‘xt’ for text mode. The default is ‘rt’.

  • encoding (str) – The encoding to pass to open(). Defaults to ‘utf-8’.

  • newline (str) – Universal newline mode. Passed to open().

Note

The modes ‘a’, ‘r’, ‘w’, and ‘x’ default to text mode for uncompressed, and to binary mode for compressed formats. I recommend avoiding them. Always express your programs intent explicitly to avoid problems.

Example

with std_open(fn, 'rt') as f:
    c = f.read()

Changed in version 1.3.1: Default mode changed to ‘r’ to minimize chances of accidental file clobbering.

Changed in version 1.4.2: Add parameter newline.

Changed in version 1.5.1: Add ability to read/write compressed files. Allow Path-like object for the filename.

pymisclib.utilities.string_from_format(fmt: str, timestamp: datetime = None) str

Given a strftime()-like format, expand into a string.

Additional formats: - %P - name of the application

Parameters:
  • fmt (str) – Format string to apply.

  • timestamp (datetime) – Timestamp to format or None to use the current time.

Returns:

Resulting string.

pymisclib.utilities.string_to_snake_case(string: str) str

Convert a string to snake_case.

Parameters:

string (str) – String to convert.

Rtype str:

Returns:

Converted snake_case string.

pymisclib.utilities.true_stem(path: Path) str

Return the true stem (e.g. the name without suffixes) of the Path.

Parameters:

path (Path) – A Path object.

Return type:

str

Returns:

The final file or directory name without any suffixes.

Example

print(true_stem(Path('/var/tmp/abc.tar.gz')))
abc
pymisclib.utilities.wrapping_counter_delta(first: int, second: int, max_value: int) int

Calculate the difference between two values of a wrapping counter.

Parameters:
  • first (int) – The first counter value.

  • second (int) – The second (later) counter value.

  • max_value (int) – The maximum value the counter can reach plus one. For fixed-length integers, this is 2**<num_bits>.

Returns:

The difference between the two counter values.

Example for a 4 bit counter (range 0..15):

 0             0            0             0 1             1
 0             f            0             f 0             f
|---------------|          |---------------|---------------|
   ^        ^       ===>               ^      ^
   2nd      1st                        1st    2nd

first  = 0x0c (= 12)
second = 0x03 (=  3)
delta = 0x03 + 0x10 - 0x0c = 0x07
           3 +   16 -   12 =    7

New in version 1.5.0.