22. filetools — A collection of file utilities.

22.1. Classes defined in module filetools

class filetools.File(filename, mode, compr=None, level=5, delete_temp=True)[source]

Read/write files with transparent file compression.

This class is a context manager providing transparent file compression and decompression. It is commonly used in a with statement, as follows:

with File('filename.ext','w') as f:
    f.write('something')
    f.write('something more')

This will create an uncompressed file with the specified name, write some things to the file, and close it. The file can be read back similarly:

with File('filename.ext','r') as f:
    for line in f:
        print(f)

Because File is a context manager, the file is automatically closed when leaving the with block.

So far this doesn’t look very different from using open(). But when specifying a filename ending on ‘.gz’ or ‘.bz2’, the File class will be automatically compress (on writing) or decompress (on reading) the file. So your code can just stay the same as above. Just use a proper filename.

Parameters:
  • filename (path_like) – Path of the file to open. If the filename ends with ‘.gz’ or ‘.bz2’, transparent (de)compression will be used, with gzip or bzip2 compression algorithms respectively. For other file names, it can be forced with the compr argument.

  • mode (str) – File open mode: ‘r’ for read, ‘w’ for write or ‘a’ for append mode. See also the Python documentation for the open() builtin function. For compressed files, append mode is not yet available.

  • compr ('gz' | 'bz2') – The compression algorithm to be used: gzip or bzip2. If not provided and the file name ends with ‘.gz’ or ‘.bz2’, compr is set automatically from the extension.

  • level (int (1..9)) – Compression level for gzip/bzip2. Higher values result in smaller files, but require longer compression times. The default of 5 gives already a fairly good compression ratio.

  • delete_temp (bool) – If True (default), the temporary files needed to do the (de)compression are deleted when the File instance is closed. This can be set to False to keep the files (mainly intended for debugging).

The File class can also be used outside a with statement. In that case the user has to open and close the File himself. The following are more or less equivalent with the above examples (the with statement is better at handling exceptions):

fil = File('filename.ext','w')
f = fil.open()
f.write('something')
f.write('something more')
fil.close()

This will create an uncompressed file with the specified name, write some things to the file, and close it. The file can be read back similarly:

fil = File('filename.ext','r')
f = fil.open()
for line in f:
    print(f)
fil.close()
open()[source]

Open the File in the requested mode.

This can be used to open a File object outside a with statement. It returns a Python file object that can be used to read from or write to the File. It performs the following:

  • If no compression is used, ope the file in the requested mode.

  • For reading a compressed file, decompress the file to a temporary file and open the temporary file for reading.

  • For writing a compressed file, open a tem[porary file for writing.

See the documentation for the File class for an example of its use.

close()[source]

Close the File.

This can be used to close the File if it was not opened using a with statement. It performs the following:

  • The underlying file object is closed.

  • If the file was opened in write or append mode and compression is requested, the file is compressed.

  • If a temporary file was in use and delete_temp is True, the temporary file is deleted.

See the documentation for the File class for an example of its use.

reopen(mode='r')[source]

Reopen the file, possibly in another mode.

This allows e.g. to read back data from a just saved file without having to destroy the File instance.

Returns the open file object.

class filetools.TempDir(suffix=None, prefix='pyf_', dir=None, chdir=False, keep=False)[source]

A temporary directory that can be used as a context manager.

This is a wrapper around Python’s tempfile.TemporaryDirectory, with the following differences:

  • the default value for prefix is set to pyf_,

  • it has an extra attribute ‘.path’ returning the directory name as a Path,

  • the context manager returns a Path instead of a str,

  • the context wrapper can automatically change into the tempdir

  • the context manager automatically changes back to the original workdir

class filetools.ChDir(dirname=None, create=True)[source]

A context manager to temporarily change the working directory.

The context manager changes the current working directory and guarantees to come back to the previous, even if an exception occurs.

Parameters:
  • dirname (path_like | None) – The relative or absolute path name of the directory to change into. If the directory does not exist, it will be created, unless create=False was specified. If None, a temporary working directory will be created and used, and be deleted with all its contents on leaving the contex.

  • create (bool) – If True(default), the directory (including missing parents) will be created if it does not exist. If False, and a path was specified for dirname, the directory should exist and be accessible.

Returns:

context – A context manager object that can be used in a with statement. On entry , it changes into the specified or temporary directory, and on exit it change back to the previous working directory.

Raises:

OSError or subclass – If the specified path can no be changed into or can not be created.

Examples

>>> olddir = os.getcwd()
>>> with ChDir() as newdir:
...    print(os.getcwd()==newdir, newdir!=olddir)
True True
>>> os.getcwd()==olddir
True
class filetools.NameSequence(template, ext='', start=0, step=1)[source]

A class for autogenerating sequences of names.

Sequences of names are autogenerated by combining a fixed string with a numeric part. The latter is incremented at each creation of a new name (by using the next() function or by calling the NameSequence).

Parameters:
  • template (str) –

    Either a template to generate the names, or an example name from which the template can be derived. If the string contains a ‘%’ character, it is considered a template and will be used as such. It must be a valid template to format a single int value. For example, a template ‘point-%d’ with a value 5 will generate a name ‘point-5’.

    If the string does not contain a ‘%’ character, a template is generated as follows. The string is split in three parts (prefix, numeric, suffix), where numeric only contains digits and suffix does not contain any digits. Thus, numeric is the last numeric part in the string. Use ext if the variable part is not the last numeric part of names. If the string does not contain any numeric part, it is split as a file name in stem and suffix, and ‘-0’ is appended to the stem. Thus, ‘point.png’ will be treated like ‘point-0.png’. Finally, if the string is empty, it is replaced with ‘0’. To create the template, the numeric part is replaced with a ‘%0#d’ format (where # is the length of the numeric part, concatened again with prefix and suffix, and ext is appended. Also, the start value is set to the numeric part (unless a nonzero start value is provided).

  • ext (str, optional) – If provided, this is an invariable string appended to the template. It is mostly useful when providing a full name as template and the variable numeric part is not the last numeric part in the name. For example, NameSequence(‘x1’, ‘.5a’) will generate names ‘x1.5a’, ‘x2.5a’, …

  • start (int, optional) – Starting value for the numerical part. If template contains a full name, it will only be acknowledged if nonzero.

  • step (int, optional) – Step for incrementing the numerical value.

Notes

If N is a NameSequence, then next(N) and N() are equivalent.

Examples

>>> N = NameSequence('obj')
>>> next(N)
'obj-0'
>>> N()
'obj-1'
>>> [N() for i in range(3)]
['obj-2', 'obj-3', 'obj-4']
>>> N.peek()
'obj-5'
>>> N()
'obj-5'
>>> N.template
'obj-%d'
>>> N = NameSequence('obj-%03d', start=5)
>>> [next(N) for i in range(3)]
['obj-005', 'obj-006', 'obj-007']
>>> N = NameSequence('obj-005')
>>> [next(N) for i in range(3)]
['obj-005', 'obj-006', 'obj-007']
>>> N = NameSequence('abc.98', step=2)
>>> [next(N) for i in range(3)]
['abc.98', 'abc.100', 'abc.102']
>>> N = NameSequence('abc-8x.png')
>>> [next(N) for i in range(3)]
['abc-8x.png', 'abc-9x.png', 'abc-10x.png']
>>> N.template
'abc-%01dx.png'
>>> N.glob()
'abc-*x.png'
>>> next(NameSequence('abc','.png'))
'abc-0.png'
>>> next(NameSequence('abc.png'))
'abc-0.png'
>>> N = NameSequence('/home/user/abc23','5.png')
>>> [next(N) for i in range(2)]
['/home/user/abc235.png', '/home/user/abc245.png']
>>> N = NameSequence('')
>>> next(N), next(N)
('0', '1')
>>> N = NameSequence('12')
>>> next(N), next(N)
('12', '13')
peek()[source]

Peek at the next name

glob()[source]

Return a UNIX glob pattern for the generated names.

A NameSequence is often used as a generator for file names. The glob() method returns a pattern that can be used in a UNIX-like shell command to select all the generated file names.

22.2. Functions defined in module filetools

filetools.TempFile(*args, **kargs)[source]

Return a temporary file that can be used as a context manager.

This is a wrapper around Python’s tempfile.NamedTemporaryFile, with the difference that the returned object has an extra attribute ‘.path’, returning the file name as a Path.

filetools.gzip(filename, gzipped=None, remove=True, level=5, compr='gz')[source]

Compress a file in gzip/bzip2 format.

Parameters:
  • filename (path_like) – The input file name.

  • gzipped (path_like, optional) – The output file name. If not specified, it will be set to the input file name + ‘.’ + compr. An existing output file will be overwritten.

  • remove (bool) – If True (default), the input file is removed after successful compression.

  • level (int 1..9) – The gzip/bzip2 compression level. Higher values result in smaller files, but require longer compression times. The default of 5 gives already a fairly good compression ratio.

  • compr ('gz' | 'bz2') – The compression algorithm to be used. The default is ‘gz’ for gzip compression. Setting to ‘bz2’ will use bzip2 compression.

Returns:

Path – The path of the compressed file.

Examples

>>> f = Path('./test_gzip.out')
>>> f.write_text('This is a test\n'*100)
1500
>>> print(f.size)
1500
>>> g = gzip(f)
>>> print(g)
test_gzip.out.gz
>>> print(g.size)
60
>>> f.exists()
False
>>> f = gunzip(g)
>>> f.exists()
True
>>> print(f.read_text().split('\n')[50])
This is a test
>>> g.exists()
False
filetools.gunzip(filename, unzipped=None, remove=True, compr='gz')[source]

Uncompress a file in gzip/bzip2 format.

Parameters:
  • filename (path_like) – The compressed input file name (usually ending in ‘.gz’ or ‘.bz2’).

  • unzipped (path_like, optional) – The output file name. If not provided and filename ends with ‘.gz’ or ‘.bz2’, it will be set to the filename with the ‘.gz’ or ‘.bz2’ removed. If not provided and filename does not end in ‘.gz’ or ‘.bz2’, or if an empty string is provided, the name of a temporary file is generated. Since you will normally want to read something from the decompressed file, this temporary file is not deleted after closing. It is up to the user to delete it (using the returned file name) when the file has been dealt with.

  • remove (bool) – If True (default), the input file is removed after successful decompression. You probably want to set this to False when decompressing to a temporary file.

  • compr ('gz' | 'bz2') – The compression algorithm used in the input file. If not provided, it is automatically set from the extension of the filename if that is either ‘.gz’ or ‘.bz2’, or else the default ‘gz’ is used.

Returns:

Path – The name of the uncompressed file.

Examples

See gzip.

filetools.zipList(filename)[source]

List the files in a zip archive

Returns a list of file names

filetools.zipExtract(filename, members=None)[source]

Extract the specified member(s) from the zip file.

The default extracts all.

filetools.dos2unix(infile)[source]

Convert a text file to unix line endings.

filetools.unix2dos(infile, outfile=None)[source]

Convert a text file to dos line endings.

filetools.countLines(fn)[source]

Return the number of lines in a text file.

filetools.hsorted(l)[source]

Sort a list of strings in human order.

When human sort a list of strings, they tend to interprete the numerical fields like numbers and sort these parts numerically, instead of the lexicographic sorting by the computer.

Returns the list of strings sorted in human order.

Example: >>> hsorted([‘a1b’,’a11b’,’a1.1b’,’a2b’,’a1’]) [‘a1’, ‘a1.1b’, ‘a1b’, ‘a2b’, ‘a11b’]

filetools.numsplit(s)[source]

Split a string in numerical and non-numerical parts.

Returns a series of substrings of s. The odd items do not contain any digits. The even items only contain digits. Joined together, the substrings restore the original.

The number of items is always odd: if the string ends or starts with a digit, the first or last item is an empty string.

Example:

>>> print(numsplit("aa11.22bb"))
['aa', '11', '.', '22', 'bb']
>>> print(numsplit("11.22bb"))
['', '11', '.', '22', 'bb']
>>> print(numsplit("aa11.22"))
['aa', '11', '.', '22', '']
filetools.splitDigits(s, pos=-1)[source]

Split a string at a sequence of digits.

The input string is split in three parts, where the second part is a contiguous series of digits. The second argument specifies at which numerical substring the splitting is done. By default (pos=-1) this is the last one.

Returns a tuple of three strings, any of which can be empty. The second string, if non-empty is a series of digits. The first and last items are the parts of the string before and after that series. Any of the three return values can be an empty string. If the string does not contain any digits, or if the specified splitting position exceeds the number of numerical substrings, the second and third items are empty strings.

Example:

>>> splitDigits('abc123')
('abc', '123', '')
>>> splitDigits('123')
('', '123', '')
>>> splitDigits('abc')
('abc', '', '')
>>> splitDigits('abc123def456fghi')
('abc123def', '456', 'fghi')
>>> splitDigits('abc123def456fghi',0)
('abc', '123', 'def456fghi')
>>> splitDigits('123-456')
('123-', '456', '')
>>> splitDigits('123-456',2)
('123-456', '', '')
>>> splitDigits('')
('', '', '')
filetools.template_from_name(name, ext='')[source]

Return template and current number from a given name.

Return a template for generating a family names with an increasing numeric part.

Parameters:
  • name (str) – The intended name format. The name is split in three parts (prefix, numeric, suffix), where numeric only contains digits and suffix does not contain any digits. Thus, numeric is the last numeric part in the name. If the name does not contain any numeric part, it is split as a file name in stem and suffix, and ‘-0’ is appended to the stem. Thus, ‘point.png’ will be treated like ‘point-0.png’. Finally, if name is an empty string, it is replaced with ‘0’.

  • ext (str, optional) – An extra string to be append to the returned template string. This can be used to make the variable part not the last numeric part in the name.

Returns:

  • template (str) – A template that can be user to generate names like the input but with other numeric part. It is the concatenation of (prefix, ‘%0#d’, suffix, ext), where # is the length of the numeric part.

  • number (int) – The integer value of the numeric part or 0 if there wasn’t one.

Notes

If the input name contained a numeric part, and ext is empty, the result of template % number is the input name.

Examples

>>> t, n = template_from_name('abc-8x.png')
>>> (t, n)
('abc-%01dx.png', 8)
>>> t % n
'abc-8x.png'
>>> template_from_name('abc-000.png')
('abc-%03d.png', 0)
>>> template_from_name('abc.png')
('abc-%d.png', 0)
>>> template_from_name('abc', ext='-1.png')
('abc-%d-1.png', 0)
>>> template_from_name('abc')
('abc-%d', 0)
>>> template_from_name('')
('%d', 0)
filetools.autoName(clas)[source]

Return the autoname class instance for objects of type clas.

This allows for objects of a certain class to be automatically named throughout pyFormex.

Parameters:

clas (str or class or object) – The object class name. If a str, it is the class name. If a class, the name is found from it. If an object, the name is taken from the object’s class. In all cases the name is converted to lower case

Returns:

NameSequence instance – A NameSequence that will generate subsequent names corresponding with the specified class.

Examples

>>> from pyformex.formex import Formex
>>> F = Formex()
>>> print(next(autoName(Formex)))
formex-0
>>> print(next(autoName(F)))
formex-1
>>> print(next(autoName('Formex')))
formex-2
filetools.listFonts(pattern='', include=None, exclude=None)[source]

List the fonts known to the system.

This uses the ‘fc-list’ command from the fontconfig package to find a list of font files installed on the user’s system. The list of files can be restricted by three parameters: a pattern to be passed to the fc-list command, an include regexp specifying which of the matching font files should be retained, and an exclude regexp specifying which files should be removed from the remaining list.

Parameters:
  • pattern (str) – A pattern string to pass to the fc-list command. For example, a pattern ‘mono’ will only list monospaced fonts. Multiple elements can be combined with a colon as separator. Example: pattern=’family=DejaVuSans:style=Bold’. An empty string selects all font files.

  • include (str) – Regex for grep to select the font files to include in the result. If not specified, the pattern from the configuration variable ‘fonts/include’ is used. Example: the default configured include=’.ttf$’ will only return font files with a .ttf suffix. An empty string will include all files selected by the pattern.

  • exclude (str) – Regex for grep to select the font files to include in the result. If not specified, the pattern from the configuration variable ‘fonts/include’ is used. Example: the default configured exclude=’Emoji’ will exclude font files that have ‘Emoji’ in their name. An empty string will exclude no files.

Returns:

list of Path – A list of the font files found on the system. If fontconfig is not installed, produces a warning and returns an empty list.

Examples

>>> fonts = listFonts('mono')
>>> print(len(fonts) > 0 and fonts[0].is_file())
True
filetools.listMonoFonts()[source]

List the monospace fonts found on the system

This is equivalent to listFonts('mono')

See also

listFonts

filetools.defaultMonoFont()[source]

Return a default monospace font for the system.

Returns:

Path – If the configured ‘fonts/default’ has a matching font file on the system, that Path is returned. Else, the first file from fontList('mono') is returned.

Raises:

ValuerError – If no monospace font was found on the system

Examples

>>> print(defaultMonoFont())
/...DejaVuSansMono.ttf
filetools.diskSpace(path, units=None, ndigits=2)[source]

Returns the amount of diskspace of a file system.

Parameters:
  • path (path_like) – A path name inside the file system to be probed.

  • units (str) – If provided, results are reported in this units. See humanSize() for possible values. The default is to return the number of bytes.

  • ndigits (int) – If provided, and also units is provided, specifies the number of decimal digits to report. See humanSize() for details.

Returns:

  • total (int | float) – The total disk space of the file system containing path.

  • used (int | float) – The used disk space on the file system containing path.

  • available (int | float) – The available disk space on the file system containing path.

Notes

The sum used + available does not necessarily equal total, because a file system may (and usually does) have reserved blocks.

filetools.humanSize(size, units, ndigits=-1)[source]

Convert a number to a human size.

Large numbers are often represented in a more human readable form using k, M, G prefixes. This function returns the input size as a number with the specified prefix.

Parameters:
  • size (int or float) – A number to be converted to human readable form.

  • units (str) – A string specifying the target units. The first character should be one of k,K,M,G,T,P,E,Z,Y. ‘k’ and ‘K’ are equivalent. A second character ‘i’ can be added to use binary (K=1024) prefixes instead of decimal (k=1000).

  • ndigits (int, optional) – If provided and >=0, the result will be rounded to this number of decimal digits.

Returns:

float – The input value in the specified units and possibly rounded to ndigits.

Examples

>>> humanSize(1234567890,'k')
1234567.89
>>> humanSize(1234567890,'M',0)
1235.0
>>> humanSize(1234567890,'G',3)
1.235
>>> humanSize(1234567890,'Gi',3)
1.15
filetools.getDocString(pyfile)[source]

Return the docstring from a Python file.

Parameters:

pyfile (path_like) – The file to seach for the docstring.

Returns:

str – The first multiline string (delimited by triple double/single quote characters) from the file.