ged2doc.input

Module which handles input files.

This module is responsible for locating all files (GEDCOM data and images) given the application inputs. Currently it handles two cases:

  • Input is specified as path to GEDCOM file, that file can contain names of image files that are either absolute or relative to directory containing GEDCOM file or some other directory. Program options can specify directory where images are located.

  • Input file is a ZIP archive that includes both GEDCOM file and files with images. Depending on how GEDCOM file and archive were prepared names of image files in GEDCOM file can be specified as absolute paths to their original location or relative paths to their common directory.

Additional issue to consider is that files can be prepared on a system which is different from the system where the file is parsed. For example GEDCOM file could be prepared on Windows machine and names of image files could be given using Windows path convention (either absolute as C:\Users\JosephSmith\Documents\Pictures\Family\Tree\Me.BMP or relative as Pictures\Family\Tree\Me.BMP) and later this GEDCOM file could be copied to Linux host and processed using ged2doc package. Files on Linux machine will have different absolute and possibly relative paths (and definitely different path separator character).

In case of ZIP archive the names of images in GEDCOM file could be different from the names in in the archive (e.g. image path in GEDCOM file C:\Users\JosephSmith\Documents\Pictures\Family\Tree\Me.BMP could be stored in ZIP archive as Pictures/Family/Tree/Me.BMP).

Logic in this module is supposed to handle all those possible cases where names of files in GEDCOM file could be different from their location on a target storage system.

Typical use cases for GEDCOM file returned by this module is to be passed to methods in ged4py package and that package expects true filesystem-backed file which supports seek() and tell() methods. Image files do not typically need support for these methods and are usually read as a byte stream using read() method. This module returns seek-able file object open in binary mode for GEDCOM file (meaning that temporary file on disk may need to be created in some cases) and a “simple” binary stream for images.

Functions

make_file_locator(input_file, …)

Create and return file locator instance

Classes

FileLocator()

Abstract interface for file locator instances.

Exceptions

MultipleMatchesError

Class for exceptions generated when there is more than one file matching specified criteria.

ged2doc.input.make_file_locator(input_file, file_name_pattern, image_path)[source]

Create and return file locator instance

For a given input file (which can be GEDCOM file or ZIP archive) return corresponding file locator object (instance of FileLocator type).

Parameters
input_file

Path of the input file or file object, can be a ZIP archive or a GEDCOM file. If argument is a file object then it must support seek() method and be open in a binary mode.

file_name_patternstr

If input file is a ZIP archive then this pattern is used to search for a GEDCOM file in archive. Could be "*.ged" for example or can include more specific pattern.

image_pathstr

Directory on a filesystem where images are found. Images could be located in sub-directories of the given path. If file_name is a ZIP archive then images are searched inside ZIP archive and then in image_path. If image_path is None then filesystem is not searched for files. If image_path is an empty string then current directory is searched.

Returns
locatorFileLocator

File locator instance.

Raises
OSError

Raised if file is not found.

AttributeError

Raised if file object is given as input file but it does not support seek() method.

class ged2doc.input.FileLocator[source]

Bases: object

Abstract interface for file locator instances.

Methods

open_gedcom()

Returns file object for the input GEDCOM file.

open_image(name)

Returns open file object for the named image file.

abstract open_gedcom()[source]

Returns file object for the input GEDCOM file.

If no GEDCOM file is found None is returned. If more than one file is found than MultipleMatchesError exception is raised. Can throw other exceptions, e.g. if file cannot be open.

Returned file object will be open in binary mode and will support seek() and tell() methods. Note that this may be a temporary file which will be deleted after file is closed.

Returns
file

File object open in binary mode supporting seek() and tell() methods.

Raises
MultipleMatchesError

Raised if more than one file file is found.

abstract open_image(name)[source]

Returns open file object for the named image file.

If image file is not found None is returned. If more than one matching file is found than MultipleMatchesError exception is raised. Can throw other exceptions if file cannot be open.

Note that this file object may not support all operations (it may be an object inside zip archive for example) so you may need to copy it if you want full file protocol support.

Parameters
namestr

Name of the image file to open. This can be relative or absolute path name. Usually this is the name that is stored in GEDCOM file and it can use separator character which is different from a system reading this file.

Returns
image

File object open in binary mode, only read() method is guaranteed to work.

Raises
MultipleMatchesError

Raised if more than one file is found.

_abc_impl = <_abc_data object>
exception ged2doc.input.MultipleMatchesError[source]

Bases: RuntimeError

Class for exceptions generated when there is more than one file matching specified criteria.