Base Classes¶
This is a reference API class listing, useful mainly for developers.
An object which can produce data |
|
Helper class to provide file-name parsing abilities to a driver class |
|
|
Refer to another named source, unmodified |
Base class for all DataSources living on an Intake server |
|
|
Manages a hierarchy of data sources as a collective unit. |
|
A single item appearing in a catalog |
|
A user-settable item that is passed to a DataSource upon instantiation. |
|
Base class for authorization |
|
Provides utilities for managing cached data files. |
|
Holds details of data description for any type of data-source |
Specialised catalog for persisted data-sources |
-
class
intake.source.base.
DataSource
(storage_options=None, metadata=None)¶ An object which can produce data
This is the base class for all Intake plugins, including catalogs and remote (server) data objects. To produce a new plugin commonly involves subclassing this definition and overriding some or all of the methods.
This class is not useful in itself, most methods raise NotImplemented.
-
close
()¶ Close open resources corresponding to this data source.
-
configure_new
(**kwargs)¶ Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
-
discover
()¶ Open resource and populate the source attributes.
-
export
(path, **kwargs)¶ Save this data for sharing with other people
Creates a copy of the data in a format appropriate for its container, in the location specified (which can be remote, e.g., s3).
Returns the resultant source object, so that you can, for instance, add it to a catalog (
catalog.add(source)
) or get its YAML representation (.yaml()
).
-
get
(**kwargs)¶ Create a new instance of this source with altered arguments
Enables the picking of options and re-evaluating templates from any user-parameters associated with this source, or overriding any of the init arguments.
Returns a new data source instance. The instance will be recreated from the original entry definition in a catalog if this source was originally created from a catalog.
-
property
gui
¶ Source GUI, with parameter selection and plotting
-
property
hvplot
¶ Returns a hvPlot object to provide a high-level plotting API.
-
persist
(ttl=None, **kwargs)¶ Save data from this source to local persistent storage
- Parameters
ttl: numeric, optional
Time to live in seconds. If provided, the original source will be accessed and a new persisted version written transparently when more than
ttl
seconds have passed since the old persisted version was written.kargs: passed to the _persist method on the base container.
-
property
plot
Returns a hvPlot object to provide a high-level plotting API.
To display in a notebook, be sure to run
intake.output_notebook()
first.
-
property
plots
¶ List custom associated quick-plots
-
read
()¶ Load entire dataset into a container and return it
-
read_chunked
()¶ Return iterator over container fragments of data source
-
read_partition
(i)¶ Return a part of the data corresponding to i-th partition.
By default, assumes i should be an integer between zero and npartitions; override for more complex indexing schemes.
-
to_dask
()¶ Return a dask container for this data source
-
to_spark
()¶ Provide an equivalent data object in Apache Spark
The mapping of python-oriented data containers to Spark ones will be imperfect, and only a small number of drivers are expected to be able to produce Spark objects. The standard arguments may b translated, unsupported or ignored, depending on the specific driver.
This method requires the package intake-spark
-
yaml
(with_plugin=False)¶ Return YAML representation of this data-source
The output may be roughly appropriate for inclusion in a YAML catalog. This is a best-effort implementation
- Parameters
with_plugin: bool
If True, create a “plugins” section, for cases where this source is created with a plugin not expected to be in the global Intake registry.
-
-
class
intake.catalog.
Catalog
(*args, name=None, description=None, metadata=None, auth=None, ttl=1, getenv=True, getshell=True, persist_mode='default', storage_options=None)¶ Manages a hierarchy of data sources as a collective unit.
A catalog is a set of available data sources for an individual entity (remote server, local file, or a local directory of files). This can be expanded to include a collection of subcatalogs, which are then managed as a single unit.
A catalog is created with a single URI or a collection of URIs. A URI can either be a URL or a file path.
Each catalog in the hierarchy is responsible for caching the most recent refresh time to prevent overeager queries.
Attributes
metadata
(dict) Arbitrary information to carry along with the data source specs.
-
discover
()¶ Open resource and populate the source attributes.
-
filter
(func)¶ Create a Catalog of a subset of entries based on a condition
Note that, whatever specific class this is performed on, the return instance is a Catalog. The entries are passed unmodified, so they will still reference the original catalog instance and include its details such as directory,.
- Parameters
func : function
This should take a CatalogEntry and return True or False. Those items returning True will be included in the new Catalog, with the same entry names
- Returns
New Catalog
-
force_reload
()¶ Imperative reload data now
-
classmethod
from_dict
(entries, **kwargs)¶ Create Catalog from the given set of entries
- Parameters
entries : dict-like
A mapping of name:entry which supports dict-like functionality, e.g., is derived from
collections.abc.Mapping
.kwargs : passed on the constructor
Things like metadata, name; see
__init__
.- Returns
Catalog instance
-
property
gui
¶ Source GUI, with parameter selection and plotting
-
items
()¶ Get an iterator over (key, value) tuples for the catalog entries.
-
pop
(key)¶ Remove entry from catalog and return it
This relies on the _entries attribute being mutable, which it normally is. Note that if a catalog automatically reloads, any entry removed here may soon reappear
- Parameters
key : str
Key to give the entry in the cat
-
reload
()¶ Reload catalog if sufficient time has passed
-
save
(url, storage_options=None)¶ Output this catalog to a file as YAML
- Parameters
url : str
Location to save to, perhaps remote
storage_options : dict
Extra arguments for the file-system
-
serialize
()¶ Produce YAML version of this catalog.
Note that this is not the same as
.yaml()
, which produces a YAML block referring to this catalog.
-
walk
(sofar=None, prefix=None, depth=2)¶ Get all entries in this catalog and sub-catalogs
- Parameters
sofar: dict or None
Within recursion, use this dict for output
prefix: list of str or None
Names of levels already visited
depth: int
Number of levels to descend; needed to truncate circular references and for cleaner output
- Returns
Dict where the keys are the entry names in dotted syntax, and the
values are entry instances.
-
-
class
intake.catalog.entry.
CatalogEntry
(getenv=True, getshell=True)¶ A single item appearing in a catalog
This is the base class, used by local entries (i.e., read from a YAML file) and by remote entries (read from a server).
-
describe
()¶ Get a dictionary of attributes of this entry.
- Returns: dict with keys
- name: str
The name of the catalog entry.
- containerstr
kind of container used by this data source
- descriptionstr
Markdown-friendly description of data source
- direct_accessstr
Mode of remote access: forbid, allow, force
- user_parameterslist[dict]
List of user parameters defined by this entry
-
get
(**user_parameters)¶ Open the data source.
Equivalent to calling the catalog entry like a function.
Note:
entry()
,entry.attr
,entry[item]
check for persisted sources, but directly calling.get()
will always ignore the persisted store (equivalent toself._pmode=='never'
).- Parameters
user_parameters : dict
Values for user-configurable parameters for this data source
- Returns
DataSource
-
property
has_been_persisted
¶ For the source created with the given args, has it been persisted?
-
property
plots
¶ List custom associated quick-plots
-
-
class
intake.container.base.
RemoteSource
(url, headers, name, parameters, metadata=None, **kwargs)¶ Base class for all DataSources living on an Intake server
-
to_dask
()¶ Return a dask container for this data source
-
-
class
intake.catalog.local.
UserParameter
(name, description=None, type=None, default=None, min=None, max=None, allowed=None)¶ A user-settable item that is passed to a DataSource upon instantiation.
For string parameters, default may include special functions
func(args)
, which may be expanded from environment variables or by executing a shell command.- Parameters
name: str
the key that appears in the DataSource argument strings
description: str
narrative text
type: str
one of list``(COERSION_RULES)``
default: type value
same type as
type
. It a str, may include special functions env, shell, client_env, client_shell.min, max: type value
for validation of user input
allowed: list of type
for validation of user input
-
describe
()¶ Information about this parameter
-
expand_defaults
(client=False, getenv=True, getshell=True)¶ Compile env, client_env, shell and client_shell commands
-
validate
(value)¶ Does value meet parameter requirements?
-
class
intake.auth.base.
BaseAuth
(*args)¶ Base class for authorization
Subclass this and override the methods to implement a new type of auth.
This basic class allows all access.
-
allow_access
(header, source, catalog)¶ Is the given HTTP header allowed to access given data source
- Parameters
header: dict
The HTTP header from the incoming request
source: CatalogEntry
The data source the user wants to access.
catalog: Catalog
The catalog object containing this data source.
-
allow_connect
(header)¶ Is the requests header given allowed to talk to the server
- Parameters
header: dict
The HTTP header from the incoming request
-
get_case_insensitive
(dictionary, key, default=None)¶ Case-insensitive search of a dictionary for key.
Returns the value if key match is found, otherwise default.
-
-
class
intake.source.cache.
BaseCache
(driver, spec, catdir=None, cache_dir=None, storage_options={})¶ Provides utilities for managing cached data files.
Providers of caching functionality should derive from this, and appear as entries in
registry
. The principle methods to override are_make_files()
and_load()
and_from_metadata()
.-
clear_all
()¶ Clears all cache and metadata.
-
clear_cache
(urlpath)¶ Clears cache and metadata for a given urlpath.
- Parameters
urlpath: str, location of data
May be a local path, or remote path if including a protocol specifier such as
's3://'
. May include glob wildcards.
-
get_metadata
(urlpath)¶ - Parameters
urlpath: str, location of data
May be a local path, or remote path if including a protocol specifier such as
's3://'
. May include glob wildcards.- Returns
Metadata (dict) about a given urlpath.
-
load
(urlpath, output=None, **kwargs)¶ Downloads data from a given url, generates a hashed filename, logs metadata, and caches it locally.
- Parameters
urlpath: str, location of data
May be a local path, or remote path if including a protocol specifier such as
's3://'
. May include glob wildcards.output: bool
Whether to show progress bars; turn off for testing
- Returns
List of local cache_paths to be opened instead of the remote file(s). If
caching is disable, the urlpath is returned.
-
-
class
intake.source.base.
AliasSource
(target, mapping=None, metadata=None, **kwargs)¶ Refer to another named source, unmodified
The purpose of an Alias is to be able to refer to other source(s) in the same catalog, perhaps leaving the choice of which target to load up to the user. This source makes no sense outside of a catalog.
In this case, the output of the target source is not modified, but this class acts as a prototype ‘derived’ source for processing the output of some standard driver.
After initial discovery, the source’s container and other details will be updated from the target; initially, the AliasSource container is not any standard.
-
__init__
(target, mapping=None, metadata=None, **kwargs)¶ - Parameters
target: str
Name of the source to load, must be a key in the same catalog
mapping: dict or None
If given, use this to map the string passed as
target
to entries in the catalogmetadata: dict or None
Extra metadata to associate
kwargs: passed on to the target
-
-
class
intake.source.base.
PatternMixin
¶ Helper class to provide file-name parsing abilities to a driver class
-
class
intake.source.base.
Schema
(**kwargs)¶ Holds details of data description for any type of data-source
This should always be pickleable, so that it can be sent from a server to a client, and contain all information needed to recreate a RemoteSource on the client.
-
class
intake.container.persist.
PersistStore
(path=None, **storage_options)¶ Specialised catalog for persisted data-sources
-
add
(key, source)¶ Add the persisted source to the store under the given key
- keystr
The unique token of the un-persisted, original source
- sourceDataSource instance
The thing to add to the persisted catalogue, referring to persisted data
-
backtrack
(source)¶ Given a unique key in the store, recreate original source
-
get_tok
(source)¶ Get string token from object
Strings are assumed to already be a token; if source or entry, see if it is a persisted thing (“original_tok” is in its metadata), else generate its own token.
-
needs_refresh
(source)¶ Has the (persisted) source expired in the store
Will return True if the source is not in the store at all, if it’s TTL is set to None, or if more seconds have passed than the TTL.
-
refresh
(key)¶ Recreate and re-persist the source for the given unique ID
-
remove
(source, delfiles=True)¶ Remove a dataset from the persist store
- sourcestr or DataSource or Lo
If a str, this is the unique ID of the original source, which is the key of the persisted dataset within the store. If a source, can be either the original or the persisted source.
- delfilesbool
Whether to remove the on-disc artifact
-