jsonschema is an implementation of the JSON Schema specification for Python.
>>> fromjsonschemaimportvalidate>>> # A sample schema, like what we'd get from json.load()>>> schema={... "type":"object",... "properties":{... "price":{"type":"number"},... "name":{"type":"string"},... },... }>>> # If no exception is raised by validate(), the instance is valid.>>> validate(instance={"name":"Eggs","price":34.99},schema=schema)>>> validate(... instance={"name":"Eggs","price":"Invalid"},schema=schema,... )Traceback (most recent call last):...ValidationError: 'Invalid' is not of type 'number'
It can also be used from the command line by installing check-jsonschema.
Two extras are available when installing the package, both currently related to format validation:
format
format-nongpl
They can be used when installing in order to include additional dependencies, e.g.:
$pipinstalljsonschema'[format]'
Be aware that the mere presence of these dependencies – or even the specification of format checks in a schema – do not activate format checks (as per the specification).
Please read the format validation documentation for further details.
If you have nox installed (perhaps via pipxinstallnox or your package manager), running nox in the directory of your source checkout will run jsonschema’s test suite on all of the versions of Python jsonschema supports.
If you don’t have all of the versions that jsonschema is tested under, you’ll likely want to run using nox’s --no-error-on-missing-interpreters option.
Of course you’re also free to just run the tests on a single version with your favorite test runner.
The tests live in the jsonschema.tests package.
Most of the documentation for this package assumes you’re familiar with the fundamentals of writing JSON schemas themselves, and focuses on how this library helps you validate with them in Python.
If you aren’t already comfortable with writing schemas and need an introduction which teaches about JSON Schema the specification, you may find Understanding JSON Schema to be a good read!
>>> validate([2,3,4],{"maxItems":2})Traceback (most recent call last):...ValidationError: [2, 3, 4] is too long
validate() will first verify that the
provided schema is itself valid, since not doing so can lead to less
obvious error messages and fail in less obvious or consistent ways.
If you know you have a valid schema already, especially
if you intend to validate multiple instances with
the same schema, you likely would prefer using the
jsonschema.protocols.Validator.validate method directly on a
specific validator (e.g. Draft202012Validator.validate).
If the cls argument is not provided, two things will happen
in accordance with the specification. First, if the schema has a
$schema keyword containing a known meta-schema [1] then the
proper validator will be used. The specification recommends that
all schemas contain $schema properties for this reason. If no
$schema property is found, the default validator class is the
latest released draft.
Any other provided positional and keyword arguments will be passed
on when instantiating the cls.
If you are unfamiliar with protocols, either as a general notion or as specifically implemented by typing.Protocol, you can think of them as a set of attributes and methods that all objects satisfying the protocol have.
The protocol to which all validator classes adhere.
Parameters:
schema – The schema that the validator object will validate with.
It is assumed to be valid, and providing
an invalid schema can lead to undefined behavior. See
Validator.check_schema to validate a schema first.
registry – a schema registry that will be used for looking up JSON references
resolver –
a resolver that will be used to resolve $ref
properties (JSON references). If unprovided, one will be created.
format_checker – if provided, a checker which will be used to assert about
format properties present in the schema. If unprovided,
no format validation is done, and the presence of format
within schemas is strictly informational. Certain formats
require additional packages to be installed in order to assert
against instances. Ensure you’ve installed jsonschema with
its extra (optional) dependencies when
invoking pip.
Deprecated since version v4.12.0: Subclassing validator classes now explicitly warns this is not part of
their public API.
The returned object satisfies the validator protocol, but may not
be of the same concrete class! In particular this occurs
when a $ref occurs to a schema with a different
$schema than this one (i.e. for a different draft).
Lazily yield each of the validation errors in the given instance.
>>> schema={... "type":"array",... "items":{"enum":[1,2,3]},... "maxItems":2,... }>>> v=Draft202012Validator(schema)>>> forerrorinsorted(v.iter_errors([2,3,4]),key=str):... print(error.message)4 is not one of [1, 2, 3][2, 3, 4] is too long
Deprecated since version v4.0.0: Calling this function with a second schema argument is deprecated.
Use Validator.evolve instead.
To handle JSON Schema’s type keyword, a Validator uses
an associated TypeChecker. The type checker provides an immutable
mapping between names of types and functions that can test if an instance is
of that type. The defaults are suitable for most users - each of the
versioned validators that are included with
jsonschema have a TypeChecker that can correctly handle their respective
versions.
Produce a new checker with the given type redefined.
Parameters:
type – The name of the type to check.
fn (collections.abc.Callable) – A callable taking exactly two parameters - the type
checker calling the function and the instance to check.
The function should return true if instance is of this
type and false otherwise.
Occasionally it can be useful to provide additional or alternate types when
validating JSON Schema’s type keyword.
jsonschema tries to strike a balance between performance in the common
case and generality. For instance, JSON Schema defines a number type, which
can be validated with a schema such as {"type":"number"}. By default,
this will accept instances of Python numbers.Number. This includes in
particular ints and floats, along with
decimal.Decimal objects, complex numbers etc. For
integer and object, however, rather than checking for
numbers.Integral and collections.abc.Mapping,
jsonschema simply checks for int and dict, since the
more general instance checks can introduce significant slowdown, especially
given how common validating these types are.
If you do want the generality, or just want to add a few specific additional
types as being acceptable for a validator object, then you should update an
existing jsonschema.TypeChecker or create a new one. You may then create a new
Validator via jsonschema.validators.extend.
jsonschema ships with validator classes for various versions of the JSON Schema specification.
For details on the methods and attributes that each validator class provides see the Validator protocol, which each included validator class implements.
Each of the below cover a specific release of the JSON Schema specification.
JSON Schema defines the format keyword which can be used to check if primitive types (strings, numbers, booleans) conform to well-defined formats.
By default, as per the specification, no validation is enforced.
Optionally however, validation can be enabled by hooking a format-checkingobject into a Validator.
>>> validate("127.0.0.1",{"format":"ipv4"})>>> validate(... instance="-12",... schema={"format":"ipv4"},... format_checker=Draft202012Validator.FORMAT_CHECKER,... )Traceback (most recent call last):...ValidationError: "-12" is not a "ipv4"
Some formats require additional dependencies to be installed.
The easiest way to ensure you have what is needed is to install jsonschema using the format or format-nongpl extras.
For example:
$pipinstalljsonschema[format]
Or if you want to avoid GPL dependencies, a second extra is available:
$pipinstalljsonschema[format-nongpl]
At the moment, it supports all the available checkers except for iri and iri-reference.
Warning
It is your own responsibility ultimately to ensure you are license-compliant, so you should be double checking your own dependencies if you rely on this extra.
The more specific list of formats along with any additional dependencies they have is shown below.
Warning
If a dependency is not installed when using a checker that requires it, validation will succeed without throwing an error, as also specified by the specification.
JSON Schema does not mandate that the format property actually do any
validation. If validation is desired however, instances of this class can
be hooked into validators to enable format validation.
FormatChecker objects always return True when asked about
formats that they do not know how to validate.
A mapping of currently known formats to tuple of functions that validate them and errors that should be caught.
New checkers can be added and removed either per-instance or globally for all checkers using the FormatChecker.checks decorator.
Register a decorated function as globally validating a new format.
Any instance created after this function is called will pick up the supplied checker.
Parameters:
format (str) – the format that the decorated function will check
raises (Exception) – the exception(s) raised
by the decorated function when an invalid instance is
found. The exception object will be accessible as the
jsonschema.exceptions.ValidationError.cause attribute
of the resulting validation error.
Deprecated since version v4.14.0: Use FormatChecker.checks on an instance instead.
Given that there is no current library in Python capable of supporting the ECMA 262 dialect, the regex format will instead validate Python regular expressions, which are the ones used by this implementation for other keywords like pattern or patternProperties.
Since in most cases “validating” an email address is an attempt instead to confirm that mail sent to it will deliver to a recipient, and that that recipient is the correct one the email is intended for, and since many valid email addresses are in many places incorrectly rejected, and many invalid email addresses are in many places incorrectly accepted, the email format keyword only provides a sanity check, not full RFC 5322 validation.
The same applies to the idn-email format.
If you indeed want a particular well-specified set of emails to be considered valid, you can use FormatChecker.checks to provide your specific definition.
The full schema that this error came from. This is potentially a
subschema from within the schema that was passed in originally,
or even an entirely different schema if a $ref was
followed.
A collections.deque containing the path to the failed
keyword within the schema, but always relative to the
original schema as opposed to any subschema (i.e. the one
originally passed into a validator class, notschema).
A collections.deque containing the path to the
offending element within the instance. The deque can be empty if
the error happened at the root of the instance.
A collections.deque containing the path to the
offending element within the instance. The absolute path
is always relative to the original instance that was
validated (i.e. the one passed into a validation method, notinstance). The deque can be empty if the error happened
at the root of the instance.
The instance that was being validated. This will differ from
the instance originally passed into validate if the
validator object was in the process of validating a (possibly
nested) element within the top-level instance. The path within
the top-level instance (i.e. ValidationError.path) could
be used to find this object, but it is provided for convenience.
If the error was caused by errors in subschemas, the list of errors
from the subschemas will be available on this property. The
schema_path and path of these errors will be relative
to the parent error.
If the error was caused by a non-validation error, the
exception object will be here. Currently this is only used
for the exception raised by a failed format checker in
jsonschema.FormatChecker.check.
The error messages in this situation are not very helpful on their own.
forerrorinerrors:print(error.message)
outputs:
{} is not valid under any of the given schemas
3 is not valid under any of the given schemas
'foo' is not valid under any of the given schemas
If we look at ValidationError.path on each of the errors, we can find
out which elements in the instance correspond to each of the errors. In
this example, ValidationError.path will have only one element, which
will be the index in our list.
forerrorinerrors:print(list(error.path))
[0]
[1]
[2]
Since our schema contained nested subschemas, it can be helpful to look at
the specific part of the instance and subschema that caused each of the errors.
This can be seen with the ValidationError.instance and
ValidationError.schema attributes.
With keywords like anyOf, the ValidationError.context
attribute can be used to see the sub-errors which caused the failure. Since
these errors actually came from two separate subschemas, it can be helpful to
look at the ValidationError.schema_path attribute as well to see where
exactly in the schema each of these errors come from. In the case of sub-errors
from the ValidationError.context attribute, this path will be relative
to the ValidationError.schema_path of the parent error.
[0, 'type'], {} is not of type 'string'
[1, 'type'], {} is not of type 'integer'
[0, 'type'], 3 is not of type 'string'
[1, 'minimum'], 3 is less than the minimum of 5
[0, 'maxLength'], 'foo' is too long
[1, 'type'], 'foo' is not of type 'integer'
The string representation of an error combines some of these attributes for
easier debugging.
print(errors[1])
3 is not valid under any of the given schemas
Failed validating 'anyOf' in schema['items']:
{'anyOf': [{'maxLength': 2, 'type': 'string'},
{'minimum': 5, 'type': 'integer'}]}
On instance[1]:
3
If you want to programmatically query which validation keywords
failed when validating a given instance, you may want to do so using
jsonschema.exceptions.ErrorTree objects.
Retrieve the child tree one level down at the given index.
If the index is not in the instance that this tree corresponds
to and is not known by this tree, whatever error would be raised
by instance.__getitem__ will be propagated (usually this is
some subclass of LookupError.
ErrorTrees support a number of useful operations. The first one we
might want to perform is to check whether a given element in our instance
failed validation. We do so using the in operator:
>>> 0intreeTrue>>> 1intreeFalse
The interpretation here is that the 0th index into the instance ("spam")
did have an error (in fact it had 2), while the 1th index (2) did not (i.e.
it was valid).
If we want to see which errors a child had, we index into the tree and look at
the ErrorTree.errors attribute.
>>> sorted(tree[0].errors)['enum', 'type']
Here we see that the enum and type keywords failed for
index 0. In fact ErrorTree.errors is a dict, whose values are the
ValidationErrors, so we can get at those directly if we want them.
>>> print(tree[0].errors["type"].message)'spam' is not of type 'number'
Of course this means that if we want to know if a given validation
keyword failed for a given index, we check for its presence in
ErrorTree.errors:
Finally, if you were paying close enough attention, you’ll notice that
we haven’t seen our minItems error appear anywhere yet. This is
because minItems is an error that applies globally to the instance
itself. So it appears in the root node of the tree.
>>> "minItems"intree.errorsTrue
That’s all you need to know to use error trees.
To summarize, each tree contains child trees that can be accessed by
indexing the tree to get the corresponding child tree for a given
index into the instance. Each tree and child has a ErrorTree.errors
attribute, a dict, that maps the failed validation keyword to the
corresponding validation error.
The best_match function is a simple but useful function for attempting
to guess the most relevant error in a given bunch.
>>> fromjsonschemaimportDraft202012Validator>>> fromjsonschema.exceptionsimportbest_match>>> schema={... "type":"array",... "minItems":3,... }>>> print(best_match(Draft202012Validator(schema).iter_errors(11)).message)11 is not of type 'array'
Try to find an error that appears to be the best match among given errors.
In general, errors that are higher up in the instance (i.e. for which
ValidationError.path is shorter) are considered better matches,
since they indicate “more” is wrong with the instance.
If the resulting match is either oneOf or anyOf, the
opposite assumption is made – i.e. the deepest error is picked,
since these keywords only need to match once, and any other errors
may not be relevant.
Parameters:
errors (collections.abc.Iterable) – the errors to select from. Do not provide a mixture of
errors from different validation attempts (i.e. from
different instances or schemas), since it won’t produce
sensical output.
key (collections.abc.Callable) – the key to use when sorting errors. See relevance and
transitively by_relevance for more details (the default is
to sort with the defaults of that function). Changing the
default is only useful if you want to change the function
that rates errors but still want the error context descent
done by this function.
Returns:
the best matching error, or None if the iterable was empty
Note
This function is a heuristic. Its return value may change for a given
set of inputs from version to version if better heuristics are added.
jsonschema.exceptions.relevance(validation_error)
A key function that sorts errors based on heuristic relevance.
If you want to sort a bunch of errors entirely, you can use
this function to do so. Using this function as a key to e.g.
sorted or max will cause more relevant errors to be
considered greater than less relevant ones.
Within the different validation keywords that can fail, this
function considers anyOf and oneOf to be weak
validation errors, and will sort them lower than other errors at the
same level in the instance.
If you want to change the set of weak [or strong] validation
keywords you can create a custom version of this function with
by_relevance and provide a different set of each.
Create a key function that can be used to sort errors by relevance.
Parameters:
weak (set) – a collection of validation keywords to consider to be
“weak”. If there are two errors at the same level of the
instance and one is in the set of weak validation keywords,
the other error will take priority. By default, anyOf
and oneOf are considered weak keywords and will be
superseded by other same-level validation errors.
strong (set) – a collection of validation keywords to consider to be
“strong”
The JSON Schema $ref and $dynamicRef keywords allow schema authors to combine multiple schemas (or subschemas) together for reuse or deduplication.
The referencing library was written in order to provide a simple, well-behaved and well-tested implementation of this kind of reference resolution [1].
It has its owndocumentationwhichisworthreviewing, but this page serves as an introduction which is tailored specifically to JSON Schema, and even more specifically to how to configure referencing for use with Validator objects in order to customize the behavior of the $ref keyword and friends in your schemas.
Configuring jsonschema for custom referencing behavior is essentially a two step process:
As a concrete example, the simple schema {"type":"integer"} may be interpreted as a schema under either Draft 2020-12 or Draft 4 of the JSON Schema specification (amongst others); in draft 2020-12, the float 2.0 must be considered an integer, whereas in draft 4, it potentially is not.
If you mean the former (i.e. to associate this schema with draft 2020-12), you’d use referencing.Resource(contents={"type":"integer"},specification=referencing.jsonschema.DRAFT202012), whereas for the latter you’d use referencing.jsonschema.DRAFT4.
Which should generally be used to remove all ambiguity and identify internally to the schema what version it is written for.
A schema may be identified via one or more URIs, either because they contain an $id keyword (in suitable versions of the JSON Schema specification) which indicates their canonical URI, or simply because you wish to externally associate a URI with the schema, regardless of whether it contains an $id keyword.
You could add the aforementioned simple schema to a referencing.Registry by creating an empty registry and then identifying it via some URI:
referencing.Registry is an entirely immutable object.
All of its methods which add schemas (resources) to itself return new registry objects containing the added schemas.
You could also confirm your schema is in the registry if you’d like, via referencing.Registry.contents, which will show you the contents of a resource at a given URI:
The most common scenario one is likely to encounter is the desire to include a small number of additional in-memory schemas, making them available for use during validation.
For instance, imagine the below schema for non-negative integers:
We may wish to have other schemas we write be able to make use of this schema, and refer to it as http://example.com/nonneg-int-schema and/or as urn:nonneg-integer-schema.
To do so we make use of APIs from the referencing library to create a referencing.Registry which maps the URIs above to this schema:
What’s above is likely mostly self-explanatory, other than the presence of the referencing.Resource.from_contents function.
Its purpose is to convert a piece of “opaque” JSON (or really a Python dict containing deserialized JSON) into an object which indicates what version of JSON Schema the schema is meant to be interpreted under.
Calling it will inspect a $schema keyword present in the given schema and use that to associate the JSON with an appropriate specification.
If your schemas do not contain $schema dialect identifiers, and you intend for them to be interpreted always under a specific dialect – say Draft 2020-12 of JSON Schema – you may instead use e.g.:
You can now pass this registry to your Validator, which allows a schema passed to it to make use of the aforementioned URIs to refer to our non-negative integer schema.
Here for instance is an example which validates that instances are JSON objects with non-negative integral values:
fromjsonschemaimportDraft202012Validatorvalidator=Draft202012Validator({"type":"object","additionalProperties":{"$ref":"urn:nonneg-integer-schema"},},registry=registry,# the critical argument, our registry from above)validator.validate({"foo":37})validator.validate({"foo":-37})# Uh oh!
Another common request from schema authors is to be able to map URIs to the file system, perhaps while developing a set of schemas in different local files.
If you have a set of fixed or static schemas in a few files, you still likely will want to follow the above in-memory instructions, and simply load all of your files by reading them in-memory from your program.
If however you wish to dynamically read files off of the file system, perhaps because they may change during the lifetime of your process, then the referencing library supports doing so fully dynamically by configuring a callable which can be used to retrieve any schema which is not already pre-loaded in-memory.
Here we resolve any schema beginning with http://localhost to a directory /tmp/schemas on the local filesystem (note of course that this will not work if run directly unless you have populated that directory with some schemas):
Such a registry can then be used with Validator objects in the same way shown above, and any such references to URIs which are not already in-memory will be retrieved from the configured directory.
We can mix the two examples above if we wish for some in-memory schemas to be available in addition to the filesystem schemas, e.g.:
Generalizing slightly, the retrieval function provided need not even assume that it is retrieving JSON.
As long as you deserialize what you have retrieved into Python objects, you may equally be retrieving references to YAML documents or any other format.
Here for instance we retrieve YAML documents in a way similar to the above using PyYAML:
JSON Schema is defined specifically for JSON, and has well-defined behavior strictly for Python objects which could have possibly existed as JSON.
If you stick to the subset of YAML for which this is the case then you shouldn’t have issue, but if you pass schemas (or instances) around whose structure could never have possibly existed as JSON (e.g. a mapping whose keys are not strings), all bets are off.
One could similarly imagine a retrieval function which switches on whether to call yaml.safe_load or json.loads by file extension (or some more reliable mechanism) and thereby support retrieving references of various different file formats.
In the general case, the JSON Schema specifications tend to discourage implementations (like this one) from automatically retrieving references over the network, or even assuming such a thing is feasible (as schemas may be identified by URIs which are strictly identifiers, and not necessarily downloadable from the URI even when such a thing is sensical).
However, if you as a schema author are in a situation where you indeed do wish to do so for convenience (and understand the implications of doing so), you may do so by making use of the retrieve argument to referencing.Registry.
Here is how one would configure a registry to automatically retrieve schemas from the JSON Schema Store on the fly using the httpx:
Given such a registry, we can now, for instance, validate instances against schemas from the schema store by passing the registry we configured to our Validator as in previous examples:
Retrieving resources from a SQLite database or some other network-accessible resource should be more or less similar, replacing the HTTP client with one for your database of course.
Warning
Be sure you understand the security implications of the reference resolution you configure.
And if you accept untrusted schemas, doubly sure!
You wouldn’t want a user causing your machine to go off and retrieve giant files off the network by passing it a $ref to some huge blob, or exploiting similar vulnerabilities in your setup.
Older versions of jsonschema used a different object – _RefResolver – for reference resolution, which you a schema author may already be configuring for your own use.
If you are not already constructing your own _RefResolver, this change should be transparent to you (or even recognizably improved, as the point of the migration was to improve the quality of the referencing implementation and enable some new functionality).
Whilst _RefResolverdid automatically retrieve remote references (against the recommendation of the spec, and in a way which therefore could lead to questionable security concerns when combined with untrusted schemas), referencing.Registry does not do so.
If you rely on this behavior, you should follow the above example of retrieving resources over HTTP.
meta_schema – the meta schema for the new validator class
validators –
a mapping from names to callables, where each callable will
validate the schema property with the given name.
Each callable should take 4 arguments:
a validator instance,
the value of the property being validated within the
instance
the instance
the schema
version – an identifier for the version that this validator class will
validate. If provided, the returned validator class will
have its __name__ set to include the version, and also
will have jsonschema.validators.validates automatically
called for the given version.
type_checker –
a type checker, used when applying the type keyword.
If unprovided, a jsonschema.TypeChecker will be created
with a set of default types typical of JSON Schema drafts.
format_checker –
a format checker, used when applying the format keyword.
If unprovided, a jsonschema.FormatChecker will be created
with a set of default formats typical of JSON Schema drafts.
id_of – A function that given a schema, returns its ID.
applicable_validators – A function that, given a schema, returns the list of
applicable schema keywords and associated values
which will be used to validate the instance.
This is mostly used to support pre-draft 7 versions of JSON Schema
which specified behavior around ignoring keywords if they were
siblings of a $ref keyword. If you’re not attempting to
implement similar behavior, you can typically ignore this argument
and leave it at its default.
a mapping of new validator callables to extend with, whose
structure is as in create.
Note
Any validator callables with the same name as an
existing one will (silently) replace the old validator
callable entirely, effectively overriding any validation
done in the “parent” validator class.
If you wish to instead extend the behavior of a parent’s
validator callable, delegate and call it directly in
the new validator function by retrieving it using
OldValidator.VALIDATORS["validation_keyword_name"].
version (str) – a version for the new validator class
The new validator class will have its parent’s meta schema.
If you wish to change or extend the meta schema in the new
validator class, modify META_SCHEMA directly on the returned
class. Note that no implicit copying is done, so a copy should
likely be made before modifying it, in order to not affect the
old validator.
Any validating function that validates against a subschema should call
descend, rather than iter_errors. If it recurses into the
instance, or schema, it should pass one or both of the path or
schema_path arguments to descend in order to properly maintain
where in the instance or schema respectively the error occurred.
My schema specifies format validation. Why do invalid instances seem valid?#
The format keyword can be a bit of a stumbling block for new
users working with JSON Schema.
In a schema such as:
{"type":"string","format":"date"}
JSON Schema specifications have historically differentiated between the
format keyword and other keywords. In particular, the
format keyword was specified to be informational as much
as it may be used for validation.
In other words, for many use cases, schema authors may wish to use
values for the format keyword but have no expectation
they be validated alongside other required assertions in a schema.
Of course this does not represent all or even most use cases – many
schema authors do wish to assert that instances conform fully, even to
the specific format mentioned.
In drafts prior to draft2019-09, the decision on whether to
automatically enable format validation was left up to
validation implementations such as this one.
This library made the choice to leave it off by default, for two reasons:
for forward compatibility and implementation complexity reasons
– if format validation were on by default, and a
future draft of JSON Schema introduced a hard-to-implement format,
either the implementation of that format would block releases of
this library until it were implemented, or the behavior surrounding
format would need to be even more complex than simply
defaulting to be on. It therefore was safer to start with it off,
and defend against the expectation that a given format would always
automatically work.
given that a common use of JSON Schema is for portability across
languages (and therefore implementations of JSON Schema), so that
users be aware of this point itself regarding format
validation, and therefore remember to check any other
implementations they were using to ensure they too were explicitly
enabled for format validation.
As of draft2019-09 however, the opt-out by default behavior mentioned here is now required for all implementations of JSON Schema.
Difficult as this may sound for new users, at this point it at least means they should expect the same behavior that has always been implemented here, across any other implementation they encounter.
Can jsonschema be used to validate YAML, TOML, etc.?#
Like most JSON Schema implementations, jsonschema doesn’t actually deal directly with JSON at all (other than in relation to the $ref keyword, elaborated on below).
In other words as far as this library is concerned, schemas and instances are simply runtime Python objects.
The JSON object {} is simply the Python dict{}, and a JSON Schema like {"type":"object",{"properties":{}}} is really an assertion about particular Python objects and their keys.
Specifically, in the case where jsonschema is asked to resolve a remote reference, it has no choice but to assume that the remote reference is serialized as JSON, and to deserialize it using the json module.
One cannot today therefore reference some remote piece of YAML and have it deserialized into Python objects by this library without doing some additional work.
See Resolving References to Schemas Written in YAML for details.
In practice what this means for JSON-like formats like YAML and TOML is that indeed one can generally schematize and then validate them exactly as if they were JSON by simply first deserializing them using libraries like PyYAML or the like, and passing the resulting Python objects into functions within this library.
Beware however that there are cases where the behavior of the JSON Schema specification itself is only well-defined within the data model of JSON itself, and therefore only for Python objects that could have “in theory” come from JSON.
As an example, JSON supports only string-valued keys, whereas YAML supports additional types.
The JSON Schema specification does not deal with how to apply the patternProperties keyword to non-string properties.
The behavior of this library is therefore similarly not defined when presented with Python objects of this form, which could never have come from JSON.
In such cases one is recommended to first pre-process the data such that the resulting behavior is well-defined.
In the previous example, if the desired behavior is to transparently coerce numeric properties to strings, as Javascript might, then do the conversion explicitly before passing data to this library.
Why doesn’t my schema’s default property set the default on my instance?#
The basic answer is that the specification does not require that
default actually do anything.
For an inkling as to why it doesn’t actually do anything, consider
that none of the other keywords modify the instance either. More
importantly, having default modify the instance can produce
quite peculiar things. It’s perfectly valid (and perhaps even useful)
to have a default that is not valid under the schema it lives in! So an
instance modified by the default would pass validation the first time,
but fail the second!
Still, filling in defaults is a thing that is useful. jsonschema
allows you to define your own validator classes and callables, so you can easily create an jsonschema.protocols.Validator
that does do default setting. Here’s some code to get you started. (In
this code, we add the default properties to each object before the
properties are validated, so the default values themselves will need to
be valid under the schema.)
fromjsonschemaimportDraft202012Validator,validatorsdefextend_with_default(validator_class):validate_properties=validator_class.VALIDATORS["properties"]defset_defaults(validator,properties,instance,schema):forproperty,subschemainproperties.items():if"default"insubschema:instance.setdefault(property,subschema["default"])forerrorinvalidate_properties(validator,properties,instance,schema,):yielderrorreturnvalidators.extend(validator_class,{"properties":set_defaults},)DefaultValidatingValidator=extend_with_default(Draft202012Validator)# Example usage:obj={}schema={'properties':{'foo':{'default':'bar'}}}# Note jsonschema.validate(obj, schema, cls=DefaultValidatingValidator)# will not work because the metaschema contains `default` keywords.DefaultValidatingValidator(schema).validate(obj)assertobj=={'foo':'bar'}
See the above-linked document for more info on how this works,
but basically, it just extends the properties keyword on a
jsonschema.validators.Draft202012Validator to then go ahead and update
all the defaults.
Note
If you’re interested in a more interesting solution to a larger
class of these types of transformations, keep an eye on Seep, which is an experimental
data transformation and extraction library written on top of
jsonschema.
Hint
The above code can provide default values for an entire object and
all of its properties, but only if your schema provides a default
value for the object itself, like so:
schema={"type":"object","properties":{"outer-object":{"type":"object","properties":{"inner-object":{"type":"string","default":"INNER-DEFAULT"}},"default":{}# <-- MUST PROVIDE DEFAULT OBJECT}}}obj={}DefaultValidatingValidator(schema).validate(obj)assertobj=={'outer-object':{'inner-object':'INNER-DEFAULT'}}
…but if you don’t provide a default value for your object, then
it won’t be instantiated at all, much less populated with default
properties.
This means broadly that no backwards-incompatible changes should be made
in minor releases (and certainly not in dot releases).
The full picture requires defining what constitutes a
backwards-incompatible change.
The following are simple examples of things considered public API,
and therefore should not be changed without bumping a major version
number:
module names and contents, when not marked private by Python
convention (a single leading underscore)
function and object signature (parameter order and name)
The following are not considered public API and may change without
notice:
the exact wording and contents of error messages; typical reasons
to rely on this seem to involve downstream tests in packages using
jsonschema. These use cases are encouraged to use the extensive
introspection provided in jsonschema.exceptions.ValidationErrors
instead to make meaningful assertions about what failed rather than
relying on how what failed is explained to a human.
the order in which validation errors are returned or raised
the contents of the jsonschema.tests package
the contents of the jsonschema.benchmarks package
the specific non-zero error codes presented by the command line
interface
the exact representation of errors presented by the command line
interface, other than that errors represented by the plain outputter
will be reported one per line
anything marked private
With the exception of the last two of those, flippant changes are
avoided, but changes can and will be made if there is improvement to be
had. Feel free to open an issue ticket if there is a specific issue or
question worth raising.
If called directly, does not check the store first, but after
retrieving the document at the specified URI it will be saved in
the store if cache_remote is True.
Note
If the requests library is present, jsonschema will use it to
request the remote uri, so that the correct encoding is
detected and used.
If it isn’t, or if the scheme of the uri is not http or
https, UTF-8 is assumed.
meta_schema – the meta schema for the new validator class
validators –
a mapping from names to callables, where each callable will
validate the schema property with the given name.
Each callable should take 4 arguments:
a validator instance,
the value of the property being validated within the
instance
the instance
the schema
version – an identifier for the version that this validator class will
validate. If provided, the returned validator class will
have its __name__ set to include the version, and also
will have jsonschema.validators.validates automatically
called for the given version.
type_checker –
a type checker, used when applying the type keyword.
If unprovided, a jsonschema.TypeChecker will be created
with a set of default types typical of JSON Schema drafts.
format_checker –
a format checker, used when applying the format keyword.
If unprovided, a jsonschema.FormatChecker will be created
with a set of default formats typical of JSON Schema drafts.
id_of – A function that given a schema, returns its ID.
applicable_validators – A function that, given a schema, returns the list of
applicable schema keywords and associated values
which will be used to validate the instance.
This is mostly used to support pre-draft 7 versions of JSON Schema
which specified behavior around ignoring keywords if they were
siblings of a $ref keyword. If you’re not attempting to
implement similar behavior, you can typically ignore this argument
and leave it at its default.
a mapping of new validator callables to extend with, whose
structure is as in create.
Note
Any validator callables with the same name as an
existing one will (silently) replace the old validator
callable entirely, effectively overriding any validation
done in the “parent” validator class.
If you wish to instead extend the behavior of a parent’s
validator callable, delegate and call it directly in
the new validator function by retrieving it using
OldValidator.VALIDATORS["validation_keyword_name"].
version (str) – a version for the new validator class
The new validator class will have its parent’s meta schema.
If you wish to change or extend the meta schema in the new
validator class, modify META_SCHEMA directly on the returned
class. Note that no implicit copying is done, so a copy should
likely be made before modifying it, in order to not affect the
old validator.
>>> validate([2,3,4],{"maxItems":2})Traceback (most recent call last):...ValidationError: [2, 3, 4] is too long
validate() will first verify that the
provided schema is itself valid, since not doing so can lead to less
obvious error messages and fail in less obvious or consistent ways.
If you know you have a valid schema already, especially
if you intend to validate multiple instances with
the same schema, you likely would prefer using the
jsonschema.protocols.Validator.validate method directly on a
specific validator (e.g. Draft202012Validator.validate).
If the cls argument is not provided, two things will happen
in accordance with the specification. First, if the schema has a
$schema keyword containing a known meta-schema [1] then the
proper validator will be used. The specification recommends that
all schemas contain $schema properties for this reason. If no
$schema property is found, the default validator class is the
latest released draft.
Any other provided positional and keyword arguments will be passed
on when instantiating the cls.
Try to find an error that appears to be the best match among given errors.
In general, errors that are higher up in the instance (i.e. for which
ValidationError.path is shorter) are considered better matches,
since they indicate “more” is wrong with the instance.
If the resulting match is either oneOf or anyOf, the
opposite assumption is made – i.e. the deepest error is picked,
since these keywords only need to match once, and any other errors
may not be relevant.
Parameters:
errors (collections.abc.Iterable) – the errors to select from. Do not provide a mixture of
errors from different validation attempts (i.e. from
different instances or schemas), since it won’t produce
sensical output.
key (collections.abc.Callable) – the key to use when sorting errors. See relevance and
transitively by_relevance for more details (the default is
to sort with the defaults of that function). Changing the
default is only useful if you want to change the function
that rates errors but still want the error context descent
done by this function.
Returns:
the best matching error, or None if the iterable was empty
Note
This function is a heuristic. Its return value may change for a given
set of inputs from version to version if better heuristics are added.
Create a key function that can be used to sort errors by relevance.
Parameters:
weak (set) – a collection of validation keywords to consider to be
“weak”. If there are two errors at the same level of the
instance and one is in the set of weak validation keywords,
the other error will take priority. By default, anyOf
and oneOf are considered weak keywords and will be
superseded by other same-level validation errors.
strong (set) – a collection of validation keywords to consider to be
“strong”
The protocol to which all validator classes adhere.
Parameters:
schema – The schema that the validator object will validate with.
It is assumed to be valid, and providing
an invalid schema can lead to undefined behavior. See
Validator.check_schema to validate a schema first.
registry – a schema registry that will be used for looking up JSON references
resolver –
a resolver that will be used to resolve $ref
properties (JSON references). If unprovided, one will be created.
Deprecated since version v4.18.0: RefResolver has been deprecated in favor of
referencing, and with it, this argument.
format_checker – if provided, a checker which will be used to assert about
format properties present in the schema. If unprovided,
no format validation is done, and the presence of format
within schemas is strictly informational. Certain formats
require additional packages to be installed in order to assert
against instances. Ensure you’ve installed jsonschema with
its extra (optional) dependencies when
invoking pip.
Deprecated since version v4.12.0: Subclassing validator classes now explicitly warns this is not part of
their public API.
The returned object satisfies the validator protocol, but may not
be of the same concrete class! In particular this occurs
when a $ref occurs to a schema with a different
$schema than this one (i.e. for a different draft).
Lazily yield each of the validation errors in the given instance.
>>> schema={... "type":"array",... "items":{"enum":[1,2,3]},... "maxItems":2,... }>>> v=Draft202012Validator(schema)>>> forerrorinsorted(v.iter_errors([2,3,4]),key=str):... print(error.message)4 is not one of [1, 2, 3][2, 3, 4] is too long
Deprecated since version v4.0.0: Calling this function with a second schema argument is deprecated.
Use Validator.evolve instead.
The main functionality is provided by the validator classes for each of the
supported JSON Schema versions.
Most commonly, jsonschema.validators.validate is the quickest way to simply
validate a given instance under a schema, and will create a validator
for you.
JSON Schema does not mandate that the format property actually do any
validation. If validation is desired however, instances of this class can
be hooked into validators to enable format validation.
FormatChecker objects always return True when asked about
formats that they do not know how to validate.
Produce a new checker with the given type redefined.
Parameters:
type – The name of the type to check.
fn (collections.abc.Callable) – A callable taking exactly two parameters - the type
checker calling the function and the instance to check.
The function should return true if instance is of this
type and false otherwise.
>>> validate([2,3,4],{"maxItems":2})Traceback (most recent call last):...ValidationError: [2, 3, 4] is too long
validate() will first verify that the
provided schema is itself valid, since not doing so can lead to less
obvious error messages and fail in less obvious or consistent ways.
If you know you have a valid schema already, especially
if you intend to validate multiple instances with
the same schema, you likely would prefer using the
jsonschema.protocols.Validator.validate method directly on a
specific validator (e.g. Draft202012Validator.validate).
If the cls argument is not provided, two things will happen
in accordance with the specification. First, if the schema has a
$schema keyword containing a known meta-schema [1] then the
proper validator will be used. The specification recommends that
all schemas contain $schema properties for this reason. If no
$schema property is found, the default validator class is the
latest released draft.
Any other provided positional and keyword arguments will be passed
on when instantiating the cls.