pycassa.columnfamily
– Column Family¶
Provides an abstraction of Cassandra’s data model to allow for easy manipulation of data inside Cassandra.
See also
-
columnfamily.
gm_timestamp
()¶ Returns the number of microseconds since the Unix Epoch.
-
class
pycassa.columnfamily.
ColumnFamily
(pool, column_family)¶ An abstraction of a Cassandra column family or super column family. Operations on this, such as
get()
orinsert()
will get data from or insert data into the corresponding Cassandra column family.pool is a
ConnectionPool
that the column family will use for all operations. A connection is drawn from the pool before each operations and is returned afterwards.column_family should be the name of the column family that you want to use in Cassandra. Note that the keyspace to be used is determined by the pool.
-
read_consistency_level
= 1¶ The default consistency level for every read operation, such as
get()
orget_range()
. This may be overridden per-operation. This should be an instance ofConsistencyLevel
. The default level isONE
.
-
write_consistency_level
= 1¶ The default consistency level for every write operation, such as
insert()
orremove()
. This may be overridden per-operation. This should be an instance ofConsistencyLevel
. The default level isONE
.
-
autopack_names
= True¶ Controls whether column names are automatically converted to or from their natural type to the binary string format that Cassandra uses. The data type used is controlled by
column_name_class
for column names andsuper_column_name_class
for super column names. By default, this isTrue
.
-
autopack_values
= True¶ Whether column values are automatically converted to or from their natural type to the binary string format that Cassandra uses. The data type used is controlled by
default_validation_class
andcolumn_validators
. By default, this isTrue
.
-
autopack_keys
= True¶ Whether row keys are automatically converted to or from their natural type to the binary string format that Cassandra uses. The data type used is controlled by
key_validation_class
. By default, this isTrue
.
-
column_name_class
¶ The data type of column names, which pycassa will use to determine how to pack and unpack them.
This is set automatically by inspecting the column family’s
comparator_type
, but it may also be set manually if you want autopacking behavior without setting acomparator_type
. Options include an instance of any class inpycassa.types
, such asLongType()
.
-
super_column_name_class
¶ Like
column_name_class
, but for super column names.
-
default_validation_class
¶ The default data type of column values, which pycassa will use to determine how to pack and unpack them.
This is set automatically by inspecting the column family’s
default_validation_class
, but it may also be set manually if you want autopacking behavior without setting adefault_validation_class
. Options include an instance of any class inpycassa.types
, such asLongType()
.
-
column_validators
¶ Like
default_validation_class
, but is adict
mapping individual columns to types.
-
key_validation_class
¶ The data type of row keys, which pycassa will use to determine how to pack and unpack them.
This is set automatically by inspecting the column family’s
key_validation_class
(which only exists in Cassandra 0.8 or greater), but may be set manually if you want the autopacking behavior without setting akey_validation_class
or if you are using Cassandra 0.7. Options include an instance of any class inpycassa.types
, such asLongType()
.
-
dict_class
= <class 'collections.OrderedDict'>¶ Results are returned as dictionaries. By default, python 2.7’s
collections.OrderedDict
is used if available, otherwiseOrderedDict
is used so that order is maintained. A different class, such asdict
, may be instead by used setting this.
-
buffer_size
= 1024¶ When calling
get_range()
orget_indexed_slices()
, the intermediate results need to be buffered if we are fetching many rows, otherwise performance may suffer and the Cassandra server may overallocate memory and fail. This is the size of that buffer in number of rows. The default is 1024.
-
timestamp
= <unbound method ColumnFamily.gm_timestamp>¶ Each
insert()
orremove()
sends a timestamp with every column. This attribute is a function that is used to get this timestamp when needed. The default function isgm_timestamp()
.
-
load_schema
()¶ Loads the schema definition for this column family from Cassandra and updates comparator and validation classes if neccessary.
-
get
(key[, columns][, column_start][, column_finish][, column_reversed][, column_count][, include_timestamp][, super_column][, read_consistency_level])¶ Fetches all or part of the row with key key.
The columns fetched may be limited to a specified list of column names using columns.
Alternatively, you may fetch a slice of columns or super columns from a row using column_start, column_finish, and column_count. Setting these will cause columns or super columns to be fetched starting with column_start, continuing until column_count columns or super columns have been fetched or column_finish is reached. If column_start is left as the empty string, the slice will begin with the start of the row; leaving column_finish blank will cause the slice to extend to the end of the row. Note that column_count defaults to 100, so rows over this size will not be completely fetched by default.
If column_reversed is
True
, columns are fetched in reverse sorted order, beginning with column_start. In this case, if column_start is the empty string, the slice will begin with the end of the row.You may fetch all or part of only a single super column by setting super_column. If this is set, column_start, column_finish, column_count, and column_reversed will apply to the subcolumns of super_column.
To include every column’s timestamp in the result set, set include_timestamp to
True
. Results will include a(value, timestamp)
tuple for each column.To include every column’s ttl in the result set, set include_ttl to
True
. Results will include a(value, ttl)
tuple for each column.If this is a standard column family, the return type is of the form
{column_name: column_value}
. If this is a super column family and super_column is not specified, the results are of the form{super_column_name: {column_name, column_value}}
. If super_column is set, the super column name will be excluded and the results are of the form{column_name: column_value}
.
-
multiget
(keys[, columns][, column_start][, column_finish][, column_reversed][, column_count][, include_timestamp][, super_column][, read_consistency_level][, buffer_size])¶ Fetch multiple rows from a Cassandra server.
keys should be a list of keys to fetch.
buffer_size is the number of rows from the total list to fetch at a time. If left as
None
, the ColumnFamily’sbuffer_size
will be used.All other parameters are the same as
get()
, except that a list of keys may be passed in.Results will be returned in the form:
{key: {column_name: column_value}}
. If an OrderedDict is used, the rows will have the same order as keys.
-
xget
(key[, column_start][, column_finish][, column_reversed][, column_count][, include_timestamp][, read_consistency_level][, buffer_size])¶ Like
get()
, but creates a generator that pages over the columns automatically.The number of columns fetched at once can be controlled with the buffer_size parameter. The default is
column_buffer_size
.The generator returns (name, value) tuples.
-
get_count
(key[, super_column][, columns][, column_start][, column_finish][, super_column][, read_consistency_level][, column_reversed][, max_count])¶ Count the number of columns in the row with key key.
You may limit the columns or super columns counted to those in columns. Additionally, you may limit the columns or super columns counted to only those between column_start and column_finish.
You may also count only the number of subcolumns in a single super column using super_column. If this is set, columns, column_start, and column_finish only apply to the subcolumns of super_column.
To put an upper bound on the number of columns that are counted, set max_count.
-
multiget_count
(key[, super_column][, columns][, column_start][, column_finish][, super_column][, read_consistency_level][, buffer_size][, column_reversed][, max_count])¶ Perform a column count in parallel on a set of rows.
The parameters are the same as for
multiget()
, except that a list of keys may be used. A dictionary of the form{key: int}
is returned.buffer_size is the number of rows from the total list to count at a time. If left as
None
, the ColumnFamily’sbuffer_size
will be used.To put an upper bound on the number of columns that are counted, set max_count.
-
get_range
([start][, finish][, columns][, column_start][, column_finish][, column_reversed][, column_count][, row_count][, include_timestamp][, super_column][, read_consistency_level][, buffer_size][, filter_empty])¶ Get an iterator over rows in a specified key range.
The key range begins with start and ends with finish. If left as empty strings, these extend to the beginning and end, respectively. Note that if RandomPartitioner is used, rows are stored in the order of the MD5 hash of their keys, so getting a lexicographical range of keys is not feasible.
In place of start and finish, you may use start_token and finish_token or a combination of start and finish_token. In this case, you are specifying a token range to fetch instead of a key range. This can be useful for fetching all data owned by a node or for parallelizing a full data set scan. Otherwise, you should typically just use start and finish. When using RandomPartitioner or Murmur3Partitioner, start_token and finish_token should be string versions of the numeric tokens; for ByteOrderedPartitioner, they should be hex-encoded string versions of the token.
The row_count parameter limits the total number of rows that may be returned. If left as
None
, the number of rows that may be returned is unlimited (this is the default).When calling get_range(), the intermediate results need to be buffered if we are fetching many rows, otherwise the Cassandra server will overallocate memory and fail. buffer_size is the size of that buffer in number of rows. If left as
None
, the ColumnFamily’sbuffer_size
attribute will be used.When filter_empty is left as
True
, empty rows (including range ghosts) will be skipped and will not count towards row_count.All other parameters are the same as those of
get()
.A generator over
(key, {column_name: column_value})
is returned. To convert this to a list, uselist()
on the result.
-
get_indexed_slices
(index_clause[, columns][, column_start][, column_finish][, column_reversed][, column_count][, include_timestamp][, read_consistency_level][, buffer_size])¶ Similar to
get_range()
, but anIndexClause
is used instead of a key range.index_clause limits the keys that are returned based on expressions that compare the value of a column to a given value. At least one of the expressions in the
IndexClause
must be on an indexed column.Note that Cassandra does not support secondary indexes or get_indexed_slices() for super column families.
See also
-
insert
(key, columns[, timestamp][, ttl][, write_consistency_level])¶ Insert or update columns in the row with key key.
columns should be a dictionary of columns or super columns to insert or update. If this is a standard column family, columns should look like
{column_name: column_value}
. If this is a super column family, columns should look like{super_column_name: {sub_column_name: value}}
. If this is a counter column family, you may use integers as values and those will be used as counter adjustments.A timestamp may be supplied for all inserted columns with timestamp.
ttl sets the “time to live” in number of seconds for the inserted columns. After this many seconds, Cassandra will mark the columns as deleted.
The timestamp Cassandra reports as being used for insert is returned.
-
batch_insert
(rows[, timestamp][, ttl][, write_consistency_level])¶ Like
insert()
, but multiple rows may be inserted at once.The rows parameter should be of the form
{key: {column_name: column_value}}
if this is a standard column family or{key: {super_column_name: {column_name: column_value}}}
if this is a super column family.
-
add
(key, column[, value][, super_column][, write_consistency_level])¶ Increment or decrement a counter.
value should be an integer, either positive or negative, to be added to a counter column. By default, value is 1.
New in version 1.1.0: Available in Cassandra 0.8.0 and later.
-
remove
(key[, columns][, super_column][, write_consistency_level])¶ Remove a specified row or a set of columns within the row with key key.
A set of columns or super columns to delete may be specified using columns.
A single super column may be deleted by setting super_column. If super_column is specified, columns will apply to the subcolumns of super_column.
If columns and super_column are both
None
, the entire row is removed.The timestamp used for the mutation is returned.
-
remove_counter
(key, column[, super_column][, write_consistency_level])¶ Remove a counter at the specified location.
Note that counters have limited support for deletes: if you remove a counter, you must wait to issue any following update until the delete has reached all the nodes and all of them have been fully compacted.
New in version 1.1.0: Available in Cassandra 0.8.0 and later.
-
truncate
()¶ Marks the entire ColumnFamily as deleted.
From the user’s perspective, a successful call to
truncate
will result complete data deletion from this column family. Internally, however, disk space will not be immediately released, as with all deletes in Cassandra, this one only marks the data as deleted.The operation succeeds only if all hosts in the cluster at available and will throw an
UnavailableException
if some hosts are down.
-