Wednesday, March 14, 2012

Cassandra PHPCassa & Composite Types

This post is updated inorder to support phpcassa 1.0.a.1

Cassandra Composite Type using PHPCassa

phpcassa 1.0.a.1 uses namespaces in PHP which is supported in PHP 5 >= 5.3.0
Make sure you have the relavant package.
The script mentioned below is the copy of PHPCassa Composite Example

I will explain it step by step

(1) Creating Keyspace using PHPCassa
        Name => "Keyspace1"
        Replication Factor => 1
        Placement Strategy => Simple Strategy
(2) Creating Column Family with Composite Keys using PHPCassa
        Name => "Composites"
        Column Comparator => CompositeType of LongType, AsciiType (Ex: 1:example)
        Row Key Validation => CompositeType of AsciiType, LongType (Ex: example:1)
        Sample Row:
                'example':1 => { 1:'columnName': "value", 1:'d' => "Hai", 2:'b' => "Fine", 112:'a' => "Sorry" }
        Columns are sorted Based on Component types as shown above
        112 > 2 as LongType but "112" < "2" as Ascii         Cassandra Properly honors the type mentioned on column family definition         I have used '' to denote ascii. Ignore them as values

use phpcassa\Connection\ConnectionPool;
use phpcassa\ColumnFamily;
use phpcassa\ColumnSlice;
use phpcassa\SystemManager;
use phpcassa\Schema\StrategyClass;

// Create a new keyspace and column family
$sys = new SystemManager('');
$sys->create_keyspace('Keyspace1', array( // (1)
    "strategy_class" => StrategyClass::SIMPLE_STRATEGY,
    "strategy_options" => array('replication_factor' => '1')

// Use composites for column names and row keys
$sys->create_column_family('Keyspace1', 'Composites', array( //(2)
    "comparator_type" => "CompositeType(LongType, AsciiType)",
    "key_validation_class" => "CompositeType(AsciiType, LongType)"

Start a connection pool, create an instance of Composites ColumnFamily
$pool = new ConnectionPool('Keyspace1', array(''));
$cf = new ColumnFamily($pool, 'Composites');
Specifying Row Keys and Column Keys
Both our row key [key_validation_class] and column key [comparator] are composite types.
That means our key has components in them and types of each component might differ
So, we can't specify the keys as a single entity. They might violate the data types that cassandra cluster expects
For ex: in our case of row keys: Component 1 is Ascii & Component 2 is Long
When a write or read request is sent to cassandra, the type property should be properly maintained
Specifying "key:1" won't work and would result in an cassandra exception

Hence we maintain components of key as a php array and specify insert_format & return_format as an array.
Ex: $key1 = array("key", 1); //Ascii, Long
Other available formats for insert and return are
  • DICTIONARY // Here, array keys correspond to row keys. So, we can't use this as our keys have components
  • OBJECT // This is almost same that thrift returns
Whereas for columns, each column corresponds to a value. Hence it will be array ( array ( components ) , value )
Here the array inside an array is required because php associative arrays don't support anything other than string keys.
As we need to preserve type. We can't specify "columnKey"=>value anymore.
Hence we map them in to an array as array(key, value) where key itself is an array(components)
// Make it easier to work with non-scalar types
$cf->insert_format = ColumnFamily::ARRAY_FORMAT;
$cf->return_format = ColumnFamily::ARRAY_FORMAT;

// Composite Row Keys ()
$key1 = array("key", 1);
$key2 = array("key", 2);

$columns = array(
    array(array(0, "a"), "val0a"),

    array(array(1, "a"), "val1a"),
    array(array(1, "b"), "val1b"),
    array(array(1, "c"), "val1c"),

    array(array(2, "a"), "val2a"),

    array(array(3, "a"), "val3a")

$cf->insert($key1, $columns);
$cf->insert($key2, $columns);

Then we fetch data
(1) Get all the columns corresponding to a key
(2) insert and return format is array so accessing via index
(3) Should output an array of components of column name
//Constructor of Column Slice
__construct( mixed $start = "", mixed $finish = "", integer $count = phpcassa\ColumnSlice::DEFAULT_COLUMN_COUNT, boolean $reversed = False ) 

(4) ColumnSlice => ColumnSlice(array(1), array(1))
  1. $start => array, means composite type
    Ex: array(component, array(component, INCLUSIVE_FLAG), ...) // inner array is component specific and required only if you wish to override INCLUSIVE_FLAG
  2. $end => Same as $first
So, we ask for all columns whose first component [note the array, coz of composite type] is with value 1 to 1.
And that Indirectly means, all columns with first component 1
(5) $start=> "" means beginning of the row and
array(1, array("c", false)) means, everything less than 1:c as per sorting I mentioned in the beginning
(6) Shortlists all values based on the first component exclusive of 0 and 2
(7) Shortlists all values based on the first component exclusive of 0 and 2 in reverse (Notice $reversed set to true)
// Fetch a user record
$row = $cf->get($key1); //(1)
$col1 = $row[0];
list($name, $value) = $col1; //(2)
echo "Column name: ";
print_r($name); //(3)
echo "Column value: ";
echo "\n\n";

// Fetch columns with a first component of 1
$slice = new ColumnSlice(array(1), array(1)); // (4)
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
echo "\n\n";

// Fetch everything before (1, c), exclusive
$inclusive = False;
$slice = new ColumnSlice('', array(1, array("c", $inclusive))); // (5)
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
echo "\n\n";

// Fetch everything between 0 and 2, exclusive on both ends
$slice = new ColumnSlice( // (6)
    $start = array(array(0, False)),
    $end   = array(array(2, False))
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
echo "\n\n";

// Do the same thing in reverse
$slice = new ColumnSlice(    //(7)
    $start = array(array(2, False)),
    $end   = array(array(0, False)),
    $count = 10,
    $reversed = True
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
echo "\n\n";

// Clear out the column family

// Destroy our schema

// Close our connections
Actually this version of PHPCassa is an awesome revamp from its later version.
  • This has come out with Thrift 0.8 Support
  • Composite Type Support [no more serialize or unserialize required ;)]
  • Full Support for Batch Mutate
  • Implementation using namespaces
  • All new API Reference
  • And Complete Examples
Awesome work by Tyler Hobbs :)
Hope this helps :)