Cassandra PHPCassa & Composite Types

This post is updated inorder to support phpcassa 1.0.a.1

Cassandra Composite Type using PHPCassa


phpcassa 1.0.a.1 uses namespaces in PHP which is supported in PHP 5 >= 5.3.0
Make sure you have the relavant package.
The script mentioned below is the copy of PHPCassa Composite Example

I will explain it step by step

(1) Creating Keyspace using PHPCassa
        Name => "Keyspace1"
        Replication Factor => 1
        Placement Strategy => Simple Strategy
(2) Creating Column Family with Composite Keys using PHPCassa
        Name => "Composites"
        Column Comparator => CompositeType of LongType, AsciiType (Ex: 1:example)
        Row Key Validation => CompositeType of AsciiType, LongType (Ex: example:1)
        Sample Row:
                'example':1 => { 1:'columnName': "value", 1:'d' => "Hai", 2:'b' => "Fine", 112:'a' => "Sorry" }
        Columns are sorted Based on Component types as shown above
        112 > 2 as LongType but "112" < "2" as Ascii         Cassandra Properly honors the type mentioned on column family definition         I have used '' to denote ascii. Ignore them as values
require_once(__DIR__.'/../lib/autoload.php');

use phpcassa\Connection\ConnectionPool;
use phpcassa\ColumnFamily;
use phpcassa\ColumnSlice;
use phpcassa\SystemManager;
use phpcassa\Schema\StrategyClass;

// Create a new keyspace and column family
$sys = new SystemManager('127.0.0.1');
$sys->create_keyspace('Keyspace1', array( // (1)
    "strategy_class" => StrategyClass::SIMPLE_STRATEGY,
    "strategy_options" => array('replication_factor' => '1')
));

// Use composites for column names and row keys
$sys->create_column_family('Keyspace1', 'Composites', array( //(2)
    "comparator_type" => "CompositeType(LongType, AsciiType)",
    "key_validation_class" => "CompositeType(AsciiType, LongType)"
));


Start a connection pool, create an instance of Composites ColumnFamily
$pool = new ConnectionPool('Keyspace1', array('127.0.0.1'));
$cf = new ColumnFamily($pool, 'Composites');
Specifying Row Keys and Column Keys
Both our row key [key_validation_class] and column key [comparator] are composite types.
That means our key has components in them and types of each component might differ
So, we can't specify the keys as a single entity. They might violate the data types that cassandra cluster expects
For ex: in our case of row keys: Component 1 is Ascii & Component 2 is Long
When a write or read request is sent to cassandra, the type property should be properly maintained
Specifying "key:1" won't work and would result in an cassandra exception

Hence we maintain components of key as a php array and specify insert_format & return_format as an array.
Ex: $key1 = array("key", 1); //Ascii, Long
Other available formats for insert and return are
  • DICTIONARY // Here, array keys correspond to row keys. So, we can't use this as our keys have components
  • OBJECT // This is almost same that thrift returns
Whereas for columns, each column corresponds to a value. Hence it will be array ( array ( components ) , value )
Here the array inside an array is required because php associative arrays don't support anything other than string keys.
As we need to preserve type. We can't specify "columnKey"=>value anymore.
Hence we map them in to an array as array(key, value) where key itself is an array(components)
// Make it easier to work with non-scalar types
$cf->insert_format = ColumnFamily::ARRAY_FORMAT;
$cf->return_format = ColumnFamily::ARRAY_FORMAT;

// Composite Row Keys ()
$key1 = array("key", 1);
$key2 = array("key", 2);

$columns = array(
    array(array(0, "a"), "val0a"),

    array(array(1, "a"), "val1a"),
    array(array(1, "b"), "val1b"),
    array(array(1, "c"), "val1c"),

    array(array(2, "a"), "val2a"),

    array(array(3, "a"), "val3a")
);

$cf->insert($key1, $columns);
$cf->insert($key2, $columns);

Then we fetch data
(1) Get all the columns corresponding to a key
(2) insert and return format is array so accessing via index
(3) Should output an array of components of column name
//Constructor of Column Slice
__construct( mixed $start = "", mixed $finish = "", integer $count = phpcassa\ColumnSlice::DEFAULT_COLUMN_COUNT, boolean $reversed = False ) 

(4) ColumnSlice => ColumnSlice(array(1), array(1))
  1. $start => array, means composite type
    Ex: array(component, array(component, INCLUSIVE_FLAG), ...) // inner array is component specific and required only if you wish to override INCLUSIVE_FLAG
  2. $end => Same as $first
So, we ask for all columns whose first component [note the array, coz of composite type] is with value 1 to 1.
And that Indirectly means, all columns with first component 1
(5) $start=> "" means beginning of the row and
array(1, array("c", false)) means, everything less than 1:c as per sorting I mentioned in the beginning
(6) Shortlists all values based on the first component exclusive of 0 and 2
(7) Shortlists all values based on the first component exclusive of 0 and 2 in reverse (Notice $reversed set to true)
// Fetch a user record
$row = $cf->get($key1); //(1)
$col1 = $row[0];
list($name, $value) = $col1; //(2)
echo "Column name: ";
print_r($name); //(3)
echo "Column value: ";
print_r($value);
echo "\n\n";

// Fetch columns with a first component of 1
$slice = new ColumnSlice(array(1), array(1)); // (4)
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    var_dump($name); 
    echo "$value, ";
}
echo "\n\n";

// Fetch everything before (1, c), exclusive
$inclusive = False;
$slice = new ColumnSlice('', array(1, array("c", $inclusive))); // (5)
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
}
echo "\n\n";

// Fetch everything between 0 and 2, exclusive on both ends
$slice = new ColumnSlice( // (6)
    $start = array(array(0, False)),
    $end   = array(array(2, False))
);
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
}
echo "\n\n";

// Do the same thing in reverse
$slice = new ColumnSlice(    //(7)
    $start = array(array(2, False)),
    $end   = array(array(0, False)),
    $count = 10,
    $reversed = True
);
$columns = $cf->get($key1, $slice);
foreach($columns as $column) {
    list($name, $value) = $column;
    echo "$value, ";
}
echo "\n\n";

// Clear out the column family
$cf->truncate();

// Destroy our schema
$sys->drop_keyspace("Keyspace1");

// Close our connections
$pool->close();
$sys->close();
Actually this version of PHPCassa is an awesome revamp from its later version.
  • This has come out with Thrift 0.8 Support
  • Composite Type Support [no more serialize or unserialize required ;)]
  • Full Support for Batch Mutate
  • Implementation using namespaces
  • All new API Reference
  • And Complete Examples
Awesome work by Tyler Hobbs :)
Hope this helps :)

Comments

  1. Hi,

    I have a column family like:

    CREATE COLUMN FAMILY Users
    WITH key_validation_class = 'CompositeType(LexicalUUIDType,LexicalUUIDType)'
    AND comparator = UTF8Type
    AND column_metadata = [
    {column_name: id, validation_class: LongType}
    {column_name: client_id, validation_class: LongType}
    {column_name: user, validation_class: UTF8Type}
    {column_name: pass, validation_class: UTF8Type}
    {column_name: name, validation_class: UTF8Type}
    {column_name: e-mail, validation_class: UTF8Type}
    ]

    How can i insert data with phpcassa in a column like that?

    ReplyDelete
    Replies
    1. Hi,
      You are using key_validation_class as composite type. That means your row key is composite. PHPCassa currently don't support composite rowkeys. I have registered a request regd the same in its group. But Currently I'm doing it via CQL. I dunno how does it work but it works.
      Try
      "UPDATE Users SET 'id'=123, 'user'='tamil', 'client_id'=123 [..ur columns] where key='1:3'" using http://itsallabtamil.blogspot.in/2011/10/cassandra-cql-php.html
      Note that there is no proper support for composites either in cql or phpcassa. But the above statement works as expected when executed via phpcassa in my application

      Delete
    2. I have updated the post. Now PHPCassa Supports key_validation class too :)

      Delete
  2. Hi, I'm new to Cassandra.

    Hope you don't mind my question. If we choose to do CompositeType for row key, which partitioning strategy we should choose? RP or OPP? Can we still use CompositeType for Row Key if we choose OPP?

    ReplyDelete
    Replies
    1. Hi linnhtun,
      Partition Strategy has nothing to do with CompositeTypes. One Standard basis of cassandra is Rows are kept sorted based on row keys and columns are kept sorted based on column names [not values]. In the case of OPP your rowkey's struct[key_validation_class] is maintained and sorting will happen based on that. In the other hand RPP will not preserve struct while sorting. Key distribution and sorting will happen based on md5(ROWKEY). Latter will always guarantee a well balanced cluster [both on load and data]. You can experiment the distribution if you know your row key format prior [Ex: YYYYMMDDHH]. Try to model your data with wide rows because cassandra is good for such a use case. OPP is a tough decision to choose, because having a distributed cluster means you would want all nodes in the cluster busy all time. But OPP might result in HOT SPOTS. Post your use case in Stack Overflow and you will get really nice suggestions

      Delete
  3. Hi, I have a problem with Cassanra (particularly with phpcassa, I guess), maybe you can help me..

    I have a row with about 200 000 columns in it. When I'm writing these columns to Cassanrda, everything seems to be perfect.
    But when I'm getting these columns from the row (like $cf_name->get($row_key)) everything is extremely SLOW.
    PHP needs about 13 minutes!! to get 50 000 columns. And I need much more!

    Have You faced the problem before? Can I somehow make it quicker?

    Thanx in advance

    ReplyDelete
    Replies
    1. Hi Stas, Check out this question http://stackoverflow.com/questions/8270365/phpcassa-get-range-is-too-slow. Have you got C Extension of phpcassa in place? Also are you executing the get query from the local cassandra machine or somewhere else in the network? What about your replication factor? Can you post a detailed description of your problem @phpcassa groups or Stack Overflow?

      Delete
  4. Tamil, THANKS A LOT! I was literally shoked when your rapid answer hit the spot!
    You weve damn right, there was no C extention installed.
    Now it takes 3 sec max to get the columns..

    ps
    I've also posted my problem on github (https://github.com/thobbs/phpcassa/issues/101). I've already written about your answer there )

    thanx again

    ReplyDelete
  5. Good day, Tamil.
    I guess I have another problem..
    If I install C extension on the server without Apache , everything works just great.
    But when installing the extension on the server with Apache, the latter complains:
    [notice] child pid 346 exit signal Segmentation fault (11)
    [Tue Oct 23 11:41:38 2012] [notice] child pid 343 exit signal Segmentation fault (11)
    [Tue Oct 23 11:41:38 2012] [notice] child pid 342 exit signal Segmentation fault (11)
    [Tue Oct 23 11:41:39 2012] [notice] child pid 376 exit signal Segmentation fault (11)
    [Tue Oct 23 11:41:39 2012] [notice] child pid 348 exit signal Segmentation fault (11)
    ....
    and nothing works....
    Is there some solution?

    ReplyDelete
    Replies
    1. Hi Stas, Can you just mail the same to phpcassa google groups. I had never came across such a issue.

      Delete

Post a Comment

Popular posts from this blog

Headless Chrome/Firefox Selenium Java in Amazon EC2

Selenium Webdriver in Nodejs + Javascript