[PDF] CQL for Cassandra 2.2 & later - Free Download PDF (2024)

Download CQL for Cassandra 2.2 & later...

CQL for Cassandra 2.2 & later Documentation January 21, 2016

Apache, Apache Cassandra, Apache Hadoop, Hadoop and the eye logo are trademarks of the Apache Software Foundation ©

Contents

Contents Introduction to Cassandra Query Language................................................................ 6 CQL data modeling..........................................................................................................6 Data modeling concepts......................................................................................................................7 Data modeling analysis....................................................................................................................... 9

Using CQL........................................................................................................................ 9 Starting cqlsh.....................................................................................................................................10 Starting cqlsh on Linux and Mac OS X................................................................................. 10 Starting cqlsh on Windows.....................................................................................................10 Creating and updating a keyspace................................................................................................... 11 Example of creating a keyspace............................................................................................ 11 Updating the replication factor............................................................................................... 12 Creating a table.................................................................................................................................12 Creating a table...................................................................................................................... 12 Using the keyspace qualifier.................................................................................................. 14 Simple Primary Key................................................................................................................ 15 Composite Partition Key......................................................................................................... 16 Compound Primary Key......................................................................................................... 17 Creating a counter table.........................................................................................................18 Create table with COMPACT STORAGE...............................................................................20 Table schema collision fix...................................................................................................... 20 Creating a materialized view.............................................................................................................20 Creating advanced data types in tables........................................................................................... 22 Creating collections................................................................................................................ 23 Creating a table with a tuple.................................................................................................. 25 Creating a user-defined type (UDT).......................................................................................25 Creating functions..............................................................................................................................26 Create user-defined function (UDF)....................................................................................... 26 Create User-Defined Aggregate Function (UDA)................................................................... 27 Inserting and updating data.............................................................................................................. 27 Inserting simple data into a table...........................................................................................27 Inserting and updating data into a set................................................................................... 28 Inserting and updating data into a list....................................................................................29 Inserting and updating data into a map................................................................................. 30 Inserting tuple data into a table............................................................................................. 31 Inserting data into a user-defined type (UDT)........................................................................31 Inserting JSON data into a table............................................................................................32 Using lightweight transactions................................................................................................ 32 Expiring data with Time-To-Live.............................................................................................33 Expiring data with TTL example.............................................................................................33 Inserting data using COPY and a CSV file............................................................................34 Batching data insertion and updates................................................................................................ 34 Using and misusing batches.................................................................................................. 34 Use of BATCH statement.......................................................................................................35 Misuse of BATCH statement..................................................................................................37 Using unlogged batches......................................................................................................... 37 Querying tables................................................................................................................................. 38

Contents

Retrieval and sorting results...................................................................................................38 Retrieval using collections...................................................................................................... 40 Retrieval using JSON............................................................................................................. 41 Retrieval using the IN keyword.............................................................................................. 41 Retrieval by scanning a partition............................................................................................42 Retrieval using standard aggregate functions........................................................................ 43 Retrieval using a user-defined function (UDF).......................................................................44 Retrieval using user-defined aggregate (UDA) functions....................................................... 44 Querying a system table........................................................................................................ 45 Indexing tables.................................................................................................................................. 49 Indexing...................................................................................................................................49 Building and maintaining indexes...........................................................................................53 Altering a table.................................................................................................................................. 54 Altering columns in a table.....................................................................................................54 Altering a table to add a collection........................................................................................ 54 Altering the data type of a column.........................................................................................55 Altering a materialized view.............................................................................................................. 55 Altering a user-defined type.............................................................................................................. 55 Removing a keyspace, schema, or data.......................................................................................... 56 Dropping a keyspace, table or materialized view.................................................................. 56 Deleting columns and rows.................................................................................................... 56 Dropping a user-defined function (UDF)................................................................................ 57 Securing a table................................................................................................................................ 57 Database users.......................................................................................................................57 Database roles........................................................................................................................58 Database Permissions............................................................................................................ 59 Tracing consistency changes............................................................................................................ 60 Setup to trace consistency changes...................................................................................... 61 Trace reads at different consistency levels............................................................................62 How consistency affects performance....................................................................................67 Paging through an unordered partitioner.......................................................................................... 67 Determining time-to-live for a column............................................................................................... 69 Determining the date/time of a write.................................................................................................70 Legacy tables.................................................................................................................................... 71 Working with legacy applications........................................................................................... 71 Querying a legacy table......................................................................................................... 71 Using a CQL legacy table query............................................................................................71

CQL reference................................................................................................................ 72 Introduction........................................................................................................................................ 72 CQL lexical structure......................................................................................................................... 72 Uppercase and lowercase...................................................................................................... 73 Escaping characters............................................................................................................... 74 Valid literals............................................................................................................................ 74 Exponential notation............................................................................................................... 75 CQL code comments..............................................................................................................75 CQL Keywords........................................................................................................................75 CQL data types................................................................................................................................. 79 Blob type.................................................................................................................................80 Collection type........................................................................................................................ 81 Counter type........................................................................................................................... 82 UUID and timeuuid types....................................................................................................... 82 uuid and Timeuuid functions.................................................................................................. 82 Timestamp type...................................................................................................................... 84 Tuple type............................................................................................................................... 84

Contents

User-defined type................................................................................................................... 85 CQL keyspace and table properties................................................................................................. 86 Table properties......................................................................................................................86 Compaction subproperties...................................................................................................... 89 Compression subproperties.................................................................................................... 92 Functions........................................................................................................................................... 93 CQL limits.......................................................................................................................................... 94 cqlsh commands................................................................................................................................94 cqlsh........................................................................................................................................ 94 CAPTURE............................................................................................................................... 97 CLEAR.................................................................................................................................... 98 CONSISTENCY...................................................................................................................... 98 COPY...................................................................................................................................... 99 DESCRIBE............................................................................................................................ 105 EXPAND............................................................................................................................... 107 EXIT...................................................................................................................................... 107 PAGING................................................................................................................................ 108 SERIAL CONSISTENCY...................................................................................................... 108 SHOW................................................................................................................................... 109 SOURCE............................................................................................................................... 110 TRACING.............................................................................................................................. 111 CQL commands...............................................................................................................................115 ALTER KEYSPACE.............................................................................................................. 115 ALTER MATERIALIZED VIEW.............................................................................................116 ALTER ROLE....................................................................................................................... 117 ALTER TABLE......................................................................................................................118 ALTER TYPE........................................................................................................................121 ALTER USER....................................................................................................................... 123 BATCH.................................................................................................................................. 123 CREATE AGGREGATE........................................................................................................126 CREATE INDEX................................................................................................................... 126 CREATE FUNCTION............................................................................................................129 CREATE KEYSPACE........................................................................................................... 130 CREATE MATERIALIZED VIEW..........................................................................................133 CREATE TABLE................................................................................................................... 134 CREATE TRIGGER.............................................................................................................. 139 CREATE ROLE.................................................................................................................... 140 CREATE USER.................................................................................................................... 141 DELETE................................................................................................................................ 142 DESCRIBE............................................................................................................................ 144 DROP AGGREGATE............................................................................................................146 DROP FUNCTION................................................................................................................ 146 DROP INDEX....................................................................................................................... 147 DROP KEYSPACE............................................................................................................... 147 DROP MATERIALIZED VIEW..............................................................................................148 DROP ROLE.........................................................................................................................149 DROP TABLE....................................................................................................................... 149 DROP TRIGGER.................................................................................................................. 150 DROP TYPE......................................................................................................................... 150 DROP USER........................................................................................................................ 151 GRANT..................................................................................................................................151 INSERT................................................................................................................................. 153 LIST PERMISSIONS............................................................................................................ 155 LIST ROLES......................................................................................................................... 157 LIST USERS.........................................................................................................................158 REVOKE............................................................................................................................... 159

Contents

SELECT................................................................................................................................ 160 TRUNCATE...........................................................................................................................168 UPDATE................................................................................................................................168 USE....................................................................................................................................... 172

Using the docs.............................................................................................................173

Introduction to Cassandra Query Language

Introduction to Cassandra Query Language Cassandra Query Language (CQL) is a query language for the Cassandra database.

Cassandra Query Language (CQL) is a query language for the Cassandra database. The Cassandra Query Language (CQL) is the primary language for communicating with the Cassandra database. The most basic way to interact with Cassandra is using the CQL shell, cqlsh. Using cqlsh, you can create keyspaces and tables, insert and query tables, plus much more. If you prefer a graphical tool, you can use DataStax DevCenter. For production, DataStax supplies a number of drivers so that CQL statements can be passed from client to cluster and back. Other administrative tasks can be accomplished using OpsCenter. Important: This document assumes you are familiar with the Cassandra 2.2 documentation, Cassandra 3.0 documentation or Cassandra 3.x documentation. Table: CQL for Cassandra 2.2 and later features New CQL features include:

• • • • • • •

JSON support for CQL3 User Defined Functions (UDFs) User Defined Aggregates (UDAs) Role Based Access Control (RBAC) Native Protocol v.4 In Cassandra 3.0 and later, Materialized Views Addition of CLEAR command for cqlsh

Improved CQL features • include: • • • • • • •

Additional COPY command options New CREATE TABLE WITH ID command Support IN clause on any partition key column or clustering column Accept Dollar Quoted Strings Allow Mixing Token and Partition Key Restrictions Support Indexing Key/Value Entries on Map Collections Date data type added and improved time/date conversion functions Change to CREATE TABLE syntax for compression options

Removed CQL features • include: •

Removal of CQL2 Removal of cassandra-cli

Native protocol

The Native Protocol has been updated to version 4, with implications for CQL use in the DataStax drivers.

CQL data modeling Data modeling topics.

Note: DataStax Academy provides a course in Cassandra data modeling. This course presents techniques using the Chebotko method for translating a real-world domain model into a running Cassandra schema.

CQL data modeling

Data modeling concepts How data modeling should be approached for Cassandra. A Pro Cycling statistics example is used throughout the CQL document.

Data modeling is a process that involves identifying the entities (items to be stored) and the relationships between entities. To create your data model, identify the patterns used to access data and the types of queries to be performed. These two ideas inform the organization and structure of the data, and the design and creation of the database's tables. Indexing the data can improve performance in some cases, so decide which columns will have secondary indexes. Data modeling in Cassandra uses a query-driven approach, in which specific queries are the key to organizing the data. Queries are the result of selecting data from a table; schema is the definition of how data in the table is arranged. Cassandra's database design is based on the requirement for fast reads and writes, so the better the schema design, the faster data is written and retrieved. In contrast, relational databases normalize data based on the tables and relationships designed, and then writes the queries that will be made. Data modeling in relational databases is table-driven, and any relationships between tables are expressed as table joins in queries. Cassandra's data model is a partitioned row store with tunable consistency. Tunable consistency means for any given read or write operation, the client application decides how consistent the requested data must be. Rows are organized into tables; the first component of a table's primary key is the partition key; within a partition, rows are clustered by the remaining columns of the key. Other columns can be indexed separately from the primary key. Because Cassandra is a distributed database, efficiency is gained for reads and writes when data is grouped together on nodes by partition. The fewer partitions that must be queried to get an answer to a question, the faster the response. Tuning the consistency level is another factor in latency, but is not part of the data modeling process. Cassandra data modeling focuses on the queries. Throughout this topic, the example of Pro Cycling statistics demonstrates how to model the Cassandra table schema for specific queries. The conceptual model for this data model shows the entities and relationships.

The entities and their relationships are considered during table design. Queries are best designed to access a single table, so all entities involved in a relationship that a query encompasses must be in the table. Some tables will involve a single entity and its attributes, like the first example shown below. Others will involve more than one entity and its attributes, such as the second example. Including all data in a single Cassandra table contrasts with a relational database approach, where the data would be stored

CQL data modeling

in two or more tables and foreign keys would be used to relate the data between the tables. Because Cassandra uses this single table-single query approach, queries can perform faster. One basic query (Q1) for Pro Cycling statistics is a list of cyclists, including each cyclist's id, firstname, and lastname. To uniquely identify a cyclist in the table, an id using UUID is used. For a simple query to list all cyclists a table that includes all the columns identified and a partition key (K) of id is created. The diagram below shows a portion of the logical model for the Pro Cycling data model. Figure: Query 1: Find a cyclist's name with a specified id

A related query (Q2) searches for all cyclists by a particular race category. For Cassandra, this query is more efficient if a table is created that groups all cyclists by category. Some of the same columns are required (id, lastname), but now the primary key of the table includes category as the partition key (K), and groups within the partition by the id (C). This choice ensures that unique records for each cyclist are created. Figure: Query 2: Find cyclists given a specified category

These are two simple queries; more examples will be shown to illustrate data modeling using CQL. Notice that the main principle in designing the table is not the relationship of the table to other tables, as it is in relational database modeling. Data in Cassandra is often arranged as one query per table, and data is repeated amongst many tables, a process known as denormalization. Relational databases instead normalize data, removing as much duplication as possible. The relationship of the entities is important, because the order in which data is stored in Cassandra can greatly affect the ease and speed of data retrieval. The schema design captures much of the relationship between entities by including related attributes in the same table. Client-side joins in application code is used only when table schema cannot capture the complexity of the relationships.

Using CQL

Data modeling analysis How to analyze a logical data model.

You've created a conceptual model of the entities and their relationships. From the conceptual model, you've used the expected queries to create table schema. The last step in data model involves completing an analysis of the logical design to discover modifications that might be needed. These modifications can arise from understanding partition size limitations, cost of data consistency, and performance costs due to a number of design choices still to be made. For efficient operation, partitions must be sized within certain limits. Two measures of partition size are the number of values in a partition and the partition size on disk. The maximum number of columns per row is two billion. Sizing the disk space is more complex, and involves the number of rows and the number of columns, primary key columns and static columns in each table. Each application will have different efficiency parameters, but a good rule of thumb is to keep the maximum number of values below 100,000 items and the disk size under 100MB. Data redundancy must be considered as well. Two redundancies that are a consequence of Cassandra's distributed design are duplicate data in tables and multiple partition replicates. Data is generally duplicated in multiple tables, resulting in performance latency during writes and requires more disk space. Consider storing a cyclist's name and id in more than one data, along with other items like race categories, finished races, and cyclist statistics. Storing the name and id in multiple tables results in linear duplication, with two values stored in each table. Table design must take into account the possibility of higher order duplication, such as unlimited keywords stored in a large number of rows. A case of n keywords stored in m rows is not a good table design. You should rethink the table schema for better design, still keeping the query foremost. Cassandra replicates partition data based on the replication factor, using more disk space. Replication is a necessary aspect of distributed databases and sizing disk storage correctly is important. Application-side joins can be a performance killer. In general, you should analyze your queries that require joins and consider pre-computing and storing the join results in an additional table. In Cassandra, the goal is to use one table per query for performant behavior. Lightweight transactions (LWT) can also affect performance. Consider whether or not the queries using LWT are necessary and remove the requirement if it is not strictly needed.

Using CQL CQL provides an API to Cassandra that is simpler than the Thrift API.

CQL provides an API to Cassandra that is simpler than the Thrift API. The Thrift API and legacy versions of CQL expose the internal storage structure of Cassandra. CQL adds an abstraction layer that hides implementation details of this structure and provides native syntaxes for collections and other common encodings.

Accessing CQL Common ways to access CQL are: • • • •

Start cqlsh, the Python-based command-line client, on the command line of a Cassandra node. Use DataStax DevCenter, a graphical user interface. For developing applications, you can use one of the official DataStax C#, Java, or Python open-source drivers. Use the set_cql_version Thrift method for programmatic access.

This document presents examples using cqlsh.

Using CQL

Starting cqlsh How to start cqlsh.

Starting cqlsh on Linux and Mac OS X A brief description on starting cqlsh on Linux and Mac OS X.

This procedure briefly describes how to start cqlsh on Linux and Mac OS X. The cqlsh command is covered in detail later.

Procedure 1. Navigate to the Cassandra installation directory. 2. Start cqlsh on the Mac OSX, for example. $ bin/cqlsh If you use security features, provide a user name and password. 3. Optionally, specify the IP address and port to start cqlsh on a different node. $ bin/cqlsh 1.2.3.4 9042 Note: You can use tab completion to see hints about how to complete a cqlsh command. Some platforms, such as Mac OSX, do not ship with tab completion installed. You can use easy_install to install tab completion capabilities on Mac OSX: $ easy_install readline

Starting cqlsh on Windows A brief description on starting cqlsh on Windows.

This procedure describes how to start cqlsh on Windows. The P command is covered in detail later.

Procedure You can start cqlsh in two ways: • From the Start menu: a) Navigate to Start > Programs > DataStax Distribution of Apache Cassandra. b) If using Cassandra 3.0+, click DataStax Distribution of Apache Cassandra > Cassandra CQL Shell c) If using Cassandra 2.2, click DataStax Community Edition > Cassandra CQL Shell. The cqlsh prompt appears: cqlsh> • From the Command Prompt: a) Open the Command Prompt. b) Navigate to Cassandra bin directory: Cassandra 3.0+: C:\> cd C:"Program Files\DataStax-DDC\apache-cassandra\bin" Cassandra 2.2: C:\> cd C:"Program Files\DataStax Community\apache-cassandra\bin" c) Enter cqlsh. The cqlsh prompt appears: cqlsh>

Using CQL

To start cqlsh on a different node, add the IP address and port: C:\> cqlsh 1.2.3.4 9042 Note: You can use tab completion to see hints about how to complete a cqlsh command. To install tab completion capabilities on Windows, you can use pip install pyreadline.

Creating and updating a keyspace Creating a keyspace is the CQL counterpart to creating an SQL database.

Creating a keyspace is the CQL counterpart to creating an SQL database, but a little different. The Cassandra keyspace is a namespace that defines how data is replicated on nodes. Typically, a cluster has one keyspace per application. Replication is controlled on a per-keyspace basis, so data that has different replication requirements typically resides in different keyspaces. Keyspaces are not designed to be used as a significant map layer within the data model. Keyspaces are designed to control data replication for a set of tables. When you create a keyspace, specify a strategy class for replicating keyspaces. Using the SimpleStrategy class is fine for evaluating Cassandra. For production use or for use with mixed workloads, use the NetworkTopologyStrategy class. To use NetworkTopologyStrategy for evaluation purposes using, for example, a single node cluster, the default data center name is used. To use NetworkTopologyStrategy for production use, you need to change the default snitch, SimpleSnitch, to a network-aware snitch, define one or more data center names in the snitch properties file, and use the data center name(s) to define the keyspace; see Snitch. For example, if the cluster uses the PropertyFileSnitch, create the keyspace using the userdefined data center and rack names in the cassandra-topologies.properties file. If the cluster uses the Ec2Snitch, create the keyspace using EC2 data center and rack names. If the cluster uses the GoogleCloudSnitch, create the keyspace using GoogleCloud data center and rack names. If you fail to change the default snitch and use NetworkTopologyStrategy, Cassandra will fail to complete any write request, such as inserting data into a table, and log this error message: Unable to complete request: one or more nodes were unavailable. Note: You cannot insert data into a table in keyspace that uses NetworkTopologyStrategy unless you define the data center names in the snitch properties file or you use a single data center named datacenter1.

Example of creating a keyspace A simple example of querying Cassandra by creating a keyspace and then using it.

To query Cassandra, create and use a keyspace. Choose an arbitrary data center name and register the name in the properties file of the snitch. Alternatively, in a cluster in a single data center, use the default data center name, for example, datacenter1, and skip registering the name in the properties file.

Procedure 1. Determine the default data center name, if using NetworkTopologyStrategy, using nodetool status. $ bin/nodetool status The output is: Datacenter: datacenter1 ======================= Status=Up/Down

Using CQL

|/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) UN 127.0.0.1 41.62 KB 256 100.0% 2. Create a keyspace.

Host ID 75dcca8f...

Rack rack1

cqlsh> CREATE KEYSPACE IF NOT EXISTS cycling WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }; 3. Use the keyspace. cqlsh> USE cycling;

Updating the replication factor Increasing the replication factor increases the total number of copies of keyspace data stored in a Cassandra cluster.

Increasing the replication factor increases the total number of copies of keyspace data stored in a Cassandra cluster. If you are using security features, it is particularly important to increase the replication factor of the system_auth keyspace from the default (1) because you will not be able to log into the cluster if the node with the lone replica goes down. It is recommended to set the replication factor for the system_auth keyspace equal to the number of nodes in each data center.

Procedure 1. Update a keyspace in the cluster and change its replication strategy options. cqlsh> ALTER KEYSPACE system_auth WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2}; Or if using SimpleStrategy: cqlsh> ALTER KEYSPACE "Excalibur" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; 2. On each affected node, run the nodetool repair command. 3. Wait until repair completes on a node, then move to the next node.

Creating a table How to create tables to store data.

In CQL, data is stored in tables containing rows of columns.

Creating a table How to create CQL tables.

In CQL, data is stored in tables containing rows of columns, similar to SQL definitions. Note: The concept of rows and columns in the internal implementation of Cassandra are not the same. For more information, see A thrift to CQL3 upgrade guide or CQL3 for Cassandra experts. Tables can be created, dropped, and altered at runtime without blocking updates and queries. When you create a table, define a primary key and columns along with table properties. Use the optional WITH clause and keyword arguments to configure table properties: caching, compaction, and other operations. Table properties are specified per table as shown in table settings.

Using CQL

Create schema using cqlsh Create table schema using cqlsh. Dynamic schema generation is not supported; collision can occur if multiple clients attempt to generate tables simultaneously. To fix problems if you do accidentally have collision occur, see the schema collision fix instructions.

Primary Key A primary key identifies the location and order of data storage. The primary key is defined at table creation time and cannot be altered. If the primary key must be changed, a new table schema is created and the data is written to the new table. Cassandra is a partition row store, and a component of the primary key, the partition key, identifies which node will hold a particular table row. See ALTER TABLE for details on altering a table after creation. At the minimum, the primary key must consist of a partition key. Composite partition keys can split a data set so that related data is stored on separate partitions. Compound primary keys include clustering columns which order the data on a partition. The definition of a table's primary key is critical in Cassandra. Carefully model how data in a table will be inserted and retrieved before choosing which columns will define the primary key. The size of the partitions, the order of the data within partitions, the distribution of the partitions amongst the nodes of the cluster - all of these considerations determine selection of the best primary key for a table.

Table characteristics Valid table names are strings of alphanumeric characters and underscores that begin with a letter. To specify a table name: •

•

Use dot notation to specify the table name. Create a table using the keyspace name separated from the table name with a period. The keyspace remains in the current keyspace, but creates the table in the specified keyspace. Use the current keyspace. Create a table using only the table name.

Column characteristics Columns are an essential component of a CQL table. Several column types exist to afford flexibility to table schema design. Each column in a table is assigned a data type during table creation. Column types, other than collection-type columns, are specified as a parenthesized, comma-separated list of column name and type pairs. The following example illustrates three data types, UUID, text and timestamp: CREATE TABLE cycling.cyclist_alt_stats ( id UUID PRIMARY KEY, lastname text, birthday timestamp, nationality text, weight text, height text ); Collection column types supported are map, set, and list. A collection column is declared using the collection type, followed by another type, such as int or text, in angle brackets. Like other columns, collection columns are specified as a parenthesized, comma-separated list of column name and type pair. The following example illustrates each collection type, but is not designed for an actual query: CREATE TABLE cycling.whimsey ( id UUID PRIMARY KEY, lastname text, cyclist_teams set, events list, teams map ); Collection types cannot currently be nested. Collections can include a frozen data type. For examples and usage, see Collection type on page 81 The column type tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type. A tuple can accommodate many fields (32768), more than can be prudently used. Typically, create a tuple with a small number of fields. A tuple is typically used for 2 to 5 fields. To create a tuple in a table, use angle brackets and a comma delimiter to declare the tuple

Using CQL

component types. Tuples can be nested. The following example illustrates a tuple type composed of a text field and a nested tuple of two float fields: CREATE TABLE cycling.route (race_id int, race_name text, point_id int, lat_long tuple, PRIMARY KEY (race_id, point_id)); Note: Cassandra 2.1.0 to 2.1.2 requires using frozen for tuples, while Cassandra 2.1.3 and later does not require this keyword: frozen For more information, see "Tuple type". User-defined types (UDTs) can be used for a column type when a data type of several fields was created using CREATE TYPE. A UDT is created when it will be used for multiple table definitions. The column type user-defined type (UDT) requires the frozen keyword. A frozen value serializes multiple components into a single value. Non-frozen types allow updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire value must be overwritten. The scope of a user-defined type is the keyspace in which you define it. Use dot notation to access a type from a keyspace outside its scope: keyspace name followed by a period followed the name of the type. Cassandra accesses the type in the specified keyspace, but does not change the current keyspace; otherwise, if you do not specify a keyspace, Cassandra accesses the type within the current keyspace. For examples and usage information, see "Using a user-defined type". A counter is a special column used to store a number that is changed in increments. A counter can only be used in a dedicated table that includes a column of counter data type. For more examples and usage information, see "Using a counter".

Using the keyspace qualifier To simplify tracking multiple keyspaces, use the keyspace qualifier instead of the USE statement.

Sometimes issuing a USE statement to select a keyspace is inconvenient. Connection pooling requires managing multiple keyspaces. To simplify tracking multiple keyspaces, use the keyspace qualifier instead of the USE statement. You can specify the keyspace using the keyspace qualifier in these statements: • • • • • • •

ALTER TABLE CREATE TABLE DELETE INSERT SELECT TRUNCATE UPDATE

Procedure To specify a table when you are not in the keyspace that contains the table, use the name of the keyspace followed by a period, then the table name. For example, cycling.race_winners. cqlsh> INSERT INTO cycling.race_winners ( race_name, race_position, cyclist_name ) VALUES ( 'National Championships South Africa WJ-ITT (CN)', 1, {firstname:'Frances',lastname:'DU TOUT'} );

Using CQL

Simple Primary Key A simple primary key consists of only the partition key which determines which node stores the data.

For a table with a simple primary key, Cassandra uses one column name as the partition key. The primary key consists of only the partition key in this case. If a large amount of data will be ingested, but few values will be inserted for the column that defines the primary key, the partitions can grow large. Large partitions might be slow to respond to read requests, or grow too large for a node's assigned disk space. On the other hand, data stored with a simple primary key will be fast to insert and retrieve if many values for the column can distribute the partitions across many nodes. Often, your first venture into using Cassandra involves tables with simple primary keys. Keep in mind that only the primary key can be specified when retrieving data from the table, if secondary indexes are not used. Many production tables will use as some unique identifier as a simple primary key. Look up the identifier and the rest of the data in the table is retrieved. If an application needs a simple lookup table, use a simple primary key. The table shown uses id as the primary key.

To create a table having a single primary key, two methods can be used: • •

Insert the PRIMARY KEY keywords after the column name in the CREATE TABLE definition. Insert the PRIMARY KEY keywords after the last column definition in the CREATE TABLE definition, followed by the column name of the key. The column name is enclosed in parentheses.

Using a simple primary key Use a simple primary key to create columns that you can query to return sorted results.

Use a simple primary key to create columns that you can query to return sorted results. This example creates a cyclist_name table storing an ID number and a cyclist's first and last names in columns. The table uses a UUID as a primary key. This table can be queried to discover the name of a cyclist given their ID number. A simple primary key table can be created in three different ways, as shown.

Procedure •

Create the table cyclist_name in the cycling keyspace, making id the primary key. Before creating the table, set the keyspace with a USE statement. cqlsh> USE cycling; CREATE TABLE cyclist_name ( id UUID PRIMARY KEY, lastname text, firstname text );

•

This same example can be written with the primary key identified at the end of the table definition. cqlsh> USE cycling; CREATE TABLE cyclist_name ( id UUID, lastname text, firstname text, PRIMARY KEY (id) );

Using CQL

•

The keyspace name can be used to identify the keyspace in the CREATE TABLE statement instead of the USE statement. cqlsh> CREATE TABLE cycling.cyclist_name ( id UUID, lastname text, firstname text, PRIMARY KEY (id) );

•

Cassandra 2.2 and later can use a date or time data type. CREATE TABLE cycling.birthdays (id UUID PRIMARY KEY, bday date, btime time);

Composite Partition Key A partition key can have a partition key defined with multiple table columns which determines which node stores the data.

For a table with a composite partition key, Cassandra uses multiple columns as the partition key. These columns form logical sets inside a partition to facilitate retrieval. In contrast to a simple partition key, a composite partition key uses two or more columns to identify where data will reside. Composite partition keys are used when the data stored is too large to reside in a single partition. Using more than one column for the partition key breaks the data into chunks, or buckets. The data is still grouped, but in smaller chunks. This method can be effective if a Cassandra cluster experiences hotspotting, or congestion in writing data to one node repeatedly, because a partition is heavily writing. Cassandra is often used for time series data, and hotspotting can be a real issue. Breaking incoming data into buckets by year:month:day:hour, using four columns to route to a partition can decrease hotspots. Data is retrieved using the partition key. Keep in mind that to retrieve data from the table, values for all columns defined in the partition key have to be supplied, if secondary indexes are not used. The table shown uses race_year and race_name in the primary key, as a composition partition key. To retrieve data, both parameters must be identified.

To create a table having a composite partition key, use the following method: •

Insert the PRIMARY KEY keywords after the last column definition in the CREATE TABLE definition, followed by the column names of the partition key. The column names for the composite partition key are enclosed in double parentheses.

Cassandra stores an entire row of data on a node by partition key. If you have too much data in a partition and want to spread the data over multiple nodes, use a composite partition key.

Using a composite partition key Use a composite partition key to identify where data will be stored.

Use a composite partition key in your primary key to create columns that you can query to return sorted results. This example creates a rank_by_year_and_name table storing the ranking and name of cyclists who competed in races. The table uses race_year and race_name as the columns defining the composition partition key of the primary key. The query discovers the ranking of cyclists who competed in races by supplying year and race name values. A composite partition key table can be created in two different ways, as shown.

Using CQL

Procedure •

Create the table rank_by_year_and_name in the cycling keyspace. Use race_year and race_name for the composite partition key. The table definition shown has an additional column rank used in the primary key. Before creating the table, set the keyspace with a USE statement. This example identifies the primary key at the end of the table definition. Note the double parentheses around the first two columns defined in the PRIMARY KEY. cqlsh> USE cycling; CREATE TABLE rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );

•

The keyspace name can be used to identify the keyspace in the CREATE TABLE statement instead of the USE statement. cqlsh> CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );

Compound Primary Key A compound primary key consists of a partition key that determines which node stores the data and of clustering column(s) which determine the order of the data on the partition.

For a table with a compound primary key, Cassandra uses a partition key that is either simple or composite. In addition, clustering column(s) are defined. Clustering is a storage engine process that sorts data within each partition based on the definition of the clustering columns. Normally, columns are sorted in ascending alphabetical order. Generally, a different grouping of data will benefit reads and writes better than this simplistic choice. Remember that data is distributed throughout the Cassandra cluster. An application can experience high latency while retrieving data from a large partition if the entire partition must be read to gather a small amount of data. On a physical node, when rows for a partition key are stored in order based on the clustering columns, retrieval of rows is very efficient. Grouping data in tables using clustering columns is the equivalent of JOINs in a relational database, but are much more performant because only one table is accessed. This table uses category as the partition key and points as the clustering column. Notice that for each category, the points are ordered in descending order.

Retrieving data from a partition is more versatile with clustering columns. For the example shown, a query could retrieve all point values greater than 200 for the One-day-races. To create a table having a compound primary key, use the following method:

Using CQL

•

Insert the PRIMARY KEY keywords after the last column definition in the CREATE TABLE definition, followed by the column name of the key. The column name is enclosed in parentheses.

Using a compound primary key Use a compound primary key to create columns that you can query to return sorted results.

Use a compound primary key to create columns that you can query to return sorted results. If our pro cycling example was designed in a relational database, you would create a cyclists table with a foreign key to the races. In Cassandra, you denormalize the data because joins are not performant in a distributed system. Later, other schema are shown that improve Cassandra performance. Collections and indexes are two data modeling methods. This example creates a cyclist_category table storing a cyclist's last name, ID, and points for each type of race category. The table uses category for the partition key and points for a single clustering column. This table can be queried to retrieve a list of cyclists and their points in a category, sorted by points. A compound primary key table can be created in two different ways, as shown.

Procedure •

To create a table having a compound primary key, use two or more columns as the primary key. This example uses an additional clause WITH CLUSTERING ORDER BY to order the points in descending order. Ascending order is more efficient to store, but descending queries are faster due to the nature of the storage engine. cqlsh> USE cycling; CREATE TABLE cyclist_category ( category text, points int, id UUID, lastname text, PRIMARY KEY (category, points) ) WITH CLUSTERING ORDER BY (points DESC);

•

The keyspace name can be used to identify the keyspace in the CREATE TABLE statement instead of the USE statement. cqlsh> CREATE TABLE cyclist_category ( category text, points int, id UUID, lastname text, PRIMARY KEY (category, points) ) WITH CLUSTERING ORDER BY (points DESC); Note: The combination of the category and points uniquely identifies a row in the cyclist_category table. More than one row with the same category can exist as long as the rows contain different pointsvalues. Consider the example again; is the data modeling for storing this data optimal? Under what conditions might errors occur?

Creating a counter table A counter is a special column for storing a number that is changed in increments.

Counters are useful for many data models. For example, a company may wish to keep track of the number of web page views received on a company website. Scorekeeping applications could use a counter to keep track of the number of games played online or the number of players who have joined a game. The table shown uses id as the primary key and keeps track of the popularity of a cyclist based on thumbs up/down clicks in the popularity field of a counter table.

Using CQL

Tracking count in a distributed database presents an interesting challenge. In Cassandra, the counter value will be located in the Memtable, commit log, and one or more SSTables at any given moment. Replication between nodes can cause consistency issues in certain edge cases. Cassandra counters were redesigned in Cassandra 2.1 to alleviate some of the difficulties. Read "What’s New in Cassandra 2.1: Better Implementation of Counters" to discover the improvements made in the counters. A counter is a special column used to store a number that is changed in increments. A counter can be decremented or incremented in integer values. Because counters are implemented differently from other columns, counter columns can only be defined in dedicated tables. The counter data type is used to set the data type. Do not assign this type to a column that serves as the primary key or partition key. Also, do not use the counter type in a table that contains anything other than counter types and the primary key. Explicitly, the only non-counter columns in a counter table must be part of the primary key. Many counter-related settings can be set in the cassandra.yaml file. A counter column cannot be indexed or deleted.. To load data into a counter column, or to increase or decrease the value of the counter, use the UPDATE command. Cassandra rejects USING TIMESTAMP or USING TTL when updating a counter column. To create a table having one or more counter columns, use this: •

Use CREATE TABLE to define the counter and non-counter columns. Use all non-counter columns as part of the PRIMARY KEY definition.

Using a counter A counter is a special column for storing a number that is changed in increments.

To load data into a counter column, or to increase or decrease the value of the counter, use the UPDATE command. Cassandra rejects USING TIMESTAMP or USING TTL in the command to update a counter column.

Procedure •

Create a table for the counter column. cqlsh> USE cycling; CREATE TABLE popular_count ( id UUID PRIMARY KEY, popularity counter );

•

Loading data into a counter column is different than other tables. The data is updated rather than inserted. UPDATE cycling.popular_count SET popularity = popularity + 1 WHERE id = 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47;

•

Take a look at the counter value and note that popularity has a value of 1. SELECT * FROM cycling.popular_count;

•

Additional increments or decrements will change the value of the counter column.

Using CQL

Create table with COMPACT STORAGE Create table using legacy (Thrift) storage engine format.

When you create a table using compound primary keys, for every piece of data stored, the column name needs to be stored along with it. Instead of each non-primary key column being stored such that each column corresponds to one column on disk, an entire row is stored in a single column on disk. If you need to conserve disk space, use the WITH COMPACT STORAGE directive that stores data in the legacy (Thrift) storage engine format. CREATE TABLE sblocks ( block_id uuid, subblock_id uuid, data blob, PRIMARY KEY (block_id, subblock_id) ) WITH COMPACT STORAGE; Using the compact storage directive prevents you from defining more than one column that is not part of a compound primary key. A compact table using a primary key that is not compound can have multiple columns that are not part of the primary key. A compact table that uses a compound primary key must define at least one clustering column. Columns cannot be added nor removed after creation of a compact table. Unless you specify WITH COMPACT STORAGE, CQL creates a table with non-compact storage. Collections and static columns cannot be used with COMPACT STORAGE tables.

Table schema collision fix How to fix schema collision problems.

Dynamic schema creation or updates can cause schema collision resulting in errors.

Procedure 1. Run a rolling restart on all nodes to ensure schema matches. Run nodetool describecluster on all nodes. Check that there is only one schema version. 2. On each node, check the data directory, looking for two directories for the table in question. If there is only one directory, go on to the next node. If there are two or more directories, the old table directory before update and a new table directory for after the update, continue. 3. Identify which cf_id (column family ID) is the newest table ID in system.schema_columnfamilies. The column family ID is fifth column in the results. $ cqlsh -e "SELECT * FROM system.schema_column_families" |grep 4. Move the data from the older table to the newer table's directory and remove the old directory. Repeat this step as necessary. 5. Run nodetool refresh.

Creating a materialized view How to create CQL materialized views.

In Cassandra 3.0 and later, a materialized view is a table that is built from another table's data with a new primary key specified. In Cassandra, queries are optimized by primary key definition and often there is a table per query. If a new query is desired, a new table is created. Previously, these additional tables were created and updated manually in the client application. Materialized views update and delete values when the original table is updated and deleted.

Using CQL

Whereas secondary indexes which are suited for low cardinality data, materialized views are suited for high cardinality data. Secondary indexes on high cardinality data require all nodes in a cluster to be queried, causing high read latency. With materialized views, the data is arranged serially based on the new primary key in a new table. Materialized views will cause hotspots if low cardinality data is inserted. To create a materialized view, certain requirements must be met. • •

The columns of the original table's primary key must be part of the materialized view's primary key. Only one new column may be added to the materialized view's primary key.

The following table is the original, or base, table for the materialized views that will be built. CREATE TABLE cyclist_mv (cid UUID PRIMARY KEY, name text, age int, birthday date, country text); This table holds values for the name, age, birthday, and country affiliation of several cyclists.

A materialized view can be created from cyclist_mv that uses age as the primary index for the file. CREATE MATERIALIZED VIEW cyclist_by_age AS SELECT age, birthday, name, country FROM cyclist_mv WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid); The CREATE MATERIALIZED VIEW statement has several features. The AS SELECT phrase identifies the columns that will be copied from the base table to the materialized view. The FROM phrase identifies the base table from which the data will be copied. The WHERE clause must include all primary key columns with the IS NOT NULL phrase so that only rows with data for all the primary key's columns will be copied. Finally, as with any table, the materialized view identifies the primary key columns. Since cyclist_mv used cid as its primary key, cid must be present in the materialized view's primary key. In the new materialized view, data can be selected based on the cyclist's age. Because the data is now partitioned based on the age, the data is easily and quickly retrieved. SELECT age, name, birthday FROM cyclist_by_age WHERE age = 18;

Using CQL

Similarly, a materialized view can be created that keys the information to cyclists' birthdays or country of origin. CREATE MATERIALIZED VIEW cyclist_by_country AS SELECT age,birthday, name, country FROM cyclist_mv WHERE country IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (country, cid); CREATE MATERIALIZED VIEW cyclist_by_birthday AS SELECT age, birthday, name, country FROM cyclist_mv WHERE birthday IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (birthday, cid); Now, queries based on the country of origin or the birthday can use the new materialized views created. SELECT age, name, birthday FROM cyclist_by_country WHERE country = 'Netherlands'; SELECT age, name, birthday FROM cyclist_by_birthday WHERE birthday = '1987-09-04';

If another INSERT is executed on the cyclist_mv table, both the base table and all three of the materialized views created in the above examples will update. When data is deleted from the base table, the data is also deleted from any affected materialized views. Materialized views allow fast lookup of the data using the normal Cassandra read path. However, materialized views do not have the same write performance that normal table writes have, because an additional read-before-write must be done to update the materialized views. Also, a data consistency check on each replica must be completed before the update is done. A write that includes updating materialized views will incur latency. In addition, when data is deleted, performance may suffer. Several tombstones are created, one for the base table and one for each materialized view. Materialized views are built asynchronously after data is inserted into the base table, and can result in delays in updating data. Data cannot be written directly to materialized views, only to base tables. Read repair is done to base tables, resulting in materialized view repair. Additional information on how materialized views work can be found in New in Cassandra 3.0: Materialized Views and Cassandra Summit 2015 talk on Materialized Views.

Creating advanced data types in tables How to create collections and user defined types (UDTs) in tables.

Data can be inserted into tables using more advanced types of data.

Using CQL

Creating collections Collection types provide a way to group and store data together in a table column.

Cassandra provides collection types as a way to group and store data together in a column. For example, in a relational database a grouping such as a user's multiple email addresses is related with a many-to-one joined relationship between a user table and an email table. Cassandra avoids joins between two tables by storing the user's email addresses in a collection column in the user table. Each collection specifies the data type of the data held. A collection is appropriate if the data for collection storage is limited. If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use collections. Instead, use a table with a compound primary key where data is stored in the clustering columns. CQL contains these collection types: • • •

set list map

Observe the following limitations of collections: • • • • •

•

Never insert more than 2 billion items in a collection, as only that number can be queried. The maximum number of keys for a map collection is 65,535. The maximum size of an item in a list or a map collection is 2GB. The maximum size of an item in a set collection is 65,535 bytes. Keep collections small to prevent delays during querying. Collections cannot be "sliced"; Cassandra reads a collection in its entirety, impacting performance. Thus, collections should be much smaller than the maximum limits listed. The collection is not paged internally. Lists can incur a read-before-write operation for some insertions. Sets are preferred over lists whenever possible.

Note: The limits specified for collections are for non-frozen collections. You can expire each element of a collection by setting an individual time-to-live (TTL) property. Also see Using frozen in a collection.

Creating the set type Using the set data type, store unordered multiple items.

A set consists of a group of elements with unique values. Duplicate values are not allowed. The values of a set are stored unordered, but will return the elements in sorted order when queried. Use the set data type to store data that has a many-to-one relationship with another column. For example, in the example below, a set called teams stores all the teams that a cyclist has been a member of during their career.

Procedure •

Define teams in a table cyclist_career_teams. Each team listed in the set will have a textdata type. cqlsh> CREATE TABLE cycling.cyclist_career_teams ( id UUID PRIMARY KEY, lastname text, teams set );

Using CQL

Creating the list type Use a list when the order of elements matter or when you need to store same value multiple times.

A list has a form much like a set, in that a list groups and stores values. Unlike a set, the values stored in a list do not need to be unique and can be duplicated. In addition, a list stores the elements in a particular order and may be inserted or retrieved according to an index value. Use the list data type to store data that has a possible many-to-many relationship with another column. For example, in the example below, a list called events stores all the race events on an upcoming calendar. Each month/year pairing might have several events occurring, and the races are stored in a list. The list can be ordered so that the races appear in the order that they will take place, rather than alphabetical order.

Procedure •

Define events in a table upcoming_calendar. Each event listed in the list will have a textdata type. cqlsh> CREATE TABLE cycling.upcoming_calendar ( year int, month int, events list, PRIMARY KEY ( year, month) );

Creating the map type Use a map when pairs of related elements must be stored as a key-value pair.

A map relates one item to another with a key-value pair. For each key, only one value may exist, and duplicates cannot be stored. Both the key and the value are designated with a data type. . Using the map type, you can store timestamp-related information in user profiles. Each element of the map is internally stored as one Cassandra column that you can modify, replace, delete, and query. Each element can have an individual time-to-live and expire when the TTL ends.

Procedure Define teams in a table cyclist_teams. Each team listed in the map will have an integer data type for the year a cyclist belonged to the team and a textdata type for the team name. The map collection is specified with a map column name and the pair of data types enclosed in angle brackets. cqlsh> CREATE TABLE cycling.cyclist_teams ( id UUID PRIMARY KEY, lastname text, firstname text, teams map );

Using CQL

Creating a table with a tuple How to create a table using the tuple data type.

Tuples are a data type that allow two or more values to be stored together in a column. A user-defined type can be used, but for simple groupings, a tuple is a good choice.

Procedure •

Create a table cycling.route using a tuple to store each waypoint location name, latitude, and longitude. cqlsh> CREATE TABLE cycling.route (race_id int, race_name text, point_id int, lat_long tuple, PRIMARY KEY (race_id, point_id));

•

Create a table cycling.nation_rankusing a tuple to store the rank, cyclist name, and points total for a cyclist and the country name as the primary key. CREATE TABLE cycling.nation_rank ( nation text PRIMARY KEY, info tuple );

•

The table cycling.nation_rank is keyed to the country as the primary key. It is possible to store the same data keyed to the rank. Create a table cycling.popular using a tuple to store the country name, cyclist name and points total for a cyclist and the rank as the primary key. CREATE TABLE cycling.popular (rank int PRIMARY KEY, cinfo tuple );

Creating a user-defined type (UDT) An example of creating a user-defined type (UDT) to attach multiple data fields to a column.

In Cassandra 2.1+, user-defined types (UDTs) can attach multiple data fields to a column. • •

Create the user-defined type (UDT) basic_info. Create a table that defines a column of the basic_info type.

Procedure •

Use the cycling keyspace. cqlsh> USE cycling;

•

Create a user-defined type named basic_info. cqlsh> CREATE TYPE cycling.basic_info ( birthday timestamp, nationality text, weight text, height text );

•

Create a table for storing cyclist data in columns of type basic_info. Use the frozen keyword in the definition of the user-defined type column. When using the frozen keyword, you cannot update parts of a user-defined type value. The entire value must be overwritten. Cassandra treats the value of a frozen, user-defined type like a blob. cqlsh> CREATE TABLE cycling.cyclist_stats ( id uuid PRIMARY KEY, lastname text, basics FROZEN);

Using CQL

•

A user-defined type can be nested in another column type. This example nests a UDT in a list. CREATE TYPE cycling.race (race_title text, race_date timestamp, race_time text); CREATE TABLE cycling.cyclist_races ( id UUID PRIMARY KEY, lastname text, firstname text, races list );

Creating functions How to create functions.

In Cassandra 3.0, users can create user-defined functions (UDFs) and user-defined aggregate functions (UDAs). Functions are used to manipulate stored data in queries. Retrieving results using standard aggregate functions are also available for queries. For more information on user-defined aggregates, see Cassandra Aggregates - min, max, avg, group by and A few more Cassandra aggregates.

Create user-defined function (UDF) User-Defined Functions (UDFs) can be used to manipulate stored data with a function of the user's choice.

Cassandra 2.2 and later allows users to define functions that can be applied to data stored in a table as part of a query result. The function must be created prior to its use in a SELECT statement. The function will be performed on each row of the table. To use user-defined functions, enable_user_defined_functions must be set true incassandra.yaml file setting to enable the functions. User-defined functions are defined within a keyspace. If no keyspace is defined, the current keyspace is used. User-defined functions are executed in a sandbox in Cassandra 3.0 and later. By default, Cassandra 2.2 and later supports defining functions in java and javascript. Other scripting languages, such as Python, Ruby, and Scala can be added by adding a JAR to the classpath.

Procedure •

Create a function, specifying the data type of the returned value, the language, and the actual code of the function to be performed. The following function, fLog(), computes the logarithmic value of each input. It is a built-in java function and used to generate linear plots of non-linear data. For this example, it presents a simple math function to show the capabilities of user-defined functions. cqlsh> CREATE OR REPLACE FUNCTION fLog (input double) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(Math.log(input.doubleValue()));';

•

Note: CALLED ON NULL INPUT ensures the function will always be executed. RETURNS NULL ON NULL INPUT ensures the function will always return NULL if any of the input arguments is NULL. RETURNS defines the data type of the value returned by the function. A function can be replaced with a different function if OR REPLACE is used as shown in the example above. Optionally, the IF NOT EXISTS keywords can be used to create the function only if another function with the same signature does not exist in the keyspace. OR REPLACE and IF NOT EXISTS cannot be used in the same command. cqlsh> CREATE FUNCTION IF NOT EXISTS fLog (input double) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(Math.log(input.doubleValue()));';

Using CQL

Create User-Defined Aggregate Function (UDA) User-Defined Aggregates(UDAs) can be used to manipulate stored data across rows of data, returning a result that is further manipulated by a final function.

Cassandra 2.2 and later allows users to define aggregate functions that can be applied to data stored in a table as part of a query result. The function must be created prior to its use in a SELECT statement and the query must only include the aggregate function itself, but no columns. The state function is called once for each row, and the value returned by the state function becomes the new state. After all rows are processed, the optional final function is executed with the last state value as its argument. Aggregation is performed by the coordinator. The example shown computes the team average for race time for all the cyclists stored in the table. The race time is computed in seconds.

Procedure •

Create a state function, as a user-defined function (UDF), if needed. This function adds all the race times together and counts the number of entries. cqlsh> CREATE OR REPLACE FUNCTION avgState ( state tuple, val int ) CALLED ON NULL INPUT RETURNS tuple LANGUAGE java AS 'if (val !=null) { state.setInt(0, state.getInt(0)+1); state.setLong(1, state.getLong(1)+val.intValue()); } return state;';

•

Create a final function, as a user-defined function (UDF), if needed. This function computes the average of the values passed to it from the state function. cqlsh> CREATE OR REPLACE FUNCTION avgFinal ( state tuple ) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'double r = 0; if (state.getInt(0) == 0) return null; r = state.getLong(1); r/= state.getInt(0); return Double.valueOf(r);';

•

Create the aggregate function using these two functions, and add an STYPE to define the data type for the function. Different STYPEs will distinguish one function from another with the same name. An aggregate can be replaced with a different aggregate if OR REPLACE is used as shown in the examples above. Optionally, the IF NOT EXISTS keywords can be used to create the aggregate only if another aggregate with the same signature does not exist in the keyspace. OR REPLACE and IF NOT EXISTS cannot be used in the same command. cqlsh> CREATE AGGREGATE IF NOT EXISTS average ( int ) SFUNC avgState STYPE tuple FINALFUNC avgFinal INITCOND (0,0);

Inserting and updating data How to insert data into a table with either regular or JSON data.

Data can be inserted into tables using the INSERT command. With Cassandra 3.0, JSON data can be inserted.

Inserting simple data into a table Inserting set data with the INSERT command.

In a production database, inserting columns and column values programmatically is more practical than using cqlsh, but often, testing queries using this SQL-like shell is very convenient. Insertion, update, and deletion operations on rows sharing the same partition key for a table are performed atomically and in isolation.

Using CQL

Procedure •

To insert simple data into the table cycling.cyclist_name, use the INSERT command. This example inserts a single record into the table. cqlsh> INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS','Marianne');

•

You can insert complex string constants using double dollar signs to enclose a string with quotes, backslashes, or other characters that would normally need to be escaped. cqlsh> INSERT INTO cycling.calendar (race_id, race_start_date, race_end_date, race_name) VALUES (201, '2015-02-18', '2015-02-22', $$Women's Tour of New Zealand$$);

Inserting and updating data into a set How to insert or update data into a set.

If a table specifies a set to hold data, then either INSERT or UPDATE is used to enter data.

Procedure •

Insert data into the set, enclosing values in curly brackets. Set values must be unique, because no order is defined in a set internally. cqlsh>INSERT INTO cycling.cyclist_career_teams (id,lastname,teams) VALUES (5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS', { 'Rabobank-Liv Woman Cycling Team','Rabobank-Liv Giant','Rabobank Women Team','Nederland bloeit' } );

•

Add an element to a set using the UPDATE command and the addition (+) operator. cqlsh> UPDATE cycling.cyclist_career_teams SET teams = teams + {'Team DSB - Ballast Nedam'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

•

Remove an element from a set using the subtraction (-) operator. cqlsh> UPDATE cycling.cyclist_career_teams SET teams = teams - {'WOMBATS - Womens Mountain Bike & Tea Society'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

•

Remove all elements from a set by using the UPDATE or DELETE statement. A set, list, or map needs to have at least one element; otherwise, Cassandra cannot distinguish the set from a null value. cqlsh> UPDATE cyclist.cyclist_career_teams SET teams = {} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2; DELETE teams FROM cycling.cyclist_career_teams WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2; A query for the teams returns null. cqlsh> SELECT id, teams FROM users WHERE id = 5b6962dd-3f90-4c93-8f61eabfa4a803e2;

Using CQL

Inserting and updating data into a list How to insert or update data into a list.

If a table specifies a list to hold data, then either INSERT or UPDATE is used to enter data.

Procedure •

Insert data into the list, enclosing values in square brackets. INSERT INTO cycling.upcoming_calendar (year, month, events) VALUES (2015, 06, ['Criterium du Dauphine','Tour de Suisse']);

•

Use the UPDATE command to insert values into the list. Prepend an element to the list by enclosing it in square brackets and using the addition (+) operator. cqlsh> UPDATE cycling.upcoming_calendar SET events = ['The Parx Casino Philly Cycling Classic'] + events WHERE year = 2015 AND month = 06;

•

Append an element to the list by switching the order of the new element data and the list name in the UPDATE command. cqlsh> UPDATE cycling.upcoming_calendar SET events = events + ['Tour de France Stage 10'] WHERE year = 2015 AND month = 06;

•

These update operations are implemented internally without any read-before-write. Appending and prepending a new element to the list writes only the new element. Add an element at a particular position using the list index position in square brackets. cqlsh> UPDATE cycling.upcoming_calendar SET events[2] = 'Vuelta Ciclista a Venezuela' WHERE year = 2015 AND month = 06;

•

To add an element at a particular position, Cassandra reads the entire list, and then rewrites the part of the list that needs to be shifted to the new index positions. Consequently, adding an element at a particular position results in greater latency than appending or prefixing an element to a list. Remove an element from a list, use the DELETE command and the list index position in square brackets. For example, remove the event just placed in the list in the last step. cqlsh> DELETE events[2] FROM cycling.upcoming_calendar WHERE year = 2015 AND month = 06; The method of removing elements using an indexed position from a list requires an internal read. In addition, the client-side application could only discover the indexed position by reading the whole list

Using CQL

•

and finding the values to remove, adding additional latency to the operation. If another thread or client prepends elements to the list before the operation is done, incorrect data will be removed. Remove all elements having a particular value using the UPDATE command, the subtraction operator (-), and the list value in square brackets. cqlsh> UPDATE cycling.upcoming_calendar SET events = events - ['Tour de France Stage 10'] WHERE year = 2015 AND month = 06; Using the UPDATE command as shown in this example is recommended over the last example because it is safer and faster.

Inserting and updating data into a map How to insert or update data into a map.

If a table specifies a map to hold data, then either INSERT or UPDATE is used to enter data.

Procedure •

Set or replace map data, using the INSERT or UPDATE command, and enclosing the integer and text values in a map collection with curly brackets, separated by a colon. cqlsh> INSERT INTO cycling.cyclist_teams (id, lastname, firstname, teams) VALUES ( 5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS', 'Marianne', {2015 : 'Rabobank-Liv Woman Cycling Team', 2014 : 'Rabobank-Liv Woman Cycling Team', 2013 : 'Rabobank-Liv Giant', 2012 : 'Rabobank Women Team', 2011 : 'Nederland bloeit' });

•

Note: Using INSERT in this manner will replace the entire map. Use the UPDATE command to insert values into the map. Append an element to the map by enclosing the key-value pair in curly brackets and using the addition (+) operator. cqlsh> UPDATE cycling.cyclist_teams SET teams = teams + {2009 : 'DSB Bank - Nederland bloeit'} WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

•

Set a specific element using the UPDATE command, enclosing the specific key of the element, an integer, in square brackets, and using the equals operator to map the value assigned to the key. cqlsh> UPDATE cycling.cyclist_teams SET teams[2006] = 'Team DSB - Ballast Nedam' WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2;

•

Delete an element from the map using the DELETE command and enclosing the specific key of the element in square brackets: cqlsh> DELETE teams[2009] FROM cycling.cyclist_teams WHERE id=e7cd5752bc0d-4157-a80f-7523add8dbcd;

•

Alternatively, remove all elements having a particular value using the UPDATE command, the subtraction operator (-), and the map key values in curly brackets. cqlsh> UPDATE cycling.cyclist_teams SET teams = teams - {'2013','2014'} WHERE id=e7cd5752-bc0d-4157-a80f-7523add8dbcd;

Using CQL

Inserting tuple data into a table Tuples are used to group small amounts of data together that are then stored in a column.

Procedure •

Insert data into the table cycling.route which has tuple data. The tuple is enclosed in parentheses. This tuple has a tuple nested inside; nested parentheses are required for the inner tuple, then the outer tuple. cqlsh> INSERT INTO cycling.route (race_id, race_name, point_id, lat_long) VALUES (500, '47th Tour du Pays de Vaud', 2, ('Champagne', (46.833, 6.65)));

•

Insert data into the table cycling.nation_rank which has tuple data. The tuple is enclosed in parentheses. The tuple called info stores the rank, name, and point total of each cyclist. cqlsh> INSERT INTO cycling.nation_rank (nation, info) VALUES ('Spain', (1,'Alejandro VALVERDE' , 9054));

•

Insert data into the table popular which has tuple data. The tuple called cinfo stores the country name, cyclist name, and points total. cqlsh> INSERT INTO cycling.popular (rank, cinfo) VALUES (4, ('Italy', 'Fabio ARU', 163));

Inserting data into a user-defined type (UDT) How to insert or update data into a user-defined type (UDT).

If a table specifies a user-defined type (UDT) to hold data, then either INSERT or UPDATE is used to enter data.

Procedure 1. Set or replace user-defined type data, using the INSERT or UPDATE command, and enclosing the userdefined type with curly brackets, separating each key-value pair in the user-defined type by a colon. cqlsh> INSERT INTO cycling.cyclist_stats (id, lastname, basics) VALUES ( e7ae5cf3-d358-4d99-b900-85902fda9bb0, 'FRAME', { birthday : '1993-06-18', nationality : 'New Zealand', weight : null, height : null } ); Note: Note the inclusion of null values for UDT elements that have no value. A value, whether null or otherwise, must be included for each element of the UDT. 2. Data can be inserted into a UDT that is nested in another column type. For example, a list of races, where the race name, date, and time are defined in a UDT has elements enclosed in curly brackets that are in turn enclosed in square brackets. cqlsh> INSERT INTO cycling.cyclist_races (id, lastname, firstname, races) VALUES ( 5b6962dd-3f90-4c93-8f61-eabfa4a803e2, 'VOS', 'Marianne',

Using CQL

[ { race_title : 'Rabobank 7-Dorpenomloop Aalburg',race_date : '2015-05-09',race_time : '02:58:33' }, { race_title : 'Ronde van Gelderland',race_date : '2015-04-19',race_time : '03:22:23' } ] ); Note: The UDT nested in the list is frozen, so the entire list will be read when querying the table.

Inserting JSON data into a table Inserting JSON data with the INSERT command for testing queries.

In a production database, inserting columns and column values programmatically is more practical than using cqlsh, but often, testing queries using this SQL-like shell is very convenient. With Cassandra 2.2 and later, JSON data can be inserted. All values will be inserted as a string if they are not a number, but will be stored using the column data type. For example, the id below is inserted as a string, but is stored as a UUID. For more information, see What's New in Cassandra 2.2: JSON Support.

Procedure •

To insert JSON data, add JSON to the INSERT command.. Note the absence of the keyword VALUES and the list of columns that is present in other INSERT commands. cqlsh> INSERT INTO cycling.cyclist_category JSON '{ "category" : "GC", "points" : 780, "id" : "829aa84a-4bba-411f-a4fb-38167a987cda", "lastname" : "SUTHERLAND" }';

•

A null value will be entered if a defined column like lastname, is not inserted into a table using JSON format. cqlsh> INSERT INTO cycling.cyclist_category JSON '{ "category" : "Sprint", "points" : 700, "id" : "829aa84a-4bba-411f-a4fb-38167a987cda" }';

Using lightweight transactions INSERT and UPDATE statements that use the IF clause support lightweight transactions, also known as Compare and Set (CAS).

INSERT and UPDATE statements using the IF clause support lightweight transactions, also known as Compare and Set (CAS). A common use for lightweight transactions is an insertion operation that must be unique, such as a cyclist's identification. Lightweight transactions should not be used casually, as the latency of operations increases fourfold due to the read-before-write nature. Cassandra 2.1.1 and later support non-equal conditions for lightweight transactions. You can use =, != and IN operators in WHERE clauses to query lightweight tables.

Using CQL

It is important to note that using IF NOT EXISTS on an INSERT, the timestamp will be designated by the lightweight transaction, and USING TIMESTAMP is prohibited.

Procedure •

Insert a new cyclist with their id. cqlsh> INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (4647f6d3-7bd2-4085-8d6c-1229351b5498, 'KNETEMANN', 'Roxxane') IF NOT EXISTS;

•

Perform a CAS operation against a row that does exist by adding the predicate for the operation at the end of the query. For example, reset Roxane Knetemann's firstname because of a spelling error. cqlsh> UPDATE cycling.cyclist_name SET firstname = ‘Roxane’ WHERE id = 4647f6d3-7bd2-4085-8d6c-1229351b5498 IF firstname = ‘Roxxane’;

Expiring data with Time-To-Live Data in a column, other than a counter column, can have an optional expiration period called TTL (time to live).

Data in a column, other than a counter column, can have an optional expiration period called TTL (time to live). The client request specifies a TTL value, defined in seconds, for the data. TTL data is marked with a tombstone after the requested amount of time has expired. A tombstone exists for gc_grace_seconds. After data is marked with a tombstone, the data is automatically removed during the normal compaction and repair processes. Use CQL to set the TTL for data. If you want to change the TTL of expiring data, the data must be re-inserted with a new TTL. In Cassandra, the insertion of data is actually an insertion or update operation, depending on whether or not a previous version of the data exists. TTL data has a precision of one second, as calculated on the server. Therefore, a very small TTL is not very useful. Moreover, the clocks on the servers must be synchronized; otherwise reduced precision will be observed because the expiration time is computed on the primary host that receives the initial insertion but is then interpreted by other hosts on the cluster. Expiring data has an additional overhead of 8 bytes in memory and on disk (to record the TTL and expiration time) compared to standard data.

Expiring data with TTL example Using the INSERT and UPDATE commands for setting the expire time for data in a column.

Both the INSERT and UPDATE commands support setting a time for data in a column to expire. The expiration time (TTL) is set using CQL.

Procedure •

Use the INSERT command to set calendar listing in the calendar table to expire in 86400 seconds, or one day. cqlsh> INSERT INTO cycling.calendar (race_id, race_name, race_start_date, race_end_date) VALUES (200, 'placeholder', '2015-05-27', '2015-05-27') USING TTL 86400;

•

Extend the expiration period to three days by using the UPDATE command and change the race name. cqlsh> UPDATE cycling.calendar USING TTL 259200

Using CQL

SET race_name = 'Tour de France - Stage 12' WHERE race_id = 200 AND race_start_date = '2015-05-27' AND race_end_date = '2015-05-27';

Inserting data using COPY and a CSV file Inserting data with the cqlsh command COPY from a CSV file is common for testing queries.

In a production database, inserting columns and column values programmatically is more practical than using cqlsh, but often, testing queries using this SQL-like shell is very convenient. A comma-delimited file, or CSV file, is useful if several records need inserting. While not strictly an INSERT command, it is a common method for inserting data.

Procedure 1. Locate your CSV file and check options to use. category|point|id|lastname GC|1269|2003|TIRALONGO One-day-races|367|2003|TIRALONGO GC|1324|2004|KRUIJSWIJK 2. To insert the data, using the COPY command with CSV data. $ COPY cycling.cyclist_catgory FROM 'cyclist_category.csv' WITH DELIMITER='|' AND HEADER=TRUE

Batching data insertion and updates How to batch insert or update data into a table.

Batching is used to insert or update data in tables. Understanding the use of batching, if used, is crucial to performance.

Using and misusing batches When to use and not use batches.

Batches are often mistakenly used in an attempt to optimize performance. Improved performance is not a reason to use batches. Batches place a burden on the coordinator for both logged and unlogged batches. Batches are best used when a small number of tables must synchronize inserted or updated data. The number of partitions involved in a batch operation and thus potential for multi-node access, can increase the amount of time the operation takes to complete. Batch statements are logged, causing additional performance latency. For logged batches, the coordinator sends a batch log to two other nodes, so that if the coordinator fails, the batch will be retried by those nodes. For unlogged batches, the coordinator manages all insert/update operations, causing that single node to do more work. If the partition keys for the operations are stored on more than one node, extra network hops occur. Note: Unlogged batches are deprecated in Cassandra2.2+. Batched statements can save network round-trips between the client and the server, and possibly between the server coordinator and the replicas. However, consider carefully before implementing batch operations, and decide if they are truly necessary. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword."

Using CQL

Use of BATCH statement How to use a BATCH statement.

Batch operations can be either beneficial or detrimental. Look at the examples below to see a good use of BATCH.

Procedure •

An example of a good batch that is logged. Note in the table definition for cyclist_expenses, the balance column is STATIC. cqlsh> CREATE TABLE cycling.cyclist_expenses ( cyclist_name text, balance float STATIC, expense_id int, amount float, description text, paid boolean, PRIMARY KEY (cyclist_name, expense_id) );

•

The first INSERT in the BATCH statement sets the balance to zero. The next two statements insert an expense and change the balance value. All the INSERT and UPDATE statements in this batch write to the same partition, keeping the latency of the write operation low. cqlsh> BEGIN BATCH INSERT INTO cycling.cyclist_expenses (cyclist_name, balance) VALUES ('Vera ADRIAN', 0) IF NOT EXISTS; INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 1, 7.95, 'Breakfast', false); APPLY BATCH; As explained in the BATCH statement reference, in Cassandra 2.0.6+ you can batch conditional updates. This example shows batching conditional updates combined with using static columns. Note: It would be reasonable to expect that an UPDATE to the balance could be included in this BATCH statement: cqlsh> UPDATE cycling.cyclist_expenses SET balance = -7.95 WHERE cyclist_name = 'Vera ADRIAN' IF balance = 0; However, it is important to understand that all the statements processed in a BATCH statement timestamp the records with the same value. The operations may not perform in the order listed in the BATCH statement. The UPDATE might be processed BEFORE the first INSERT that sets the balance value to zero, allowing the conditional to be met. An acknowledgement of a batch statement is returned if the batch operation is successful.

Using CQL

The resulting table will only have one record so far.

•

The balance can be adjusted separately with an UPDATE statement. Now the balance will reflect that breakfast was unpaid. cqlsh> UPDATE cycling.cyclist_expenses SET balance = -7.95 WHERE cyclist_name = 'Vera ADRIAN' IF balance = 0;

•

The table cyclist_expenses stores records about each purchase by a cyclist and includes the running balance of all the cyclist's purchases. Because the balance is static, all purchase records for a cyclist have the same running balance. This BATCH statement inserts expenses for two more meals changes the balance to reflect that breakfast and dinner were unpaid. cqlsh> BEGIN BATCH INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 2, 13.44, 'Lunch', true); INSERT INTO cycling.cyclist_expenses (cyclist_name, expense_id, amount, description, paid) VALUES ('Vera ADRIAN', 3, 25.00, 'Dinner', false); UPDATE cycling.cyclist_expenses SET balance = -32.95 WHERE cyclist_name = 'Vera ADRIAN' IF balance = -7.95; APPLY BATCH;

Using CQL

•

Finally, the cyclist pays off all outstanding bills and the balance of the account goes to zero. caqlsh> BEGIN BATCH UPDATE cycling.cyclist_expenses SET ADRIAN' IF balance = -32.95; UPDATE cycling.cyclist_expenses SET ADRIAN' AND expense_id = 1 IF paid UPDATE cycling.cyclist_expenses SET ADRIAN' AND expense_id = 3 IF paid APPLY BATCH;

balance = 0 WHERE cyclist_name = 'Vera paid = true WHERE cyclist_name = 'Vera = false; paid = true WHERE cyclist_name = 'Vera = false;

Because the column is static, you can provide only the partition key when updating the data. To update a non-static column, you would also have to provide a clustering key. Using batched conditional updates, you can maintain a running balance. If the balance were stored in a separate table, maintaining a running balance would not be possible because a batch having conditional updates cannot span multiple partitions.

Misuse of BATCH statement How to misuse a BATCH statement.

Misused, BATCH statements can cause many problems in a distributed database like Cassandra. Batch operations that involve multiple nodes are a definite anti-pattern. Keep in mind which partition data will be written to when grouping INSERT and UPDATE statements in a BATCH statement. Writing to several partitions might require interaction with several nodes in the cluster, causing a great deal of latency for the write operation.

Procedure •

This example shows an anti-pattern since the BATCH statement will write to several different partitions, given the partition key id. cqlsh> BEGIN BATCH INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (6d5f1663-89c0-45fc-8cfd-60a373b01622,'HOSKINS', 'Melissa); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (38ab64b6-26cc-4de9-ab28-c257cf011659,'FERNANDES', 'Marcia'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (9011d3be-d35c-4a8d-83f7-a3c543789ee7,'NIEWIADOMA', 'Katarzyna'); INSERT INTO cycling.cyclist_name (id, lastname, firstname) VALUES (95addc4c-459e-4ed7-b4b5-472f19a67995,'ADRIAN', 'Vera'); APPLY BATCH;

Using unlogged batches How to use an unlogged BATCH statement.

Unlogged BATCH statements require management by the coordinator. Best case scenario for using unlogged BATCH statements is when inserts will all take place on a single node.

Using CQL

Procedure •

An unlogged batch that writes to a single partition resolves to only one write internally, regardless of the number of writes, and is an acceptable use of batch. In this example, the partition key includes both date and time. cqlsh> BEGIN UNLOGGED BATCH; INSERT INTO sensor_readings (date, time, reading) values (20140910,'2014-09-10T11:00:00.00+0000', 6335.2); INSERT INTO sensor_readings (date, time, reading) values (20140910,'2014-09-10T11:00:15.00+0000', 5222.2); APPLY BATCH;

Querying tables How to query data from tables.

Data can be queried from tables using the SELECT command. With Cassandra 3.0, many new options are available, such as retrieving JSON data, using standard aggregate functions, and manipulating retrieved data with user-defined functions (UDFs) and user-defined aggregate functions (UDAs).

Retrieval and sorting results Using the SELECT command for simple queries.

Querying tables to select data is the reason data is stored in databases. Similar to SQL, CQL can SELECT data using simple or complex qualifiers. At its simplest, a query selects all data in a table. At its most complex, a query delineates which data to retrieve and display and even calculate new values based on user-defined functions.

Procedure •

The example below illustrates how to create a query that uses category as a filter. cqlsh> SELECT * FROM cycling.cyclist_category WHERE category = 'SPRINT';

Note that Cassandra will reject this query if category is not a partition key or clustering column. Queries require a sequential retrieval across the entire cyclist_category table. In a distributed database like Cassandra, this is a crucial concept to grasp; scanning all data across all nodes is prohibitively slow and thus blocked from execution. The use of partition key and clustering columns in a WHERE clause must result in the selection of a contiguous set of rows. Queries can filter using secondary indexes, discussed in the Indexing Tables section. A query based on lastname can result in satisfactory results if the lastname column is indexed.

Using CQL

•

You can also pick the columns to display instead of choosing all data. cqlsh> SELECT category, points, lastname FROM cycling.cyclist_category;

•

For a large table, limit the number of rows retrieved using LIMIT. The default limit is 10,000 rows. To sample data, pick a smaller number. To retrieve more than 10,000 rows set LIMIT to a large value. cqlsh> SELECT * From cycling.cyclist_name LIMIT 3;

•

You can fine-tune the display order using the ORDER BY clause. The partition key must be defined in the WHERE clause and the ORDER BY clause defines the clustering column to use for ordering. cqlsh> CREATE TABLE cycling.cyclist_cat_pts ( category text, points int, id UUID,lastname text, PRIMARY KEY (category, points) ); SELECT * FROM cycling.cyclist_cat_pts WHERE category = 'GC' ORDER BY points ASC;

•

Tuples are retrieved in their entirety. This example uses AS to change the header of the tuple name. cqlsh> SELECT race_name, point_id, lat_long AS CITY_LATITUDE_LONGITUDE FROM cycling.route;

Using CQL

Retrieval using collections How to retrieve data from a collection.

Collections do not differ from other columns in retrieval. To query for a subset of the collection, a secondary index for the collection must be created.

Procedure •

Retrieve teams for a particular cyclist id from the set. cqlsh> SELECT lastname, teams FROM cycling.cyclist_career_teams WHERE id = 5b6962dd-3f90-4c93-8f61-eabfa4a803e2; To query a table containing a collection, Cassandra retrieves the collection in its entirety. Keep collections small enough to be manageable because the collection store in memory. Alternatively, construct a data model to replace a collection if it must accommodate large amounts of data. Cassandra returns results in an order based on the type of the elements in the collection. For example, a set of text elements is returned in alphabetical order. If you want elements of the collection returned in insertion order, use a list.

•

Retrieve events stored in a list from the upcoming calendar for a particular year and month. cqlsh> SELECT * FROM cycling.upcoming_calendar WHERE year=2015 AND month=06;

•

Note: The order is not alphabetical, but rather in the order of insertion. Retrieve teams for a particular cyclist id from the map. cqlsh> SELECT lastname, firstname, teams FROM cycling.cyclist_teams WHERE id=5b6962dd-3f90-4c93-8f61-eabfa4a803e2; The order of the map output depends on the key type of the map. In this case, the key is an integer type.

Using CQL

Retrieval using JSON Using the SELECT command to return JSON data.

The SELECT command can be used to retrieve data from a table in JSON format. For more information, see What's New in Cassandra 2.2: JSON Support.

Procedure Specify that the result should use the JSON format with the keyword JSON. cqlsh> SELECT JSON month, year, events FROM cycling.upcoming_calendar;

Retrieval using the IN keyword Using the SELECT command with the IN keyword.

The IN keyword can define a set of clustering columns to fetch together, supporting a "multi-get" of CQL rows. A single clustering column can be defined if all preceding columns are defined for either equality or group inclusion. Alternatively, several clustering columns may be defined to collect several rows, as long as all preceding columns are queried for equality or group inclusion. The defined clustering columns can also be queried for inequality. Note that using both IN and ORDER BY will require turning off paging with the PAGING OFF command in cqlsh.

Procedure •

Turn off paging. cqlsh> PAGING OFF

•

Retrieve and sort results in descending order. cqlsh> SELECT * FROM cycling.cyclist_cat_pts WHERE category IN ('Timetrial', 'Sprint') ORDER BY id DESC;

•

Alternatively, retrieve and sort results in ascending order. To retrieve results, use the SELECT command. cqlsh> SELECT * FROM cycling.cyclist_cat_pts WHERE category IN ('Timetrial', 'Sprint') ORDER BY id ASC;

Using CQL

•

Retrieve rows using multiple clustering columns. This example searches the partition key race_ids for several races, but the partition key can also be composed as an equality for one value. cqlsh> SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102) AND (race_start_date, race_end_date) IN (('2015-05-09','2015-05-31'), ('2015-05-06', '2015-05-31'));

•

Retrieve several rows using multiple clustering columns and inequality. cqlsh> SELECT * FROM cycling.calendar WHERE race_id IN (100, 101, 102) AND (race_start_date, race_end_date) >= ('2015-05-09','2015-05-24');

Retrieval by scanning a partition Scanning partition when the table has more than one clustering column and comparing groups of clustering keys to certain values.

Queries can scan a partition to retrieve a segment of stored data. The segment must be sequentially stored, so clustering columns can be used to define the slice of data selected.

Procedure •

Create a table race_times to hold the race times of various cyclists for various races. CREATE TABLE cycling.race_times (race_name text, cyclist_name text, race_time text, PRIMARY KEY (race_name, race_time));

•

Scan the race times in the table to find a particular segment of data using a conditional operator. SELECT * FROM cycling.race_times WHERE race_name = '17th Santos Tour Down Under' AND race_time >= '19:15:19' AND race_time CREATE TABLE cycling.cyclist_points (id UUID, firstname text, lastname text, race_title text, race_points int, PRIMARY KEY (id, race_points );

•

Calculate the standard aggregation function sum to find the sum of race points for a particular cyclist. The value of the aggregate will be returned. cqlsh> SELECT sum(race_points) FROM cycling.cyclist_points WHERE id=e3b19ec4-774a-4d1c-9e5a-decec1e30aac;

•

Another standard aggregate function is count. A table country_flag records the country of each cyclist. CREATE TABLE cycling.country_flag (country text, cyclist_name text, flag int STATIC, PRIMARY KEY (country, cyclist_name));

Using CQL

•

Calculate the standard aggregation function count to find the number of cyclists from Belgium. The value of the aggregate will be returned. cqlsh> SELECT count(cyclist_name) FROM cycling.country_flag WHERE country='Belgium';

Retrieval using a user-defined function (UDF) Using the SELECT command to return data and applying a UDF.

The SELECT command can be used to retrieve data from a table while applying a user-defined function (UDF) to it.

Procedure Use the user-defined function (UDF) fLog() created previously to retrieve data from a table cycling.cyclist_points. cqlsh> SELECT id, lastname, fLog(race_points) FROM cycling.cyclist_points;

Retrieval using user-defined aggregate (UDA) functions Using the SELECT command to return data and apply user-defined aggregate functions.

Referring back to the user-defined aggregate average(), retrieve the average of the column cyclist_time_sec from a table.

Using CQL

Procedure 1. List all the data in the table. cqlsh> SELECT * FROM cycling.team_average;

2. Apply the user-defined aggregate function average() to the cyclist_time_sec column. cqlsh> SELECT average(cyclist_time_sec) FROM cycling.team_average WHERE team_name='UnitedHealthCare Pro Cycling Womens Team' AND race_title='Amgen Tour of California Women''s Race presented by SRAM Stage 1 - Lake Tahoe > Lake Tahoe';

Querying a system table Details about Cassandra database objects and cluster configuration in the system keyspace tables.

The system keyspace includes a number of tables that contain details about your Cassandra database objects and cluster configuration. Cassandra populates these tables and others in the system keyspace. Table: Columns in System Tables Table name

Column name

Comment

local

"key", bootstrapped, Information on a node has about itself broadcast_address,cluster_name,cql_version,data_center,gossip_generation,host_id,native_proto and a superset of gossip. truncated_at map

peers

peer, data_center, host_id,preferred_ip,rack, release_version, rpc_address, schema_version, tokens

Each node records what other nodes tell it about themselves over the gossip.

schema_aggregates keyspace_name, aggregate_name, Information about user-defined signature, argument_types, aggregates final_func,initcond,return_type,state_func,state_type

Using CQL

Table name

Column name

Comment

schema_columnfamilies See comment.

Inspect schema_columnfamilies to get detailed information about specific tables.

schema_columns

keyspace_name, columnfamily_name, column_name, component_index, index_name, index_options, index_type, validator

Information on columns and column indexes. Used internally for compound primary keys.

schema_functions

keyspace_name, Information on user-defined functions function_name, signature, argument_names,argument_types, body,called_on_null_input,language,return_type

schema_keyspaces keyspace_name, durable_writes, strategy_class, strategy_options schema_usertypes

Information on keyspace strategy class and replication factor

keyspace_name,type_name,field_names,field_types Information about user-defined types

Keyspace, table, and column information Querying system.schema_* tables directly to get keyspace, table, and column information.

An alternative to the cqlsh describe_* functions or using DevCenter to discover keyspace, table, and column information is querying system.schema_* table directly.

Procedure •

Query the defined keyspaces using the SELECT statement. cqlsh> SELECT * FROM system.schema_keyspaces;

•

Query schema_columnfamilies about a particular table. cqlsh> SELECT * FROM system.schema_columnfamilies WHERE keyspace_name = 'cycling' AND columnfamily_name = 'cyclist_name';

Using CQL

•

Query schema_columns about the columns in a table. cqlsh> SELECT * FROM system.schema_columns WHERE keyspace_name = 'cycling' AND columnfamily_name = 'cyclist_name';

Using CQL

Cluster information Querying system tables to get cluster topology information.

You can query system tables to get cluster topology information. Display the IP address of peer nodes, data center and rack names, token values, and other information. "The Data Dictionary" article describes querying system tables in detail.

Procedure After setting up a cluster, query the peers and local tables. SELECT * FROM system.peers; Output from querying the peers table looks something like this: peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens -----------+-------------+-------------+--------------+------+-----------------+-------------+----------------+----------127.0.0.3 | datacenter1 | edda8d72... | null | rack1 | 2.1.0 | 127.0.0.3 | 59adb24e-f3... | {3074... 127.0.0.2 | datacenter1 | ef863afa... | null | rack1 | 2.1.0 | 127.0.0.2 | 3d19cd8f-c9... | {-3074...} (2 rows)

Functions, aggregates, and user types Querying system.schema_* tables directly to get information about user-defined functions, aggregates, and user types.

Currently, the system tables are the only method of displaying information about user-defined functions, aggregates, and user types.

Procedure •

Show all user-defined functions in the system.schema_functions table. cqlsh> SELECT * FROM system.schema_functions;

Using CQL

•

Show all user-defined aggregates in the system.schema_aggregates table. cqlsh> SELECT * FROM system.schema_aggregates;

•

Show all user-defined types in the system.schema_usertypes table. cqlsh> SELECT * FROM system.schema_usertypes;

Indexing tables How to query data from tables using indexes.

Data can be queried from tables using indexes, once created.

Indexing An index provides a means to access data in Cassandra using attributes other than the partition key for fast, efficient lookup of data matching a given condition.

An index provides a means to access data in Cassandra using attributes other than the partition key. The benefit is fast, efficient lookup of data matching a given condition. The index indexes column values in a separate, hidden table from the one that contains the values being indexed. Cassandra has a number of techniques for guarding against the undesirable scenario where a data might be incorrectly retrieved during a query involving indexes on the basis of stale values in the index. As mentioned earlier, in Cassandra 2.1 and later, you can index collection columns.

When to use an index When and when not to use an index.

Cassandra's built-in indexes are best on a table having many rows that contain the indexed value. The more unique values that exist in a particular column, the more overhead you will have, on average, to query and maintain the index. For example, suppose you had a races table with a billion entries for cyclists in hundreds of races and wanted to look up rank by the cyclist. Many cyclists' ranks will share the same column value for race year. The race_year column is a good candidate for an index.

When not to use an index Do not use an index in these situations: • • • •

On high-cardinality columns for a query of a huge volume of records for a small number of results. See Problems using a high-cardinality column index below. In tables that use a counter column On a frequently updated or deleted column. See Problems using an index on a frequently updated or deleted column below. To look for a row in a large partition unless narrowly queried. See Problems using an index to look for a row in a large partition unless narrowly queried below.

Problems using a high-cardinality column index

Using CQL

If you create an index on a high-cardinality column, which has many distinct values, a query between the fields will incur many seeks for very few results. In the table with a billion songs, looking up songs by writer (a value that is typically unique for each song) instead of by their artist, is likely to be very inefficient. It would probably be more efficient to manually maintain the table as a form of an index instead of using the Cassandra built-in index. For columns containing unique data, it is sometimes fine performance-wise to use an index for convenience, as long as the query volume to the table having an indexed column is moderate and not under constant load. Conversely, creating an index on an extremely low-cardinality column, such as a boolean column, does not make sense. Each value in the index becomes a single row in the index, resulting in a huge row for all the false values, for example. Indexing a multitude of indexed columns having foo = true and foo = false is not useful. Problems using an index on a frequently updated or deleted column Cassandra stores tombstones in the index until the tombstone limit reaches 100K cells. After exceeding the tombstone limit, the query that uses the indexed value will fail. Problems using an index to look for a row in a large partition unless narrowly queried A query on an indexed column in a large cluster typically requires collating responses from multiple data partitions. The query response slows down as more machines are added to the cluster. You can avoid a performance hit when looking for a row in a large partition by narrowing the search.

Using a secondary index Using CQL to create a secondary index on a column after defining a table.

Using CQL, you can create an index on a column after defining a table. In Cassandra 2.1 and later, you can index a collection column. Secondary indexes are used to query a table using a column that is not normally queryable, such as the partition key or the partition key plus the first clustering column. Secondary indexes are tricky to use and can impact performance greatly. The index table is stored on each node in a cluster, so a query involving a secondary index can rapidly become a performance nightmare if multiple nodes are accessed. A general rule of thumb is to index a column with low cardinality of few values. Before creating an index, be aware of when and when not to create an index.

Procedure •

The table rank_by_year_and_name can yield the rank of cyclists for races. cqlsh> CREATE TABLE cycling.rank_by_year_and_name ( race_year int, race_name text, cyclist_name text, rank int, PRIMARY KEY ((race_year, race_name), rank) );

•

Both race_yearand race_name must be specified as these columns comprise the partition key. cqlsh> SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015 AND race_name='Tour of Japan - Stage 4 - Minami > Shinshu';

Using CQL

•

A logical query to try is a listing of the rankings for a particular year. Because the table has a composite partition key, this query will fail if only the first column is used in the conditional operator. cqlsh> SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015;

•

An index is created for the race year, and the query will succeed. An index name is optional and must be unique within a keyspace. If you do not provide a name, Cassandra will assign a name like race_year_idx. cqlsh> CREATE INDEX ryear ON cycling.rank_by_year_and_name (race_year); SELECT * FROM cycling.rank_by_year_and_name WHERE race_year=2015;

•

A clustering column can also be used to create an index.

Using multiple indexes How to use multiple secondary indexes.

Indexes can be created on multiple columns and used in queries. The general rule about cardinality applies to all columns indexed. In a real-world situation, certain columns might not be good choices, depending on their cardinality.

Procedure •

The table cycling.alt_stats can yield the statistics about cyclists. cqlsh> CREATE TABLE cycling.cyclist_alt_stats ( id UUID PRIMARY KEY, lastname text, birthday timestamp, nationality text, weight text, height text );

•

Create indexes on the columns birthday and nationality. cqlsh> CREATE INDEX birthday_idx ON cycling.cyclist_alt_stats ( birthday ); CREATE INDEX nationality_idx ON cycling.cyclist_alt_stats ( nationality );

•

Query for all the cyclists with a particular birthday from a certain country. cqlsh> SELECT * FROM cycling.cyclist_alt_stats WHERE birthday = '1982-01-29' AND nationality = 'Russia';

Using CQL

•

The indexes have been created on appropriate low cardinality columns, but the query still fails. Why? The answer lies with the partition key, which has not been defined. When you attempt a potentially expensive query, such as searching a range of rows, Cassandra requires the ALLOW FILTERING directive. The error is not due to multiple indexes, but the lack of a partition key definition in the query. cqlsh> SELECT * FROM cycling.cyclist_alt_stats WHERE birthday = '1990-05-27' AND nationality = 'Portugal' ALLOW FILTERING

Indexing a collection How to index collections and query the database to find a collection containing a particular value.

In Cassandra 2.1 and later, you can index collections and query the database to find a collection containing a particular value. Sets and lists are indexed slightly differently from maps, given the key-value nature of maps. All the cautions about using secondary indexes apply to indexing collections.

Procedure •

Create an index on a set to find all the cyclists that have been on a particular team. CREATE INDEX team_idx ON cycling.cyclist_career_teams ( teams ); SELECT * FROM cycling.cyclist_career_teams WHERE teams CONTAINS 'Nederland bloeit';

•

Create an index on a map to find all cyclist/team combinations for a particular year. CREATE INDEX team_year_idx ON cycling.cyclist_teams ( KEYS (teams) ); SELECT * From cycling.cyclist_teams WHERE teams CONTAINS KEY 2015;

Using CQL

•

Indexes on the keys and values of a map cannot co-exist. For example, if you created an index on teams for the second table cyclist_teams, you would need to drop it to create an index on the map keys using the KEYS keyword and map name in nested parentheses as shown. Create an index on the entries of a map and find cyclists who are the same age. An index using ENTRIES is only valid for maps. CREATE TABLE cycling.birthday_list (cyclist_name text PRIMARY KEY, blist map; );CREATE INDEX blist_idx ON cycling.birthday_list (ENTRIES(blist)); SELECT * FROM cycling.birthday_list WHERE blist['age'] = '23';

•

Using the same index, find cyclists from the same country. SELECT * FROM cyclist.birthday_list WHERE blist['nation'] = 'NETHERLANDS';

•

Create an index on the full content of a FROZEN map. The table in this example stores the number of Pro wins, Grand Tour races, and Classic races that a cyclist has competed in. The SELECT statement finds any cyclist who has 39 Pro race wins, 7 Grand Tour starts, and 14 Classic starts. CREATE TABLE cycling.race_starts (cyclist_name text PRIMARY KEY, rnumbers FROZEN); CREATE INDEX rnumbers_idx ON cycling.race_starts (FULL(rnumbers)); SELECT * FROM cycling.race_starts WHERE rnumbers = [39,7,14];

Building and maintaining indexes Indexes provide operational ease for populating and maintaining the index.

An advantage of indexes is the operational ease of populating and maintaining the index. Indexes are built in the background automatically, without blocking reads or writes. Client-maintained tables as indexes must be created manually; for example, if the artists column had been indexed by creating a table such as songs_by_artist, your client application would have to populate the table with data from the songs table. To perform a hot rebuild of an index, use the nodetool rebuild_index command.

Using CQL

Altering a table How to alter a table to add or delete columns or change table properties.

Tables can be changed with the ALTER command.

Altering columns in a table Adding or changing columns to a table with the ALTER TABLE command.

The ALTER TABLE command can be used to add new columns to a table and to alter the column type of an existing column.

Procedure •

Add a age column of type int to the table cycling.cyclist_alt_stats. cqlsh> ALTER TABLE cycling.cyclist_alt_stats ADD age int; This creates the column metadata and adds the column to the table schema, and sets the value to NULL for all rows.

•

Add a column favorite_color of varchar, and then change the data type of the same column to text. cqlsh> ALTER TABLE cycling.cyclist_alt_stats ADD favorite_color varchar; ALTER TABLE cycling.cyclist_alt_stats ALTER favorite_color TYPE text; Note: There are limitations on altering the data type of a column. The two data types, the original and the one changing to, must be compatible.

Altering a table to add a collection Adding or changing collection columns to a table with the ALTER TABLE command.

The ALTER TABLE command can be used to add new collection columns to a table and to alter the column type of an existing column.

Procedure •

Alter the table cycling.upcoming_calendar to add a collection map description that can store a description for each race listed. cqlsh> ALTER TABLE cycling.upcoming_calendar ADD description map;

•

After updating cycling.upcoming_calendar table to insert some data, description can be displayed. cqlsh> UPDATE cycling.upcoming_calendar

Using CQL

SET description = description + {'Criterium du Dauphine' : "Easy race', 'Tour du Suisse' : 'Hard uphill race'} WHERE year = 2015 AND month = 6;

Altering the data type of a column Changing the data type of a column after it is defined or added to a table using ALTER TABLE.

Using ALTER TABLE, you can change the data type of a column after it is defined or added to a table.

Procedure Change the coupon_code column to store coupon codes as integers instead of text by changing the data type of the column. cqlsh:demodb> ALTER TABLE users ALTER coupon_code TYPE int; Only newly inserted values, not existing coupon codes are validated against the new type.

Altering a materialized view Altering the properties of a materialized view with the ALTER MATERIALIZED VIEW command.

In Cassandra 3.0 and later, a materialized view has table propertiesTable properties on page 86 like base tables. The ALTER MATERIALIZED VIEW command can be used to alter the view's properties, specifying the properties using a WITH clause. Materialized views do not perform repair, so properties regarding repair are invalid.

Procedure •

Alter a materialized view to change the caching properties. cqlsh> ALTER MATERIALIZED VIEW cycling.cyclist_by_birthday WITH caching = {'keys' : 'NONE', 'rows_per_partition' : '15' };

Altering a user-defined type Adding columns to a user-defined type with the ALTER TYPE command.

The ALTER TYPE command can be used to add new columns to a user-defined type and to alter the data type of an existing column in a user-defined type.

Procedure •

Add a middlename column of type text to the user-defined type cycling.fullname. cqlsh> ALTER TYPE cycling.fullname ADD middlename text;

Using CQL

This creates the column metadata and adds the column to the type schema. To verify, display the table ssystem.schema_usertypes.

•

A column can be renamed in either ALTER TABLE or ALTER TYPE. cqlsh> ALTER TYPE cycling.fullname RENAME middlename TO middleinitial;

Removing a keyspace, schema, or data Using the DROP and DELETE commands.

To remove data, you can set column values for automatic removal using the TTL (time-to-expire) table attribute. You can also drop a table or keyspace, and delete keyspace column metadata.

Dropping a keyspace, table or materialized view Steps for dropping keyspace, table or materialized view using the DROP command.

You drop a table or keyspace using the DROP command.

Procedure •

Drop the test keyspace. cqlsh> DROP KEYSPACE test;

•

Drop the cycling.last_3_days table. cqlsh> DROP TABLE cycling.last_3_days;

•

Drop the cycling.cyclist_by_age materialized view in Cassandra 3.0 and later. cqlsh> DROP MATERIALIZED VIEW cycling.cyclist_by_age;

Deleting columns and rows How to use the DELETE command to delete a column or row.

CQL provides the DELETE command to delete a column or row. Deleted values are removed completely by the first compaction following deletion.

Using CQL

Procedure 1. Deletes the values of the column lastname from the table cyclist_name. cqlsh> DELETE lastname FROM cycling.cyclist_name WHERE id = c7fceba0c141-4207-9494-a29f9809de6f; 2. Delete entire row for a particular race from the table calendar. cqlsh> DELETE FROM cycling.calendar WHERE race_id = 200;

Dropping a user-defined function (UDF) How to use the DROP command to delete a user-defined function (UDF).

You drop a user-defined function (UDF) using the DROP command.

Procedure Drop the fLog() function. The conditional option IF EXISTS can be included. cqlsh> DROP FUNCTION [IF EXISTS] fLog;

Securing a table How to secure a table.

Data in CQL tables is secured using either user or role-based security commands. cassandra.yaml settings must be changed in order to use authentication and authorization. Change the following settings: authenicator: PasswordAuthenticator authorizer: CassandraAuthorizer

Database users How to create and work with users.

User-based access control enables authorization management on a per-user basis.

Procedure •

Create a user with a password. IF NOT EXISTS is included to ensure a previous role definition is not overwritten. cqlsh> CREATE USER IF NOT EXISTS sandy WITH PASSWORD 'Ride2Win@' NOSUPERUSER;

•

Create a user with SUPERUSER privileges. SUPERUSER grants the ability to create users and roles unconditionally. cqlsh> CREATE USER chuck WITH PASSWORD 'Always1st$' SUPERUSER; Note: WITH PASSWORD implicitly specifies LOGIN.

Using CQL

•

Alter a user to change options. A role with SUPERUSER status can alter the SUPERUSER status of another user, but not the user currently held. To modify properties of a user, the user must have permission. cqlsh> ALTER USER sandy SUPERUSER;

•

List the users. cqlsh> LIST USERS;

•

Drop user that is not a current user. User must be a SUPERUSER. DROP USER IF EXISTS chuck;

Database roles How to create and work with roles.

Roles-based access control is available in Cassandra 2.2 and later. Roles enable authorization management on a larger scale than security per user can provide. A role is created and granted to users or other roles. Hierarchical sets of permissions can be created. For more information, see Role Based Access Control in Cassandra.

Procedure •

Create a role with a password. IF NOT EXISTS is included to ensure a previous role definition is not overwritten. cqlsh> CREATE ROLE IF NOT EXISTS team_manager WITH PASSWORD = 'RockIt4Us!';

Using CQL

•

Create a role with LOGIN and SUPERUSER privileges. LOGIN allows a client to identify as this role when connecting. SUPERUSER grants the ability to create roles unconditionally. cqlsh> CREATE ROLE sys_admin WITH PASSWORD = 'IcanDoIt4ll' AND LOGIN = true AND SUPERUSER = true;

•

Alter a role to change options. A role with SUPERUSER status can alter the SUPERUSER status of another role, but not the role currently held. PASSWORD, LOGIN, and SUPERUSER can be modified with ALTER ROLE. To modify properties of a role, the user must have permission. cqlsh> ALTER ROLE sys_admin WITH PASSWORD = 'All4one1forAll' AND SUPERUSER = false;

•

Grant a role to a user or a role. cqlsh> GRANT sys_admin TO team_manager; GRANT team_manager TO sandy;

•

List roles of a user. cqlsh> LIST ROLES; LIST ROLES OF sandy; Note: NORECURSIVE is an option to discover all roles directly granted to a user. Without NORECURSIVE, transitively acquired roles are also listed.

•

Revoke role that was previously granted to a user or a role. Any permission that derives from the role is revoked. cqlsh> REVOKE sys_admin FROM team_manager; REVOKE team_manager FROM sandy;

•

Drop role that is not a current role. User must be a SUPERUSER. DROP ROLE IF EXISTS sys_admin;

Database Permissions How to set user and role permissions.

Permissions can be granted at any level of the database hierarchy and flow downwards. Keyspaces and tables are hierarchical as follows: ALL KEYSPACES > KEYSPACE > TABLE. Functions are hierarchical in the following manner: ALL FUNCTIONS > KEYSPACE > FUNCTION. ROLES can also be hierarchical and encompass other ROLES. Permissions can be granted on: • • • •

CREATE - keyspace, table, function, role, index ALTER - keyspace, table, function, role DROP - keyspace, table, function, role, index SELECT - keyspace, table

Using CQL

• • • •

MODIFY - INSERT, UPDATE, DELETE, TRUNCATE - keyspace, table AUTHORIZE - GRANT PERMISSION, REVOKE PERMISSION - keyspace, table, function, role DESCRIBE - LIST ROLES EXECUTE - SELECT, INSERT, UPDATE - functions

The permissions are extensive with many variations. A few examples are described below.

Procedure •

The first line grants anyone with the team_manager role the ability to INSERT, UPDATE, DELETE, and TRUNCATE any table in the keyspace cycling. The second line grants anyone with the sys_admin role the ability to view all roles in the database. GRANT MODIFY ON KEYSPACE cycling TO team_manager; GRANT DESCRIBE ON ALL ROLES TO sys_admin;

•

The first line revokes SELECT in all keyspaces for anyone with the tteam_manager role. The second line prevents the team_manager role from executing the named function fLog(). REVOKE SELECT ON ALL KEYSPACES FROM team_manager; REVOKE EXECUTE ON FUNCTION cycling.fLog(double) FROM team_manager;

•

All permissions can be listed, for either all keyspaces or a single keyspace. LIST ALL PERMISSIONS OF sandy; LIST ALL PERMISSIONS ON cycling.cyclist_name OF chuck;

Tracing consistency changes This tutorial shows the difference between these consistency levels and the number of replicas that participate to satisfy a request.

In a distributed system such as Cassandra, the most recent value of data is not necessarily on every node all the time. The client application configures the consistency level per request to manage response time versus data accuracy. By tracing activity on a five-node cluster, this tutorial shows the difference between these consistency levels and the number of replicas that participate to satisfy a request: •

ONE

•

Returns data from the nearest replica. QUORUM

•

Returns the most recent data from the majority of replicas. ALL Returns the most recent data from all replicas.

Follow instructions to setup five nodes on your local computer, trace reads at different consistency levels, and then compare the results.

Using CQL

Setup to trace consistency changes Steps for tracing consistency changes.

To setup five nodes on your local computer, trace reads at different consistency levels, and then compare the results.

Procedure 1. Get the ccm library of scripts from github. You will use this library in subsequent steps to perform the following actions: • •

Download Apache Cassandra source code. Create and launch an Apache Cassandra cluster on a single computer.

Refer to the ccm README for prerequisites. 2. Set up loopback aliases. For example, enter the following commands on the command line to set up the alias on the Mac. On some platforms, you can probably skip this step. $ sudo ifconfig lo0 alias 127.0.0.2 up $ sudo ifconfig lo0 alias 127.0.0.3 up $ sudo ifconfig lo0 alias 127.0.0.4 up $ sudo ifconfig lo0 alias 127.0.0.5 up 3. Download Apache Cassandra source code, version 2.1.0 for example, into the /.ccm/repository, and start the ccm cluster named trace_consistency. $ ccm create trace_consistency -v 2.1.0 Downloading http://archive.apache.org/dist/cassandra/2.2.0/ apache-cassandra-2.1.0-src.tar.gz to /var/folders/9k/ ywsprd8n14s7hzb5qnztgb5h0000gq/T/ccm-d7fGAN.tar.gz (15.750MB) 16514874 [100.00%] Extracting /var/folders/9k/ywsprd8n14s7hzb5qnztgb5h0000gq/T/ccmd7fGAN.tar.gz as version 2.1.0 ... Compiling Cassandra 2.1.0 ... Current cluster is now: trace_consistency 4. Use the following commands to populate and check the cluster: $ ccm populate -n 5 $ ccm start 5. Check that the cluster is up: $ ccm node1 ring The output shows the status of all five nodes. 6. Connect cqlsh to the first node in the ring. $ ccm node1 cqlsh Related information Cassandra 2.0 cassandra.yaml Cassandra 2.1 cassandra.yaml Cassandra 2.2 cassandra.yaml Cassandra 3.0 cassandra.yaml

Using CQL

Trace reads at different consistency levels Running and tracing queries that read data at different consistency levels.

After performing the setup steps, run and trace queries that read data at different consistency levels. The tracing output shows that using three replicas on a five-node cluster, a consistency level of ONE processes responses from one of three replicas, QUORUM from two of three replicas, and ALL from three of three replicas.

Procedure 1. On the cqlsh command line, create a keyspace that specifies using three replicas for data distribution in the cluster. cqlsh> CREATE KEYSPACE demo_cl WITH replication = {'class':'SimpleStrategy', 'replication_factor':3}; 2. Create a table, and insert some values: cqlsh> USE demo_cl; cqlsh> CREATE TABLE demo_table ( id int PRIMARY KEY, col1 int, col2 int ); cqlsh> INSERT INTO demo_table (id, col1, col2) VALUES (0, 0, 0); 3. Turn on tracing and use the CONSISTENCY command to check that the consistency level is ONE, the default. cqlsh> TRACING on; cqlsh> CONSISTENCY; The output should be: Current consistency level is 1. 4. Query the table to read the value of the primary key. cqlsh> SELECT * FROM demo_table WHERE id = 0; The output includes tracing information: id | col1 | col2 ----+------+-----0 | 0 | 0 (1 rows) Tracing session: 0f5058d0-6761-11e4-96a3-fd07420471ed activity

Using CQL

Sending message to /127.0.0.3 [WRITE-/127.0.0.3] | 2014-11-08 08:05:29.439000 | 127.0.0.1 | 2211 Sending message to /127.0.0.4 [WRITE-/127.0.0.4] | 2014-11-08 08:05:29.439000 | 127.0.0.1 | 2237 Message received from /127.0.0.1 [Thread-10] | 2014-11-08 08:05:29.441000 | 127.0.0.3 | 75 Executing single-partition query on demo_table [SharedPool-Worker-1] | 2014-11-08 08:05:29.441000 | 127.0.0.4 | 818 Acquiring sstable references [SharedPool-Worker-1] | 2014-11-08 08:05:29.441000 | 127.0.0.4 | 861 Merging memtable tombstones [SharedPool-Worker-1] | 2014-11-08 08:05:29.441000 | 127.0.0.4 | 915 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2014-11-08 08:05:29.442000 | 127.0.0.4 | 999 Merging data from memtables and 0 sstables [SharedPool-Worker-1] | 2014-11-08 08:05:29.442000 | 127.0.0.4 | 1018 Merging memtable tombstones [SharedPool-Worker-1] | 2014-11-08 08:05:29.442000 | 127.0.0.3 | 1058 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2014-11-08 08:05:29.442000 | 127.0.0.3 | 1146 Merging data from memtables and 0 sstables [SharedPool-Worker-1] | 2014-11-08 08:05:29.442000 | 127.0.0.3 | 1165 Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2014-11-08 08:05:29.442000 | 127.0.0.3 | 1223 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2014-11-08 08:05:29.442001 | 127.0.0.3 | 1504 Message received from /127.0.0.4 [Thread-7] | 2014-11-08 08:05:29.443000 | 127.0.0.1 | 6399 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2014-11-08 08:05:29.443000 | 127.0.0.3 | 1835 Message received from /127.0.0.3 [Thread-8] | 2014-11-08 08:05:29.443000 | 127.0.0.1 | 6449 Processing response from /127.0.0.4 [SharedPool-Worker-2] | 2014-11-08 08:05:29.443000 | 127.0.0.1 | 6623 Processing response from /127.0.0.3 [SharedPool-Worker-3] | 2014-11-08 08:05:29.443000 | 127.0.0.1 | 6635 Request complete | 2014-11-08 08:05:29.443897 | 127.0.0.1 | 6897 5. Change the consistency level to QUORUM and run the SELECT statement again. cqlsh> CONSISTENCY quorum;

Using CQL

cqlsh> SELECT * FROM demo_table WHERE id = 0; id | col1 | col2 ----+------+-----0 | 0 | 0 (1 rows) Tracing session: 3bbae430-6761-11e4-96a3-fd07420471ed activity

| timestamp | source | source_elapsed -------------------------------------------------------------------------------------+----------------------------+-----------+---------------Execute CQL3 query | 2014-11-08 08:06:43.955000 | 127.0.0.1 | 0 Parsing SELECT * FROM demo_table WHERE id = 0 LIMIT 10000; [SharedPool-Worker-1] | 2014-11-08 08:06:43.955000 | 127.0.0.1 | 71 Preparing statement [SharedPool-Worker-1] | 2014-11-08 08:06:43.955000 | 127.0.0.1 | 267 Sending message to /127.0.0.4 [WRITE-/127.0.0.4] | 2014-11-08 08:06:43.956000 | 127.0.0.1 | 1628 Sending message to /127.0.0.5 [WRITE-/127.0.0.5] | 2014-11-08 08:06:43.956000 | 127.0.0.1 | 1690 Message received from /127.0.0.1 [Thread-9] | 2014-11-08 08:06:43.957000 | 127.0.0.5 | 95 Executing single-partition query on demo_table [SharedPool-Worker-2] | 2014-11-08 08:06:43.957000 | 127.0.0.4 | 229 Acquiring sstable references [SharedPool-Worker-2] | 2014-11-08 08:06:43.957000 | 127.0.0.4 | 249 Merging memtable tombstones [SharedPool-Worker-2] | 2014-11-08 08:06:43.957000 | 127.0.0.4 | 299 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2014-11-08 08:06:43.957000 | 127.0.0.4 | 387 Merging data from memtables and 0 sstables [SharedPool-Worker-2] | 2014-11-08 08:06:43.957000 | 127.0.0.4 | 408 Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2014-11-08 08:06:43.957000 | 127.0.0.4 | 469 Enqueuing response to /127.0.0.1 [SharedPool-Worker-2] | 2014-11-08 08:06:43.957001 | 127.0.0.4 | 734 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2014-11-08 08:06:43.957001 | 127.0.0.4 | 894 Message received from /127.0.0.4 [Thread-7] | 2014-11-08 08:06:43.958000 | 127.0.0.1 | 3383

Using CQL

Processing response from /127.0.0.4 [SharedPool-Worker-2] | 2014-11-08 08:06:43.958000 | 127.0.0.1 | 3612 Executing single-partition query on demo_table [SharedPool-Worker-1] | 2014-11-08 08:06:43.959000 | 127.0.0.5 | 1462 Acquiring sstable references [SharedPool-Worker-1] | 2014-11-08 08:06:43.959000 | 127.0.0.5 | 1509 Merging memtable tombstones [SharedPool-Worker-1] | 2014-11-08 08:06:43.959000 | 127.0.0.5 | 1569 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2014-11-08 08:06:43.959000 | 127.0.0.5 | 1662 Merging data from memtables and 0 sstables [SharedPool-Worker-1] | 2014-11-08 08:06:43.959000 | 127.0.0.5 | 1681 Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2014-11-08 08:06:43.959000 | 127.0.0.5 | 1760 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2014-11-08 08:06:43.959001 | 127.0.0.5 | 2104 Message received from /127.0.0.5 [Thread-10] | 2014-11-08 08:06:43.960000 | 127.0.0.1 | 5330 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2014-11-08 08:06:43.960000 | 127.0.0.5 | 2423 Processing response from /127.0.0.5 [SharedPool-Worker-2] | 2014-11-08 08:06:43.960000 | 127.0.0.1 | 5519 Request complete | 2014-11-08 08:06:43.960947 | 127.0.0.1 | 5947 6. Change the consistency level to ALL and run the SELECT statement again. cqlsh> CONSISTENCY ALL; cqlsh> SELECT * FROM demo_table WHERE id = 0; id | col1 | col2 ----+------+-----0 | 0 | 0 (1 rows) Tracing session: 4da75ca0-6761-11e4-96a3-fd07420471ed activity

Using CQL

Preparing statement [SharedPool-Worker-1] | 2014-11-08 08:07:14.026000 | 127.0.0.1 | 271 Sending message to /127.0.0.4 [WRITE-/127.0.0.4] | 2014-11-08 08:07:14.027000 | 127.0.0.1 | 978 Message received from /127.0.0.1 [Thread-9] | 2014-11-08 08:07:14.027000 | 127.0.0.5 | 56 Sending message to /127.0.0.5 [WRITE-/127.0.0.5] | 2014-11-08 08:07:14.027000 | 127.0.0.1 | 1012 Executing single-partition query on demo_table [SharedPool-Worker-2] | 2014-11-08 08:07:14.027000 | 127.0.0.3 | 253 Sending message to /127.0.0.3 [WRITE-/127.0.0.3] | 2014-11-08 08:07:14.027000 | 127.0.0.1 | 1054 Acquiring sstable references [SharedPool-Worker-2] | 2014-11-08 08:07:14.027000 | 127.0.0.3 | 275 Merging memtable tombstones [SharedPool-Worker-2] | 2014-11-08 08:07:14.027000 | 127.0.0.3 | 344 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2014-11-08 08:07:14.028000 | 127.0.0.3 | 438 Acquiring sstable references [SharedPool-Worker-1] | 2014-11-08 08:07:14.028000 | 127.0.0.5 | 461 Merging memtable tombstones [SharedPool-Worker-1] | 2014-11-08 08:07:14.028000 | 127.0.0.5 | 525 Skipped 0/0 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-1] | 2014-11-08 08:07:14.028000 | 127.0.0.5 | 622 Merging data from memtables and 0 sstables [SharedPool-Worker-1] | 2014-11-08 08:07:14.028000 | 127.0.0.5 | 645 Read 1 live and 0 tombstoned cells [SharedPool-Worker-1] | 2014-11-08 08:07:14.028000 | 127.0.0.5 | 606 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2014-11-08 08:07:14.028001 | 127.0.0.5 | 1125 Message received from /127.0.0.3 [Thread-8] | 2014-11-08 08:07:14.029000 | 127.0.0.1 | 3224 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2014-11-08 08:07:14.029000 | 127.0.0.5 | 1616 Processing response from /127.0.0.3 [SharedPool-Worker-3] | 2014-11-08 08:07:14.029000 | 127.0.0.1 | 3417 Message received from /127.0.0.5 [Thread-10] | 2014-11-08 08:07:14.029000 | 127.0.0.1 | 3454 Message received from /127.0.0.4 [Thread-7] | 2014-11-08 08:07:14.029000 | 127.0.0.1 | 3516 Processing response from /127.0.0.5 [SharedPool-Worker-2] | 2014-11-08 08:07:14.029000 | 127.0.0.1 | 3627

Using CQL

Processing response from /127.0.0.4 [SharedPool-Worker-2] | 2014-11-08 08:07:14.030000 | 127.0.0.1 | 3688 Request complete | 2014-11-08 08:07:14.030347 | 127.0.0.1 | 4347

How consistency affects performance Changing the consistency level can affect read performance. The tracing output shows that as you change the consistency level, performance is affected.

Changing the consistency level can affect read performance. The tracing output shows that as you change the consistency level from ONE to QUORUM to ALL, performance degrades in from 2585 to 2998 to 5219 microseconds, respectively. If you follow the steps in this tutorial, it is not guaranteed that you will see the same trend because querying a one-row table is a degenerate case, used for example purposes. The difference between QUORUM and ALL is slight in this case, so depending on conditions in the cluster, performance using ALL might be faster than QUORUM. Under the following conditions, performance using ALL is worse than QUORUM: • • •

The data consists of thousands of rows or more. One node is slower than others. A particularly slow node was not selected to be part of the quorum.

Tracing queries on large datasets You can use probabilistic tracing on databases having at least ten rows, but this capability is intended for tracing through much more data. After configuring probabilistic tracing using the nodetool settraceprobability command, you query the system_traces keyspace. SELECT * FROM system_traces.events;

Paging through an unordered partitioner How to use CQL to page through rows.

When using the RandomPartitioner or Murmur3Partitioner, Cassandra rows are ordered by the hash of their value and hence the order of rows is not meaningful. Using CQL, you can page through rows even when using the random partitioner or the murmur3 partitioner using the token function as shown in this example: The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner distribute tokens in a completely unordered manner. The token function makes it possible to page through these unordered partitioner results. You can use the TOKEN function to express a conditional relation on a partition key column. In this case, the query returns rows based on the token of the partition key rather than on the value.

Procedure •

Select data based on a token of a particular value. SELECT * FROM cycling.last_3_days WHERE token(year) > token('2015-05-24');

Using CQL

•

The results will not always be consistent with expectations, because the token function actually queries directly using tokens. Underneath, the token function makes token-based comparisons and does not convert year to token (not year > '2015-05-26'). SELECT * FROM cycling.last_3_days WHERE token(year) > token('2015-05-26');

•

Display the tokens for all values of the column year. SELECT TOKEN(year) FROM cycling.last_3_days;

Using CQL

•

Tokens and partition keys can be mixed in conditional statements. The results will not always be expected. SELECT * FROM cycling.last_3_days WHERE token(year) < token ('2015-05-26') AND year IN ('2015-05-24','2015-05-25');

Determining time-to-live for a column How to insert and retrieve data using the TTL function to determine the date/time of the writes to the columns.

The time-to-live (TTL) function can be used to insert data into a table for a specific duration of time. You can also determine the current time-to-live for a record using the TTL function.

Procedure •

Insert data into the table cycling.calendar and use the USING TTL clause to set the expiration period to 86400 seconds. INSERT INTO cycling.calendar (race_id, race_name, race_start_date, race_end_date) VALUES (200, 'placeholder','2015-05-27', '2015-05-27') USING TTL 86400;

•

Wait for a while and then issue a SELECT statement to determine how much longer the data has to live. SELECT TTL (race_name) from cycling.calendar WHERE race_id = 200;

•

The time-to-live value can also be updated with USING TTL. UPDATE cycling.calendar USING TTL 300 SET race_name = 'dummy' WHERE race_id = 200 AND race_start_date = '2015-05-27' AND race_end_date = '2015-05-27';

Using CQL

Determining the date/time of a write Using the WRITETIME function in a SELECT statement to determine when the date/time that the column was written to the database.

A table contains a timestamp representing the date/time that a write occurred to a column. Using the WRITETIME function in a SELECT statement returns the date/time that the column was written to the database. The output of the function is microseconds except in the case of Cassandra 2.1 counter columns. Counter column writetime is milliseconds. This procedure continues the example from the previous procedure and calls the WRITETIME function to retrieve the date/time of the writes to the columns.

Procedure •

Retrieve the date/time that the value Paolo was written to the firstname column in the table cyclist_points. Use the WRITETIME function in a SELECT statement, followed by the name of a column in parentheses: SELECT WRITETIME (firstname) FROM cycling.cyclist_points WHERE id=220844bf-4860-49d6-9a4b-6b5d3a79cbfb;

Note: The writetime output in microseconds converts to Wed, 24 Jun 2015 01:12:05 GMT.

Using CQL

Legacy tables How to work with legacy tables.

Legacy tables must be handled differently from currently built CQL tables.

Working with legacy applications Internally, CQL does not change the row and column mapping from the Thrift API mapping. CQL and Thrift use the same storage engine.

Internally, CQL does not change the row and column mapping from the Thrift API mapping. CQL and Thrift use the same storage engine. CQL supports the same query-driven, denormalized data modeling principles as Thrift. Existing applications do not have to be upgraded to CQL. The CQL abstraction layer makes CQL easier to use for new applications. For an in-depth comparison of Thrift and CQL, see "A Thrift to CQL Upgrade Guide" and CQL for Cassandra experts.

Creating a legacy table You can create legacy (Thrift/CLI-compatible) tables in CQL using the COMPACT STORAGE directive. The compact storage directive used with the CREATE TABLE command provides backward compatibility with older Cassandra applications; new applications should generally avoid it. Compact storage stores an entire row in a single column on disk instead of storing each non-primary key column in a column that corresponds to one column on disk. Using compact storage prevents you from adding new columns that are not part of the PRIMARY KEY.

Querying a legacy table Using CQL, you can query a legacy table. A legacy table managed in CQL includes an implicit WITH COMPACT STORAGE directive. Using a music service example, select all the columns in the playlists table that was created in CQL. This output appears: [default@music ] GET playlists [62c36092-82a1-3a00-93d1-46196ee77204 ]; => ( column =7db1a490-5878-11e2-bcfd-0800200c9a66:,value =, timestamp =1357602286168000 ) => ( column =7db1a490-5878-11e2-bcfd-0800200c9a66:album, value =4e6f204f6e6520526964657320666f722046726565, timestamp =1357602286168000 ) . . . => ( column =a3e64f8f-bd44-4f28-b8d9-6938726e34d4:title, value =4c61204772616e6765, timestamp =1357599350478000 ) Returned 16 results. The output of cell values is unreadable because GET returns the values in byte format.

Using a CQL legacy table query Using CQL to query a legacy table.

Using CQL, you can query a legacy table. A legacy table managed in CQL includes an implicit WITH COMPACT STORAGE directive. When you use CQL to query legacy tables with no column names defined for data within a partition, Cassandra generates the names (column1 and value1) for the data. Using the RENAME clause, you can change the default column name to a more meaningful name. ALTER TABLE users RENAME userid to user_id;

CQL reference

CQL supports dynamic tables created in the Thrift API, CLI, and earlier CQL versions. For example, a dynamic table is represented and queried like this: CREATE TABLE clicks ( userid uuid, url text, timestamp date PRIMARY KEY (userid, url ) ) WITH COMPACT STORAGE; SELECT url, timestamp FROM clicks WHERE userid = 148e9150-1dd2-11b2-0000-242d50cf1fff; SELECT timestamp FROM clicks WHERE userid = 148e9150-1dd2-11b2-0000-242d50cf1fff AND url = 'http://google.com'; In these queries, only equality conditions are valid.

CQL reference CQL reference topics.

Introduction About using CQL.

All of the commands included in the CQL language are available on the cqlsh command line. There are a group of commands that are available on the command line, but are not support by the CQL language. These commands are called cqlsh commands. You can run cqlsh commands from the command line only. You can run CQL commands in a number of ways. This reference covers CQL and cqlsh based on the CQL specification 3.3.

CQL lexical structure CQL input consists of statements that change data, look up data, store data, or change the way data is stored.

CQL input consists of statements. Like SQL, statements change data, look up data, store data, or change the way data is stored. Statements end in a semicolon (;). For example, the following is valid CQL syntax: SELECT * FROM MyTable; UPDATE MyTable SET SomeColumn = 'SomeValue' WHERE columnName = B70DE1D0-9908-4AE3-BE34-5573E5B09F14; This is a sequence of two CQL statements. This example shows one statement per line, although a statement can usefully be split across lines as well.

CQL reference

Uppercase and lowercase Keyspace, column, and table names created using CQL are case-insensitive unless enclosed in double quotation marks.

Keyspace, column, and table names created using CQL are case-insensitive unless enclosed in double quotation marks. If you enter names for these objects using any uppercase letters, Cassandra stores the names in lowercase. You can force the case by using double quotation marks. For example: CREATE TABLE test ( Foo int PRIMARY KEY, "Bar" int ); The following table shows partial queries that work and do not work to return results from the test table: Table: What Works and What Doesn't Queries that Work

Queries that Don't Work

SELECT foo FROM . . .

SELECT "Foo" FROM . . .

SELECT Foo FROM . . .

SELECT "BAR" FROM . . .

SELECT FOO FROM . . .

SELECT bar FROM . . .

SELECT "foo" FROM . . .

SELECT Bar FROM . . .

SELECT "Bar" FROM . . . SELECT "foo" FROM ... works because internally, Cassandra stores foo in lowercase. The doublequotation mark character can be used as an escape character for the double quotation mark. Case sensitivity rules in earlier versions of CQL apply when handling legacy tables. CQL keywords are case-insensitive. For example, the keywords SELECT and select are equivalent. This document shows keywords in uppercase.

Valid characters for data types and keyspace/table/column names Keyspace, column, and table names created using CQL have restrictions regarding valid characters.

Keyspace, column, and table names created using CQL can only contain alphanumeric and underscore characters. User-defined data type names and field names, user-defined function names, and userdefined aggregate names created using CQL can only contain alphanumeric and underscore characters. If you enter names for these objects using anything other than alphanumeric characters or underscores, Cassandra will issue an invalid syntax message and fail to create the object. Table: What Works and What Doesn't Creations that Work

Creations that Don't Work

CREATE TABLE foo ...

CREATE TABLE foo!$% ...

CREATE TABLE foo_bar ...

CREATE TABLE foo[]"90 ...

ALTER TABLE foo5 ...

ALTER TABLE foo5$$

CREATE FUNCTION foo5 ...

CREATE FUNCTION foo5$$

CREATE AGGREGATE foo5 ...

CREATE AGGREGATE foo5$$

CREATE TYPE foo5 (bar9 text, ...

CREATE TYPE foo5$$ (bar#9 int ...

CQL reference

Escaping characters Using single and double quotation marks in CQL.

Column names that contain characters that CQL cannot parse need to be enclosed in double quotation marks in CQL. Dates, IP addresses, and strings need to be enclosed in single quotation marks. To use a single quotation mark itself in a string literal, escape it using a single quotation mark. cqlsh> INSERT INTO cycling.calendar (race_id, race_start_date, race_end_date, race_name) VALUES (201, '2015-02-18', '2015-02-22', 'Women''s Tour of New Zealand'); An alternative is to use dollar-quoted strings. Dollar-quoted string constants can be used to create functions, insert data, and select data when complex quoting is needed. Use double dollar signs to enclose the desired string. cqlsh> INSERT INTO cycling.calendar (race_id, race_start_date, race_end_date, race_name) VALUES (201, '2015-02-18', '2015-02-22', $$Women's Tour of New Zealand$$);

Valid literals Values and definitions of valid literals.

Valid literal consist of these kinds of values: •

blob

•

hexadecimal defined as 0[xX](hex)+ boolean

•

true or false, case-insensitive, not enclosed in quotation marks numeric constant A numeric constant can consist of integers 0-9 and a minus sign prefix. A numeric constant can also be float. A float can be a series of one or more decimal digits, followed by a period, ., and one or more decimal digits. There is no optional + sign. The forms .42 and 42 are unacceptable. You can use leading or trailing zeros before and after decimal points. For example, 0.42 and 42.0. A float constant, expressed in E notation, consists of the characters in this regular expression: '-'?[0-9]+('.'[0-9]*)?([eE][+-]?[0-9+])?

•

NaN and Infinity are floats. identifier

•

A letter followed by any sequence of letters, digits, or the underscore. Names of tables, columns, and other objects are identifiers and enclosed in double quotation marks. integer

•

An optional minus sign, -, followed by one or more digits. string literal

•

Characters enclosed in single quotation marks. To use a single quotation mark itself in a string literal, escape it using a single quotation mark. For example, use '' to make dog possessive: dog''s. uuid

•

32 hex digits, 0-9 or a-f, which are case-insensitive, separated by dashes, -, after the 8th, 12th, 16th, and 20th digits. For example: 01234567-0123-0123-0123-0123456789ab timeuuid

CQL reference

Uses the time in 100 nanosecond intervals since 00:00:00.00 UTC (60 bits), a clock sequence number for prevention of duplicates (14 bits), plus the IEEE 801 MAC address (48 bits) to generate a unique identifier. For example: d2177dd0-eaa2-11de-a572-001b779c76e3 whitespace

•

Separates terms and used inside string literals, but otherwise CQL ignores whitespace.

Exponential notation Cassandra supports exponential notation.

Cassandra supports exponential notation. This example shows exponential notation in the output from a cqlsh command. CREATE TABLE test( id varchar PRIMARY KEY, value_double double, value_float float ); INSERT INTO test (id, value_float, value_double) VALUES ('test1', -2.6034345E+38, -2.6034345E+38); SELECT * FROM test;

id | value_double | value_float -------+--------------+------------test1 | -2.6034e+38 | -2.6034e+38

CQL code comments Commenting CQL code.

You can use the following notation to include comments in CQL code: •

Double hyphen -- Single-line comment

•

Double forward slash //Single-line comment

•

Forward slash asterisk /* Multi-line comment */

CQL Keywords Table of keywords and whether or not the words are reserved.

This table lists keywords and whether or not the words are reserved. A reserved keyword cannot be used as an identifier unless you enclose the word in double quotation marks. Non-reserved keywords have a specific meaning in certain context but can be used as an identifier outside this context. Table: Keywords Keyword

Reserved

ADD

yes

AGGREGATE

yes

CQL reference

Keyword

Reserved

ALL

ALLOW

yes

ALTER

yes

AND

yes

ANY

yes

APPLY

yes

ASC

yes

ASCII

AUTHORIZE

yes

BATCH

yes

BEGIN

yes

BIGINT

BLOB

BOOLEAN

yes

CLUSTERING

COLUMNFAMILY

yes

COMPACT

CONSISTENCY

COUNT

COUNTER

CREATE

yes

CUSTOM

DECIMAL

DELETE

yes

DESC

yes

DISTINCT

DOUBLE

DROP

yes

EACH_QUORUM

yes

ENTRIES

yes

EXISTS

FILTERING

CQL reference

Keyword

Reserved

FLOAT

FROM

yes

FROZEN

GRANT

yes

INDEX

yes

INET

yes

INFINITY

yes

INSERT

yes

INT

INTO

yes

KEY

KEYSPACE

yes

KEYSPACES

yes

LEVEL

LIMIT

yes

LIST

LOCAL_ONE

yes

LOCAL_QUORUM

yes

MAP

MATERIALIZED

yes

MODIFY

yes

NAN

yes

NORECURSIVE

yes

NOSUPERUSER

NOT

yes

ONE

yes

ORDER

yes

PASSWORD

yes

PERMISSION

PERMISSIONS

CQL reference

Keyword

Reserved

PRIMARY

yes

QUORUM

yes

RENAME

yes

REVOKE

yes

SCHEMA

yes

SELECT

yes

SET

yes

STATIC

STORAGE

SUPERUSER

TABLE

yes

TEXT

TIME

yes

TIMESTAMP

TIMEUUID

THREE

yes

TOKEN

yes

TRUNCATE

yes

TTL

TUPLE

TWO

yes

TYPE

UNLOGGED

yes

UPDATE

yes

USE

yes

USER

USERS

USING

yes

UUID

VALUES

VARCHAR

VARINT

VIEW

yes

CQL reference

Keyword

Reserved

WHERE

yes

WITH

yes

WRITETIME

CQL data types Built-in data types for columns.

CQL defines built-in data types for columns. The counter type is unique. Table: CQL Data Types CQL Type

Constants

Description

ascii

strings

US-ASCII character string

bigint

integers

64-bit signed long

blob

blobs

Arbitrary bytes (no validation), expressed as hexadecimal

boolean

booleans

true or false

counter

integers

Distributed counter value (64-bit long)

date

strings

Date string, such as 2015-05-03

decimal

integers, floats

Variable-precision decimal Java type Note: When dealing with currency, it is a best practice to have a currency class that serializes to and from an int or use the Decimal form.

double

integers, floats

64-bit IEEE-754 floating point Java type

float

integers, floats

32-bit IEEE-754 floating point Java type

frozen

user-defined types, collections, tuples

A frozen value serializes multiple components into a single value. Non-frozen types allow updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire value must be overwritten. Note: Cassandra 2.1.0 to 2.1.2 requires using frozen for tuples, while Cassandra 2.1.3 and later does not require this keyword: frozen

inet

strings

IP address string in IPv4 or IPv6 format, used by the python-cql driver and CQL native protocols

CQL reference

CQL Type

Constants

Description

int

integers

32-bit signed integer

list

n/a

A collection of one or more ordered elements: [ literal, literal, literal ]

map

n/a

A JSON-style array of literals: { literal : literal, literal : literal ... }

set

n/a

A collection of one or more elements: { literal, literal, literal }

smallint

integers

2 byte integer

text

strings

UTF-8 encoded string

time

strings

Time string, such as 13:30:54.234

timestamp

integers, strings

Date plus time, encoded as 8 bytes since epoch

timeuuid

uuids

Version 1 UUID only

tinyint

integers

1 byte integer

tuple

n/a

Cassandra 2.1 and later. A group of 2-3 fields.

uuid

uuids

A UUID in standard UUID format

varchar

strings

UTF-8 encoded string

varint

integers

Arbitrary-precision integer Java type

In addition to the CQL types listed in this table, you can use a string containing the name of a JAVA class (a sub-class of AbstractType loadable by Cassandra) as a CQL type. The class name should either be fully qualified or relative to the org.apache.cassandra.db.marshal package. Enclose ASCII text, timestamp, and inet values in single quotation marks. Enclose names of a keyspace, table, or column in double quotation marks.

Java types The Java types, from which most CQL types are derived, are obvious to Java programmers. The derivation of the following types, however, might not be obvious: Table: Derivation of selective CQL types CQL type

Java type

decimal

java.math.BigDecimal

float

java.lang.Float

double

java.lang.Double

varint

java.math.BigInteger

Blob type Cassandra blob data type represents a constant hexadecimal number.

The Cassandra blob data type represents a constant hexadecimal number defined as 0[xX](hex)+ where hex is an hexadecimal character, such as [0-9a-fA-F]. For example, 0xcafe. The maximum theoretical size

CQL reference

for a blob is 2GB. The practical limit on blob size, however, is less than 1 MB, ideally even smaller. A blob type is suitable for storing a small image or short string.

Blob conversion functions These functions convert the native types into binary data (blob): • •

typeAsBlob(type) blobAsType

For every native, nonblob type supported by CQL, the typeAsBlob function takes a argument of type type and returns it as a blob. Conversely, the blobAsType function takes a 64-bit blob argument and converts it to a bigint value. This example shows how to use bitintAsBlob: CREATE TABLE bios ( user_name varchar PRIMARY KEY, bio blob ); INSERT INTO bios (user_name, bio) VALUES ('fred', bigintAsBlob(3)); SELECT * FROM bios; user_name | bio -----------+-------------------fred | 0x0000000000000003

This example shows how to use blobAsBigInt. ALTER TABLE bios ADD id bigint; INSERT INTO bios (user_name, id) VALUES ('fred', blobAsBigint(0x0000000000000003)); SELECT * FROM bios; user_name | bio | id -----------+--------------------+---fred | 0x0000000000000003 | 3

Collection type A collection column is declared using the collection type, followed by another type.

A collection column is declared using the collection type, followed by another type, such as int or text, in angle brackets. For example, you can create a table having a list of textual elements, a list of integers, or a list of some other element types. list list Collection types cannot currently be nested. For example, you cannot define a list within a list: list

// not allowed

In Cassandra 2.1 and later, you can create an index on a column of type map, set, or list.

CQL reference

Using frozen in a collection A frozen value serializes multiple components into a single value. Non-frozen types allow updates to individual fields. Cassandra treats the value of a frozen type as a blob. The entire value must be overwritten. Note: Frozen collections can be used for primary key columns. Non-frozen collections cannot be used for primary key columns. column_name For example: CREATE TABLE mykeyspace.users ( id uuid PRIMARY KEY, name frozen , direct_reports set, addresses map );

// a collection set // a collection map

Counter type A counter column value is a 64-bit signed integer.

A counter column value is a 64-bit signed integer. You cannot set the value of a counter, which supports two operations: increment and decrement. Use counter types as described in the "Using a counter" section. Do not assign this type to a column that serves as the primary key or partition key. Also, do not use the counter type in a table that contains anything other than counter types and the primary key. To generate sequential numbers for surrogate keys, use the timeuuid type instead of the counter type. You cannot create an index on a counter column or set data in a counter column to expire using the Time-To-Live (TTL) property.

UUID and timeuuid types The UUID (universally unique id) comparator type for avoiding collisions in column names.

The UUID (universally unique id) comparator type is used to avoid collisions in column names. Alternatively, you can use the timeuuid. Timeuuid types can be entered as integers for CQL input. A value of the timeuuid type is a Type 1 UUID. A Version 1 UUID includes the time of its generation and are sorted by timestamp, making them ideal for use in applications requiring conflict-free timestamps. For example, you can use this type to identify a column (such as a blog entry) by its timestamp and allow multiple clients to write to the same partition key simultaneously. Collisions that would potentially overwrite data that was not intended to be overwritten cannot occur. A valid timeuuid conforms to the timeuuid format shown in valid literals.

uuid and Timeuuid functions About using Timeuuid functions.

Cassandra 2.0.7 and later includes the uuid() function. This function takes no parameters and generates a random Type 4 UUID suitable for use in INSERT or SET statements. Several timeuuid functions are designed for use with the timeuuid type:

•

dateOf()

•

Used in a SELECT clause, this function extracts the timestamp of a timeuuid column in a result set. This function returns the extracted timestamp as a date. Use unixTimestampOf() to get a raw timestamp. now()

CQL reference

•

In the coordinator node, generates a new unique timeuuid in milliseconds when the statement is executed. The timestamp portion of the timeuuid conforms to the UTC (Universal Time) standard. This method is useful for inserting values. The value returned by now() is guaranteed to be unique. minTimeuuid() and maxTimeuuid() Returns a UUID-like result given a conditional time component as an argument. For example: SELECT * FROM myTable WHERE t > maxTimeuuid('2013-01-01 00:05+0000') AND t < minTimeuuid('2013-02-02 10:00+0000') The min/maxTimeuuid example selects all rows where the timeuuid column, t, is strictly later than 2013-01-01 00:05+0000 but strictly earlier than 2013-02-02 10:00+0000. The t >= maxTimeuuid('2013-01-01 00:05+0000') does not select a timeuuid generated exactly at 2013-01-01 00:05+0000 and is essentially equivalent to t > maxTimeuuid('2013-01-01 00:05+0000').

•

The values returned by minTimeuuid and maxTimeuuid functions are not true UUIDs in that the values do not conform to the Time-Based UUID generation process specified by the RFC 4122. The results of these functions are deterministic, unlike the now() function. unixTimestampOf() Used in a SELECT clause, this functions extracts the timestamp in milliseconds of a timeuuid column in a result set. Returns the value as a raw, 64-bit integer timestamp.

Cassandra 2.2 and later support some additional timeuuid and timestamp functions to manipulate dates. The functions can be used in INSERT, UPDATE, and SELECT statements. •

toDate(timeuuid)

•

Converts timeuuid to date in YYYY-MM-DD format. toTimestamp(timeuuid)

•

Converts timeuuid to timestamp format. to UnixTimestamp(timeuuid)

•

Converts timeuuid to UNIX timestamp format. toDate(timestamp)

•

Converts timestamp to date in YYYY-MM-DD format. toUnixTimestamp(timestamp)

•

Converts timestamp to UNIX timestamp format. toTimestamp(date)

•

Converts date to timestamp format. toUnixTimestamp(date) Converts date to UNIX timestamp format.

An example of the new functions creates a table and inserts various time-related values: CREATE TABLE sample_times (a int, b timestamp, c timeuuid, d bigint, PRIMARY KEY (a,b,c,d)); INSERT INTO sample_times (a,b,c,d) VALUES (1, toUnixTimestamp(now()), 50554d6e-29bb-11e5-b345-feff819cdc9f, toTimestamp(now()));

CQL reference

Select data and convert it to a new format: SELECT toDate(c) FROM sample_times;

Timestamp type A timestamp type can be entered as an integer for CQL input, or as a string literal in ISO 8601 formats.

Values for the timestamp type are encoded as 64-bit signed integers representing a number of milliseconds since the standard base time known as the epoch: January 1 1970 at 00:00:00 GMT. A timestamp type can be entered as an integer for CQL input, or as a string literal in any of the following ISO 8601 formats: yyyy-mm-dd HH:mm yyyy-mm-dd HH:mm:ss yyyy-mm-dd HH:mmZ yyyy-mm-dd HH:mm:ssZ yyyy-mm-dd'T'HH:mm yyyy-mm-dd'T'HH:mmZ yyyy-mm-dd'T'HH:mm:ss yyyy-mm-dd'T'HH:mm:ssZ yyyy-mm-dd yyyy-mm-ddZ where Z is the RFC-822 4-digit time zone, expressing the time zone's difference from UTC. For example, for the date and time of Jan 2, 2003, at 04:05:00 AM, GMT: 2011-02-03 04:05+0000 2011-02-03 04:05:00+0000 2011-02-03T04:05+0000 2011-02-03T04:05:00+0000 If no time zone is specified, the time zone of the Cassandra coordinator node handing the write request is used. For accuracy, DataStax recommends specifying the time zone rather than relying on the time zone configured on the Cassandra nodes. If you only want to capture date values, the time of day can also be omitted. For example: 2011-02-03 2011-02-03+0000 In this case, the time of day defaults to 00:00:00 in the specified or default time zone. Timestamp output appears in the following format by default: yyyy-mm-dd HH:mm:ssZ You can change the format by setting the time_format property in the [ui] section of the cqlshrc file.

Tuple type You can use a tuple as a convenient alternative to a user-defined type.

The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type. A tuple can accommodate many fields (32768), more than can be prudently used. Typically, create a tuple with a few fields.

CQL reference

In the table creation statement, use angle brackets and a comma delimiter to declare the tuple component types. Surround tuple values in parentheses to insert the values into a table, as shown in this example. CREATE TABLE collect_things ( k int PRIMARY KEY, v tuple ); INSERT INTO collect_things (k, v) VALUES(0, (3, 'bar', 2.1)); SELECT * FROM collect_things; k | v ---+----------------0 | (3, 'bar', 2.1) Note: Cassandra 2.1.0 to 2.1.2 requires using frozen for tuples, while Cassandra 2.1.3 and later does not require this keyword: frozen You can filter a selection using a tuple. CREATE INDEX on collect_things (v); SELECT * FROM collect_things WHERE v = (3, 'bar', 2.1); k | v ---+----------------0 | (3, 'bar', 2.1) You can nest tuples as shown in the following example: CREATE TABLE nested (k int PRIMARY KEY, t FROZEN tuple ); INSERT INTO nested (k, t) VALUES (0, (3, ('foo', 3.4)));

User-defined type A user-defined type facilitates handling multiple fields of related information in a table.

Cassandra 2.1 supports user-defined types. A user-defined type facilitates handling multiple fields of related information in a table. Applications that required multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table. The address type example demonstrates how to use a user-defined type. You can create, alter, and drop a user-defined type using these commands: • • •

CREATE TYPE ALTER TYPE DROP TYPE

The cqlsh utility includes these commands for describing a user-defined type or listing all user-defined types: • •

DESCRIBE TYPE DESCRIBE TYPES

The scope of a user-defined type is the keyspace in which you define it. Use dot notation to access a type from a keyspace outside its scope: keyspace name followed by a period followed the name of the

CQL reference

type. Cassandra accesses the type in the specified keyspace, but does not change the current keyspace; otherwise, if you do not specify a keyspace, Cassandra accesses the type within the current keyspace.

CQL keyspace and table properties About using CQL keyspace and table properties.

The CQL WITH clause specifies keyspace and table properties in these CQL commands: • • • •

ALTER KEYSPACE ALTER TABLE CREATE KEYSPACE CREATE TABLE

CQL keyspace properties CQL supports setting the following keyspace properties in addition to naming data centers. •

class

•

The name of the replication strategy: SimpleStrategy or NetworkTopologyStrategy. You set the replication factor independently for each data center. replication_factor The replication_factor property is used only when specifying the SimpleStrategy, as shown in CREATE KEYSPACE examples. The replication factor value is the total number of replicas across the cluster.

For production use or for use with mixed workloads, create the keyspace using NetworkTopologyStrategy. SimpleStrategy is fine for evaluation purposes. NetworkTopologyStrategy is recommended for most deployments because it is much easier to expand to multiple data centers when required by future expansion. You can also configure the durable writes property when creating or altering a keyspace.

Table properties A list of CQL table properties and their syntax.

CQL supports Cassandra table properties, such as comments and compaction options, listed in the following table. In CQL commands, such as CREATE TABLE, you format properties in either the name-value pair or collection map format. The name-value pair property syntax is: name = value AND name = value The collection map format, used by compaction and compression properties, is: { name : value, name : value, name : value ... } Enclose properties that are strings in single quotation marks. See CREATE TABLE for examples. Table: CQL properties CQL property

Default

Description

bloom_filter_fp_chance 0.01 for Desired false-positive probability for SSTable Bloom filters. See SizeTieredCompactionStrategyand Bloom filter below. DateTieredCompactionStrategy, 0.1 for LeveledCompactionStrategy

CQL reference

CQL property

Default

Description

caching

Cassandra 2.1:

Optimizes the use of cache memory without manual tuning. See caching below.

ALL for keys

NONE for rows_per_partition Cassandra 2.0.x: keys_only comment

N/A

A human readable comment describing the table. See comments below.

compaction

SizeTieredCompactionStrategy Sets the compaction strategy for the table. See compaction below.

compression

LZ4CompressorThe compression algorithm to use. Valid values are LZ4Compressor), SnappyCompressor, and DeflateCompressor. See compression below.

dclocal_read_repair_chance 0.1 (Cassandra 2.1, Cassandra 2.0.9 and later) 0.0 (Cassandra 2.0.8 and earlier)

Specifies the probability of read repairs being invoked over all replicas in the current data center.

default_time_to_live 0

The default expiration time in seconds for a table. Used in MapReduce scenarios when you have no control of TTL.

gc_grace_seconds864000 [10 days]

Specifies the time to wait before garbage collecting tombstones (deletion markers). The default value allows a great deal of time for consistency to be achieved prior to deletion. In many deployments this interval can be reduced, and in a single-node cluster it can be safely set to zero.

memtable_flush_period_in_ms 0

Forces flushing of the memtable after the specified time in milliseconds elapses.

read_repair_chance 0.0 (Cassandra 2.1, Cassandra 2.0.9 and later) 0.1 (Cassandra 2.0.8 and earlier)

Specifies the basis for invoking read repairs on reads in clusters. The value must be between 0 and 1.

speculative_retry 99percentile Cassandra 2.0.2 and later

Overrides normal read timeout when read_repair_chance is not 1.0, sending another request to read. See speculative retry below.

Bloom filter The Bloom filter property is the desired false-positive probability for SSTable Bloom filters. When data is requested, the Bloom filter checks if the row exists before doing disk I/O. Bloom filter property value ranges from 0 to 1.0. The effects of the minimum and maximum values are: 0 Enables the unmodified, effectively the largest possible, Bloom filter 1.0 Disables the Bloom Filter The recommended setting is 0.1. A higher value yields diminishing returns.

CQL reference

caching Caching optimizes the use of cache memory without manual tuning. You set table properties to configure caching when you create or alter the table. Cassandra weights the cached data by size and access frequency. After configuring the caching table property, configure the global caching properties in the cassandra.yaml file. For information about global caching properties, see Cassandra 3.0 documentation. Configure the cache by creating a property map of values for the caching property. Options are: • •

keys: ALL or NONE rows_per_partition: number of CQL rows, NONE, or ALL

For example: CREATE TABLE DogTypes ( block_id uuid, species text, alias text, population varint, PRIMARY KEY (block_id) ) WITH caching = { 'keys' : 'NONE', 'rows_per_partition' : '120' };

comments Comments can be used to document CQL statements in your application code. Single line comments can begin with a double dash (--) or a double slash (//) and extend to the end of the line. Multi-line comments can be enclosed in /* and */ characters.

compaction The compaction property defines the compaction strategy class to use. Choose the compaction strategy that best fits your data and environment: Note: For more guidance, see the When to Use Leveled Compaction, Leveled Compaction in Apache Cassandra blog. •

•

SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. Also see STCS compaction subproperties. DateTieredCompactionStrategy (DTCS): Available in Cassandra 2.0.11 and 2.1.1 and later. This strategy is particularly useful for time series data. DateTieredCompactionStrategy stores data written within a certain period of time in the same SSTable. For example, Cassandra can store your last hour of data in one SSTable time window, and the next 4 hours of data in another time window, and so on. Compactions are triggered when the min_threshold (4 by default) for SSTables in those windows is reached. The most common queries for time series workloads retrieve the last hour/day/month of data. Cassandra can limit SSTables returned to those having the relevant data. Also, Cassandra can store data that has been set to expire using TTL in an SSTable with other data scheduled to expire at approximately the same time. Cassandra can then drop the SSTable without doing any compaction. Also see DTCS compaction subproperties. LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping SSTables. This can improve performance for reads, because Cassandra can determine which SSTables in each level to check for the existence of row key data. This compaction strategy is modeled after Google's leveldb implementation. Also see LCS compaction subproperties.

CQL reference

Hybrid (leveled and size-tiered) compaction improvements to the leveled compaction strategy reduce the performance overhead on read operations when compaction cannot keep pace with write-heavy workload. When using the LCS, if Cassandra cannot keep pace with the workload, the compaction strategy switches to STCS until Cassandra catches up. For this reason, it is a best practice to configure the max_threshold subproperty for a table to use when the switch occurs. You can specify a custom strategy. Use the full class name as a string constant.

compression To configure compression, choose the LZ4Compressor, SnappyCompressor, or DeflateCompressor property to use in creating or altering a table. Use an empty string ('') to disable compression, as shown in the example of how to use subproperties. Choosing the right compressor depends on your requirements for space savings over read performance. LZ4 is fastest to decompress, followed by Snappy, then by Deflate. Compression effectiveness is inversely correlated with decompression speed. The extra compression from Deflate or Snappy is not enough to make up for the decreased performance for generalpurpose workloads, but for archival data they may be worth considering. Developers can also implement custom compression classes using the org.apache.cassandra.io.compress.ICompressor interface. Specify the full class name enclosed in single quotation marks. Also use the compression subproperties.

speculative retry To override normal read timeout when read_repair_chance is not 1.0, sending another request to read, choose one of these values and use the property to create or alter the table: • • • •

ALWAYS: Retry reads of all replicas. Xpercentile: Retry reads based on the effect on throughput and latency. Yms: Retry reads after specified milliseconds. NONE: Do not retry reads.

Using the speculative retry property, you can configure rapid read protection in Cassandra 2.0.2 and later. Use this property to retry a request after some milliseconds have passed or after a percentile of the typical read latency has been reached, which is tracked per table. For example: ALTER TABLE users WITH speculative_retry = '10ms'; Or: ALTER TABLE users WITH speculative_retry = '99percentile'; Related information Cassandra 2.0 cassandra.yaml Cassandra 2.1 cassandra.yaml Cassandra 2.2 cassandra.yaml Cassandra 3.0 cassandra.yaml

Compaction subproperties Constructing a map of the compaction property and its subproperties.

Using CQL, you can configure a table to use SizeTieredCompactionStrategy (STCS), DateTieredCompactionStrategy (DTCS), and LeveledCompactionStrategy (LCS). You construct a map of the compaction property and the following subproperties: CQL Compaction Subproperties for STCS

CQL reference

Compaction Subproperties

Default

Description

bucket_high

1.5

Size-tiered compaction considers SSTables to be within the same bucket if the SSTable size diverges by 50% or less from the default bucket_low and default bucket_high values: [average-size × bucket_low, average-size × bucket_high].

bucket_low

0.5

Same as above.

cold_reads_to_omit

0.05

The maximum percentage of reads/sec that ignored SSTables may account for. The recommended range of values is 0.0 and 1.0. See cold_reads_to_omit.

enabled

true

Enables background compaction. See Enabling and disabling background compaction.

max_threshold

Sets the maximum number of SSTables to allow in a minor compaction.

min_threshold

Sets the minimum number of SSTables to trigger a minor compaction.

min_sstable_size

50MB

STCS groups SSTables for compaction into buckets. The bucketing process groups SSTables that differ in size by less than 50%. This results in a bucketing process that is too fine grained for small SSTables. If your SSTables are small, use min_sstable_size to define a size threshold (in bytes) below which all SSTables belong to one unique bucket.

only_purge_repaired_tombstones false

In Cassandra 3.0 and later, to avoid users resurrected data if repair is not run within gc_grace_seconds, this option allows purging only tombstones from repaired SSTables if set to "true". If you do not run repair for a long time, all tombstones are kept and may cause problems.

tombstone_compaction_interval 1 day

The minimum time to wait after an SSTable creation time before considering the SSTable for tombstone compaction. Tombstone compaction is the compaction triggered if the SSTable has more garbage-collectable tombstones than tombstone_threshold.

tombstone_threshold

A ratio of garbage-collectable tombstones to all contained columns, which if exceeded by the SSTable triggers compaction (with no other SSTables) for the purpose of purging the tombstones.

0.2

unchecked_tombstone_compaction false

True enables more aggressive than normal tombstone compactions. A single SSTable tombstone compaction runs without checking the likelihood of success. Cassandra 2.0.9 and later.

CQL Compaction Subproperties for DTCS

Compaction Subproperties

Default

Description

base_time_seconds

3600 (1 hour) Set to the size of the first time window.

CQL reference

Compaction Subproperties

Default

Description

enabled

true

Set to enable background compaction. See Enabling and disabling background compaction.

max_sstable_age_days

1000

Stop compacting SSTables only having data older than these specified days. Fractional days can be set. This parameter is deprecated in Casandra3.2.

max_window_size_seconds

86,4000

Set the maximum window size in seconds. The default is 1 day.

max_threshold

Set the maximum number of SSTables to allow in a minor compaction.

min_threshold

Set the minimum number of SSTables to trigger a minor compaction.

timestamp_resolution

MICROSECONDS Set to MICROSECONDS or MILLISECONDS, depending on the timestamp unit of the data you insert

tombstone_compaction_interval 1 day

Set to the minimum time to wait after an SSTable creation time before considering the SSTable for tombstone compaction. Tombstone compaction is the compaction triggered if the SSTable has more garbage-collectable tombstones than tombstone_threshold.

tombstone_threshold

Set the ratio of garbage-collectable tombstones to all contained columns, which if exceeded by the SSTable triggers compaction (with no other SSTables) for the purpose of purging the tombstones.

0.2

unchecked_tombstone_compaction false

Set to True enables more aggressive than normal tombstone compactions. A single SSTable tombstone compaction runs without checking the likelihood of success.

CQL Compaction Subproperties for LCS Compaction Subproperties

Default

Description

enabled

true

Set to enable background compaction. See cold_reads_to_omit below.

sstable_size_in_mb

160MB

Set the target size for SSTables that use the leveled compaction strategy. Although SSTable sizes should be less or equal to sstable_size_in_mb, it is possible to have a larger SSTable during compaction. This occurs when data for a given partition key is exceptionally large. The data is not split into two SSTables.

tombstone_compaction_interval 1 day

Set the minimum time to wait after an SSTable creation time before considering the SSTable for tombstone compaction. Tombstone compaction is the compaction triggered if the SSTable has more garbage-collectable tombstones than tombstone_threshold.

tombstone_threshold

Set a ratio of garbage-collectable tombstones to all contained columns, which if exceeded by the SSTable

0.2

CQL reference

Compaction Subproperties

Default

Description triggers compaction (with no other SSTables) for the purpose of purging the tombstones.

unchecked_tombstone_compaction false

Set to True enables more aggressive than normal tombstone compactions. A single SSTable tombstone compaction runs without checking the likelihood of success.

cold_reads_to_omit Using SizeTieredCompactionStrategy, you can configure the maximum percentage of reads/sec that ignored SSTables may account for. The recommended range of values is 0.0 and 1.0. In Cassandra 2.1 and later, Cassandra ignores the coldest 5% of SSTables. In Cassandra 2.0.3 and later, the cold_reads_to_omit is (0.0) by default: all SSTables are compacted. You can increase the cold_reads_to_omit property value to tune performance per table. A value of 1.0 completely disables compaction. The "Optimizations around Cold SSTables" blog includes detailed information tuning performance using this property, which avoids compacting cold SSTables. Use the ALTER TABLE command to configure cold_reads_to_omit.

Enabling and disabling background compaction The following example uses the enable property to disable background compaction: ALTER TABLE mytable WITH COMPACTION = {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false'} Disabling background compaction can prove harmful, as disk space is never regained and zombies, tombstones that rewrite data as new data, are possible. In most cases, although compaction uses I/O, it is better to leave it enabled, rather than disabling it.

Compression subproperties Configuring compression for a table.

Using CQL, you can configure compression for a table by constructing a map of the compaction property and the following subproperties: Table: CQL Compression Subproperties

Compression Subproperties

Default

Description

class

LZ4Compressor The compression algorithm to use. Valid values are LZ4Compressor, SnappyCompressor, and DeflateCompressor. See sstable_compression below.

chunk_length_kb

64KB

On disk, SSTables are compressed by block to allow random reads. This subproperty of compression defines the size (in KB) of the block. Values larger than the default value might improve the compression rate, but increases the minimum size of data to be read from disk when a read occurs. The default value is a good middle-ground for compressing tables. Adjust compression size to account for read/write access patterns (how much data is typically requested at once) and the average size of rows in the table.

CQL reference

Compression Subproperties

Default

Description

crc_check_chance

1.0

When compression is enabled, each compressed block includes a checksum of that block for the purpose of detecting disk bitrot and avoiding the propagation of corruption to other replica. This option defines the probability with which those checksums are checked during read. By default they are always checked. Set to 0 to disable checksum checking and to 0.5, for instance, to check them on every other read.

sstable_compression The compression algorithm to use. Valid values are LZ4Compressor SnappyCompressor, and DeflateCompressor. Use an empty string ('') to disable compression: ALTER TABLE mytable WITH COMPRESSION = {'sstable_compression': ''}; Choosing the right compressor depends on your requirements for space savings over read performance. LZ4 is fastest to decompress, followed by Snappy, then by Deflate. Compression effectiveness is inversely correlated with decompression speed. The extra compression from Deflate or Snappy is not enough to make up for the decreased performance for general-purpose workloads, but for archival data they may be worth considering. Developers can also implement custom compression classes using the org.apache.cassandra.io.compress.ICompressor interface. Specify the full class name as a "string constant".

Functions CQL support functions for transforming one or more column values into a new value.

CQL supports several functions that transform one or more column values into a new value. Aggregation functions are not supported. • • • • •

Blob conversion functions UUID and Timeuuid functions Token function WRITETIME function TTL function

Use the token function to compute the token for a given partition key. The exact signature of the token function depends on the table and partitioner used by the cluster. The type of the arguments to the token function depends on the type of the partition key columns. The return type depends on the partitioner in use: • • •

Murmur3Partitioner, bigint RandomPartitioner, varint ByteOrderedPartitioner, blob

For instance, in a cluster using the default Murmur3Partitioner, the token function that computes the token for the partition key of this table takes a single argument of type text. The partition key is userid. There is no clustering column so the partition key is the same as the primary key, and the return type is bigint. CREATE TABLE users ( userid text PRIMARY KEY, username text, ... )

CQL reference

Regardless of the partitioner in use, Cassandra does not support non-equal conditional operations on the partition key. Use the token function for range queries on the partition key.

CQL limits Upper CQL limits.

Observe the following upper limits: 31

• • • • • • •

Columns in a partition: 2B (2 ); single column value size: 2 GB (x MB is recommended) 16 Clustering column value, length of: 65535 (2 -1) 16 Key length: 65535 (2 -1) 16 Query parameters in a query: 65535 (2 -1) 16 Statements in a batch: 65535 (2 -1) 15 Fields in a tuple: 32768 (2 ) (just a few fields, such as 2-10, are recommended) 31 16 Collection (List): collection size: 2B (2 ); values size: 65535 (2 -1) (Cassandra 2.1 and later, using native protocol v3) 31 16 Collection (Set): collection size: 2B (2 ); values size: 65535 (2 -1) (Cassandra 2.1 and later, using native protocol v3) 31 16 16 Collection (Map): collection size: 2B (2 ); number of keys: 65535 (2 -1); values size: 65535 (2 -1) (Cassandra 2.1 and later, using native protocol v3) Blob size: 2 GB ( less than 1 MB is recommended)

• • •

Note: The limits specified for collections are for non-frozen collections.

cqlsh commands The cqlsh commands.

cqlsh Start the CQL interactive terminal.

Start the CQL interactive terminal.

Synopsis cqlsh [options] [host [port]] python cqlsh [options] [host [port]]

Description The Cassandra installation includes the cqlsh utility, a python-based command line client for executing Cassandra Query Language (CQL) commands. The cqlsh command is used on the Linux or Windows command line to start the cqlsh utility. On Windows, the keyword python is used if the PATH environment variable does not point to the python installation. You can use cqlsh to execute CQL commands interactively. cqlsh supports tab completion. You can also execute cqlsh commands, such as TRACE.

Requirements In Cassandra 2.1+, the cqlsh utility uses the native protocol. In Cassandra 2.1+, which uses the Datastax python driver, the default cqlsh listen port is 9042. In Cassandra 2.0, the cqlsh utility uses the Thrift transport. In Cassandra 2.0.x, the default cqlsh listen port is 9160. By default, Cassandra 2.0.x and earlier enables Thrift by configuring start_rpc to true in the

CQL reference

cassandra.yaml file. The cqlsh utility uses the Thrift RPC service. Also, firewall configuration to allow access through the Thrift port might be required. For more information about configuration, see the Cassandra 2.2 cassandra.yaml or Cassandra 2.0 cassandra.yaml file.

Options -C, --color Always use color output. --debug Show additional debugging information. --cqlshrc path Use an alternative cqlshrc file location, path. (Cassandra 2.1.1) -e cql_statement, --execute cql_statement Accept and execute a CQL command in Cassandra 2.1 and later. Useful for saving CQL output to a file. -f file_name, --file=file_name Execute commands from file_name, then exit. -h, --help Show the online help about these options and exit. -k keyspace_name Use the given keyspace. Equivalent to issuing a USE keyspace command immediately after starting cqlsh. --no-color Never use color output. -p password Authenticate using password. Default = cassandra. -t transport_factory_name, --transport=transport_factory_name Use the provided Thrift transport factory function. -u user_name Authenticate as user. Default = cassandra. --version Show the cqlsh version.

Creating and using cqlshrc file You can create cqlshrc file that resides in the hidden .cassandra directory in your home directory. When present, the file can pass default configuration information to cqlsh. A sample file looks like this: ; Sample ~/.cassandra/cqlshrc file. [authentication] username = fred password = !!bang!! The Cassandra installation includes a cqlshrc.sample file in the conf directory of a tarball installation. On Windows, in Command Prompt, create this file by copying the cqlshrc.sample file from the conf directory to the hidden .cassandra folder your user home folder, and renaming it to cqlshrc. You can use a cqlshrc file to configure SSL encryption instead of overriding the SSL_CERTFILE environmental variables repeatedly. Cassandra internal authentication must be configured before users can use the authentication options.

CQL reference

cqlshrc options You configure the cqlshrc file by setting these options in the [authentication], [ui], or [ssl] sections of the file. [authentication] options are: keyspace Use the given keyspace. Equivalent to issuing a USE keyspace command immediately after starting cqlsh. password Authenticate using password. username Authenticate as user. [connection] option (Cassandra 2.1.1+) is: client_timeout Configures the cqlsh timeout in seconds. Set to None or the number of seconds of inactivity that triggers timeout. [csv] option (Cassandra 2.0.15+, 2.1.5+) is: field_size_limit Configures the cqlsh field size. Set to a particular field size. For instance, field_size_limit (1000000000). [ssl] options are covered in the Cassandra documentation. [ui] options are: color Always use color output. completekey Use this key for autocompletion of a cqlsh shell entry. Default is the tab key. float_precision Use this many decimal digits of precision. Default = 5. time_format Configure the output format of database objects using Python strftime syntax.

Using CQL commands On startup, cqlsh shows the name of the cluster, IP address, and the port used for connection to the cqlsh utility. The cqlsh prompt initially is cqlsh>. After you specify a keyspace to use, the prompt includes the name of the keyspace. For example: cqlsh 1.2.3.4 9042 -u jdoe -p mypassword Connected to Test Cluster at 1.2.3.4:9042. [cqlsh 5.0.1 | Cassandra 2.2.0 | CQL spec 3.3.0 | Native protocol v4] Use HELP for help. cqlsh>USE mykeyspace; cqlsh:mykeyspace> At the cqlsh prompt, type CQL commands. Use a semicolon to terminate a command. A new line does not terminate a command, so commands can be spread over several lines for clarity. cqlsh> USE demo_cl; cqlsh:demo_cl> SELECT * FROM demo_table ... WHERE id = 0; If a command is sent and executed successfully, results are sent to standard output.

CQL reference

The lexical structure of commands, covered earlier in this reference, includes how upper- and lower-case literals are treated in commands, when to use quotation marks in strings, and how to enter exponential notation.

Saving CQL output in a file Using the -e option to the cqlsh command followed by a CQL statement, enclosed in quotation marks, accepts and executes the CQL statement. For example, to save the output of a SELECT statement to myoutput.txt: cqlsh -e "SELECT * FROM mytable" > myoutput.txt

Using files as input To execute CQL commands in a file, use the -f option and the path to the file on the operating system command line. Or, after you start cqlsh, use the SOURCE command and the path to the file on the cqlsh command line.

cqlsh environment variables You can override the default cqlsh host and listen port by setting the CQLSH_HOST and CQLSH_PORT environment variables. You set the CQLSH_HOST to a host name or IP address. When configured, cqlsh uses the variables instead of the default values of localhost and port 9042 (Cassandra 2.1 or later) or 9160 (Cassandra 2.0.x). A host and port number given on the command line takes precedence over configured variables. Related information Cassandra 2.0 cassandra.yaml Cassandra 2.1 cassandra.yaml

CAPTURE Captures command output and appends it to a file.

Captures command output and appends it to a file.

Synopsis CAPTURE ('' | OFF )

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

CQL reference

Description To start capturing the output of a query, specify the path of the file relative to the current directory. Enclose the file name in single quotation marks. The shorthand notation in this example is supported for referring to $HOME.

Examples CAPTURE '~/mydir/myfile.txt' Output is not shown on the console while it is captured. Only query result output is captured. Errors and output from cqlsh-only commands still appear. To stop capturing output and return to normal display of output, use CAPTURE OFF. To determine the current capture state, use CAPTURE with no arguments.

CLEAR Clears cqlsh console screen.

Clears cqlsh console screen.

Synopsis CLEAR | CLS

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Clears the cqlsh console screen. A CTRL+L will also clear the cqlsh console screen.

CONSISTENCY Shows the current consistency level, or given a level, sets it.

Shows the current consistency level, or given a level, sets it.

Synopsis CONSISTENCY level

Synopsis Legend • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable

CQL reference

•

Orange ( and ) means not literal, indicates scope

Description Providing an argument to the CONSISTENCY command overrides the default consistency level of ONE, and configures the consistency level for future requests. Valid values are: ANY, ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM, SERIAL and LOCAL_SERIAL. Running the command without an argument shows the current consistency level.

Examples Use CONSISTENCY without arguments to discover the current consistency level. cqlsh> CONSISTENCY If you haven't changed the default, the output of the CONSISTENCY command with no arguments is: Current consistency level is ONE. Use CONSISTENCY with an argument to set the consistency level to a new value. cqlsh> CONSISTENCY QUORUM Consistency level set to QUORUM. The consistency level can be set to SERIAL or LOCAL_SERIAL, but only SELECT queries are supported. While the consistency level is set to either of these options, INSERT and UPDATE commands will fail. See SERIAL CONSISTENCYSERIAL CONSISTENCY on page 108 to set the serial consistency level. Note: Consistency level refers to the consistency level for all non-lightweight transaction reads and writes. Serial consistency level refers to the consistency level of lightweight transaction reads and writes where IF EXISTS and IF NOT EXISTS are used. cqlsh> CONSISTENCY SERIAL Consistency level set to SERIAL. Attempt to INSERT some data. cqlsh> INSERT INTO bar (foo) VALUES ('foo'); InvalidRequest: code=2200 [Invalid query] message="You must use conditional updates for serializable writes" cqlsh> INSERT INTO bar (foo) VALUES ('foo') IF NOT EXISTS; InvalidRequest: code=2200 [Invalid query] message="SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted by I don't care how many replicas commit it for non-SERIAL reads""

COPY Imports and exports CSV (comma-separated values) data to and from Cassandra.

Imports and exports CSV (comma-separated values) data to and from Cassandra.

CQL reference

Synopsis COPY table_name ( column, ...) FROM ( 'file_name' | STDIN ) WITH option = 'value' AND ... COPY table_name ( column , ... ) TO ( 'file_name' | STDOUT ) WITH option = 'value' AND ...

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

Description Using the COPY options in a WITH clause, you can change the CSV format. These tables describe the available options: Table: Common COPY options for TO and FROM

100

COPY Options

Default Value

Use To

DELIMITER

comma (,)

Set the character that separates fields having newline characters in the file.

QUOTE

quotation mark (")

Set the character that encloses field values.

ESCAPE

backslash (\)

Set the character that escapes literal uses of the QUOTE character.

HEADER

false

Set true to indicate that first row of the file is a header.

NULL

an empty string

Represents the absence of a value.

DATETIMEFORMAT

'%Y-%m-%d %H:%M: %S%z'

Set the time format for reading or writing CSV time data. The timestamp uses the strftime format. If not set, the default value is set to the time_format value in the cqlshrc file.

MAX ATTEMPTS

Set the maximum number of attempts for errors.

REPORTFREQUENCY

0.25

Set the frequency with which status is displayed, in seconds

DECIMALSEP

period (.)

Set a separator for decimal values

THOUSANDSSEP

None

Set a separator for thousands digit groups

BOOLSTYLE

True, False

Set a representation for boolean values for True and False. The values are case insensitive.,Example: yes,no or 1,0

NUMPROCESSES

Number of cores - 1

Set the number of worker processes. Maximum value is 16.

CQL reference

COPY Options

Default Value

Use To

CONFIGFILE

None

Specify a configuration file with the same format as the cqlshrc file to set WITH options. The following sections can be specified: [copy], [copy-to], [copyfrom], [copy:ks.table], [copy-to:ks.table], [copyfrom:ks.table], where is the keyspace name and is the tablename. Options are read from these sections, in the order specified above. Command line options always override options in configuration files. Depending on the COPY direction, only the relevant copy-from or copyto sections are used. If no configuration file is specified, the cqlshrc file is searched instead.

RATEFILE

None

Specify a file for printing output statistics.

Table: COPY FROM options COPY Options

Default Value

Use To

CHUNKSIZE

1,000

Set the size of chunks passed to worker processes.

INGESTRATE

100,000

Set an approximate ingest rate in rows per second. Must be set to a greater value than chunk size.

MAXBATCHSIZE

Set the maximum size of an import batch.

MINBATCHSIZE

Set the minimum size of an import batch.

MAXROWS

-1

Set the maximum number of rows. "-1" sets no maximum.

SKIPROWS

The number of rows to skip.

SKIPCOLS

None

Set a comma-separated list of column names to skip.

MAXPARSEERRORS

-1

Set the maximum global number of parsing errors. "-1" sets no maximum.

MAXINSERTERRORS

-1

Set the maximum global number of insert errors. "-1" sets no maximum.

ERRFILE

None

Set a file to store all rows that are not imported. If no value is set, the information is stored in import_ks_table.err where is the keyspace and is the table name.

TTL

3600

Set the time to live in seconds. By default, data will not expire.

COPY Options

Default Value

Use To

ENCODING

UTF8

Set the COPY TO command to output unicode strings.

PAGESIZE

1,000

Set the page size for fetching results.

PAGETIMEOUT

Set the page timeout for fetching results.

Table: COPY TO options

101

CQL reference

COPY Options

Default Value

Use To

BEGINTOKEN

None

Set the minimum token string for exporting data.

ENDTOKEN

None

Set the maximum token string for exporting data.

MAXREQUESTS

Set the maximum number of requests each worker process can work on in parallel.

MAXOUTPUTSIZE

-1

Set the maximum size of the output file, measured in number of lines. If a value is set, the output file will be split into segment when the value is exceeded. "-1" sets no maximum.

The ENCODING option is available only for the COPY TO command. This table shows that, by default, Cassandra expects the CSV data to consist of fields separated by commas (,), records separated by line separators (a newline, \r\n), and field values enclosed in double-quotation marks (""). Also, to avoid ambiguity, escape a literal double-quotation mark using a backslash inside a string enclosed in doublequotation marks ("\""). By default, Cassandra does not expect the CSV file to have a header record on the first line that consists of the column names. COPY TO includes the header in the output if HEADER=TRUE. COPY FROM ignores the first line if HEADER=TRUE. You cannot copy data to or from counter tables.

COPY FROM a CSV file By default, when you use the COPY FROM command, Cassandra expects every row in the CSV input to contain the same number of columns. The number of columns in the CSV input is the same as the number of columns in the Cassandra table metadata. Cassandra assigns fields in the respective order. To apply your input data to a particular set of columns, specify the column names in parentheses after the table name. COPY FROM loads rows from a CVS file in a parallel non-deterministic order. Empty data for a column is assumed by default as NULL value and will override a value; if the data is overwritten, unexpected results can occur. Data can be successfully loaded but with non-deterministic random results if there is more than one row in the CSV file with the same primary key. Having more than one row with the same primary key is not explicitly checked, so unintended results can occur. COPY FROM is intended for importing small datasets (a few million rows or less) into Cassandra. For importing larger datasets, use the Cassandra bulk loader.

COPY TO a CSV file For example, assume you have the following table in CQL: cqlsh> SELECT * FROM test.airplanes;

name | mach | manufacturer | year ---------------+------+--------------+-----P38-Lightning | 0.7 | Lockheed | 1937 After inserting data into the table, you can copy the data to a CSV file in another order by specifying the column names in parentheses after the table name: COPY airplanes (name, mach, year, manufacturer) TO 'temp.csv'

102

CQL reference

Specifying the source or destination files Specify the source file of the CSV input or the destination file of the CSV output by a file path. Alternatively, you can use the STDIN or STDOUT keywords to import from standard input and export to standard output. When using stdin, signal the end of the CSV data with a backslash and period ("\.") on a separate line. If the data is being imported into a table that already contains data, COPY FROM does not truncate the table beforehand. You can copy only a partial set of columns. Specify the entire set or a subset of column names in parentheses after the table name in the order you want to import or export them. By default, when you use the COPY TO command, Cassandra copies data to the CSV file in the order defined in the Cassandra table metadata. You can also omit listing the column names when you want to import or export all the columns in the order they appear in the source table or CSV file.

Roundtrip copying of a simple table Copy a table to a CSV file. 1. Using CQL, create a table named airplanes and copy it to a CSV file. CREATE KEYSPACE test WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 }; USE test; CREATE TABLE airplanes ( name text PRIMARY KEY, manufacturer ascii, year int, mach float ); INSERT INTO airplanes (name, manufacturer, year, mach) VALUES ('P38-Lightning', 'Lockheed', 1937, 0.7); COPY airplanes (name, manufacturer, year, mach) TO 'temp.csv'; 1 rows exported in 0.004 seconds. 2. Clear the data from the airplanes table and import the data from the temp.csv file. TRUNCATE airplanes; COPY airplanes (name, manufacturer, year, mach) FROM 'temp.csv'; 1 rows imported in 0.087 seconds. Copy data from standard input to a table. 1. Enter data directly during an interactive cqlsh session, using the COPY command defaults. TRUNCATE airplanes; COPY airplanes (name, manufacturer, year, mach) FROM STDIN; The output is: [Use \. on a line by itself to end input] [copy]

103

CQL reference

Copying collections Cassandra supports round-trip copying of collections to and from CSV files. To perform this example, download the sample code now. 1. Unzip the downloaded file named cql_collections.zip. 2. Copy/paste all the CQL commands from the cql_collections.txt file to the cqlsh command line. 3. Take a look at the contents of the songs table. The table contains a map of venues, a list of reviews, and a set of tags. cqlsh> SELECT * FROM music.songs; id |album|artist|data|reviews |tags | title|venue ------------+-----+------+----+-------------------+-----------------+----+---------------------------------------------------------------------7db1a490...| null| null |null|['hot dance music']| {'rock'}| null|{'2013-09-22...': 'The Fillmore', '2013-10-01...': 'The Apple Barrel'} a3e64f8f...| null| null |null| null|{'1973', 'blues'}| null|null 8a172618...| null| null |null| null|'2007', 'covers'}| null|null 4. Copy the music.songs table to a CSV file named songs-20140603.csv. cqlsh> COPY music.songs to 'songs-20140603.csv'; 3 rows exported in 0.006 seconds. 5. Check that the copy operation worked. cqlsh> exit; $ cat songs-20140603.csv 7db1a490...,,,,['hot dance music'],{'rock'},,"{'2013-09-22...': 'The Fillmore', '2013-10-01....': 'The Apple Barrel'}" a3e64f8f...,,,,,"{'1973', 'blues'}",, 8a172618...,,,,,"{'2007', 'covers'}",,

104

CQL reference

6. Start cqlsh again, and create a table definition that matches the data in the songs-204140603 file. cqlsh> CREATE TABLE music.imported_songs ( id uuid PRIMARY KEY, album text, artist text, data blob, reviews list, tags set, title text, venue map ); 7. Copy the data from the CSV file into the imported_songs table. cqlsh> COPY music.imported_songs from 'songs-20140603.csv'; 3 rows imported in 0.004 seconds.

DESCRIBE Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster.

Provides information about the connected Cassandra cluster, or about the objects stored in the cluster.

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

Description The DESCRIBE or DESC command outputs information about the connected Cassandra cluster, or about the objects stored on in the cluster. To query the system tables directly, use SELECT. In Linux the keyspace and table name arguments are case-sensitive and need to match the upper or lowercase names stored internally. In Windows, the keyspace and table name arguments are not casesensitive. Use the DESCRIBE commands to list objects by their internal names. Use DESCRIBE FULL SCHEMA if you need the schema of system_* keyspaces.

105

CQL reference

DESCRIBE functions in the following ways: DESCRIBE commands

Example

Description

DESCRIBE CLUSTER

DESCRIBE CLUSTER;

Output information about the connected Cassandra cluster. Cluster name, partitioner, and snitch are output. For non-system keyspace, the endpoint-range ownership information is also shown.

DESCRIBE KEYSPACES

DESCRIBE Output a list of all keyspace names. KEYSPACES;

DESCRIBE KEYSPACE

DESCRIBE KEYSPACE cycling;

Output the CQL command that could be used to recreate the given keyspace, and the objects in it, such as the tables, types and functions.

DESCRIBE [FULL] SCHEMA

DESCRIBE SCHEMA;

Output the CQL command that could be used to recreate the entire non-system schema. Use the FULL option to also include system keyspaces.

DESCRIBE TABLES

DESCRIBE TABLES;

Output a list of all tables in the current keyspace, or in all keyspaces if there is not current keyspace.

DESCRIBE TABLE .

DESCRIBE Output the CQL command that could be used to recreate TABLE the given table. upcoming_calendar;

DESCRIBE INDEX .

DESCRIBE INDEX team_entry;

Output the CQL command that could be used to recreate the given index.

DESCRIBE TYPES

DESCRIBE TYPES;

Output list of all user-defined types in the current keyspace, or in all keyspaces if there is no current keyspace.

DESCRIBE TYPE .

DESCRIBE TYPE basic_info;

Output the CQL command that can be used to recreate the given user-defined type.

DESCRIBE FUNCTIONS

DESCRIBE Output names of all user-defined functions in the given FUNCTIONS; keyspace, or in all keyspaces if there is no current keyspace.

DESCRIBE FUNCTION DESCRIBE Output the CQL command that could be used to recreate . FUNCTION the given user-defined function. cycling.myFunction; DESCRIBE AGGREGATES

DESCRIBE Output a list of all user-defined aggregates in the given AGGREGATES; keyspace, or in all keyspaces if there is no current keyspace.

DESCRIBE AGGREGATE DESCRIBE Output the CQL command that could be used to recreate . AGGREGATE the given user-defined aggregate. cycling.myAggregate; DESCRIBE MATERIALIZED DESCRIBE Output the CQL command that could be used to recreate VIEW MATERIALIZED the given materialized view. VIEW cyclistsByAge;

106

CQL reference

EXPAND Formats the output of a query vertically.

Formats the output of a query vertically.

Synopsis EXPAND ( ON | OFF )

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

Description This command lists the contents of each row of a table vertically, providing a more convenient way to read long rows of data than the default horizontal format. You scroll down to see more of the row instead of scrolling to the right. Each column name appears on a separate line in column one and the values appear in column two. Sample output of EXPAND ON is: cqlsh:my_ks> EXPAND ON Now printing expanded output cqlsh:my_ks> SELECT * FROM users; @ Row 1 ------------+---------------------------------------------userid | samwise emails | {'[emailprotected]', '[emailprotected]'} first_name | Samwise last_name | Gamgee todo | {'2013-09-22 12:01:00-0700': 'plant trees'} top_scores | null @ Row 2 ------------+---------------------------------------------userid | frodo emails | {'[emailprotected]', '[emailprotected]'} first_name | Frodo last_name | Baggins todo | {'2012-10-02 12:00:00-0700': 'throw my precious into mount doom'} top_scores | null (2 rows)

EXIT Terminates cqlsh.

Terminates cqlsh.

107

CQL reference

Synopsis EXIT | QUIT

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

PAGING Enables or disables query paging.

Enables or disables query paging.

Synopsis PAGING ( ON | OFF )

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description In Cassandra 2.1.1, you can use query paging in cqlsh. Paging provides the output of queries in 100line chunks followed by the more prompt. To get the next chunk, press the space bar. Turning paging on enables query paging for all further queries. Using no ON or OFF argument with the command shows the current query paging status.

SERIAL CONSISTENCY Sets the current serial consistency level.

Sets the current serial consistency level that will be used for lightweight transactions. The serial consistency level is different from the consistency level for non-lightweight transaction commands.

Synopsis SERIAL CONSISTENCY level

Synopsis Legend • •

108

Uppercase means literal Lowercase means not literal

CQL reference

• • • •

Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

Description Providing an argument to the SERIAL CONSISTENCY command sets the serial consistency level. Valid values are: SERIAL and LOCAL_SERIAL.

Examples Use SERIAL CONSISTENCY without an argument to set the serial consistency level to SERIAL. cqlsh> SERIAL CONSISTENCY Serial consistency level set to SERIAL. cqlsh> SERIAL CONSISTENCY LOCAL_SERIAL Serial consistency level set to LOCAL_SERIAL. Attempt to INSERT some data. cqlsh> INSERT INTO bar (foo) VALUES ('foo') IF NOT EXISTS; [APPLIED] | foo ------+---FALSE | foo A value of FALSE is returned if the value of foo already exists.

SHOW Shows the Cassandra version, host, or tracing information for the current cqlsh client session.

Shows the Cassandra version, host, or tracing information for the current cqlsh client session.

Synopsis SHOW VERSION | HOST | SESSION tracing_session_id

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

Description A SHOW command displays this information about the current cqlsh client session:

109

CQL reference

• • •

The version and build number of the connected Cassandra instance, as well as the CQL mode for cqlsh and the native protocol version used by the connected Cassandra instance. The host information of the Cassandra node that the cqlsh session is currently connected to. Tracing information for the current cqlsh session.

The SHOW SESSION command retrieves tracing session information, which is available for 24 hours. After that time, the tracing information time-to-live expires. These examples show how to use the commands. cqlsh:my_ks> SHOW version [cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.3 | Native protocol v3] cqlsh:my_ks> SHOW host Connected to Test Cluster at 127.0.0.1:9042. cqlsh:my_ks> SHOW SESSION d0321c90-508e-11e3-8c7b-73ded3cb6170 Sample output of SHOW SESSION is: Tracing session: d0321c90-508e-11e3-8c7b-73ded3cb6170 activity

| timestamp | source | source_elapsed ---------------------------------------------------------------------------------------+--------------+-----------+---------------execute_cql3_query | 12:19:52,372 | 127.0.0.1 | 0 Parsing CREATE TABLE emp (\n empID int,\n deptID int,\n first_name varchar,\n last_name varchar,\n PRIMARY KEY (empID, deptID)\n); | 12:19:52,372 | 127.0.0.1 | 153 Request complete | 12:19:52,372 | 127.0.0.1 | . . .

650

SOURCE Executes a file containing CQL statements.

Executes a file containing CQL statements.

Synopsis SOURCE 'file'

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

110

CQL reference

Description To execute the contents of a file, specify the path of the file relative to the current directory. Enclose the file name in single quotation marks. The shorthand notation in this example is supported for referring to $HOME:

Examples SOURCE '~/mydir/myfile.txt' The output for each statement, if there is any, appears in turn, including any error messages. Errors do not abort execution of the file. Alternatively, use the --file option to execute a file while starting CQL.

TRACING Enables or disables request tracing.

Enables or disables request tracing.

Synopsis TRACING ( ON | OFF )

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

Description To turn tracing read/write requests on or off, use the TRACING command. After turning on tracing, database activity creates output that can help you understand Cassandra internal operations and troubleshoot performance problems. For example, using the tracing tutorial you can see how different consistency levels affect operations. Some tracing messages refer to internals of the database that users cannot understand, but can provide to the Cassandra team for troubleshooting. For 24 hours, Cassandra saves the tracing information in sessions and events tables in the system_traces keyspace, which you query when using probabilistic tracing. For information about probabilistic tracing, see Cassandra 3.0 documentation or Cassandra 2.2 documentation. $ CREATE TABLE sessions ( session_id uuid PRIMARY KEY, coordinator inet, duration int, parameters map, request text, started_at timestamp ); CREATE TABLE events ( session_id uuid, event_id timeuuid, activity text, source inet,

111

CQL reference

source_elapsed int, thread text, PRIMARY KEY (session_id, event_id) ); The source_elapsed column stores the elapsed time in microseconds before the event occurred on the source node. To keep tracing information, copy the data in sessions and event tables to another location. Alternatively, use the tracing session id to retrieve session information using SHOW SESSION. Tracing session information expires after one day.

Tracing a write request This example shows tracing activity on a 3-node cluster created by ccm on Mac OSX. Using a keyspace that has a replication factor of 3 and an employee table similar to the one in "Using a compound primary key on page 18," the tracing shows that the coordinator performs the following actions: • • •

Identifies the target nodes for replication of the row. Writes the row to the commitlog and memtable. Confirms completion of the request. TRACING ON INSERT INTO emp (empID, deptID, first_name, last_name) VALUES (104, 15, 'jane', 'smith');

Cassandra provides a description of each step it takes to satisfy the request, the names of nodes that are affected, the time for each step, and the total time for the request. Tracing session: 740b9c10-3506-11e2-0000-fe8ebeead9ff activity | timestamp | source | source_elapsed -------------------------------------+--------------+----------+---------------execute_cql3_query | 16:41:00,754 | 127.0.0.1 | 0 Parsing statement | 16:41:00,754 | 127.0.0.1 | 48 Preparing statement | 16:41:00,755 | 127.0.0.1 | 658 Determining replicas for mutation | 16:41:00,755 | 127.0.0.1 | 979 Message received from /127.0.0.1 | 16:41:00,756 | 127.0.0.3 | 37 Acquiring switchLock read lock | 16:41:00,756 | 127.0.0.1 | 1848 Sending message to /127.0.0.3 | 16:41:00,756 | 127.0.0.1 | 1853 Appending to commitlog | 16:41:00,756 | 127.0.0.1 | 1891 Sending message to /127.0.0.2 | 16:41:00,756 | 127.0.0.1 | 1911 Adding to emp memtable | 16:41:00,756 | 127.0.0.1 | 1997 Acquiring switchLock read lock | 16:41:00,757 | 127.0.0.3 | 395 Message received from /127.0.0.1 | 16:41:00,757 | 127.0.0.2 | 42 Appending to commitlog | 16:41:00,757 | 127.0.0.3 | 432

112

CQL reference

Acquiring switchLock read lock | 16:41:00,757 | 127.0.0.2 | 168 Adding to emp memtable | 16:41:00,757 | 127.0.0.3 | 522 Appending to commitlog | 16:41:00,757 | 127.0.0.2 | 211 Adding to emp memtable | 16:41:00,757 | 127.0.0.2 | 359 Enqueuing response to /127.0.0.1 1282 Enqueuing response to /127.0.0.1 1024 Sending message to /127.0.0.1 1469 Sending message to /127.0.0.1 1179 Message received from /127.0.0.2 10966 Message received from /127.0.0.3 10966 Processing response from /127.0.0.2 11063 Processing response from /127.0.0.3 11066 Request complete 11139

| 16:41:00,758 | 127.0.0.3 | | 16:41:00,758 | 127.0.0.2 | | 16:41:00,758 | 127.0.0.3 | | 16:41:00,758 | 127.0.0.2 | | 16:41:00,765 | 127.0.0.1 | | 16:41:00,765 | 127.0.0.1 | | 16:41:00,765 | 127.0.0.1 | | 16:41:00,765 | 127.0.0.1 | | 16:41:00,765 | 127.0.0.1 |

Tracing a sequential scan Due to the log structured design of Cassandra, a single row is spread across multiple SSTables. Reading one row involves reading pieces from multiple SSTables, as shown by this trace of a request to read the employee table, which was pre-loaded with 10 rows of data. SELECT * FROM emp; Output is:

113

CQL reference

execute_cql3_query | 17:47:32,511 | 127.0.0.1 |

249

383

| 17:47:32,512 | 127.0.0.1

114

| 17:47:32,512 | 127.0.0.2 | 17:47:32,513 | 127.0.0.2 | 17:47:32,513 | 127.0.0.2 | 17:47:32,514 | 127.0.0.2 | 17:47:32,514 | 127.0.0.2 | 17:47:32,514 | 127.0.0.2 | 17:47:32,514 | 127.0.0.2 | 17:47:32,514 | 127.0.0.2 | 17:47:32,514 | 127.0.0.2 | 17:47:32,515 | 127.0.0.1 | 17:47:32,515 | 127.0.0.1 | 17:47:32,665 | 127.0.0.1 | 17:47:32,665 | 127.0.0.2 | 17:47:32,666 | 127.0.0.2 | 17:47:32,667 | 127.0.0.2 | 17:47:32,667 | 127.0.0.2 | 17:47:32,667 | 127.0.0.2 | 17:47:32,667 | 127.0.0.2 | 17:47:32,667 | 127.0.0.2 | 17:47:32,668 | 127.0.0.1 | 17:47:32,668 | 127.0.0.1 | 17:47:32,669 | 127.0.0.1 | 17:47:32,669 | 127.0.0.1 | 17:47:32,669 | 127.0.0.1 | 17:47:32,669 | 127.0.0.1 | 17:47:32,669 | 127.0.0.1 | 17:47:32,669 | 127.0.0.1

CQL reference

Request complete | 17:47:32,670 | 127.0.0.1 |

159525

The sequential scan across the cluster shows: • • •

The first scan found 4 rows on node 2. The second scan found 2 more rows found on node 2. The third scan found the 4 rows on node 1.

Related information Cassandra 3.0 probabilistic tracing Cassandra 2.2 probabilistic tracing

CQL commands CQL commands.

ALTER KEYSPACE Change property values of a keyspace.

Change property values of a keyspace.

Synopsis ALTER ( KEYSPACE | SCHEMA ) keyspace_name WITH REPLICATION = map | ( WITH DURABLE_WRITES = ( true | false )) AND ( DURABLE_WRITES = ( true | false)) map is a map collection, a JSON-style array of literals: { literal : literal , literal : literal, ... }

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description ALTER KEYSPACE changes the map that defines the replica placement strategy and/or the DURABLE_WRITES value. You can also use the alias ALTER SCHEMA. Use these properties and values to construct the map. To set the replica placement strategy, construct a map of properties and values, as shown in the table of map properties on the CREATE KEYSPACE reference page. CQL property map keys must be lower case. For example, class and replication_factor are correct. You cannot change the name of the keyspace.

115

CQL reference

Example Change the definition of the mykeyspace to use the NetworkTopologyStrategy in a single data center. Use the default data center name in Cassandra and a replication factor of 3. ALTER KEYSPACE "Excalibur" WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };

ALTER MATERIALIZED VIEW Modify the properties of a materialized view in Cassandra 3.0 and later.

Modify the properties of a materialized view in Cassandra 3.0 and later.

Synopsis ALTER MATERIALIZED VIEW keyspace_name.view_name ( WITH property AND property ... ) property is a CQL table property and value, such as speculative_retry = '10ms'. Enclose a string property in single quotation marks.

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description ALTER MATERIALIZED VIEW manipulates the view metadata. You can change view properties. No results are returned. You can qualify view names by keyspace. For example, to alter the cyclist_by_age in the cycling keyspace: ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH ....

Modifying table properties To change the materialized view storage properties established during creation of the materialized view, use ALTER MATERIALIZED VIEW and a WITH directive that includes the property name and value. To change multiple properties, use AND as shown in this example: ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH comment = 'A most excellent and useful view' AND bloom_filter_fp_chance = 0.02; Enclose a text property value in single quotation marks.

116

CQL reference

Modifying compression and compaction Use a property map to alter a compression or compaction option. ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 }; ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'cold_reads_to_omit': 0.05}; Change the values of the caching property. For example, change the keys option from ALL, the default, to NONE and change the rows_per_partition to 15. Note: In Cassandra 3.0 and later, cold_reads_to_omit is removed for SizeTieredCompactionStrategy.

Changing caching In Cassandra 2.1 and later, you create and change the caching options using a property map. ALTER MATERIALIZED VIEW cycling.cyclist_by_age WITH caching = { 'keys' : 'NONE', 'rows_per_partition' : '15' }; Finally, take a look at the table definition. DESCRIBE MATERIALIZED VIEW cycling.cyclist_by_age CREATE MATERIALIZED VIEW cycling.cyclist_by_age AS SELECT age, cid, birthday, country, name FROM cycling.cyclist_mv WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid) WITH CLUSTERING ORDER BY (cid ASC) AND bloom_filter_fp_chance = 0.02 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = 'A most excellent and useful view' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.DeflateCompressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE';

ALTER ROLE Alter a role.

Alter existing role options.

Synopsis ALTER ROLE role_name WITH PASSWORD 'password' ( WITH PASSWORD 'password'| [SUPERUSER | NOSUPERUSER] )

117

CQL reference

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Superusers can change a role's password or superuser status. To prevent disabling all superusers, superusers cannot change their own superuser status. Ordinary roles can change only their own password. Enclose the role name in single quotation marks if it contains non-alphanumeric characters. Enclose the password in single quotation marks.

Examples ALTER ROLE coach WITH PASSWORD 'bestTeam';

ALTER TABLE Modify the column metadata of a table.

Modify the column metadata of a table.

Synopsis ALTER TABLE keyspace_name.table_name instruction instruction is: ALTER column_name TYPE cql_type | ( ADD column_name cql_type ) | ( DROP column_name ) | ( RENAME column_name TO column_name ) | ( WITH property AND property ... ) cql_type is compatible with the original type and is a CQL type, other than a collection or counter. Exceptions: ADD supports a collection type and also, if the table is a counter, a counter type. property is a CQL table property and value, such as speculative_retry = '10ms'. Enclose a string property in single quotation marks.

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

118

CQL reference

Description ALTER TABLE manipulates the table metadata. You can change the data storage type of columns, add new columns, drop existing columns, and change table properties. No results are returned. You can also use the alias ALTER COLUMNFAMILY. First, specify the name of the table to be changed after the ALTER TABLE keywords, followed by the type of change: ALTER, ADD, DROP, RENAME, or WITH. Next, provide the rest of the needed information, as explained in the following sections. You can qualify table names by keyspace. For example, to alter the addamsFamily table in the monsters keyspace: ALTER TABLE monsters.addamsFamily ALTER lastKnownLocation TYPE uuid;

Changing the type of a column To change the storage type for a column, the type you are changing from must be compatible with the type you are changing to. You can change an ascii type to text. You cannot change text (varchar) to ascii because every UTF8 string is not ascii. You can convert text to blobs. You cannot convert a blob to text because not every blob is not a UTF8 string. For example, change this type of the bio column in the users table from ascii to text, and then from text to blob. CREATE TABLE users ( user_name varchar PRIMARY KEY, bio ascii, ); ALTER TABLE users ALTER bio TYPE text; ALTER TABLE users ALTER bio TYPE blob; The column must already exist in current rows. The bytes stored in values for that column remain unchanged, and if existing data cannot be deserialized according to the new type, your CQL driver or interface might report errors. These changes to a column type are not allowed: • •

Changing the type of a clustering column. Changing columns on which an index is defined.

Altering the type of a column after inserting data can confuse CQL drivers/tools if the new type is incompatible with the data.

Adding a column To add a column, other than a column of a collection type, to a table, use ALTER TABLE and the ADD keyword in the following way: ALTER TABLE addamsFamily ADD gravesite varchar; To add a column of the collection type: ALTER TABLE users ADD top_places list; No validation of existing data occurs. These additions to a table are not allowed: • •

Adding a column having the same name as an existing column A static column

119

CQL reference

Dropping a column To drop a column from the table, use ALTER TABLE and the DROP keyword. Dropping a column removes the column from the table. ALTER TABLE addamsFamily DROP gender; ALTER DROP removes the column from the table definition, removes data pertaining to that column, and eventually reclaims the space formerly used by the column. The column is unavailable for querying immediately after being dropped. The actual data removal occurs during compaction; data is not included in SSTables in the future. To force the removal of dropped columns before compaction occurs, use the nodetool upgradesstables command followed by an ALTER TABLE statement, which updates the table metadata to register the drop. After re-adding a dropped column, a query does not return values written before the column was last dropped. Do not re-add a dropped column to a table using client-supplied timestamps, which is not a Cassandra-generated write time. You cannot drop columns from tables defined with the COMPACT STORAGE option.

Renaming a column The main purpose of the RENAME clause is to change the names of CQL-generated primary key and column names that are missing from a legacy table. Primary key columns can be renamed. You cannot rename an indexed column or a static column, which is supported in Cassandra 2.0.6 and later.

Modifying table properties To change the table storage properties established during creation of the table, use one of the following formats to alter a table: • •

ALTER TABLE and a WITH directive that includes the property name and value ALTER TABLE and a property map, shown in the next section

Using a WITH directive, for example, you can modify the read_repair_chance property, which configures the basis for invoking read repairs on reads in clusters configured for a non-quorum consistency. To change multiple properties, use AND as shown in this example: ALTER TABLE addamsFamily WITH comment = 'A most excellent and useful table' AND read_repair_chance = 0.2; Enclose a text property value in single quotation marks. You cannot modify properties of a table having compact storage.

Modifying compression and compaction Use a property map to alter a compression or compaction option. ALTER TABLE addamsFamily WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 }; ALTER TABLE mykeyspace.mytable WITH compaction = {'class': 'SizeTieredCompactionStrategy', 'cold_reads_to_omit': 0.05}; Change the values of the caching property. For example, change the keys option from ALL, the default, to NONE and change the rows_per_partition to 15. Note: In Cassandra 3.0 and later, cold_reads_to_omit is removed for SizeTieredCompactionStrategy.

120

CQL reference

Changing caching In Cassandra 2.1, you create and change the caching options using a property map. //Cassandra 2.1 only ALTER TABLE users WITH caching = { 'keys' : 'NONE', 'rows_per_partition' : '15' }; Next, change just the rows_per_partition to 25. //Cassandra 2.1 only ALTER TABLE users WITH caching = { 'rows_per_partition' : '25' }; Finally, take a look at the table definition. //Cassandra 2.1 only DESCRIBE TABLE users; CREATE TABLE mykeyspace.users ( user_name text PRIMARY KEY, bio blob ) WITH bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"NONE", "rows_per_partition":"25"}' AND comment = '' AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.1 AND speculative_retry = '99.0PERCENTILE'; In Cassandra 2.0.x, you alter the caching options using the WITH directive. //Cassandra 2.0.x only ALTER TABLE users WITH caching = "keys_only; Important: Use row caching in Cassandra 2.0.x with caution.

ALTER TYPE Modify a user-defined type. Cassandra 2.1 and later.

Modify a user-defined type. Cassandra 2.1 and later.

Synopsis ALTER TYPE field_name instruction instruction is: ALTER field_name TYPE new_type | ( ADD field_name new_type ) | ( RENAME field_name TO field_name )

121

CQL reference

( AND field_name TO field_name ) ... field_name is an arbitrary identifier for the field. new_type is an identifier other than the reserved type names.

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description ALTER TYPE can change a user-defined type in the following ways: • • •

Change the type of an existing field. Append a new field to an existing type. Rename a field defined by the type.

First, after the ALTER TYPE keywords, specify the name of the user-defined type to be changed, followed by the type of change: ALTER, ADD, or RENAME. Next, provide the rest of the needed information, as explained in the following sections.

Changing the type of a field To change the type of a field, the field must already exist in type definition and its type should be compatible with the new type. You can change an ascii type to text. You cannot change text (varchar) to ascii because every UTF8 string is not ascii. You can convert text to blobs. You cannot convert a blob to text because not every blob is not a UTF8 string. This example shows changing the type of the model field from ascii to text and then to blob. CREATE TYPE version ( model ascii, version_number int ); ALTER TYPE version ALTER model TYPE text; ALTER TYPE version ALTER model TYPE blob; You cannot change the type of these columns: • •

Clustering columns because doing so induces the on-disk ordering of rows Indexed columns

Adding a field to a type To add a new field to a type, use ALTER TYPE and the ADD keyword. ALTER TYPE version ADD release_date timestamp;

122

CQL reference

To add a collection map field called point_release in this example that represents the release date and decimal designator, use this syntax: ALTER TYPE version ADD point_release map;

Renaming a field of a type To change the name of a field of a user-defined type, use the RENAME old_name TO new_name syntax. You can't use different keyspaces prefixes for the old and new names. Make multiple changes to field names of a type by appending AND old_name TO new_name to the command. ALTER TYPE version RENAME model TO sku; ALTER TYPE version RENAME sku TO model AND version_number TO num

ALTER USER Alter existing user options.

Alter existing user options.

Synopsis ALTER USER user_name WITH PASSWORD 'password' ( NOSUPERUSER | SUPERUSER )

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Superusers can change a user's password or superuser status. To prevent disabling all superusers, superusers cannot change their own superuser status. Ordinary users can change only their own password. Enclose the user name in single quotation marks if it contains non-alphanumeric characters. Enclose the password in single quotation marks.

Examples ALTER USER moss WITH PASSWORD 'bestReceiver';

BATCH Write multiple DML statements.

Write multiple DML statements.

Synopsis Cassandra 2.1 and later. Note that the UNLOGGED option is deprecated from Cassandra 3.0. BEGIN UNLOGGED BATCH USING TIMESTAMP timestamp

123

CQL reference

dml_statement; dml_statement; ... APPLY BATCH; Cassandra 2.0.x BEGIN ( UNLOGGED | COUNTER ) BATCH USING TIMESTAMP timestamp dml_statement; dml_statement; ... APPLY BATCH; dml_statement is: • • •

INSERT UPDATE DELETE

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A BATCH statement combines multiple data modification language (DML) statements (INSERT, UPDATE, DELETE) into a single logical operation, and sets a client-supplied timestamp for all columns written by the statements in the batch. Batching multiple statements can save network exchanges between the client/ server and server coordinator/replicas. Cassandra spreads requests across nearby nodes as much as possible to optimize performance. Batch statements will have an uneven workload, and may access many nodes or a single node repeatedly. Using batches to optimize performance is generally not successful, as described in Using and misusing batches topic. Instead, using batches to synchronize data to tables is a legitimate operation. For information about the fastest way to load data, see "Cassandra: Batch loading without the Batch keyword." Batches are logged by default. In the context of a Cassandra batch operation, logged batches means that if any of the batch succeeds, all of it will. To achieve this, Cassandra first writes the serialized batch to the batchlog system table that consumes the serialized batch as blob data. When the rows in the batch have been successfully written and persisted (or hinted) the batchlog data is removed. There is a performance penalty associated with the batchlog, as it is written to two other nodes. If you do not want to incur this penalty, prevent Cassandra writing to the batchlog system by using the UNLOGGED keyword. Note: UNLOGGED BATCH is deprecated starting in Cassandra 3.0. Although a logged batch guarantees that if any part of the batch succeeds, all of it will, no other transactional enforcement is done at the batch level. For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. However, transactional row updates within a partition key are isolated: clients cannot read a partial update. Statement order does not matter within a batch; Cassandra applies the same timestamp to all rows. Use client-supplied timestamps to achieve a particular order.

124

CQL reference

Using a timestamp BATCH supports setting a client-supplied timestamp, an integer, in the USING clause with one exception: if a DML statement in the batch contains a compare-and-set (CAS) statement, such as the following statement, do not attempt to use a timestamp: cqlsh> INSERT INTO users (id, lastname) VALUES (999, 'Sparrow') EXISTS

IF NOT

The timestamp applies to all statements in the batch. If not specified, the current time of the insertion (in microseconds) is used. The individual DML statements inside a BATCH can specify a timestamp if one is not specified in the USING clause. For example, specify a timestamp in an INSERT statement. cqlsh> BEGIN BATCH INSERT INTO purchases (user, balance) VALUES ('user1', -8) USING TIMESTAMP 19998889022757000; INSERT INTO purchases (user, expense_id, amount, description, paid) VALUES ('user1', 1, 8, 'burrito', false); APPLY BATCH; Verify that balance column has the client-provided timestamp. cqlsh> SELECT balance, WRITETIME(balance) FROM PURCHASES; balance | writetime_balance ---------+-------------------8 | 19998889022757000

Batching conditional updates In Cassandra 2.0.6 and later, you can batch conditional updates introduced as lightweight transactions in Cassandra 2.0. Only updates made to the same partition can be included in the batch because the underlying Paxos implementation works at the granularity of the partition. You can group updates that have conditions with those that do not, but when a single statement in a batch uses a condition, the entire batch is committed using a single Paxos proposal, as if all of the conditions contained in the batch apply. This example shows batching of conditional updates: The statements for inserting values into purchase records use the IF conditional clause. cqlsh> BEGIN BATCH INSERT INTO purchases (user, balance) VALUES ('user1', -8) IF NOT EXISTS; INSERT INTO purchases (user, expense_id, amount, description, paid) VALUES ('user1', 1, 8, 'burrito', false); APPLY BATCH; cqlsh> BEGIN BATCH UPDATE purchases SET balance = -208 WHERE user='user1' IF balance = -8; INSERT INTO purchases (user, expense_id, amount, description, paid) VALUES ('user1', 2, 200, 'hotel room', false); APPLY BATCH; A continuation of this example shows how to use a static column with conditional updates in batch.

Batching counter updates In Cassandra 2.1 and later, batches of counters should use UNLOGGED because, unlike other writes in Cassandra, counter updates are not an idempotent operation.

125

CQL reference

In Cassandra 2.0, use BEGIN COUNTER BATCH in a batch statement for batched counter updates.

Cassandra 2.1 Example cqlsh> BEGIN UNLOGGED BATCH UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523; UPDATE AdminActionCounts SET total = total + 2 WHERE keyalias = 701; APPLY BATCH;

Cassandra 2.0 Example cqlsh> BEGIN COUNTER BATCH UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523; UPDATE AdminActionCounts SET total = total + 2 WHERE keyalias = 701; APPLY BATCH;

CREATE AGGREGATE Create user-defined aggregate.

Synopsis CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] .function-name ( ) SFUNC STYPE FINALFUNC INITCOND ( , )

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A user-defined aggregate function can be created using user-defined functions. Returns a single value that is the aggregate value over all rows returned in query.

Examples Create an aggregate that calculates a user-defined average. cqlsh> CREATE OR REPLACE AGGREGATE IF NOT EXISTS average ( int ) SFUNC avgState STYPE tuple FINALFUNC avgFinal INITCOND (0,0);

CREATE INDEX Define a new index on a single column of a table.

Define a new index on a single column of a table.

126

CQL reference

Synopsis CREATE CUSTOM INDEX IF NOT EXISTS index_name ON keyspace_name.table_name ( KEYS ( column_name ) ) (USING class_name) (WITH OPTIONS = map) Restrictions: Using class_name is allowed only if CUSTOM is used and class_name is a string literal containing a java class name. index_name is an identifier, enclosed or not enclosed in double quotation marks, excluding reserved words. map is a map collection, a JSON-style array of literals: { literal : literal, literal : literal ... }

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description CREATE INDEX creates a new index on the given table for the named column. Attempting to create an already existing index will return an error unless the IF NOT EXISTS option is used. If you use the option, the statement will be a no-op if the index already exists. Optionally, specify a name for the index itself before the ON keyword. Enclose a single column name in parentheses. It is not necessary for the column to exist on any current rows. The column and its data type must be specified when the table is created, or added afterward by altering the table. You can use dot notation to specify a keyspace for the table: keyspace name followed by a period followed the name of the table. Cassandra creates the table in the specified keyspace, but does not change the current keyspace; otherwise, if you do not use a keyspace name, Cassandra creates the index for the table within the current keyspace. If data already exists for the column, Cassandra indexes the data during the execution of this statement. After the index is created, Cassandra indexes new data for the column automatically when new data is inserted. Cassandra supports creating an index on most columns, including a clustering column of a compound primary key or on the partition (primary) key itself. Cassandra 2.1 and later supports creating an index on a collection or the key of a collection map. Cassandra rejects an attempt to create an index on both the collection key and value. Indexing can impact performance greatly. Before creating an index, be aware of when and when not to create an index. Counter columns cannot be indexed. Cassandra supports creating a custom index, which is primarily for internal use, and options that apply to the custom index. For example: CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass';

127

CQL reference

CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass' WITH OPTIONS = {'storage': '/mnt/ssd/indexes/'};

Creating an index on a column Define a table and then create an index on two of its columns: CREATE TABLE myschema.users ( userID uuid, fname text, lname text, email text, address text, zip int, state text, PRIMARY KEY (userID) ); CREATE INDEX user_state ON myschema.users (state); CREATE INDEX ON myschema.users (zip);

Creating an index on a clustering column Define a table having a composite partition key, and then create an index on a clustering column.

CREATE TABLE mykeyspace.users ( userID uuid, fname text, lname text, email text, address text, zip int, state text, PRIMARY KEY ((userID, fname), state) ); CREATE INDEX ON mykeyspace.users (state);

Creating an index on a collection In Cassandra 2.1 and later, create an index on a collection column as you would any other column. Enclose the name of the collection column in parentheses at the end of the CREATE INDEX statement. For example, add a collection of phone numbers to the users table to index the data in the phones set. ALTER TABLE users ADD phones set; CREATE INDEX ON users (phones); If the collection is a map, Cassandra creates an index on map values. Assume the users table contains this map data from the example of a todo map: {'2014-10-2 12:10' : 'die' } The map value is located to the right of the colon, 'die'. The map key, the timestamp, is located to the left of the colon. You can also create an index on map keys using a slightly different syntax. If an index of the map keys of the collection exists, drop that index before creating an index on the map collection values.

128

CQL reference

Creating an index on map keys In Cassandra 2.1 and later, you can create an index on map collection keys. If an index of the map values of the collection exists, drop that index before creating an index on the map collection keys. To index map keys, you use the KEYS keyword and map name in nested parentheses. For example, index the collection keys, the timestamps, in the todo map in the users table:

CREATE INDEX todo_dates ON users (KEYS(todo)); To query the table, you can use CONTAINS KEY in WHERE clauses.

Creating an index on the map entries In Cassandra 2.2 and later, you can create an index on map entries. An ENTRIES index can be created only on a map column of a table that doesn't have an existing index. To index collection entries, you use the ENTRIES keyword and map name in nested parentheses. For example, index the collection entries in a list in a race table: CREATE INDEX entries_idx ON race (ENTRIES(race_wins)); To query the table, you can use a WHERE clause.

Creating an index on a full collection In Cassandra 2.2 and later, you can create an index on a full FROZEN collection. An FULL index can be created on a set, list, or map column of a table that doesn't have an existing index. To index collection entries, you use the FULL keyword and collection name in nested parentheses. For example, index the list rnumbers. CREATE INDEX rnumbers_idx ON cycling.race_starts (FULL(rnumbers)); To query the table, you can use a WHERE clause.

CREATE FUNCTION Create user-defined function.

Synopsis CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] .function-name ( ) (CALLED | RETURNS NULL) ON NULL INPUT RETURNS LANGUAGE AS

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

129

CQL reference

A semicolon that terminates CQL statements is not included in the synopsis.

Description Create or replace a user-defined function (UDF) that defines a function using Java or Javascript.

Examples Compute logarithm of an input value. CALLED ON NULL INPUT ensures that the function will always be executed. CREATE OR REPLACE FUNCTION fLog (input double) CALLED ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(Math.log(input.doubleValue()));'; Compute logarithm of an input value. Return NULL if the input argument is NULL. CREATE OR REPLACE FUNCTION fLog (input double) RETURNS NULL ON NULL INPUT RETURNS double LANGUAGE java AS 'return Double.valueOf(Math.log(input.doubleValue()));';

CREATE KEYSPACE Define a new keyspace and its replica placement strategy.

Define a new keyspace and its replica placement strategy.

Synopsis CREATE ( KEYSPACE | SCHEMA ) IF NOT EXISTS keyspace_name WITH REPLICATION = map AND DURABLE_WRITES = ( true | false ) map is a map collection, a JSON-style array of literals: { literal : literal, literal : literal ... }

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description CREATE KEYSPACE creates a top-level namespace and sets the keyspace name, replica placement strategy class, replication factor, and DURABLE_WRITES options for the keyspace. For information about the replica placement strategy, see Cassandra 2.2 replica placement strategy or Cassandra 2.1 replica placement strategy. When you configure NetworkTopologyStrategy as the replication strategy, you set up one or more virtual data centers. Alternatively, you use the default data center. Use the same names for data centers as those used by the snitch. For information about the snitch, see Cassandra 2.2 snitch documentation or Cassandra 2.1 snitch documentation.

130

CQL reference

You assign different nodes, depending on the type of workload, to separate data centers. For example, assign Hadoop nodes to one data center and Cassandra real-time nodes to another. Segregating workloads ensures that only one type of workload is active per data center. The segregation prevents incompatibility problems between workloads, such as different batch requirements that affect performance. A map of properties and values defines the two different types of keyspaces: { 'class' : 'SimpleStrategy', 'replication_factor' : }; { 'class' : 'NetworkTopologyStrategy'[, '' : , '' : ] . . . }; Table: Table of map properties and values Property

Value

Value Description

'class'

'SimpleStrategy' or 'NetworkTopologyStrategy'

Required. The name of the replica placement strategy class for the new keyspace.

'replication_factor'

Required if class is SimpleStrategy; otherwise, not used. The number of replicas of data on multiple nodes.

Required if class is NetworkTopologyStrategy and you provide the name of the first data center. This value is the number of replicas of data on each node in the first data center. Example

Required if class is NetworkTopologyStrategy and you provide the name of the second data center. The value is the number of replicas of data on each node in the data center.

...

More replication factors for optional named data centers.

CQL property map keys must be lower case. For example, class and replication_factor are correct. Keyspace names are 48 or fewer alpha-numeric characters and underscores, the first of which is an alpha character. Keyspace names are case-insensitive. To make a name case-sensitive, enclose it in double quotation marks. You can use the alias CREATE SCHEMA instead of CREATE KEYSPACE. Attempting to create an already existing keyspace will return an error unless the IF NOT EXISTS option is used. If the option is used, the statement will be a no-op if the keyspace already exists.

Example of setting the SimpleStrategy class To construct the CREATE KEYSPACE statement, first declare the name of the keyspace, followed by the WITH REPLICATION keywords and the equals symbol. The name of the keyspace is case insensitive unless enclosed in double quotation marks. Next, to create a keyspace that is not optimized for multiple

131

CQL reference

data centers, use SimpleStrategy for the class value in the map. Set replication_factor properties, separated by a colon and enclosed in curly brackets. For example: CREATE KEYSPACE Excelsior WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; Using SimpleStrategy is fine for evaluating Cassandra. For production use or for use with mixed workloads, use NetworkTopologyStrategy.

Example of setting the NetworkToplogyStrategy class Using NetworkTopologyStrategy is also fine for evaluating Cassandra. To use NetworkTopologyStrategy for evaluation purposes using, for example, a single node cluster, specify the default data center name of the cluster. To determine the default data center name, use nodetool status. $ nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Rack UN 127.0.0.1 46.59 KB 256 100.0% e22e75e46432 rack1

Host ID dd867d15-6536-4922-b574-

Cassandra uses datacenter1 as the default data center name. Create a keyspace named NTSkeyspace on a single node cluster, for example: CREATE KEYSPACE NTSkeyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 1 }; To use NetworkTopologyStrategy with data centers in a production environment, you need to change the default snitch, SimpleSnitch, to a network-aware snitch, define one or more data center names in the snitch properties file, and use those data center name(s) to define the keyspace; otherwise, Cassandra will fail to find a node, to complete a write request, such as inserting data into a table. After configuring Cassandra to use a network-aware snitch, such as the PropertyFileSnitch, you define data center and rack names in the cassandra-topology.properties file. Construct the CREATE KEYSPACE statement using NetworkTopologyStrategy for the class value in the map. Set one or more key-value pairs consisting of the data center name and number of replicas per data center, separated by a colon and enclosed in curly brackets. For example: CREATE KEYSPACE "Excalibur" WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2}; This example sets three replicas for a data center named dc1 and two replicas for a data center named dc2. The data center name you use depends on the cluster-configured snitch you are using. There is a correlation between the data center name defined in the map and the data center name as recognized by the snitch you are using. The nodetool status command prints out data center names and rack locations of your nodes if you are not sure what they are.

132

CQL reference

Setting DURABLE_WRITES You can set the DURABLE_WRITES option after the map specification of the CREATE KEYSPACE command. When set to false, data written to the keyspace bypasses the commit log. Be careful using this option because you risk losing data. Do not set this attribute on a keyspace using the SimpleStrategy. CREATE KEYSPACE Risky WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 } AND DURABLE_WRITES = false;

Checking created keyspaces Check that the keyspaces were created: SELECT * FROM system.schema_keyspaces;

keyspace_name | durable_writes | strategy_class | strategy_options ---------------+---------------+-----------------------------------------------------+---------------------------excelsior | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"3"} Excalibur | True | org.apache.cassandra.locator.NetworkTopologyStrategy | {"dc2":"2","dc1":"3"} risky | False | org.apache.cassandra.locator.NetworkTopologyStrategy | {"datacenter1":"1"} system | True | org.apache.cassandra.locator.LocalStrategy | {} system_traces | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"} (5 rows) Cassandra converted the excelsior keyspace to lowercase because quotation marks were not used to create the keyspace and retained the initial capital letter for the Excalibur because quotation marks were used. Related information Cassandra 2.2 replication strategy Cassandra 2.1 replication strategy Cassandra 2.2 snitch configuration Cassandra 2.1 snitch configuration Cassandra 2.2 property file snitch Cassandra 2.1 property file snitch

CREATE MATERIALIZED VIEW Create a materialized view in Cassandra 3.0 and later.

Synopsis CREATE MATERIALIZED VIEW [IF NOT EXISTS] keyspace_name.view_name AS SELECT column1, column2, ... FROM

133

CQL reference

keyspace_name.table_name WHERE column1 IS NOT NULL AND column2 IS NOT NULL ... PRIMARY KEY(column1, column2, ...)

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description In Cassandra 3.0 and later, CREATE MATERIALIZED VIEW will create a new materialized view from a specified base table. To create a materialized view, certain requirements must be met. • •

The columns of the original table's primary key must be part of the materialized view's primary key. Only one new column may be added to the materialized view's primary key.

Examples A materialized view is created from a table. The fields used in the primary key must be constrained to nonNULL values in the WHERE phrase. CREATE MATERIALIZED VIEW cyclist_by_age AS SELECT age, birthday, name, country FROM cyclist_mv WHERE age IS NOT NULL AND cid IS NOT NULL PRIMARY KEY (age, cid);

CREATE TABLE Define a new table.

Define a new table.

Synopsis CREATE TABLE IF NOT EXISTS .table_name ( column_definition(, column_definition, ...)) WITH property (AND property ...) | option (AND option ...) column_definition is: column_name cql_type STATIC PRIMARY KEY | column_name PRIMARY KEY | column_name frozen PRIMARY KEY | column_name frozen... PRIMARY KEY | ( PRIMARY KEY ( partition_key ) ) Note: Frozen collections can be used for primary key columns. Non-frozen collections cannot be used for primary key columns. Restrictions: •

134

There should always be exactly one primary key definition.

CQL reference

• •

cql_type of the primary key must be a CQL data type or a user-defined type. cql_type of a collection uses this syntax: LIST | SET | MAP

PRIMARY KEY is: column_name | ( column_name1, column_name2, column_name3 ...) | ((column_name4, column_name5), column_name6, column_name7 ...) column_name1 is the partition key. column_name2, column_name3 ... are clustering columns. column_name4, column_name5 are partitioning keys. column_name6, column_name7 ... are clustering columns. option is: COMPACT STORAGE | ID | ( CLUSTERING ORDER BY (clustering_column ( ASC) | DESC ), ...) ) property is a CQL table property, enclosed in single quotation marks in the case of strings, or one of these directives: WITH compaction = { 'class' : 'LeveledCompactionStrategy' } AND read_repair_chance = 1.0

Synopsis legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description CREATE TABLE creates a new table under the current keyspace. Cassandra 2.1.1 and later supports the IF NOT EXISTS syntax for creating a trigger. Attempting to create an existing table returns an error unless the IF NOT EXISTS option is used. If the option is used, the statement if a no-op if the table already exists. A static column can store the same data in multiple clustered rows of a partition, and then retrieve that data with a single SELECT statement. You can add a counter column, which has been improved in Cassandra 2.1, to a table.

Defining a column You assign a type to columns during table creation. Column types, other than collection-type columns, are specified as a parenthesized, comma-separated list of column name and type pairs.

135

CQL reference

This example shows how to create a table that includes collection columns and a tuple. CREATE TABLE users ( userid text PRIMARY KEY, first_name text, last_name text, emails set, top_scores list, todo map, tuple ); See "Creating a user-defined type" for information on creating UDTs. This example shows the use of a UDT in a table. CREATE TYPE fullname ( firstname text, lastname text ); CREATE TABLE users ( userid text PRIMARY KEY, username FROZEN, emails set, top_scores list, todo map, tuple );

Using compact storage If you need to conserve disk space, use the WITH COMPACT STORAGE directive that stores data in the legacy (Thrift) storage engine format. CREATE TABLE sblocks ( block_id uuid, subblock_id uuid, data blob, PRIMARY KEY (block_id, subblock_id) ) WITH COMPACT STORAGE;

Creating a table WITH ID If a table is accidentally dropped with DROP TABLE, this option can be used to recreate the table and run a commitlog replay to retrieve the data. CREATE TABLE users ( userid text PRIMARY KEY, emails set ) WITH ID='5a1c395e-b41f-11e5-9f22-ba0be0483c18';

Defining a primary key column The only schema information that must be defined for a table is the primary key and its associated data type.

The only schema information that must be defined for a table is the primary key and its associated data type. Unlike earlier versions, CQL does not require a column in the table that is not part of the primary key. A primary key can have any number (1 or more) of component columns. If the primary key consists of only one column, you can use the keywords, PRIMARY KEY, after the column definition: CREATE TABLE users ( user_name varchar PRIMARY KEY,

136

CQL reference

password varchar, gender varchar, session_token varchar, state varchar, birth_year bigint ); Alternatively, you can declare the primary key consisting of only one column in the same way as you declare a compound primary key. Do not use a counter column for a key.

Setting a table property Using the WITH clause and keyword arguments for configure caching, compaction, and other operations that Cassandra performs on new table.

Using the optional WITH clause and keyword arguments, you can configure caching, compaction, and a number of other operations that Cassandra performs on new table. You can use the WITH clause to specify the properties of tables listed in CQL table properties, including caching, table comments, compression, and compaction. Format the property as either a string or a map. Enclose a string property in single quotation marks. For example, to embed a comment in a table, you format the comment as a string property: CREATE TABLE MonkeyTypes ( block_id uuid, species text, alias text, population varint, PRIMARY KEY (block_id) ) WITH comment='Important biological records' AND read_repair_chance = 1.0;

To configure compression and compaction, you use property maps: CREATE TABLE DogTypes ( block_id uuid, species text, alias text, population varint, PRIMARY KEY (block_id) ) WITH compression = { 'sstable_compression' : 'DeflateCompressor', 'chunk_length_kb' : 64 } AND compaction = { 'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 }; To specify using compact storage or clustering order use the WITH clause. To configure caching in Cassandra 2.1, you also use a property map. // Cassandra 2.1 CREATE TABLE DogTypes ( ... block_id uuid, ... species text, ... alias text, ... population varint, ... PRIMARY KEY (block_id) ... ) WITH caching = '{ 'keys' : 'NONE', 'rows_per_partition' : '120' }';

137

CQL reference

To configure caching in Cassandra 2.0.x, you do not use a property map. Simply set the caching property to a value: // Cassandra 2.0.x only CREATE TABLE DogTypes ( block_id uuid, species text, alias text, population varint, PRIMARY KEY (block_id) ) WITH caching = 'keys_only'; Important: In Cassandra 2.0.x, use row caching with caution.

Using a compound primary key A compound primary key consists of more than one column and treats the first column declared in a definition as the partition key.

A compound primary key consists of more than one column and treats the first column declared in a definition as the partition key. To create a compound primary key, use the keywords, PRIMARY KEY, followed by the comma-separated list of column names enclosed in parentheses. CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar, PRIMARY KEY (empID, deptID) );

Using a composite partition key A composite partition key is a partition key consisting of multiple columns.

A composite partition key is a partition key consisting of multiple columns. You use an extra set of parentheses to enclose columns that make up the composite partition key. The columns within the primary key definition but outside the nested parentheses are clustering columns. These columns form logical sets inside a partition to facilitate retrieval. CREATE TABLE Cats ( block_id uuid, breed text, color text, short_hair boolean, PRIMARY KEY ((block_id, breed), color, short_hair) ); For example, the composite partition key consists of block_id and breed. The clustering columns, color and short_hair, determine the clustering order of the data. Generally, Cassandra will store columns having the same block_id but a different breed on different nodes, and columns having the same block_id and breed on the same node.

Using clustering order Ordering query results to make use of the on-disk sorting of columns.

You can order query results to make use of the on-disk sorting of columns. You can order results in ascending or descending order. The ascending order will be more efficient than descending. If you need results in descending order, you can specify a clustering order to store columns on disk in the reverse order of the default. Descending queries will then be faster than ascending ones.

138

CQL reference

The following example shows a table definition that changes the clustering order to descending by insertion time. create table timeseries ( event_type text, insertion_time timestamp, event blob, PRIMARY KEY (event_type, insertion_time) ) WITH CLUSTERING ORDER BY (insertion_time DESC);

Sharing a static column In a table that uses clustering columns, non-clustering columns can be declared static in the table definition.

In a table that uses clustering columns, non-clustering columns can be declared static in the table definition. Static columns are only static within a given partition. CREATE TABLE t ( k text, s text STATIC, i int, PRIMARY KEY (k, i) ); INSERT INTO t (k, s, i) VALUES ('k', 'I''m shared', 0); INSERT INTO t (k, s, i) VALUES ('k', 'I''m still shared', 1); SELECT * FROM t; Output is: k | s | i ---------------------------k | "I'm still shared" | 0 k | "I'm still shared" | 1 Restrictions •

A table that does not define any clustering columns cannot have a static column. The table having no clustering columns has a one-row partition in which every column is inherently static. A table defined with the COMPACT STORAGE directive cannot have a static column. A column designated to be the partition key cannot be static.

• •

You can batch conditional updates to a static column. In Cassandra 2.0.9 and later, you can use the DISTINCT keyword to select static columns. In this case, Cassandra retrieves only the beginning (static column) of the partition.

CREATE TRIGGER Registers a trigger on a table.

Registers a trigger on a table.

Synopsis CREATE TRIGGER IF NOT EXISTS trigger_name ON table_name USING 'java_class'

Synopsis Legend • •

Uppercase means literal Lowercase means not literal

139

CQL reference

• • • •

Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description The implementation of triggers includes the capability to register a trigger on a table using the familiar CREATE TRIGGER syntax. This implementation is experimental. CREATE TRIGGER myTrigger ON myTable USING 'org.apache.cassandra.triggers.InvertedIndex' In Cassandra 2.1 and later, you need to enclose trigger names that use uppercase characters in double quotation marks. The logic comprising the trigger can be written in any Java (JVM) language and exists outside the database. The Java class in this example that implements the trigger is named org.apache.cassandra.triggers and defined in an Apache repository. The trigger defined on a table fires before a requested DML statement occurs to ensures the atomicity of the transaction. Place the custom trigger code (JAR) in the triggers directory on every node. The custom JAR loads at startup. The location of triggers directory depends on the installation: • • •

Cassandra 2.0.x tarball: install_location/lib/triggers Cassandra 2.1.x tarball: install_location/conf/triggers Datastax Enterprise 4.5 and later: Installer-No Services and tarball: install_location/ resources/cassandra/conf/triggers Datastax Enterprise 4.5 and later: Installer-Services and packages: /etc/dse/cassandra/ triggers

•

Cassandra 2.1.1 and later supports lightweight transactions for creating a trigger. Attempting to create an existing trigger returns an error unless the IF NOT EXISTS option is used. If the option is used, the statement is a no-op if the table already exists.

CREATE ROLE Create a new role.

Create a new role.

Synopsis CREATE ROLE [IF NOT EXISTS] role_name WITH PASSWORD ='password' [AND LOGIN = boolean] [AND SUPERUSER = boolean]

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

140

CQL reference

Description CREATE ROLE defines a new database role. By default roles do not have superuser status. Only a superuser can issue CREATE ROLE requests. Roles accounts are used for logging in under internal authentication and authorization. Enclose the role name in single quotation marks if it contains non-alphanumeric characters. You cannot recreate an existing role. To change the superuser status or password, use ALTER ROLE.

Creating internal roles You need to use the WITH PASSWORD clause when creating a role for internal authentication. Enclose the password in single quotation marks. CREATE ROLE coach WITH PASSWORD = 'NinersRule' AND LOGIN = true AND SUPERUSER = true; If internal authentication has not been set up, you do not need the WITH PASSWORD clause: CREATE ROLE manager NOSUPERUSER;

Creating a role conditionally In Cassandra 2.2 and later, you can test that the role does not exist before attempting to create one. Attempting to create an existing role results in an invalid query condition unless the IF NOT EXISTS option is used. If the option is used, the statement will be a no-op if the role exists. cqlsh> CREATE ROLE IF NOT EXISTS new_role WITH PASSWORD = 'password';

CREATE USER Create a new user.

Create a new user.

Synopsis CREATE USER IF NOT EXISTS user_name WITH PASSWORD 'password' ( NOSUPERUSER | SUPERUSER )

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description CREATE USER defines a new database user account. By default users accounts do not have superuser status. Only a superuser can issue CREATE USER requests. User accounts are required for logging in under internal authentication and authorization.

141

CQL reference

Enclose the user name in single quotation marks if it contains non-alphanumeric characters. You cannot recreate an existing user. To change the superuser status or password, use ALTER USER.

Creating internal user accounts You need to use the WITH PASSWORD clause when creating a user account for internal authentication. Enclose the password in single quotation marks. CREATE USER spillman WITH PASSWORD 'Niner27'; CREATE USER akers WITH PASSWORD 'Niner2' SUPERUSER; CREATE USER boone WITH PASSWORD 'Niner75' NOSUPERUSER; If internal authentication has not been set up, you do not need the WITH PASSWORD clause: CREATE USER test NOSUPERUSER;

Creating a user account conditionally In Cassandra 2.0.9 and later, you can test that the user does not have an account before attempting to create one. Attempting to create an existing user results in an invalid query condition unless the IF NOT EXISTS option is used. If the option is used, the statement will be a no-op if the user exists. $ bin/cqlsh -u cassandra -p cassandra Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 2.1.0 | CQL spec 3.3.0 | Native protocol v3] Use HELP for help. cqlsh> CREATE USER newuser WITH PASSWORD 'password'; cqlsh> CREATE USER newuser WITH PASSWORD 'password'; code=2200 [Invalid query] message="User newuser already exists" cqlsh> CREATE USER IF NOT EXISTS newuser WITH PASSWORD 'password'; cqlsh>

DELETE Removes entire rows or one or more columns from one or more rows.

Removes entire rows or one or more columns from one or more rows.

Synopsis DELETE column_name, ... | ( column_name term ) FROM keyspace_name.table_name USING TIMESTAMP integer WHERE row_specification ( IF ( EXISTS | ( condition( AND condition ) . . . term is: [ list_position ] | key_value row_specification is one of: primary_key_name = key_value primary_key_name IN ( key_value, key_value, ...)

142

) ) )

CQL reference

condition is: column_name = key_value | column_name [list_position] = key_value

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A DELETE statement removes one or more columns from one or more rows in a table, or it removes the entire row if no columns are specified. Cassandra applies selections within the same partition key atomically and in isolation.

Deleting columns or a row After the DELETE keyword, optionally list column names, separated by commas. DELETE col1, col2, col3 FROM Planeteers WHERE userID = 'Captain'; When no column names are specified, the entire row(s) specified in the WHERE clause are deleted. DELETE FROM MastersOfTheUniverse WHERE mastersID IN ('Man-At-Arms', 'Teela'); When a column is deleted, it is not removed from disk immediately. The deleted column is marked with a tombstone and then removed after the configured grace period has expired. The optional timestamp defines the new tombstone record.

Conditionally deleting columns In Cassandra 2.0.7 and later, you can conditionally delete columns using IF or IF EXISTS. Deleting a column is similar to making an insert or update conditionally. Conditional deletions incur a non-negligible performance cost and should be used sparingly.

Specifying the table The table name follows the list of column names and the keyword FROM.

Deleting old data You can identify the column for deletion using a timestamp. DELETE email, phone FROM users USING TIMESTAMP 1318452291034 WHERE user_name = 'jsmith';

143

CQL reference

The TIMESTAMP input is an integer representing microseconds. The WHERE clause specifies which row or rows to delete from the table. DELETE col1 FROM SomeTable WHERE userID = 'some_key_value'; This form provides a list of key names using the IN notation and a parenthetical list of comma-delimited key names. DELETE col1 FROM SomeTable WHERE userID IN (key1, key2); DELETE phone FROM users WHERE user_name IN ('jdoe', 'jsmith'); In Cassandra 2.0 and later, CQL supports an empty list of values in the IN clause, useful in Java Driver applications when passing empty arrays as arguments for the IN clause.

Using a collection set, list or map To delete an element from the map, use the DELETE command and enclose the key of the element in square brackets: DELETE todo ['2012-9-24'] FROM users WHERE user_id = 'frodo'; To remove an element from a list, use the DELETE command and the list index position in square brackets: DELETE top_places[3] FROM users WHERE user_id = 'frodo'; To remove all elements from a set, you can use the DELETE statement: DELETE emails FROM users WHERE user_id = 'frodo';

DESCRIBE Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster.

Provides information about the connected Cassandra cluster, or about the objects stored in the cluster.

Synopsis Legend • • • • •

144

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable

CQL reference

•

Orange ( and ) means not literal, indicates scope

Example

Description

DESCRIBE CLUSTER

DESCRIBE CLUSTER;

Output information about the connected Cassandra cluster. Cluster name, partitioner, and snitch are output. For non-system keyspace, the endpoint-range ownership information is also shown.

DESCRIBE KEYSPACES

DESCRIBE Output a list of all keyspace names. KEYSPACES;

DESCRIBE KEYSPACE

DESCRIBE KEYSPACE cycling;

Output the CQL command that could be used to recreate the given keyspace, and the objects in it, such as the tables, types and functions.

DESCRIBE [FULL] SCHEMA

DESCRIBE SCHEMA;

Output the CQL command that could be used to recreate the entire non-system schema. Use the FULL option to also include system keyspaces.

DESCRIBE TABLES

DESCRIBE TABLES;

Output a list of all tables in the current keyspace, or in all keyspaces if there is not current keyspace.

DESCRIBE TABLE .

DESCRIBE Output the CQL command that could be used to recreate TABLE the given table. upcoming_calendar;

DESCRIBE INDEX .

DESCRIBE INDEX team_entry;

Output the CQL command that could be used to recreate the given index.

DESCRIBE TYPES

DESCRIBE TYPES;

Output list of all user-defined types in the current keyspace, or in all keyspaces if there is no current keyspace.

DESCRIBE TYPE .

DESCRIBE TYPE basic_info;

Output the CQL command that can be used to recreate the given user-defined type.

DESCRIBE FUNCTIONS

DESCRIBE Output names of all user-defined functions in the given FUNCTIONS; keyspace, or in all keyspaces if there is no current keyspace.

DESCRIBE FUNCTION DESCRIBE Output the CQL command that could be used to recreate . FUNCTION the given user-defined function. cycling.myFunction; DESCRIBE AGGREGATES

DESCRIBE Output a list of all user-defined aggregates in the given AGGREGATES; keyspace, or in all keyspaces if there is no current keyspace.

145

CQL reference

DESCRIBE commands

Example

Description

DROP AGGREGATE Drop a user-defined aggregate.

Synopsis DROP AGGREGATE IF EXISTS keyspace_name.aggregate_name

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Drop a user-defined aggregate.

Examples Text cqlsh> DROP AGGREGATE [IF EXISTS] myAverage;

DROP FUNCTION Drop user-defined function (UDF).

Synopsis DROP FUNCTION IF EXISTS keyspace_name.function_name

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • •

146

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR

CQL reference

• •

Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Drop a user-defined function from a keyspace.

Examples Text cqlsh> DROP FUNCTION [IF EXISTS] fLog;

DROP INDEX Drop the named index.

Drop the named index.

Synopsis DROP INDEX IF EXISTS keyspace.index_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A DROP INDEX statement removes an existing index. If the index was not given a name during creation, the index name is __idx. If the index does not exists, the statement will return an error, unless IF EXISTS is used in which case the operation is a no-op. You can use dot notation to specify a keyspace for the index you want to drop: keyspace name followed by a period followed the name of the index. Cassandra drops the index in the specified keyspace, but does not change the current keyspace; otherwise, if you do not use a keyspace name, Cassandra drops the index for the table within the current keyspace.

Example DROP INDEX user_state; DROP INDEX users_zip_idx; DROP INDEX myschema.users_state;

DROP KEYSPACE Remove the keyspace.

Remove the keyspace.

147

CQL reference

Synopsis DROP ( KEYSPACE | SCHEMA ) IF EXISTS keyspace_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A DROP KEYSPACE statement results in the immediate, irreversible removal of a keyspace, including all tables and data contained in the keyspace. You can also use the alias DROP SCHEMA. If the keyspace does not exists, the statement will return an error unless IF EXISTS is used, in which case the operation is a no-op. Cassandra takes a snapshot of the keyspace before dropping it. In Cassandra 2.0.4 and earlier, the user was responsible for removing the snapshot manually.

Example DROP KEYSPACE MyTwitterClone;

DROP MATERIALIZED VIEW Remove the named materialized view.

Remove the named materialized view.

Synopsis DROP MATERIALIZED VIEW IF EXISTS keyspace_name.view_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A DROP MATERIALIZED VIEW statement results in the immediate, irreversible removal of a materialized view, including all data contained in the materialized view.

148

CQL reference

Example cqlsh> DROP MATERIALIZED VIEW cycling.cyclist_by_age;

DROP ROLE Remove a role.

Synopsis DROP ROLE IF EXISTS role_name

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description DROP ROLE removes an existing user. In Cassandra 2.2 and later, you can test that the role exists. Attempting to drop a role that does not exist results in an invalid query condition unless the IF EXISTS option is used. If the option is used, the statement will be a no-op if the role does not exist. You have to be logged in as a superuser to issue a DROP ROLE statement. Roles cannot drop themselves. Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.

Examples DROP ROLE IF EXISTS coach;

DROP TABLE Remove the named table.

Remove the named table.

Synopsis DROP TABLE IF EXISTS keyspace_name.table_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

149

CQL reference

Description A DROP TABLE statement results in the immediate, irreversible removal of a table, including all data contained in the table. You can also use the alias DROP COLUMNFAMILY.

Example DROP TABLE worldSeriesAttendees;

DROP TRIGGER Removes registration of a trigger.

Removes registration of a trigger.

Synopsis DROP TRIGGER IF EXISTS trigger_name ON table_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description The experimental DROP TRIGGER statement removes the registration of a trigger created using CREATE TRIGGER. Cassandra 2.1.1 and later supports the IF EXISTS syntax for dropping a trigger. Cassandra checks for the existence of the trigger before dropping it.

DROP TYPE Drop a user-defined type. Cassandra 2.1 and later.

Drop a user-defined type. Cassandra 2.1 and later.

Synopsis DROP TYPE IF EXISTS type_name type_name is the name of a user-defined type.

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

150

CQL reference

Description This statement immediately and irreversibly removes a type. To drop a type, use ALTER TYPE and the DROP keyword as shown in the following example. Attempting to drop a type that does not exist will return an error unless the IF EXISTS option is used. If the option is used, the statement will be a no-op if the type already exists. Dropping a user-defined type that is in use by a table or another type is not allowed. DROP TYPE version;

DROP USER Remove a user.

Remove a user.

Synopsis DROP USER IF EXISTS user_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description DROP USER removes an existing user. In Cassandra 2.0.9 and later, you can test that the user exists. Attempting to drop a user that does not exist results in an invalid query condition unless the IF EXISTS option is used. If the option is used, the statement will be a no-op if the user does not exist. You have to be logged in as a superuser to issue a DROP USER statement. Users cannot drop themselves. Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.

Examples DROP USER IF EXISTS boone;

GRANT Provide access to database objects.

Provide access to database objects.

Synopsis GRANT permission_name PERMISSION | ( GRANT ALL PERMISSIONS ) ON resource TO user_name | role_name permission_name is one of these: • • •

ALL ALTER AUTHORIZE

151

CQL reference

• • • •

CREATE DROP MODIFY SELECT

resource is one of these: • • •

ALL KEYSPACES KEYSPACE keyspace_name TABLE keyspace_name.table_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Permissions to access all keyspaces, a named keyspace, or a table can be granted to a user or role. Enclose the user name or role name in single quotation marks if it contains non-alphanumeric characters. This table lists the permissions needed to use CQL statements: Table: CQL Permissions Permission

CQL Statement

ALL

All statements

ALTER

ALTER KEYSPACE, ALTER TABLE, CREATE INDEX, DROP INDEX

AUTHORIZE

GRANT, REVOKE

CREATE

CREATE KEYSPACE, CREATE TABLE

DROP

DROP KEYSPACE, DROP TABLE

MODIFY

INSERT, DELETE, UPDATE, TRUNCATE

SELECT

To be able to perform SELECT queries on a table, you have to have SELECT permission on the table, on its parent keyspace, or on ALL KEYSPACES. To be able to CREATE TABLE you need CREATE permission on its parent keyspace or ALL KEYSPACES. You need to be a superuser or to have AUTHORIZE permission on a resource (or one of its parents in the hierarchy) plus the permission in question to be able to GRANT or REVOKE that permission to or from a user. GRANT, REVOKE and LIST permissions check for the existence of the table and keyspace before execution. GRANT and REVOKE check that the user exists.

152

CQL reference

Examples Give spillman permission to perform SELECT queries on all tables in all keyspaces: GRANT SELECT ON ALL KEYSPACES TO spillman; Give akers permission to perform INSERT, UPDATE, DELETE and TRUNCATE queries on all tables in the field keyspace. GRANT MODIFY ON KEYSPACE field TO akers; Give boone permission to perform ALTER KEYSPACE queries on the forty9ers keyspace, and also ALTER TABLE, CREATE INDEX and DROP INDEX queries on all tables in forty9ers keyspace: GRANT ALTER ON KEYSPACE forty9ers TO boone; Give boone permission to run all types of queries on ravens.plays table. GRANT ALL PERMISSIONS ON ravens.plays TO boone; Give the role coach permission to run all types of queries on ravens.plays table. GRANT ALL PERMISSIONS ON ravens.plays TO coach; Grant access to a keyspace to just one user, assuming nobody else has ALL KEYSPACES access. GRANT ALL ON KEYSPACE keyspace_name TO user_name;

INSERT Add or update columns.

Add or update columns.

Synopsis INSERT INTO keyspace_name.table_name ( identifier, column_name...) VALUES ( value, value ... ) IF NOT EXISTS USING option AND option Value is one of: • •

a literal a set { literal, literal, . . . }

•

a list [ literal, literal, . . . ]

•

a map collection, a JSON-style array of literals { literal : literal, literal : literal, . . . }

option is one of: • •

TIMESTAMP microseconds TTL seconds

153

CQL reference

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description An INSERT writes one or more columns to a record in a Cassandra table atomically and in isolation. No results are returned. You do not have to define all columns, except those that make up the key. Missing columns occupy no space on disk. If the column exists, it is updated. The row is created if none exists. Use IF NOT EXISTS to perform the insertion only if the row does not already exist. Using IF NOT EXISTS incurs a performance hit associated with using Paxos internally. For information about Paxos, see Cassandra 3.0 documentation. You can qualify table names by keyspace. INSERT does not support counters, but UPDATE does. Internally, insert and update operations are identical.

Specifying TIMESTAMP and TTL • •

Time-to-live (TTL) in seconds Timestamp in microseconds INSERT INTO Hollywood.NerdMovies (user_uuid, fan) VALUES (cfd66ccc-d857-4e90-b1e5-df98a3d40cd6, 'johndoe') USING TTL 86400;

Using a collection set or map To insert data into a collection, enclose values in curly brackets. Set values must be unique. For example: INSERT INTO users (userid, first_name, last_name, emails) VALUES('frodo', 'Frodo', 'Baggins', {'[emailprotected]', '[emailprotected]'}); Insert a map named todo to insert a reminder, 'die' on October 2 for user frodo. INSERT INTO users (userid, todo )

154

CQL reference

VALUES('frodo', {'2014-10-2 12:10' : 'die' } ); Values of items in collections are limited to 64K. To insert data into a collection column of a user-defined type, enclose components of the type in parentheses within the curly brackets, as shown in "Using a user-defined type." Related information Cassandra 2.1 tunable consistency Cassandra 2.0 tunable consistency

Example of inserting data into playlists An example showing how to insert data into the music service example playlists table.

The "Example of a music service" section described the playlists table. This example shows how to insert data into that table.

Procedure Use the INSERT command to insert UUIDs for the compound primary keys, title, artist, and album data of the playslists table. INSERT INTO playlists (id, song_order, song_id, title, artist, album) VALUES (62c36092-82a1-3a00-93d1-46196ee77204, 1, a3e64f8f-bd44-4f28-b8d9-6938726e34d4, 'La Grange', 'ZZ Top', 'Tres Hombres'); INSERT INTO playlists (id, song_order, song_id, title, artist, album) VALUES (62c36092-82a1-3a00-93d1-46196ee77204, 2, 8a172618-b121-4136-bb10-f665cfc469eb, 'Moving in Stereo', 'Fu Manchu', 'We Must Obey'); INSERT INTO playlists (id, song_order, song_id, title, artist, album) VALUES (62c36092-82a1-3a00-93d1-46196ee77204, 3, 2b09185b-fb5a-4734-9b56-49077de9edbf, 'Outside Woman Blues', 'Back Door Slam', 'Roll Away');

LIST PERMISSIONS List permissions granted to a user.

List permissions granted to a user.

Synopsis LIST permission_name PERMISSION | ( LIST ALL PERMISSIONS ) ON resource OF user_name NORECURSIVE permission_name is one of these: • • • • • •

ALTER AUTHORIZE CREATE DROP MODIFY SELECT

resource is one of these:

155

CQL reference

• • •

ALL KEYSPACES KEYSPACE keyspace_name TABLE keyspace_name.table_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Permissions checks are recursive. If you omit the NORECURSIVE specifier, permission on the requests resource and its parents in the hierarchy are shown. • •

Omitting the resource name (ALL KEYSPACES, keyspace, or table), lists permissions on all tables and all keyspaces. Omitting the user name lists permissions of all users. You need to be a superuser to list permissions of all users. If you are not, you must add OF

• •

Omitting the NORECURSIVE specifier, lists permissions on the resource and its parent resources. Enclose the user name in single quotation marks only if it contains non-alphanumeric characters.

After creating users in and granting the permissions in the GRANT examples, you can list permissions that users have on resources and their parents.

156

CQL reference

MODIFY AUTHORIZE SELECT

LIST ROLES List existing roles.

List existing roles

Synopsis LIST ROLES

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

157

CQL reference

Description Assuming you use internal authentication, created the roles in the CREATE ROLES examples, and have not yet changed the default user, the following example shows the output of LIST ROLES.

Example LIST ROLES; Output is:

LIST USERS List existing users and their superuser status.

List existing users and their superuser status.

Synopsis LIST USERS

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Assuming you use internal authentication, created the users in the CREATE USER examples, and have not yet changed the default user, the following example shows the output of LIST USERS.

Example LIST USERS; Output is: name | super -----------+------cassandra | True boone | False akers | True

158

CQL reference

spillman | False

Synopsis cqlsh> LOGIN user_name password

Synopsis Legend In the synopsis section of each statement, formatting has the following meaning: • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Use this command to change login information without requiring cqlsh to restart. Login using a specified username. If the password is specified, it will be used. Otherwise, you will be prompted to enter the password.

Examples Login as the user cutie with the password patootie. LOGIN cutie patootie

REVOKE Revoke user permissions.

Revoke user permissions.

Synopsis REVOKE ( permission_name PERMISSION ) | ( REVOKE ALL PERMISSIONS ) ON resource FROM user_name| role_name permission_name is one of these: • • • • • • •

ALL ALTER AUTHORIZE CREATE DROP MODIFY SELECT

resource is one of these: •

ALL KEYSPACES

159

CQL reference

• •

KEYSPACE keyspace_name TABLE keyspace_name.table_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description Permissions to access all keyspaces, a named keyspace, or a table can be revoked from a user. Enclose the user name in single quotation marks if it contains non-alphanumeric characters. The table in GRANT lists the permissions needed to use CQL statements:

Example REVOKE SELECT ON ravens.plays FROM boone; The user boone can no longer perform SELECT queries on the ravens.plays table. Exceptions: Because of inheritance, the user can perform SELECT queries on ravens.plays if one of these conditions is met: • • •

The user is a superuser. The user has SELECT on ALL KEYSPACES permissions. The user has SELECT on the ravens keyspace. REVOKE SELECT ON ravens.plays FROM coach;

The role coach can no longer perform SELECT queries on the ravens.plays table.

SELECT Retrieve data from a Cassandra table.

Retrieve data from a Cassandra table.

Synopsis SELECT select_expression FROM keyspace_name.table_name WHERE relation AND relation ... ORDER BY ( clustering_column ( ASC | DESC )...) LIMIT n ALLOW FILTERING select expression is: selection_list | DISTINCT selection_list | ( COUNT ( * | 1 ) ) selection_list is one of:

160

CQL reference

• •

A list of partition keys (used with DISTINCT) selector AS alias, selector AS alias, ...| * alias is an alias for a column name.

a constant: string, number, uuid, boolean, hex a bind marker (?) a function set: { literal, literal, ... }

•

list: [ literal, literal, ... ]

•

map: { literal : literal, literal : literal, ... }

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

161

CQL reference

Description A SELECT statement reads one or more records from a Cassandra table. The input to the SELECT statement is the select expression. The output of the select statement depends on the select expression: Table: Select Expression Output Select Expression

Output

Column of list of columns

Rows having a key value and collection of columns

COUNT aggregate function

One row with a column that has the value of the number of rows in the result set

DISTINCT partition key list

Values of columns that are different from other column values

WRITETIME function

The date/time that a write to a column occurred

TTL function

The remaining time-to-live for a column

Specifying columns The SELECT expression determines which columns, if any, appear in the result. Using the asterisk specifies selection of all columns: SELECT * from People; Columns in big data applications duplicate values. Use the DISTINCT keyword to return only distinct (different) values of partition keys.

Counting returned rows A SELECT expression using COUNT(*) returns the number of rows that matched the query. Alternatively, you can use COUNT(1) to get the same result. In Cassandra 2.2.1 and later, COUNT(*) or COUNT(1) can be used in conjunction with other aggregate functions or columns in a SELECT statement. Count the number of rows in the users table: SELECT COUNT(*) FROM users; Count the number of rows and get the max values in the users table: SELECT max(name), name, COUNT(*) FROM users; The capability to use an alias for a column name is particularly useful when using a function call on a column, such as dateOf(created_at), in the select expression.

SELECT event_id, dateOf(created_at), blobAsText(content) FROM timeline;

Using a column alias You can define an alias on columns using the AS keyword.

SELECT event_id, dateOf(created_at) AS creation_date, blobAsText(content) AS content FROM timeline;

162

CQL reference

In the output, columns assume the aesthetically-pleasing name.

event_id | creation_date | content -------------------------+--------------------------+---------------550e8400-e29b-41d4-a716 | 2013-07-26 10:44:33+0200 | Some stuff

Specifying rows returned using LIMIT Using the LIMIT option, you can specify that the query return a limited number of rows. SELECT COUNT(*) FROM big_table LIMIT 50000; SELECT COUNT(*) FROM big_table LIMIT 200000; The output of these statements if you had 105,291 rows in the database would be: 50000, and 105,291. The cqlsh shell has a default row limit of 10,000. The Cassandra server and native protocol do not limit the number of rows that can be returned, although a timeout stops running queries to protect against running malformed queries that would cause system instability.

Specifying the table using FROM The FROM clause specifies the table to query. Optionally, specify a keyspace for the table followed by a period, (.), then the table name. If a keyspace is not specified, the current keyspace is used. For example, count the number of rows in the IndexInfo table in the system keyspace: SELECT COUNT(*) FROM system."IndexInfo";

Filtering data using WHERE The WHERE clause specifies which rows to query. In the WHERE clause, refer to a column using the actual name, not an alias. Columns in the WHERE clause need to meet one of these requirements: • •

The partition key definition includes the column. A column that is indexed using CREATE INDEX.

The primary key in the WHERE clause tells Cassandra to race to the specific node that has the data. Put the name of the column to the left of the = or IN operator. Put the column value to the right of the operator. For example, empID and deptID columns are included in the partition key definition in the following table, so you can query all the columns using the empID in the WHERE clause: CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar, PRIMARY KEY (empID, deptID)); SELECT deptid FROM emp WHERE empid = 104; Cassandra supports these conditional operators in the WHERE clause: CONTAINS, CONTAINS KEY, IN, =, >, >=, = 1500 AND reign_start < 3000 LIMIT 10 ALLOW FILTERING; Critically, LIMIT doesn't protect you from the worst liabilities. For instance, what if there are no entries with no king? Then you have to scan the entire list no matter what LIMIT is. ALLOW FILTERING will probably become less strict as we collect more statistics on our data. For example, if we knew that 90% of entries have no king we would know that finding 10 such entries should be relatively inexpensive.

Using the IN filter condition Use IN, an equals condition operator, in the WHERE clause to specify multiple possible values for a column. For example, select two columns, first_name and last_name, from three rows having employee ids (primary key) 105, 107, or 104: SELECT first_name, last_name FROM emp WHERE empID IN (105, 107, 104); Format values for the IN conditional test as a comma-separated list. The list can consist of a range of column values. Using IN to filter on a compound or composite primary key The IN condition is recommended on the last column of the partition key only if you query all preceding columns of key for equality. For example: CREATE TABLE parts (part_type text, part_name text, part_num int, part_year text, serial_num text, PRIMARY KEY ((part_type, part_name), part_num, part_year)); SELECT * FROM parts WHERE part_type='alloy' AND part_name='hubcap' AND part_num=1249 AND part_year IN ('2010', '2015'); You can omit the equality test for clustering columns other than the last when using IN, but such a query might involve data filtering and thus may have unpredictable performance. Such a query requires use of ALLOW FILTERING. For example: SELECT * FROM parts WHERE part_num=123456 AND part_year IN ('2010', '2015') ALLOW FILTERING; CQL supports an empty list of values in the IN clause, useful in Java Driver applications when passing empty arrays as arguments for the IN clause.

When not to use IN The recommendations about when not to use an index apply to using IN in the WHERE clause. Under most conditions, using IN in the WHERE clause is not recommended. Using IN can degrade performance because usually many nodes must be queried. For example, in a single, local data center cluster with 30 nodes, a replication factor of 3, and a consistency level of LOCAL_QUORUM, a single key query goes out to two nodes, but if the query uses the IN condition, the number of nodes being queried are most likely even higher, up to 20 nodes depending on where the keys fall in the token range.

Comparing clustering columns In Cassandra 2.0.6 and later, you can group the partition key and clustering columns and compare the tuple to values for scanning a partition. For example: SELECT * FROM ruling_stewards WHERE (steward_name, king) = ('Boromir', 'Brego');

165

CQL reference

The syntax used in the WHERE clause compares records of steward_name and king as a tuple against the Boromir, Brego tuple.

Paging through unordered results The TOKEN function can be used with a condition operator on the partition key column to query. The query selects rows based on the token of their partition key rather than on their value. The token of a key depends on the partitioner in use. Use with the RandomPartitioner or Murmur3Partitioner will not give you a meaningful order. For example, assume you defined this table: CREATE TABLE periods ( period_name text, event_name text, event_date timestamp, weak_race text, strong_race text, PRIMARY KEY (period_name, event_name, event_date) ); After inserting data, this query uses the TOKEN function to find the data using the partition key. SELECT * FROM periods WHERE TOKEN(period_name) > TOKEN('Third Age') AND TOKEN(period_name) < TOKEN('Fourth Age');

Using compound primary keys and sorting results ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key. Ordering can be done in ascending or descending order, default ascending, and specified with the ASC or DESC keywords. In the ORDER BY clause, refer to a column using the actual name, not the aliases. For example, set up the playlists table, which uses a compound primary key, insert the example data, and use this query to get information about a particular playlist, ordered by song_order. You do not need to include the ORDER BY column in the select expression. SELECT * FROM playlists WHERE id = 62c36092-82a1-3a00-93d1-46196ee77204 ORDER BY song_order DESC LIMIT 50; Output is:

Or, create an index on playlist artists, and use this query to get titles of Fu Manchu songs on the playlist: CREATE INDEX ON playlists(artist) SELECT album, title FROM playlists WHERE artist = 'Fu Manchu'; Output is:

166

CQL reference

Filtering a collection set, list, or map You can query a table containing a collection to retrieve the collection in its entirety. You can also index the collection column, and then use the CONTAINS condition in the WHERE clause to filter the data for a particular value in the collection. Using a music service example, after adding the collection of tags to the playlists table, adding some tag data, and then indexing the tags, you can filter on 'blues' in the tags set. SELECT album, tags FROM playlists WHERE tags CONTAINS 'blues';

After indexing the music venue map, you can filter on map values, such as 'The Fillmore': SELECT * FROM playlists WHERE venue CONTAINS 'The Fillmore'; After indexing the collection keys in the venues map, you can filter on map keys. SELECT * FROM playlists WHERE venue CONTAINS KEY '2014-09-22 22:00:00-0700';

Filtering a map's entries You can query a table containing a collection to retrieve rows based on map entries. Only maps can be used for this option. CREATE INDEX blist_idx ON cycling.birthday_list (ENTRIES(blist)); The following query finds all cyclists who are 23 years old based on their entry in the blist map of the table birthday_list. SELECT * FROM cycling.birthday_list WHERE blist['age'] = '23';

Filtering a full frozen collection You can query a table containing a FROZEN collection (set, list, or map) to retrieve rows that have a full match of a collection's values. CREATE INDEX rnumbers_idx ON cycling.race_starts (FULL(rnumbers)); The following query finds any cyclist who has 39 Pro wins, 7 Grand Tour starts, and 14 Classic starts in a frozen list. SELECT * FROM cycling.race_starts WHERE rnumbers = [39,7,14];

Retrieving the date/time a write occurred Using WRITETIME followed by the name of a column in parentheses returns date/time in microseconds that the column was written to the database.

167

CQL reference

Retrieve the date/time that a write occurred to the first_name column of the user whose last name is Jones: SELECT WRITETIME (first_name) FROM users WHERE last_name = 'Jones'; writetime(first_name) ----------------------1353010594789000 The writetime output in microseconds converts to November 15, 2012 at 12:16:34 GMT-8

TRUNCATE Remove all data from a table.

Remove all data from a table.

Synopsis TRUNCATE keyspace_name. table_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A TRUNCATE statement results in the immediate, irreversible removal of all data in the named table. Note: The consistency level must be set to ALL prior to performing a TRUNCATE operation. All replicas must remove the data.

Examples Set the consistency level to ALL first. TRUNCATE or TRUNCATE TABLE can be used to remove all data from a named table. CONSISENTENCY ALL; TRUNCATE user_activity; CONSISTENCY ALL; TRUNCATE TABLE menu_item;

UPDATE Update columns in a row.

Update columns in a row.

Synopsis UPDATE keyspace_name.table_name USING option AND option SET assignment, assignment, ...

168

CQL reference

WHERE row_specification IF column_name = literal AND column_name = literal . . . IF EXISTS option is one of: • •

TIMESTAMP microseconds TTL seconds

assignment is one of: column_name = value set_or_list_item = set_or_list_item ( + | - ) ... map_name = map_name ( + | - ) ... map_name = map_name ( + | - ) { map_key : map_value, ... } column_name [ term ] = value counter_column_name = counter_column_name ( + | - ) integer set is: { literal, literal, . . . } list is: [ literal, literal ] map is: { literal : literal, literal : literal, . . . } term is: [ list_index_position | [ key_value ] row_specification is: primary key name = key_value primary key name IN (key_value ,...)

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description An UPDATE writes one or more column values for a given row to a Cassandra table. No results are returned. A statement begins with the UPDATE keyword followed by a Cassandra table name. The row is created if none existed before, and updated otherwise. Specify the row to update in the WHERE clause by including all columns composing the partition key. The IN relation is supported only for the last column of the partition key. The UPDATE SET operation is not valid on a primary key field. Specify other column values using SET. To update multiple columns, separate the name-value pairs using commas.

169

CQL reference

You can invoke a lightweight transaction using UPDATE: UPDATE customer_account SET customer_email='[emailprotected]' If customer_email='[emailprotected]'; Use the IF keyword followed by a condition to be met for the update to succeed. Using an IF condition incurs a performance hit associated with using Paxos internally to support linearizable consistency. In an UPDATE statement, all updates within the same partition key are applied atomically and in isolation. To update a counter column value in a counter table, specify the increment or decrement to the current value of the counter column. Unlike the INSERT command, the UPDATE command supports counters. Otherwise, the update and insert operations are identical internally. UPDATE UserActionCounts SET total = total + 2 WHERE keyalias = 523; In an UPDATE statement, you can specify these options on columns that are not counter columns: • •

TTL seconds Timestamp microseconds

TTL input is in seconds. TTL column values are automatically marked as deleted (with a tombstone) after the requested amount of time has expired. TTL marks the inserted values, not the column itself, for expiration. Any subsequent update of the column resets the TTL to the TTL specified in the update. By default, values never expire. The TIMESTAMP input is an integer representing microseconds. If not specified, the time (in microseconds) that the write occurred to the column is used. Each update statement requires a precise set of primary keys to be specified using a WHERE clause. You need to specify all keys in a table having compound and clustering columns. For example, update the value of a column in a table having a compound primary key, userid and url: UPDATE excelsior.clicks USING TTL 432000 SET user_name = 'bob' WHERE userid=cfd66ccc-d857-4e90-b1e5-df98a3d40cd6 AND url='http://google.com'; UPDATE Movies SET col1 = val1, col2 = val2 WHERE movieID = key1; UPDATE Movies SET col3 = val3 WHERE movieID IN (key1, key2, key3); UPDATE Movies SET col4 = 22 WHERE movieID = key4; CQL supports an empty list of values in the IN clause, useful in Java Driver applications when passing empty arrays as arguments for the IN clause.

Examples of updating a column Update a column in several rows at once: UPDATE users SET state = 'TX' WHERE user_uuid IN (88b8fd18-b1ed-4e96-bf79-4280797cba80, 06a8913c-c0d6-477c-937d-6c1b69a95d43, bc108776-7cb5-477f-917d-869c12dfffa8); Update several columns in a single row: UPDATE users SET name = 'John Smith', email = '[emailprotected]' WHERE user_uuid = 88b8fd18-b1ed-4e96-bf79-4280797cba80;

170

CQL reference

Updating a counter column You can increase or decrease the value of a counter column by an arbitrary numeric value though the assignment of an expression that adds or subtracts the value. To update the value of a counter column, use the syntax shown in the following example: UPDATE counterks.page_view_counts SET counter_value = counter_value + 2 WHERE url_name='www.datastax.com' AND page_name='home'; To use a lightweight transaction on a counter column to ensure accuracy, put one or more counter updates in the batch statement.

Updating a collection set To add an element to a set, use the UPDATE command and the addition (+) operator: UPDATE users SET emails = emails + {'[emailprotected]'} WHERE user_id = 'frodo'; To remove an element from a set, use the subtraction (-) operator. UPDATE users SET emails = emails - {'[emailprotected]'} WHERE user_id = 'frodo'; To remove all elements from a set, you can use the UPDATE statement: UPDATE users SET emails = {} WHERE user_id = 'frodo';

Updating a collection map To set or replace map data, you can use the UPDATE command. Enclose the timestamp and text values in map collection syntax: strings in curly brackets, separated by a colon. UPDATE users SET todo = { '2012-9-24' : 'enter mordor', '2012-10-2 12:00' : 'throw ring into mount doom' } WHERE user_id = 'frodo'; You can also update or set a specific element using the UPDATE command. For example, update a map named todo to insert a reminder, 'die' on October 2 for user frodo. UPDATE users SET todo['2014-10-2 12:10'] = 'die' WHERE user_id = 'frodo'; You can set the a TTL for each map element: UPDATE users USING TTL SET todo['2012-10-1'] = 'find water' WHERE user_id = 'frodo'; In Cassandra 2.1.1 and later, you can update the map by adding one or more elements separated by commas: UPDATE users SET todo = todo + { '2012-10-1': 'find water', '2014-12-15': 'buy presents' } where user_id = 'frodo'; You can remove elements from a map in the same way using - instead of +.

171

CQL reference

Using a collection list To insert values into the list. UPDATE users SET top_places = [ 'rivendell', 'rohan' ] WHERE user_id = 'frodo'; To prepend an element to the list, enclose it in square brackets, and use the addition (+) operator: UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo'; To append an element to the list, switch the order of the new element data and the list name in the UPDATE command: UPDATE users SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo'; To add an element at a particular position, use the list index position in square brackets: UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo'; To remove all elements having a particular value, use the UPDATE command, the subtraction operator (-), and the list value in square brackets: UPDATE users SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo'; To update data in a collection column of a user-defined type, enclose components of the type in parentheses within the curly brackets, as shown in "Using a user-defined type."

USE Connect the client session to a keyspace.

Connect the client session to a keyspace.

Synopsis USE keyspace_name

Synopsis Legend • • • • • •

Uppercase means literal Lowercase means not literal Italics mean optional The pipe (|) symbol means OR or AND/OR Ellipsis (...) means repeatable Orange ( and ) means not literal, indicates scope

A semicolon that terminates CQL statements is not included in the synopsis.

Description A USE statement identifies the keyspace that contains the tables to query for the current client session. All subsequent operations on tables and indexes are in the context of the named keyspace, unless otherwise specified or until the client connection is terminated or another USE statement is issued. To use a case-sensitive keyspace, enclose the keyspace name in double quotation marks.

172

Tips for using DataStax documentation

Example USE PortfolioDemo; Continuing with the example of checking created keyspaces: USE "Excalibur";

Tips for using DataStax documentation Describes navigation icons and provides search tips and links to other resources.

Navigating the documents To navigate, use the Contents pane or search. Additional controls are: Toolbar icons Go back through the topics as listed in the Contents pane. Go forward through the topics as listed in the Contents pane. See doc tweets and provide feedback.

Display PDF version.

Send email to DataStax docs. Print page. Contents, bookmarking, and legend icons Opens Contents items. Also expands and collapses text, such as Synopsis and Nodetool legends. Closes the Contents items. Appears on headings for bookmarking. Right-click to get the link.

Searching documents Search is designed for each product guide. For example, if searching in DataStax Enterprise 4.8, the results include topics from DataStax Enterprise 4.8, Cassandra 2.1, and CQL 3.1. The results are displayed in tabs:

173

Tips for using DataStax documentation

Other resources You can find more information and help at: • • • • • • •

174

Documentation home page DataStax Academy Datasheets Webinars Whitepapers DataStax Developer Blogs Support