Introducing Comega
by Dare Obasanjo
|
Pages: 1, 2
Query Operators in Cω
Cω adds two broad classes of query operators to the C# language: XPath-based operators for querying the member variables of an object by name or by type and SQL-based operators for performing sophisticated queries involving projection, grouping, and joining of data from one or more objects.
XPath-Based Operators
With the existence of streams and anonymous structs that can have
multiple members with the same name, even ordinary direct member
access with the . operator in Cω can be
considered a query operation. For example, the
operation books.Book.title from the previous section
returns the titles of all the Book objects contained
within the Books class. This is equivalent to the XPath
query /Books/Book/title which returns all the titles
of Book elements contained within
the Books element.
The wildcard member access operator .* is used to
retrieve all the fields of a type. This operator is equivalent to
the XPath query child::* which returns all the child
nodes of the principal node type of the current node. For example,
the operation books.Book.* returns a stream container
of all the members of all the Book objects contained
within the Books class. This is equivalent to the XPath
query /Books/Book/* which returns all the child
elements of the Book elements contained within
the Books element.
Cω also supports transitive member access using
the ... operator which is analogous to the
descendent-or-self axis or ..
abbreviated path in XPath. The operation
books...title returns a stream containing
all title member fields that are contained within the
Books class or are member fields of any of its
contents in a recursive manner. This is equivalent to the XPath
query /Books//title which returns
all title elements that are descendants of
the Books element. The transitive member access
operator can also be used to match nodes restricted to a
particular type using syntax of the form
...typename::*. For example, the
operation books...string::* returns a stream
containing all the member fields of
type System.String that are contained within
the Books class or are member fields of any of its
contents in a recursive manner. This is analogous to the XPath 2.0
query
/Books//element(*, xs:string) which matches
any descendent of the Books element of
type xs:string.
Filter operations can be applied to the results of a transitive
member access in the same manner that predicates can be used to
filter XPath queries. Just as in XPath, a Cω filter is applied
to a query operation by using the [expression]
operator placed after the query. As is the case with
apply-to-all-expressions, the filter may contain the special
variable it which is bound to each successive element
of the iterated stream. Below is an example, which queries all the
fields of type System.Int32 in an anonymous struct then
filters the results to those whose value is greater than 8.
struct {int a; int b; int c;} z = new {a=5, b=10, c= 15};
int* values = z...int::*[it > 8];
foreach(int i in values){
Console.WriteLine(i + " is greater than 8");
}
SQL-Based Operators
Cω includes a number of constructs from the SQL language as keywords. Operators for performing selection with projection, filtering, ordering, grouping, and joins are all built into Cω. The SQL operators can be applied to both in-memory objects and relational stores that can be accessed via ADO.NET. When applied to a relational database, the Cω query operators are converted to SQL queries over the underlying store. The primary advantage of using the SQL operators from the Cω language is that the query syntax and results can be checked at compile time instead of at runtime, as is the case with embedding SQL expressions in strings using traditional relational APIs.
To connect to a SQL database in Cω, it must be exposed as a
managed assembly (i.e. a .NET library file), which is then referenced
by the application. A relational database can be exposed to a Cω
as a managed assembly either by using the sql2comega.exe
command line tool or the
Add Database Schema... dialog from within Visual
Studio. Database objects are used by Cω to
represent the relational database hosted by the
server. A Database object has a public property for each
table or view and a method for each table-valued function found in the
database. To query a relational database, a table, view, or
table-valued function must be specified as input to one or more of
the SQL-based operators.
The following sample program and output shows some of the
capabilities of using the SQL-based operators to query a relational
database in Cω. The database used in this example is the sample
Northwind database that comes with Microsoft SQL Server. The
name DB used in the example refers to a global instance
of a Database object in the Northwind namespace of
the Northwind.dll assembly generated
using sql2comega.exe.
using System;
using System.Data.SqlTypes;
using Northwind;
class Test {
static void Main() {
// The foreach statement can infer the type of the
iteration variable 'row'
// statically evaluating the result type of the select expression
foreach( row in select ContactName from DB.Customers ) {
Console.WriteLine("{0}", row.ContactName);
}
}
}
The following sample program and output shows some of the capabilities of using the SQL-based operators to query in-memory objects in Cω:
Code Sample
using Microsoft.Comega;
using System;
public class Test{
enum CDStyle {Alt, Classic, HipHop}
static struct{ string Title; string Artist; CDStyle Style; int Year;}* CDs =
new{new{ Title="Lucky Frog", Artist="Holly Holt", Style=CDStyle.Alt, Year=2001},
new{ Title="Kamikaze", Artist="Twista", Style=CDStyle.HipHop, Year=2004},
new{ Title="Stop Light Green", Artist="Robert O'Hara",
Style=CDStyle.Alt, Year=1981},
new{ Title="Noctures", Artist="Chopin", Style=CDStyle.Classic, Year=1892},
new{ Title="Mimosa!", Artist="Brian Groth", Style=CDStyle.Alt, Year=1980},
new {Title="Beg For Mercy", Artist="G-Unit", Style=CDStyle.HipHop, Year=2003}
};
public static void Main(){
struct { string Title; string Artist;}* results;
Console.WriteLine("QUERY #1: select Title, Artist from CDs where Style ==
CDStyle.HipHop");
results = select Title, Artist from CDs where Style == CDStyle.HipHop;
results.{ Console.WriteLine("Title = {0}, Artist = {1}",
it.Title, it.Artist); };
Console.WriteLine();
struct { string Title; string Artist; int Year;}* results2;
Console.WriteLine("QUERY #2: select Title, Artist, Year from CDs order by Year");
results2 = select Title, Artist, Year from CDs order by Year;
results2.{ Console.WriteLine("Title = {0}, Artist = {1}, Year = {2}", it.Title,
it.Artist, it.Year); };
}
}
Output
QUERY #1: select Title, Artist from CDs where Style == CDStyle.HipHop
Title = Kamikaze, Artist = Twista
Title = Beg For Mercy, Artist = G-Unit
QUERY #2: select Title, Artist, Year from CDs order by Year
Title = Noctures, Artist = Chopin, Year = 1892
Title = Mimosa!, Artist = Brian Groth, Year = 1980
Title = Stop Light Green, Artist = Robert O'Hara, Year = 1981
Title = Lucky Frog, Artist = Holly Holt, Year = 2001
Title = Beg For Mercy, Artist = G-Unit, Year = 2003
Title = Kamikaze, Artist = Twista, Year = 2004
A number of operations that require tedious nested loops can be processed in a straightforward manner using the declarative SQL-like operators in Cω. Provided below is a brief description of the major classes of SQL operators included in Cω.
Projection
The projection of the select expression is the list of expressions
following the select keyword. The projection is executed
once for each row specified by the from clause. The
job of the projection is to shape the resulting rows of data into rows
containing only the columns required. The simplest form of the select
command consists of the select keyword followed by a
projection list of one or more expressions identifying columns from
the source, followed by the from keyword and then an
expression identifying the source of the query. For example,
rows = select ContactName, Phone from DB.Customers;
foreach( row in rows ) {
Console.WriteLine("{0}", row.ContactName);
}
In the above example, the type designator for the results of the select query are not specified. The Cω compiler automatically infers the correct type. The actual type of an individual result row is a Cω tuple type. One can specify the result type directly using a tuple type (i.e. an anonymous struct) and the asterisk (*) to designate a stream of results. For example,
struct{SqlString ContactName;}* rows =
select ContactName from DB.Customers;
struct{SqlString ContactName; SqlString Phone;}* rows =
select ContactName, Phone from DB.Customers;
Filtering
The results of a select expression can be filtered using one of
three keywords; distinct,
top and where. The distinct keyword is used to
restrict the resulting rows to only unique values. The top
keyword is used to restrict the total number of rows produced by the
query. The keyword top is followed by a constant numeric
expression which specifies the number of rows to return. One can also
create a distinct top selection which restricts the total
number of unique rows returned by the query. The where clause
is used to specify a Boolean expression for filtering the rows
returned by the query source. Rows where the expression evaluates to
true are retained, while the rest are discarded. The example below
shows all three filter operators in action:
select distinct top 10 ContactName from DB.Customers where City == "London";
Ordering
The resulting rows from a select expression can be sorted by using
the order by clause. The order by clause is always the last
clause of the select expression, if it is specified at all.
The order by clause consists of the two keywords order
by followed immediately by a comma separated list of expressions
that define the values that will determine the order. The first
expression in the list defines the ordering criteria with the most
precedence. It is also possible to specify whether each expression
should be considered in ascending or descending order. The default for
all expressions is to be considered in ascending order. The example
below shows the order by clause in action.
rows = select ContactName, Phone from DB.Customers
order by ContactName desc, Phone asc;
Grouping
Values can be aggregated across multiple rows using the group
by clause and the built-in aggregate functions. The group
by clause enables one to specify how different rows are actually
related so they can be grouped together. Aggregate functions can then
be applied to the columns to compute values over the group. An
aggregate function is a function that computes a single value from a
series of inputs, such as computing the sum or average of a series of
numbers. There are six aggregate functions built into Cω. They
are Count, Min, Max, Avg, Sum and Stddev. To use these functions in a
query, one must first import the System.Query namespace. The
following example shows how to use the group by clause and the
built-in aggregate functions.
rows = select Country, Count(Country) from DB.Customers
group by Country;
This example uses an aggregate to produce the set of all countries and
the count of customers within each country. The Count()
aggregate tallies the number of items in the group.
Aggregates can be used in all clauses evaluated after the group
by clause. The projection list is evaluated after the group
by clause, even though it is specified earlier in the query. A
consequence of this is that aggregate functions can not be applied in
the where clause. However, it is still possible to filter
grouped rows by using the having clause. The having
clause acts just like the where clause, except that it is evaluated
after the group by clause. The following example shows how
the having clause is used.
rows = select Country, Count(Country) from DB.Customers
group by Country
having Count(Country) > 1;
Joins
Select queries can be used to combine results from multiple tables. A
SQL joining is a Cartesian product of one or more tables where each
row from one table is paired up with each row from another table. The
full Cartesian product consists of all such pairings. To select
multiple sources whose data should be joined to perform a query,
the from clause can actually contain a comma-separated list of
source expressions, each with its own iteration alias. The following
example pairs up all Customer rows with their corresponding Order
rows, and produces a table listing the customer's name and the shipping
date for the order.
rows = select c.ContactName, o.ShippedDate
from c in DB.Customers, o in DB.Orders
where c.CustomerID == o.CustomerID;
Cω also supports more sophisticated table-joining semantics from
the SQL world including
inner join, left join, right join, and outer
join using the corresponding keywords. Explanations of the
semantics of the various kinds of joins is available at the
W3Schools tutorial
on SQL JOIN. The following example shows a select expression that
uses the inner join keyword:
rows = select c.ContactName, o.ShippedDate
from c in DB.Customers
inner join o in DB.Orders
on c.CustomerID == o.CustomerID;
Data Modification
The relational data access capabilities of Cω are not limited
to querying data. One can also insert new rows into a table using
the insert command, modify existing rows within a table using
the update command, or delete rows from a table using
the delete command.
The insert command is an expression that evaluates to the
number of successful inserts that have been made to the table as a
result of executing the command. The following example inserts a new
customer in the Customers table:
int n = insert CustomerID = "ABCDE", ContactName="Frank", CompanyName="Acme"
into DB.Customers;
The same effect can be obtained by using an anonymous struct as opposed to setting each field directly as shown below:
row = new{CustomerID = "ABCDE", ContactName="Frank", CompanyName="Acme"};
int n = insert row into DB.Customers;
The update command is an expression that evaluates to the
number of rows that were successfully modified as a result of
executing the command. The following example shows how to do a global
replace of all misspelled references to the city "London":
int n = update DB.Customers
set City = "London"
where Country == "UK" && City == "Lundon";
It is also possible to modify all rows in the table by omitting the
where clause. The delete command is an expression that
evaluates to the number of rows that were successfully deleted as a
result of executing the command. The following example deletes all the
orders for customers in London:
int n = delete o from c in DB.Customers
inner join o in DB.Orders
on c.CustomerID == o.CustomerID
where c.City == "London";
Most applications that use insert, update, and delete expressions
will also use transactions to guarantee
ACIDity
(atomicity, consistency, isolation, and durability) of one or
more changes to the database. The Cω language includes
a transact statement that promotes the notion of initiating and
exiting transactions in a programming language
feature. The transact statement is a statement bounding a block
of code associated with a database transaction. If the code executes
successfully and attempts to exit the block, the transaction will
automatically be committed. If the block is exited due to a thrown
exception, the transaction will automatically be aborted. The developer
can also specify handlers that execute some user-defined code once the
transact block is exited using the commit and rollback
handlers. The following example shows transactions in action:
transact(DB) {
delete from DB.Customers where CustomerID == "ALFKI";
}
commit {
Console.WriteLine("commited");
}
rollback {
Console.WriteLine("aborted");
}
Using XML Literals in Cω
In Cω one can construct object instances using XML syntax. This feature is modelled after the ability to construct elements in languages like XQuery and XSLT. Like XQuery and XSLT, the XML literals can contain embedded code for constructing values. However, since Cω is statically typed, the names of members and types must be known at compile time and thus can not be constructed dynamically.
One can control the shape of the XML using a
number of constructs. One can specify that fields should be treated
as attributes in the XML literal with the use of
the attribute modifier on the member
declaration. Similarly, choice types and anonymous structs are
treated as the children of the XML element that maps to the content
class. The following example shows how to initialize a content class
from an XML literal:
using Microsoft.Comega;
using System;
public class NewsItem{
attribute string title;
attribute string author;
struct { DateTime date; string body; }
public static void Main(){
NewsItem news = <NewsItem title="Hello World" author="Dare Obasanjo">
<date>{DateTime.Now}</date>
<body>I am the first post of the New Year.</body>
</NewsItem>;
Console.WriteLine(news.title + " by " + news.author + " on " + news.date);
}
}
XML literals are intended to make the process of constructing strongly
typed XML much simpler. Consider the following XQuery example taken
from the W3C XQuery Use cases document. It iterates over a
bibliogrpahy that contains a number of books. For each book in the
bibliography, it lists the title and authors, grouped inside
a result element.
for $b in $bs/book
return
<result>
{$b/title}
{$b/author}
</result>
Performing the equivalent task in Cω looks very similar.
foreach (b in bs.book){
yield return <result>
{b.title}
{b.author}
</result>
}
Conclusion
The Cω language is an interesting attempt to bridge the impedence mismatches involved in typical enterprise development efforts when crossing the boundaries of the relational, object oriented, and XML worlds. A number of the ideas in the language have taken hold in academia and within Microsoft as is evidenced by comments by Anders Hejlsberg about the direction of C# 3.0. Until then, developers can explore what it means to program in a more data-oriented programming language by downloading the Cω compiler or Visual Studio.NET plugin
Acknowledgements
I'd like to thank Don Box, Mike Champion, Denise Draper, and Erik Meijer for their ideas and feedback while writing this article.
- SQL Features
2005-01-20 07:33:31 Len Bullard - That's Not The XPath .. I Know
2005-01-16 01:39:16 Richard Cox