-
Notifications
You must be signed in to change notification settings - Fork 1
Indexes and Queries
Here is our simple object graph again:
[Serializable]
public partial class Customer
{
public Customer()
{
Contacts = new List<Contact>();
}
public int Number { get; set; }
public string Name { get; set; }
public IList<Contact> Contacts { get; private set; }
}
[Serializable]
public class Contact
{
public string GivenName { get; set; }
public string Initial { get; set; }
public string FamilyName { get; set; }
public DateTime DateOfBirth { get; set; }
}
We can create an index over our customer object. In this case we index the customer number. This is analogous to the primary key.
public class CustomersByNumber : IIndex<Customer, int>
{
/// <summary>
/// For each Customer we yield one value for the customer number.
/// </summary>
public IEnumerable<int> Yield(Customer customer)
{
yield return customer.Number;
}
}
This will produce an index of customer numbers, where each customer number is pointing to the related persisted customer instance.
The Yield method declared on IIndex<TGraph,TIndex>
returns IEnumerable<TIndex>
. This allows us to return multiple index entries for a single graph.
The following index yields multiple keys, one per contact for the customer. The index is always against the graph root, rather than a member (i.e. customer, not the contact).
public class CustomersByContactFamilyName : IIndex<Customer, string>
{
public IEnumerable<string> Yield(Customer customer)
{
return customer.Contacts.Select(contact => contact.FamilyName.ToUpper());
}
}
Indexes can be against supertypes or interfaces, which provides for some powerful polymorphic behaviour in queries. TODO Demo this.
This index is returning aggregate information for a customer, in this case the number of contacts.
public class CustomersByNumberOfContacts : IIndex<Customer, int>
{
public IEnumerable<int> Yield(Customer customer)
{
yield return customer.Contacts.Count();
}
}
We need to register indexes in addition to graphs. We register instances of indexes (rather than types as we do for graphs). The instances are used at runtime by Stash, so ensure that indexes are stateless and do not hold on to resources.
Kernel.Kickstart(
new BerkeleyBackingStore(new DefaultBerkeleyBackingStoreEnvironment(TempDir)),
register =>
{
register.Graph<Customer>();
register.Index(new CustomersByNumber());
register.Index(new CustomersByContactFamilyName());
register.Index(new CustomersByNumberOfContacts());
});
We can persist some customers to demonstrate the use of these indexes in queries.
var session = Kernel.SessionFactory.GetSession();
var customerStash = session.GetStashOf<Customer>();
var customer1 = new Customer {Number = 5, Name = "Acme Tackle"};
customer1.Contacts.Add(new Contact {GivenName = "Bob", FamilyName = "Smith"});
customer1.Contacts.Add(new Contact {GivenName = "Jane", FamilyName = "Jones"});
var customer2 = new Customer {Number = 20, Name = "Waldo Robotics"};
customer2.Contacts.Add(new Contact {GivenName = "Henry", FamilyName = "Dangerfield"});
customer2.Contacts.Add(new Contact {GivenName = "Roberta", FamilyName = "Williams"});
customer2.Contacts.Add(new Contact {GivenName = "Fred", FamilyName = "Smith"});
var customer3 = new Customer {Number = 1, Name = "Spam4U"};
customer3.Contacts.Add(new Contact {GivenName = "Dick", FamilyName = "Dastardly"});
customerStash.Endure(customer1);
customerStash.Endure(customer2);
customerStash.Endure(customer3);
session.Complete();
Stash is automatically indexing the graphs as we endure them. Once we have stashed the graphs, within a session we can get a stashed set and match against these indexes.
Lets explore some of the operators offered by Stash:
[Fact]
public void we_can_get_customers_by_their_number_of_contacts()
{
var customers = session.GetStashOf<Customer>();
var theSpammer =
customers
.Matching(_ => _.Where<CustomersByNumberOfContacts>().EqualTo(1))
.Single();
theSpammer.Number.ShouldEqual(1);
theSpammer.Name.ShouldEqual("Spam4U");
}
[Fact]
public void we_can_get_customers_with_more_than_one_contact()
{
var customersWithManyContacts =
session.GetStashOf<Customer>()
.Matching(_ => _.Where<CustomersByNumberOfContacts>().GreaterThan(1));
customersWithManyContacts.ShouldHaveCount(2);
}
[Fact]
public void we_can_get_customers_with_contacts_having_the_family_name_Smith()
{
var customersEmployingSmiths =
session.GetStashOf<Customer>()
.Matching(_ => _.Where<CustomersByContactFamilyName>().EqualTo("SMITH"));
customersEmployingSmiths.ShouldHaveCount(2);
}
[Fact]
public void we_can_get_customers_with_contacts_having_the_family_name_starting_with_Da()
{
var customersWithEmployeesHavingNamesStartingDa =
session.GetStashOf<Customer>()
.Matching(_ => _.Where<CustomersByContactFamilyName>().StartsWith("DA"));
customersWithEmployeesHavingNamesStartingDa.ShouldHaveCount(2);
}
[Fact]
public void we_can_get_customers_having_between_1_and_2_contacts()
{
var customersWithOneOrTwoContacts =
session.GetStashOf<Customer>()
.Matching(_ => _.Where<CustomersByNumberOfContacts>().Between(1, 2));
customersWithOneOrTwoContacts.ShouldHaveCount(2);
}
[Fact]
public void we_can_join_customers_with_contacts_having_the_family_name_starting_with_Da_and_between_1_and_2_contacts()
{
var customersWithOneOrTwoContactsAndEmployeesHavingNamesStartingDa =
session.GetStashOf<Customer>()
.Matching(
_ =>
_.IntersectionOf(
_.Where<CustomersByContactFamilyName>().StartsWith("DA"),
_.Where<CustomersByNumberOfContacts>().Between(1, 2)
)
)
.ToList();
customersWithOneOrTwoContactsAndEmployeesHavingNamesStartingDa.ShouldHaveCount(1);
customersWithOneOrTwoContactsAndEmployeesHavingNamesStartingDa.Single().Number.ShouldEqual(1);
}
We have introduced something quite subtle in this last example. Take notice of the ToList() method, which materialises the query (i.e. executes the deferred queries). Without the ToList(), each assertion below would re-execute the query. Stash is designed to work with Linq to Objects (i.e. StashedSet implements IEnumerable). As with any Linq expression, execution is deferred until an operation enumerates the results. The explicit materialisation avoids the unintended re-execution of the entire query for each assertion.
This concept is particularly important as some queries are potentially expensive. To make this more obvious we have introducted the Materialize()
extension method (which is simply a wrapper around ToList()). Use Materialize rather than ToList() as it makes the intent of your code clearer.
[Fact]
public void we_can_union_customers_with_contacts_having_the_family_name_starting_with_Da_and_more_than_2_contacts()
{
var customersWithMoreThanOneContactAndEmployeesHavingNamesStartingDa =
session.GetStashOf<Customer>()
.Matching(
_ =>
_.UnionOf(
_.Where<CustomersByContactFamilyName>().StartsWith("DA"),
_.Where<CustomersByNumberOfContacts>().GreaterThan(2))
)
.Materialize();
customersWithMoreThanOneContactAndEmployeesHavingNamesStartingDa.ShouldHaveCount(2);
customersWithMoreThanOneContactAndEmployeesHavingNamesStartingDa.ShouldContain(_ => _.Name == "Waldo Robotics");
customersWithMoreThanOneContactAndEmployeesHavingNamesStartingDa.ShouldContain(_ => _.Name == "Spam4U");
}
In the following example we make use of the deferred nature of the StashedSet by building a base query and then extending it into two distinct queries which are then separately materialised. This behaviour is useful for building specifications dynamically at runtime.
[Fact]
public void we_need_the_call_to_Materialize_as_we_can_extend_our_queries_by_adding_additional_matching_clauses()
{
var withTwoOrMoreContacts =
session.GetStashOf<Customer>()
.Matching(_ => _.Where<CustomersByNumberOfContacts>().GreaterThanEqual(2));
var twoOrMoreAndEmployingSmith =
withTwoOrMoreContacts
.Matching(_ => _.Where<CustomersByContactFamilyName>().EqualTo("SMITH"))
.Materialize();
var twoOrMoreAndCustomerNumberGreaterThan10 =
withTwoOrMoreContacts
.Matching(_ => _.Where<CustomersByNumber>().GreaterThan(10))
.Materialize();
twoOrMoreAndEmployingSmith.ShouldHaveCount(2);
twoOrMoreAndCustomerNumberGreaterThan10.ShouldHaveCount(1);
}