Wednesday, 18 October 2017

Infinispan coming to Basel One 2017!


Tomorrow, 19th October, I'll be presenting at Basel One 2017 a talk on "Streaming Data Analysis with Kubernetes", where I'll be combining OpenShift, Infinispan and Vert.x to demonstrate how to process streaming data.

If you're in Basel and want to find out more about these technologies, make sure you come! The talk is at 16:00 :D

Cheers,
Galder

Friday, 6 October 2017

Infinispan 9.2.0.Alpha1 released

Dear Infinispan Community,

Today we continue with our time-boxed releases, this time on 9.2 release branch! We prepared The Infinispan 9.2.0.Alpha1 for you. As usual, it can be found on our download page.

9.2.0.Alpha1 brings Infinispan MultimapCache - a new distributed and clustered collection type that maps keys to values in which each key can contain multiple values. We rolled out support for embedded cache mode, but upcoming releases will have support for other Infinispan modes, including client invocation through hotrod.

Aside from MultimapCache we also include the usual slew of bug fixes, clean ups, and general improvements. Full details of the new features and enhancements included in this release can be found here.

We have a lot more exciting features coming up on Infinispan 9.2 branch. Thank you for following us and stay tuned!

The Infinispan Team

Monday, 2 October 2017

Better Late than Never: Remote Cache collections

One of the main benefits of Infinispan extending the java.util.Map interface when we introduced our Cache interface  was that users would immediately be able to use a well established and familiar API.

The unfortunate thing about this relationship is that now the Cache interface also has to implement all of the other methods such as keySet, values and entrySet. Originally Infinispan didn't implement these collections or returned an immutable copy (requiring all elements to be in memory). Neither choice is obviously desirable.

This all changed with ISPN-4836 which provided backing implementations of keySet, values and entrySet collections. This means that all methods were now provided and would keep up to date with changes to the underlying Cache and updates to these collections would be persisted down to the Cache. The implementation also didn't keep a copy of all contents and instead allowed for memory efficient iteration. And if the user still wanted to use a copy they could still do that, by iterating over the collection and copying themselves. This later spring boarded our implementation of Distributed Stream as well.

The problem was that the RemoteCache was left in the old state, where some things weren't implemented and others were copies just like how embedded caches used to be.
Well I can now gladly say with the release of Infinispan 9.1 that RemoteCache now has backing implementations of keySet, values and entrySet implemented via ISPN-7900. Thus these collections support all methods on these collections and are backed by the underlying RemoteCache.

Unfortunately the Stream methods on these collections are not distributed like embedded, but we hope to someday improve that as well. Instead these streams must iterate over the cache to perform the operations locally. By default these will pull 10,000 entries at a time to try to make sure that memory is not overburdened on the client. If you want to decrease this number (less memory - lower performance) or increase (more memory - higher performance) you can tweak this by changing the batchSize parameter via ConfigurationBuilder or infinispan.client.hotrod.batch_Size if you use a property based file.

You can read more about this and the remote iterator which drives these collections on our user guide.

We hope you find that this improves your usage of RemoteCaches in the future by allowing you to have backed collections that also allow you to use the improvements of Java 8 with Streams.

If you have yet you can acquire Infinispan 9.1.1 or the latest stable version at http://infinispan.org/download/

Wednesday, 27 September 2017

JGroups workshops in Rome and Berlin in November

I'm happy to announce that I've revamped the JGroups workshop and I'll be teaching 2 workshops in November: Rome Nov 7-10 and Berlin Nov 21-24.

The new workshop is 4 days (TUE-FRI). I've updated the workshop to use the latest version of JGroups (4.0.x), removed a few sections and added sections on troubleshooting/diagnosis and handling of network partitions (split brain).

Should be a fun and hands-on workshop!

To register and for the table of contents, navigate to http://www.jgroups.org/workshops.html

To get the early-bird discount, use code EARLYBIRD.

Cheers,

Bela

Wednesday, 20 September 2017

Infinispan 9.1.1.Final is out

Dear Infinispan users,

after 9.1.0.Final we decided to go on a bug-squashing spree before embarking on the feature work that will be part of 9.2.

So, fresh out of the build baking machine is 9.1.1.Final. You can peruse the list of fixed bugs and you can get the bits from our download page.

Enjoy !

Friday, 15 September 2017

DevNation Live video and slides available now!


Last week I gave a talk for DevNation Live on how to handle big data with Infinispan. The recording of the talk is available now from here, and the slides are here too.

In case you missed the talk, don't miss the opportunity to see what Infinispan can do with big data!

Cheers,
Galder

Sunday, 10 September 2017

Multi-tenancy - Infinispan as a Service (also on OpenShift)

In recent years Software as a Service concept has gained a lot of traction. I'm pretty sure you've used it many times before. Let's take a look at a practical example and explain what's going on behind the scenes.

Practical example - photo album application

Imagine a very simple photo album application hosted within the cloud. Upon the first usage you are asked to create an account. Once you sign up, a new tenant is created for you in the application with all necessary details and some dedicated storage just for you. Starting from this point you can start using the album - download and upload photos. 

The software provider that created the photo album application can also celebrate. They have a new client! But with a new client the system needs to increase its capacity to ensure it can store all those lovely photos. There are also other concerns - how to prevent leaking photos and other data from one account into another? And finally, since all the content will be transferred through the Internet, how to secure transmission?

As you can see, multi-tenancy is not that easy as it would seem. The good news is that if it's properly configured and secured, it might be beneficial both for the client and for the software provider. 

Multi-tenancy in Infinispan

Let's think again about our photo album application for a moment. Whenever a new client signs up we need to create a new account for him and dedicate some storage. Translating that into Infinispan concepts this would mean creating a new CacheContainer. Within a CacheContainer we can create multiple Caches for user details, metadata and photos. You might be wondering why creating a new Cache is not sufficient? It turns out that when a Hot Rod client connects to a cluster, it connects to a CacheContainer exposed via a Hot Rod Endpoint. Such a client has access to all Caches. Considering our example, your friends could possibly see your photos. That's definitely not good! So we need to create a CacheContainer per tenant. Before we introduced Multi-tenancy, you could expose each CacheContainer using a separate port (using separate Hot Rod Endpoint for each of them). In many scenarios this is impractical because of proliferation of ports. For this reason we introduced the Router concept. It allows multiple clients to access their own CacheContainers through a single endpoint and also prevents them from accessing data which doesn't belong to them. The final piece of the puzzle is transmitting sensitive data through an unsecured channel such as the Internet. The use of TLS encryption solves this problem. The final outcome should look like the following:


The Router component on the diagram above is responsible for recognizing data from each client and redirecting it to the appropriate Hot Rod endpoint.
As the name implies, the router inspects incoming traffic and reroutes it to the appropriate underlying CacheContainer. To do this it can use two different strategies depending on the protocol: TLS/SNI for the Hot Rod protocol, matching each server certificate to a specific cache container  and path prefixes for REST.
The SNI strategy detects the SNI Host Name (which is used as tenant) and also requires TLS certificates to match. By creating proper trust stores we can match which tenant can access which CacheContainers.
URL path prefix is very easy to understand, but it is also less secure unless you enable authentication. For this reason it should not be used in production unless you know what you are doing (the SNI strategy for the REST endpoint will be implemented in the near future). Each client has its own unique REST path prefix that needs to be used for accessing the data (e.g. http://127.0.0.1:8080/rest/client1/fotos/2).

Confused? Let's clarify this with an example.

Foto application sample configuration

The first step is to generate proper key/trust stores for the server and client:


The next step is to configure the server. The snippet below shows only the most important parts:


Let's analyze the most critical lines:
  • 7, 15 - We need to add generated key stores to the server identities
  • 25, 30 - It is highly recommended to use separate CacheContainers
  • 38, 39 - A Hot Rod connector (but without socket binding) is required to provide proper mapping to CacheContainer. You can also use many useful settings on this level (like ignored caches or authentication).
  • 42 - Router definition which binds into default Hot Rod and REST ports.
  • 44 - 46 - The most important bit which states that only a client using SSLRealm1 (which uses trust store corresponding to client_1_server_keystore.jks) and TLS/SNI Host name client-1 can access Hot Rod endpoint named multi-tenant-hotrod-1 (which points to CacheContainer multi-tenancy-1).

Improving the application by using OpenShift

Hint: You might be interested in looking at our previous blog posts about hosting Infinispan on OpenShift. You may find them at the bottom of the page.

So far we've learned how to create and configure a new CacheContainer per tenant. But we also need to remember that system capacity needs to be increased with each new tenant. OpenShift is a perfect tool for scaling the system up and down. The configuration we created in the previous step almost matches our needs but needs some tuning.

As we mentioned earlier, we need to encrypt transport between the client and the server. The main disadvantage is that OpenShift Router will not be able to inspect it and take routing decisions. A passthrough Route fits perfectly in this scenario but requires creating TLS/SNI Host Names as Fully Qualified Application Names. So if you start OpenShift locally (using oc cluster up) the tenant names will look like the following: client-1-fotoalbum.192.168.0.17.nip.io

We also need to think how to store generated key stores. The easiest way is to use Secrets:


Finally, a full DeploymentConfiguration:



If you're interested in playing with the demo by yourself, you might find a working example here. It mainly targets OpenShift but the concept and configuration are also applicable for local deployment.

Links

Wednesday, 6 September 2017

Join us online on 7th September for DevNation talk on Infinispan!


Tomorrow, 7th of September at 12:00pm EDT, I will be doing an online live tech talk for Red Hat DevNation on showing how to handle big data with Infinispan.

If you didn't manage to attend my talk at Great Indian Developer Summit or Berlin Buzzwords, this is your chance to see it live and direct, with live coding demos included!

You can register to see the talk live visiting the Red Hat DevNation Live site.

Cheers,
Galder

Friday, 21 July 2017

Cluster Counter

In Infinispan 9.1 we introduce the clustered counters. It is a counter distributed and shared among all nodes in the cluster and today we are going to know more about it.

Installation

To use a counter in your Infinispan cluster, first you have to add the maven dependency to your project. As you can see, it is simple as doing:

After adding the module, you can retrieve the CounterManager and start creating and using counters.

CounterManager

Each CounterManager is associated to a CacheManager. But, before showing how to use it, first we have some configuration to be done.

There are two attributes that you can configure: The num-owner - that represents the number of copies of the counter's value in a cluster; and the reliability - that represents the behavior of the counters in case of partitions.

Below, is the configuration example with their default values.
XML:
Programatically:
Then, you can retrieve the CounterManager from the CacheManager, as shown below, and start using the counters!

Counter

A counter is identified by its name. Also, it is initialized with an initial value (by default 0) and it can be persisted, if the value needs to survive a cluster restart.

There are 2 types of counters: strong and weak counters.

Strong Counters

The strong counter provides higher consistency. Its value is known during the update and its updates are applied atomically. This allows to set boundaries and provides conditional operation (as compare-and-set).

Configuration

A strong counter can be configured in the configuration file or programatically. They can be also created dynamically at runtime. Below shows us how it can be done:

XML:
Programatically:
Runtime:

Use Case

The strong counter fits the following uses cases:
  • Global Id Generator
Due to its strong consistency, it can be used as a global identifier generator, as in the example below:

  • Rate Limiter
If bounded, it can be used as a simple rate limiter. Just don't forget to invoke reset()...

  • Simply count "stuff"
Well, it is a counter after all...

Weak Counters

The weak counter provides eventual consistency and its value is not known during updates. It provides faster writes when comparing with the strong counter.

Configuration

As in strong counter, the weak counter can be configure its name and its initial value. In addition, a concurrency-level can be configure to set the number of concurrent updates that can be handled in parallel. Below shows us how to configure it:

XML:
Programatically:
Runtime:

Use Case

The main use case for the weak counter includes all scenarios where its value isn't needed while updating the counter. For example, it can be used to count the number of visits to some resource:

For more information, take a look at the documentation. If you have any feedback, or would like to request some new features, or found some issue, let us know via the forumissue tracker or the #infinispan channel on Freenode.

Monday, 17 July 2017

Conflict Management and Partition Handling

In Infinispan 9.1.0.Final we have overhauled the behaviour and configuration of partition handling in distributed and replicated caches.  Partition handling is no longer simply enabled/disabled, instead a partition strategy is configured. This allows for more fine-grained control of a cache's behaviour when a split brain occurs. Furthermore, we have created the ConflictManager component so that conflicts on cache entries can be automatically resolved on-demand by users and/or automatically during partition merges .


Conflict Manager


During a cache's lifecycle it is possible for inconsistencies to appear between replicas of a cache entry  due to a variety of reasons (e.g replication failures, incorrect use of flags etc).  The conflict manager is a tool that allows users to retrieve all stored replica values for a cache entry. In addition to allowing users to process a stream of cache entries whose stored replicas have conflicting values. Furthermore, by utilising implementations of the EntryMergePolicy interface it is possible for said conflicts to be resolved deterministically.

EntryMergePolicy


In the event of conflicts arising between one or more replicas of a given CacheEntry, it is necessary for a conflict resolution algorithm to be defined, therefore we provide the EntryMergePolicy interface. This interface consists of a single method, "merge", whose output is utilised as the "resolved" CacheEntry for a given key. A non-null return value is put to all replicas of the CacheEntry in question, whereas a null return value results in all replicas being removed from the cache.

The merge method takes two parameters: the "preferredEntry" and "otherEntries". In the context of a partition merge, the preferredEntry is the CacheEntry associated with the partition whose coordinator is conducting the merge (or if multiple entries exist in this partition, it’s the primary replica). However, in all other contexts, the preferredEntry is simply the primary replica. The second parameter, "otherEntries" is simply a list of all other entries associated with the key for which a conflict was detected.

Currently Infinispan provides the following implementations of EntryMergePolicy:


Policy Description
MergePolicies.PREFERRED_ALWAYS Always utilise the "preferredEntry".
MergePolicies.PREFERRED_NON_NULL Utilise the "preferredEntry" if it is non-null, otherwise utilise the first entry from "otherEntries".
MergePolicies.REMOVE_ALL Always remove a key from the cache when a conflict is detected.

Application Usage


For conflict resolution during partition merges, once an EntryMergePolicy has been configured for the cache, no additional actions are required by the user.  However, if an Infinispan user would like to utilise the ConflictManager explicitly in their application, it should be retrieved by passing an AdvancedCache instance to the ConflictManagerFactory

Note, that depending on the number of entries in the cache, the getConflicts and resolveConflict methods are expensive operations, as they both depend on a spliterator which lazily loads cache entries on a per segment basis. Consequently, when operating in distributed mode, if many conflicts exist, it is possible for an OutOfMemoryException to occur on the node searching for conflicts.


Partition Handling Strategies


In 9.1.0.Final the partition handling enabled/disabled option has been deprecated and users must now configure an appropriate PartitionHandling strategy for their application. A partition handling strategy determines what operations can be performed on a cache when a split brain event has occurred. Ultimately, in terms of Brewer’s CAP theorem, the configured strategy determines whether the cache's availability or consistency is sacrificed in the presence of partition(s). Below is a table of the provided strategies and their characteristics:


Strategy Description CAP
DENY_READ_WRITES If the partition does not have all owners for a given segment, both reads and writes are denied for all keys in that segment.

This is equivalent to setting partition handling to true in Infinispan 9.0.
Consistency
ALLOW_READS Allows reads for a given key if it exists in this partition, but only allows writes if this partition contains all owners of a segment. Availability
ALLOW_READ_WRITES Allow entries on each partition to diverge, with conflicts resolved during merge.

This is equivalent to setting partition handling to false in Infinispan 9.0.
Availability

Conflict Resolution on Partition Merge


When utilising the ALLOW_READ_WRITES partition strategy it is possible for the values of cache entries to diverge between competing partitions. Therefore, when the two partitions merge, it is necessary for these conflicts to be resolved. Internally Infinispan utilises a cache's ConflictManager to search for cache entry conflicts and then applies the configured EntryMergePolicy to automatically resolve said conflicts before rebalancing the cache. This conflict resolution is completely automatic and does not require any additional code or input from Infinispan users.

Note, that if you do not want conflicts to be resolved automatically during a partition merge, i.e. the behaviour before 9.1.x, you can set the merge-policy to null (or NONE in xml). 


Configuration

Programmatic



XML




Conclusion


Partition handling has been overhauled in Infinispan 9.1.0.Final to allow for increased control over a cache's behaviour. We have introduced the ConflictManager which enables users to inspect and manage the consistency of their cache entries via custom and provided merge policies.

If you have any feedback on the partition handling changes, or would like to request some new features/optimisations, let us know via the forumissue tracker or the #infinispan channel on Freenode.

Friday, 14 July 2017

Infinispan 9.1 "Bastille"

Dear Infinispan users,

after 3½ months, we are proud to present to you our latest stable release, Infinispan 9.1, codenamed "Bastille".

While minor releases are traditionally evolutionary instead of revolutionary, this release still comes loaded with a number of great features:

Scattered cache

A new clustered cache, similar to a distributed cache, but with a higher write throughput.

Consistency Checker, Conflict Resolution and Automatic merge policies

An overhaul to partition handling which allows much finer control about whether to allow reads and writes in split clusters and how data is reconciled when partitions are merged.

Clustered Counters

An implementation of clustered counters with both strong and weak semantics, threshold events, optional persistence and bounding. Currently these are only available in embedded mode, but they will be usable over Hot Rod in Infinispan 9.2.

Locked Streams

Locked streams allow you to run your stream processing operations knowing that another update cannot be performed while the Consumer is executed on an entry. Note this only works in non transactional and pessimistic transactional caches (optimistic transactional caches are not supported).

API improvements

The compute(), computeIfPresent() and computeIfAbsent() methods on the Cache interface are now implemented as proper distributed operations so that they run local to the entries.
The DeltaAware interface for supporting granular clustered operations has been deprecated in favour of functional commands.

Persistence improvements

The CacheStore SPI now supports write batching. The JDBC, JPA, RocksDB, Remote and File stores have been modified to take advantage of this. You should see great benefits when using write-behind or when using putAll operations.

Remote query with JBoss Marshalling

Remote query now also works with Java entities annotated with Hibernate Search annotations and JBoss Marshalling without requiring ProtoBuf.

HTTP/2 and ALPN support on the REST endpoint

The REST endpoint has been completely rewritten so that it now supports both HTTP/1.1 and HTTP/2 as well as ALPN (even on Java 8). The new endpoint is also 30% faster during reads and 6% faster during writes.

Hot Rod Java client improvements

The Java Hot Rod client now has proper entrySet(), keySet() and values() implementations which iterate over the remote data instead of pulling it all locally.
It is now also finally possible to create and remove caches directly from the client.

Server Administration console improvements

The console has received a number of updates for usability and consistency. It is also finally possible to configure and manage the remote endpoints.

Component upgrades

Hibernate Search 5.8, JGroups 4.0.4, KUBE Ping 1.0.0.Beta1

Bug fixes

We have also dropped the guillotine on a large number of bugs.

If all goes well, we plan to release Infinispan 9.2 at the end of October, with lots of great updates.

So, head over to our download page, consult the upgrading guide and let us know about how you use Infinispan.

Cheers !

The Infinispan team

Thursday, 6 July 2017

Reactive Big Data demo working with Infinispan 9.0.3.Final

A couple of months ago I did an extensive blog post on a reactive Big Data demo I did for Great Indian Developer Summit. At the time, the demo relied on a custom Infinispan build which fixed ISPN-7814 and ISPN-7710 issues.

These issues are now fixed in the main repository and the 9.0.x branch, and so you can now run the demo, as is, using Infinispan 9.0.3.Final. I've updated the demo so that it uses this version.

Cheers,
Galder

Monday, 3 July 2017

Scattered cache

Infinispan strives for high throughput and low latency. Version 9.1.0.CR1 comes with a new cache mode - scattered cache - that's our answer to low round-trip times for write operations. Through a smart routing algorithm it guarantees that the write operation will result in only single RPC, where distributed caches with 2 owners would often use 2 RPCs. Scattered cache is resilient against single node failure (this is equivalent to distributed cache with 2 owners) and does not support transactions.

Declarative configuration is straigtforward: and programmatic is a piece of cake, too:

What does the routing algorithm differently, then? In distributed cache, one node is always designated as the primary owner and the others owners are backups. When a (non-owner) node does a write (invoke cache.put("k", "v")), it sends the command to the primary owner, primary forwards it to the backups and the operation is completed only when all owners confirm this to the first node (originator). It's not possible to contact all owners in a single multinode RPC from the originator as the primary has to decide upon ordering in case of concurrent writes.

In scattered cache every node may be the backup. We don't designate backup owners in the routing table (also called consistent hash for historical reasons), only primaries are set there. When a node does a write it sends the command to the primary owner and when this confirms the operation the originator stores the entry locally, effectively becoming a backup. That means that there can be more than 2 copies of the entry in the cluster, the others being outdated - but only temporarily, the other copies are eventually invalidated through a background process that does not slow down the synchronous writes. As you can see, this algorithm cannot be easily extended to multiple owners (keeping the performance characteristics) and therefore scattered cache does not support multiple backups.

In case of primary crash, the reconciliation process is somewhat more complex, because there's no central record telling who are the backups. Instead, the new primary owner has to search all nodes and pick the last write - previous primary owner has assigned each entry a sequence number which makes this possible. And there are more technical difficulties - you can read about these in the design document or in the documentation.

There is a con, of course. When reading an entry from a distributed cache, there is some chance that the entry will be located on this node (being one of the owners), and local reads are very fast. In scattered cache we always read from the primary node, as the local version of the entry might be already outdated. That halves the chance for a local read - and this is the price you pay for faster writes. Don't worry too much, though - we have plans for L1-like caching that will make reads great again!

Don't touch my data stream!

In Infinispan 8.0 we were very excited to announce Distributed Streams as we moved to Java 8. This feature allows applying any of the various java.util.stream.Stream operations on the datagrid, which are performed in a distributed nature, providing the highest possible performance as data is processed on the node where it lives, only requiring the terminal operation intermediate results to be returned to the invoker.

One problem with distributed streams though is that data is processed without acquiring locks: great for performance, but there is no guarantee that some other process isn't concurrently modifying the cache entry you're working on. Consider the following example which iterates through the entire contents of a cache, modifying each entry based on its existing value:

This works great until you have another cache put() running concurrently that changes a value. In this case the only way to be sure that an update is applied properly is to perform an optional update in the forEach. In a transactional cache you could also lock the entry manually (pessimistic) or retry on a WriteSkewException (optimistic). For example this is how the optional update could be performed.

As you can see the code isn't as pretty as it was before, but is still pretty concise.

Infinispan 9.1 introduces locked streams, which allow you to run your operation knowing that another update cannot be performed while running the Consumer. Note this only works in non transactional and pessimistic transactional caches (optimistic transactional caches are not supported).

If you notice the code looks very similar to the first example. You just have to invoke the lockedStream method on the AdvancedCache and then you can use the stream knowing that data for the given key won't change while performing your update.

This locked stream has a slightly limited API compared to the normal java.util.stream API. Only the filter method is allowed in addition to forEach. The CacheStream API is also supported, with a few exceptions. For more details on the API and what methods are supported you should check out the Javadoc.

The lock is only acquired for the given key while invoking the Consumer, allowing other updates on other keys to be performed concurrently, just like a normal put operation would behave. It is not suggested to perform operations on other keys in the Consumer, as this could cause possible deadlocks.

Now go forth and perform operations using the data stream knowing that the data underneath has not changed!

Thursday, 29 June 2017

Infinispan 9.1.0.CR1

Dear Infinispan Community,

the Infinispan 9.1.0.CR1 is out and can be found on our downloads page. Almost Final!


Full details of the new features and enhancements included in this release can be found here.

Short list of highlights:
  • [ISPN-6245] - Remote query should be able to work with JBoss marshaling, compat mode and hibernate-search annotations
  • [ISPN-6645] - Scattered cache
  • [ISPN-7900] - Provide entrySet, values, keySet implementation for RemoteCache
  • [ISPN-7753] - Compute, ComputeIfPresent, ComputeIfAbsent (now distributed)
  • The usual slew of bug fixes, clean ups and general improvements.
As usual, we will be blogging about each feature and improvement.

Always consult the Upgrading guide to see what has changed. Thank you for following us and stay tuned! The Infinispan Team

Hotrod clients C++/C# 8.2.0.Alpha1 released!

Dear Infinispanners,

we're pleased to announce the C++ and C# 8.2.0.Alpha1 release.

This is the first step on the 8.2 roadmap, which is not formally defined in a Jira at the moment but will surely include: SASL authentication, continuous queries, cluster counters.

In this release you can try and enjoy an Alpha version of the SASL authentication, sample code is here (C++, C#). Tell us what you think!

Check the release notes, browse the source code (C++, C#) or just download an try it!

Cheers,
The Infinispan Team

Thursday, 22 June 2017

Cache configuration inheritance: you're no son of mine

Once upon a time Infinispan cache configurations were all orphans.

Actually, it wasn't as sad as that: they all shared a single parent - the default cache. While this gave caches a limited form of inheritance, it led to confusion as users weren't really aware of it and it was impossible to turn it off: the limited gene pool was propagating possibly unwanted traits to all of its children.

Templates and real configuration inheritance

Infinispan 7.2 finally introduced proper configuration templates and inheritance. But there was a catch. Backwards compatibility dictated that the "default mother of all caches" behaviour survived.

In the above example, the default cache is a replicated cache with a file store. The distributed cache inherits the "transactional" configuration. However, because of default inheritance present up to Infinispan 8.2, the distributed-cache also ended up having a file store. Confusing or what !?!
The best workaround was to never give it a specific configuration, let Infinispan use its internal defaults and essentially avoid it. Just like the black sheep in the family.

Bye bye default cache 

With Infinispan 9.0 we decided it was finally time to cut the umbilical cord between the default cache and all the other caches: if you declare one, it will never be used as default inheritance for every other cache. In the above example, the distributed cache won't have a file store any more.

We've gone even further: unless you declare a default cache, we will not even set one up for you, not even one with default settings!

Aleksandr Sergeevich Serebrovskii, the Russian geneticist  who first formulated the concept of the gene pool and the diversity benefits it brings, would be proud of us.

Wednesday, 21 June 2017

Infinispan 9.0.3.Final and 8.2.7.Final are out!

Dear Infinispanner,

we're proud to announce the release of two new versions of Infinispan for the 9.0 and 8.2 users.

If you're on these branches please check the list of the bugs we've caught and consider to upgrade.

Download, docs and more info are available on the Infinispan Site.

Cheers,
The Infinispan Team

Bugs closed in 9.0.3 (full release notes):
  • [ISPN-6730] - EmbeddedCompatContinuousQueryTest.testContinuousQuery fails with CCE
  • [ISPN-7710] - CompatibilityProtoStreamMarshaller can't be set in server
  • [ISPN-7779] - State transfer does not work with protobuf encoded entities
  • [ISPN-7802] - Use chunked reads/writes in TcpTransport
  • [ISPN-7814] - Remove auth check in CacheDecodeContext
  • [ISPN-7895] - ArrayIndexOutOfBoundsException when using off heap with expiration
  • [ISPN-7901] - Postgres drop-on-exit remove index fails
  • [ISPN-7906] - Infinispan Query DSL does not handle inheritance of properties/fields correctly
  • [ISPN-7922] - OffHeap stream causes too many entries in memory at once
  • [ISPN-7930] - Remove unnecessary provided dependencies

Bugs closed in 8.2.7 (full release notes):
  • [ISPN-6029] - DDL for JDBC store tables should never allow null
  • [ISPN-6539] - ClassCastException with Remote Cache Loader and GetWithMetadata
  • [ISPN-6766] - hot rod client: RemoteCache.removeClient method does not remove the listener from the list after server restart
  • [ISPN-7430] - Slowdown when using PutAll with transactions
  • [ISPN-7480] - JDBC cache store doesn't work on Sybase
  • [ISPN-7495] - XSD files missing from infinispan-embedded
  • [ISPN-7535] - Cache creation requires specific permissions when using security manager
  • [ISPN-7547] - ISPN-7207 fix swallows JMX exceptions
  • [ISPN-7572] - Infinispan initialization via DirectoryProvider can't use any CacheStore or other extensions
  • [ISPN-7584] - Rolling upgrade fails with "java.lang.ClassCastException: SimpleClusteredVersion cannot be cast to NumericVersion"
  • [ISPN-7622] - Hot Rod Rolling Upgrade throws TimeOutException
  • [ISPN-7779] - State transfer does not work with protobuf encoded entities
  • [ISPN-7838] - JBoss Modules NPE in Domain mode
  • [ISPN-7860] - DataContainerFactory doesn't work with MANUAL eviction mode (CacheConfigurationException: Unknown eviction strategy MANUAL)
  • [ISPN-7906] - Infinispan Query DSL does not handle inheritance of properties/fields correctly 

Tuesday, 20 June 2017

Back from Berlin Buzzwords, video already available!

Exactly one week ago I was presenting a talk on big data in action with Infinispan and two days later the video has already been uploaded!! Fastest ever conference video release I've seen! Kudos to Berlin Buzzwords 👏

The slides for the talk can be found here, and the video is here:


Berlin Buzzwords was a very interesting conference. Similar to J On The Beach, it's a conference focused on data related technologies, but Berlin Buzzwords had a more Apache focus. So, you had many talks on Solr, Lucene, Spark, Flink, Kafka, Beam...etc, as well as Apache spinoffs such as ElasticSearch.

The conference was very well organised and the talks were good, although I did miss some demos in the talks I attended. Having been presenting Infinispan for over 8 years, I am fully aware that coming up with data related demos is not an easy task. However, with some many open data streams available these days, there has never been a bigger opportunity to put some of that data to work in a live demo and demonstrate why your tech is so awesome.

From an Infinispan perspective, it was fascinating talking to Flink, Beam...etc developers and learn how Infinispan could be integrated with these projects. We already have Hadoop and Spark integrations, but we're not standing still and we will continue to integrate with other popular data processing technologies.

On a personal level, it was awesome to meet William Benton once again and we had some very interesting discussions about Radanalytics, a project that helps you build data-driven applications on top of OpenShift. I also had some interesting chats with fellow Basel residents working for Baloise insurance group and the University of Basel.

Cheers,
Galder


Monday, 19 June 2017

Infinispan 9.1.0.Beta1

Dear Infinispan Community,

the Infinispan 9.1.0.Beta1 is out and can be found on our downloads page.


Full details of the new features and enhancements included in this release can be found here.

Short list of highlights:
  • [ISPN-7114] Consistency Checker, Conflict Resolution and Automatic merge policies
  • [ISPN-5218] Batching for CacheStores
  • [ISPN-7896] On-demand data conversion in caches
  • [ISPN-6676] HTTP/2 suport in the REST endpoint with TLS/ALPN upgrade
  • [ISPN-7841] Add stream operations that can operate upon data exclusively
  • [ISPN-7868] Add encryption and authentication support to the Remote Store
  • [ISPN-7772] Hot Rod Client create/remove cache operations
  • [ISPN-6994] Add an AdvancedCache.withSubject(Subject) method for explicit impersonation
  • [ISPN-7803] Functional commands-based AtomicMaps
  • The usual slew of bug fixes, clean ups and general improvements.
As usual, we will be blogging about each feature and improvement.

Always consult the Upgrading guide to see what has changed. thank you for following us and stay tuned! The Infinispan Team

Cache operations impersonation: do as I say (or maybe as she says)

The implementation of cache authorization in Infinispan has traditionally followed the JAAS model of wrapping calls in a PrivilegedAction invoked through Subject.doAs(). This led to the following cumbersome pattern:


We also provided an implementation which, instead of relying on enabling the SecurityManager, could use a lighter and faster ThreadLocal for storing the Subject:


While this solves the performance issue, it still leads to unreadable code.
This is why, in Infinispan 9.1 we have introduced a new way to perform authorization on caches:


Obviously, for multiple invocations, you can hold on to the "impersonated" cache and reuse it:

We hope this will make your life simpler and your code more readable !

Tuesday, 13 June 2017

Infinispan coming to Berlin Buzzwords 2017

Are you attending Berlin Buzzwords and want to find out more how Infinispan can help your systems react to real-time data quickly, and see the cool stuff we have for data analytics, make sure you come to my talk on Big Data In Action with Infinispan on Tuesday, 13th June at 16:30.



Cheers,
Galder

Wednesday, 31 May 2017

Infinispan 9.1.0.Alpha1 Released

Dear Infinispan Community,

The first Alpha release of Infinispan 9.1 is out and can be found on our downloads page.

Highlights include:



Full details of the new features and enhancements included in this release can be found here.

Check out the new features and enhancements, download the release and tell us all about it on the forum, on our issue tracker or on IRC on the #infinispan channel on Freenode.

Cheers,
The Infinispan Team

Monday, 29 May 2017

Hotrod clients C++/C# 8.1.1.Final released!

Dear Infinispanners,

we're pleased to announce that 8.1.1.Final release for C++/C# clients is out!

Check the release notes and browse the source code, effort this time has been put in reducing code complexity.

This is the first release built by our new CI Jenkins environment, this is supposed to not affect the binaries but if you feel that something has gone wrong please fill a jira issue.

Enjoy and thanks for reading!

The Infinispan Team

Tuesday, 23 May 2017

KUBE_PING 0.9.3 released

I'm happy to announce that JGroups KUBE_PING 0.9.3 was released. The major changes include:
  • Fixed releasing connections for embedded HTTP Server
  • Fixed JGroups 3/4 compatibility issues
  • Fixed test suite
  • Fixed `Message.setSrc` compatibility issues
  • Updated documentation
The bits might be downloaded from JBoss Repository as soon as the sync completes. Please download them from here in the meantime. 

I would also like to recommend you recent blog post created by Bela Ban. KUBE_PING was completely revamped (no embedded HTTP Server, reduced dependencies) and we plan to use new, 1.0.0 version in Infinispan soon! If you'd like to try it out, grab it from here.

Infinispan 9.0.1.Final Released

Dear Infinispan Community,

We have just released Infinispan 9.0.1.Final which can be found on our downloads page. Full details of the fixes included in this release can be found here.

Check out the fixed issues, download the release and tell us all about it on the forum, on our issue tracker or on IRC on the #infinispan channel on Freenode.

Cheers,
The Infinispan Team

Monday, 22 May 2017

Infinispan Spring Boot Starters 1.0.0.Final released

We are happy to announce that Infinispan Spring Boot Starters 1.0.0.Final have been released.

Change-list:


You can grab the bits from JBoss Repository after the sync is complete. In the meantime, grab them from here.

Sunday, 21 May 2017

J On The Beach: An unmissable conference for distributed computing tech!

J On The Beach was a blast! It's only their second year doing the conference, but it was really well managed and it had an amazing lineup of speakers. To top that up, it was in Malaga so the good weather made it possible to stay outside in the garden at La Termica chatting to attendees and speakers.

The evening before the start of the conference, we had a welcome reception at the Ayuntamiento de Malaga learning about IT and Big Data promotion that the major and his team are helping with.

The conference started with a mind-blowing keynote on quantum computing by Eric Ladizinsky. It was a super talk with very interesting information about what the future might hold in terms of computing. The challenges of quantum computing are immense but the possibilities it opens up staggering as well.

That first morning I had the chance to see Kyle Kingsbury's Jepsen talk which was very entertaining. He gave an intro on Jepsen and looked back at the results of different distributed environments. This allowed the audience to get a good overview on what each system is capable of and what guarantees they provide. Also in the first day I attended, Christopher Meiklejohn's talk on Antidote, a geo-replicated NoSQL database with strong guarantees based on Riak. It uses CRDTs and Hight Available Transactions to achieve this.

On the second day I had my presentation on Functional Reactive Programming with Elm, Node.js and Infinispan. It was well received and got good feedback. Slides can be found here, and the demo repository is here. Unfortunately, due to scheduling and preparations for my talk, I couldn't go to Duarte Dunes' ScyllaDB and Tyler Akidau's Apache Beam talks, but I hope to catch those up when the videos are shared.

However, I was able to attend Caitie Mccaffrey's talk on Distributed Sagas, a protocol for coordinating microservices. Even though such protocol would be hard to implement in all situations, e.g. online ticket shop for a very popular artist, it had some interesting characteristics. The talk itself was delivered masterfully.

Finally, I was at Martin Thompson's High Performance Managed Languages talk which was superb! With years of experience and the development of Aeron on his back, he was able to give a interesting overview of the performance characteristics of managed vs unmanaged languages. Flexibility in managed languages, such as in C#, seems to be the best way to achieve the best performance.

All in all it was a fantastic conference, and I was delighted to have been part of it. Valo, the company behind J On The Beach were fantastic hosts and met some amazing people that are or had been part of this company, including: Luis, Justo, Michael, Danielle...etc.

I hope to come up another time :)

Cheers,
Galder

Thursday, 18 May 2017

Infinispan landed on J On The Beach 2017

Are you in Malaga for J On The Beach 2017 and want to know more about functional reactive programming with Elm, Node.js and Infinispan? Then, make sure you come to this talk on Friday, 11am at Mollete Hall. It's a fun, live coding talk that you just can't miss :)

Cheers,
Galder

Monday, 15 May 2017

NoSQL Unit supports Infinispan 9.0.0.Final

NoSQL Unit is a JUnit extension that helps you write NoSQL unit tests, created by Alex Soto. It brings the ideas first introduced by DBUnit to the world of NoSQL databases.

The essence of DBUnit or NoSQL Unit is that before running each test, the persistence layer is found in a known state. This makes your test repeatable, independent of other test failures or potential database corruptions.

You can use NoSQL Unit for testing embedded or remote Infinispan instances, and since version 1.0.0-rc.5which was released a few days back, it supports the latest Infinispan 9.0.0.Final.

We have a created a little demo GitHub repository showing you how to test Infinispan using NoSQL Unit. Go and give it a go! :)

Thanks Alex bringing NoSQL Unit to my attention!

Cheers,
Galder

Wednesday, 10 May 2017

Running an Infinispan cluster with Kubernetes on Google Container Engine (GKE)

Over the past few years we've been blogging a lot on how to use Infinispan in cloud environments based on Docker, Kubernetes or OpenShift.

Continuing with this series of blog posts, Bela Ban, chief-in-charge of JGroups, posted an unmissable blog post yesterday not only how to run Infinispan with Kubernetes on Google Container Engine (GKE), but also how to load test it with IspnPerfTest.

If any of these topics interests you, don't miss out and head to Bela's blog to read all about it!

Thanks Bela for the blog post!!!

Cheers,
Galder

Friday, 5 May 2017

Reactive Big Data on OpenShift In-Memory Data Grids

Thanks a lot to everyone who attended the Infinispan sessions I gave in Great Indian Developer Summit! Your questions after the talks were really insightful.

One of the talks I gave was titled Big Data In Action with Infinispan (slides are available here), where I was looking at how Infinispan based in-memory data grids can help you deal with the problems of real-time big data and how to do big data analytics.

During the talk I live coded a demo showing both real-time and analytics parts, running on top of OpenShift and using Vert.x for joining the different parts. The demo repository contains background information on instructions to get started with the demo, but I thought it'd be useful to get focused step-by-step instructions in this blog post.

Set Up


Before we start with any of the demos, it's necessary to run some set up steps:

    1. Check out git repository:
    
        git clone https://github.com/galderz/swiss-transport-datagrid

    2. Install OpenShift Origin or Minishift to get an OpenShift environment running in your own 
        machine. I decided to use OpenShift Origin, so the instructions below are tailored for that 
        environment, but similar instructions could be used with Minishift.

    3. Install Anaconda for Python 3, this is required to run Jupyter notebook for plotting.

Demo Domain


Once the set up is complete, it's time to talk about the demos before we run them.

Both demos shown below work with the same application domain: swiss rail transport systems. In this domain, we differentiate between physical stations, trains, station boards which are located in stations, and finally stops, which are individual entries in station boards.

Real Time Demo


The first demo is about working with real-time data from station boards around the country and presenting a centralised dashboard of delayed trains around the country. The following diagrams shows how the following components interact with each other to achieve this:



Infinispan, which provides the in-memory data grid storage, and Vert.x, which provides the glue for the centralised delayed dashboard to interact with Infinispan, all run within OpenShift cloud. 

Within the cloud, the Injector verticle cycles through station board data and injects it into Infinispan. Also within the cloud, a Vert.x verticle that uses Infinispan's Continuous Query to listen for station board entries that are delayed, and these are pushed into the Vert.x event bus, which in turn, via a SockJS bridge, get consumed via WebSockets from the dashboard. The centralised dashboards is written with JavaFX and runs outside the cloud.

To run the demo, do the following:

    1. Start OpenShift Origin if you've not already done so:

        oc cluster up --public-hostname=127.0.0.1

    2. Deploy all the OpenShift cloud components:

        cd ~/swiss-transport-datagrid
        ./deploy-all.sh

    3. Open the OpenShift console and verify that all pods are up.

    4. Load github repository into your favourite IDE and run
        delays.query.continuous.fx.FxApp Java FX application. This will load the
        centralised dashboard. Within seconds delayed trains will start appearing. For example:


Analytics Demo


The first demo is focused on how you can use Infinispan for doing offline analytics. In particular, this demo tries to answer the following question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

Once again, this demo runs on top of OpenShift cloud, uses Infinispan as in-memory data grid for storage and Vert.x for glueing components together.

To answer this question, Infinispan data grid will be loaded with 3 weeks worth of data from station boards using a Vert.x verticle. Once the data is loaded, the Jupyter notebook will invoke an HTTP restful endpoint which will invoke an Vert.x verticle called AnalyticsVerticle

This verticle will invoke a remote server task which will use Infinispan Distributed Java Streams to calculate the two pieces of information required to answer the question: per hour, how many trains are going through the system, and out of those, how many are delayed.

An important aspect to bear in mind about this server tasks is that it will only be executed in one of the nodes in the cluster. It does not matter which one. In turn, this node will will ship the lambdas required to do the computation to each of the nodes so that they can executed against their local data. The other nodes will reply with the results and the node where the server task was invoked will aggregate the results.

The results will be sent back to the originating invoker, the Jupyter notebook which will plot the results. The following diagrams shows how the following components interact with each other to achieve this:



Here is the demo step-by-step guide:

    1. Start OpenShift Origin and deploy all components as shown in previous demo.

    2. Start the Jupyter notebook:

        cd ~/swiss-transport-datagrid/analytics/analytics-jupyter
        ~/anaconda/bin/jupyter notebook

    3.  Once the notebook opens, click open live-demo.ipynb notebook and execute each of the cells in order. You should end up seeing a plot like this:


So, the answer to the question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

is 2am! That's because last connecting trains of the day wait for each other to avoid leaving passengers stranded.

Conclusion


This has been a summary of the demos that I presented at Great Indian Developer Summit with the intention of getting you running these demos as quickly as possible. The repository contains more detailed information of these demos. If there's anything unclear or any of the instructions above are not working, please let us know!

Once again, a very special thanks to Alexandre Masselot for being the inspiration for these demos. Merci @Alex!!

Over the next few months we will be enhancing the demo and hopefully we'll be able to do some more live demonstrations at other conferences.

Cheers,
Galder

Monday, 24 April 2017

Learn about Infinispan at Great Indian Developer Summit

I've just arrived in India where I'll be speaking about Infinispan, JBoss Data Grid and other related technologies in the Great Indian Developer Developer Summit in Bangalore. So if you're attending and want to find out more how Infinispan can help your systems react to real-time data quickly, and see the cool stuff we have for data analytics, make sure you come!!




For more details, check the conference schedule :)

Cheers,
Galder

Hotrod clients C++/C# 8.1.0.Final released!

Dears,

we're pleased to announce that 8.1.0.Final release for C++/C# clients is out!

Check the Release Notes and try it yourself without fear, it's tagged as stable!

As in the best TV series: Final doesn't mean the last! Stay tuned for the next 8.2.0 "More Fun Is Coming" season :)

Enjoy and thanks for reading!

The Infinispan Team

Tuesday, 11 April 2017

Infinispan Archetype 1.0.19 Released

I'm pleased to announce that we have just released version 1.0.19 of the infinispan-archetype. This release focuses on making the archetype compatible with Infinispan 9.0 as well as adding a store archetype for creating custom cache writer/loader implementations.

Archetype Usage


To utilise the archetypes use the following commands:


Contributing


If you encounter any issues with the archetypes, or would like to request additional archetypes, please raise an issue on GitHub.

Friday, 7 April 2017

In Memory Data Grid Patterns Demos from Devoxx France!

Devoxx France 2017 was a blast!! Emmanuel and I would like to thank all attendees to our in-memory data grids patterns talk. The room was full and we thoroughly enjoyed the experience!

During the talk we presented a couple of small demos that showcased some in-memory data grid use cases. The demos are located here, but I thought it'd be useful to provide some step-by-step here so that you can get them running as quickly as possible.

Before we start with any of the demos, it's necessary to run some set up steps:

  1. Check out git repository:

    git clone https://github.com/galderz/datagrid-patterns

  2. Download Infinispan Server 9.0.0.Final and at the same level as the git repository.

  3. Go into the datagrid-patterns directory, start the servers and wait until they've started:

    cd datagrid-patterns
    ./run-servers.sh

  4. Install Anaconda for Python 3, this is required to run Jupyter notebook for plotting.

  5. Install Maven 3.

Once the set up is complete, it's time to start with the individual demos.

Both demos shown below work with the same application domain: rail transport systems. In this domain, we differentiate between physical stations, trains, station boards which are located in stations, and finally stops, which are individual entries in station boards.

Analytics Demo


The first demo is focused on how you can use Infinispan for doing offline analytics. In particular, this demo tries to answer the following question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

To answer this question, Infinispan data grid will be loaded with 3 weeks worth of data from station boards. Once the data is loaded, we will execute a remote server task which will use Infinispan Distributed Java Streams to calculate the two pieces of information required to answer the question: per hour, how many trains are going through the system, and out of those, how many are delayed.

An important aspect to bear in mind about this server tasks is that it will only be executed in one of the nodes in the cluster. It does not matter which one. In turn, this node will will ship the lambdas required to do the computation to each of the nodes so that they can executed against their local data. The other nodes will reply with the results and the node where the server task was invoked will aggregate the results.

Then, these results are sent back to the client, which in turn, stores the results as JSON in an intermediate cache. Once the results are in place, we will use a Jupyter notebook to read those results and plot the result.

Let's see these steps in action:

  1. First, we need to install the server tasks in the running servers above:

    cd datagrid-patterns/analytics
    mvn clean install package -am -pl analytics-server
    mvn wildfly:deploy -pl analytics-server
    
  2. Open the datagrid-pattern repo with your favourite IDE and run delays.java.stream.InjectApp class located in analytics/analytics-server project. This command will inject the data into the cache. On my environment, it takes between 1 and 2 minutes.

  3. With the data loaded, we need to run the remote task that will calculate the total number of trains per hour and how many of those are delayed. To do that, execute delays.java.stream.AnalyticsApp class located in analytics/analytics-server project from your IDE.

  4. You can verify that the results have been calculating by going to the following address:


  5. With the results in place, it's time to start the Jupyter notebook:

    cd datagrid-patterns/analytics/analytics-jupyter
    ~/anaconda/bin/jupyter notebook

  6. Once the notebook opens, click open live-demo.ipynb notebook and execute each of the cells in order. You should end up seeing a plot like this:


So, the answer to the question:

Q. What is the time of the day when there is the biggest ratio of delayed trains?

is 2am! That's because last connecting trains of the day wait for each other to avoid leaving passengers stranded.

Real Time Demo


The second demo that we presented uses the same application domain as above, but this time we're trying to use our data grid as a way of storing the station board state of each station at a given point in time. So, the idea is to use Infinispan as an in memory data grids for working with real time data.

So, what can we do with this type of data? In our demo, we will create a centralised dashboard of delayed trains around the country. To do that, we will take advantage of Infinispan's Continuous Query functionality which allows us to find those station boards which contain stops that are delayed, and as new delayed trains appeared these will be pushed to our dashboard.

To run this demo, keep the same servers running as above and do the following:

1. Run delays.query.continuous.FxApp application located in real-time project inside the datagrid-patterns demo. This app will inject some live station board data and will launch a JavaFX dashboard that shows delayed trains as they appear. It should look something like this:



Conclusion

This has been a summary of the demos that we run in our talk at Devoxx France with the intention of getting you running these demos as quickly as possible. The repository contains more detailed information of these demos. If there's anything unclear or any of the instructions above are not working, please let us know!

Thanks to Emmanuel Bernard for partnering with me for this Devoxx France talk and for the continuous feedback while developing the demos. Thanks as well to Tristan Tarrant for the input in the demos and many thanks to all Devoxx France attendees who attended our talk :)

A very special thanks to Alexandre Masselot whose "Swiss Transport in Real Time: Tribulations in the Big Data Stack" talk at Soft-Shake 2016 was the inspiration for these demos. @Alex, thanks a lot for sharing the demos and data with me and the rest of the community!!

In a just a few weeks I'll be at Great Indian Developer Summit presenting these demos and much more! Stay tuned :)

Cheers,
Galder