Wednesday, 19 November 2014

Infinispan 7.0.2.Final released!

Dear community,

Infinispan 7.0.2.Final is now available!

This release removes duplication from the service lookup metadata. Please consult the release notes for details.

Thanks to everyone involved in this release! 

Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Monday, 17 November 2014

Infinispan 7.0.1.Final released!

Dear community,

Infinispan 7.0.1.Final is now available!

This is a bug-fix release and contains query performance improvements. For the complete list of changes please consult the release notes.

Thanks to everyone involved in this release! 

Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Tuesday, 4 November 2014

Why doesn't Map.size return the size of the entire cluster?

Many people may have been surprised the first time they used Map.size method on a distributed Infinispan cluster.  As was later deduced only the local node size is returned.

Infinispan had taken this approach to limit the chance that instead of getting the full cluster size you would receive an OutOfMemoryError.  This seems fair to return the local answer only but you secretly always wanted the entire cluster size.

For the Infinispan 7.0.0.Final release forget what you know when using the Map interface with Infinispan.

Enter Distributed Entry Iterator

We already announced this feature a while back at http://blog.infinispan.org/2014/05/iterate-all-entries-in-cache.html.  You can check it out for more details but it is essentially a memory efficient way of retrieving all the entries in the cache by iterating over them.

This has opened the way to implementing the various bulk methods on the Map interface that we could never do efficiently in the past (ie. Map.size, Map.keySet, Map.entrySet & Map.values).

Map size

Okay I admit, size could have been done more efficiently before, but the answer would have contained a very high margin of error.  Now size will give you a size value with consistency semantics just slightly less than ConcurrentHashMap does, but for the entire cluster.  Warning should be given that size may be slower for larger clusters or ones with a lot of data in a stored cache loader.

The size method behavior can be controlled by using a supplied Flag such as SKIP_CACHE_LOAD to not count any configured cache loaders or CACHE_MODE_LOCAL if you want the local count only.  These flags are not exclusive and can be both passed if desired as well.

Map Collection Views

In the past the Map.values, Map.keySet & Map.entrySet methods were only ever in memory copies of the local data at the time they were invoked, similar to Map.size.

Now these collection views will be cluster wide and an additionally will show updated contents when the cache is updated and your writes to the collection will be reflected in the Cache itself!  The only operations you can't do on these collections are adding values to either the keySet or values collections as they aren't key/value pairs.

If your cache was configured with a Flag such as SKIP_CACHE_LOAD or CACHE_MODE_LOCAL it will also be reflected in the collection view for both reads and writes.

Some caution is advised when using toArray, retainAll, or size methods as they will require full iteration to complete.

KeySet Optimization

The key set collection also has an optimization so it will never pull down the values so it has a lower network and serialization/deserialization overhead (unlike entrySet and values).

Transactionality

All of the aforementioned methods still support transactions in a way that you would expect.  There is one guarantee we don't provide and that is when using REPEATABLE_READ isolation.  We will not store entries read from an iterator in the transactional context as this could very easily run your local node out of memory with a large enough data set.

For reference methods that use an iterator internally are toArray, retainAll, isEmpty & size on the various collections as well as contains and containsAll on the values collection.

Other API Changes

These changes have also loosened some restrictions on other methods as well.

Map.isEmpty

This method before was only used locally as it used to calculate the size to determine if it was empty.  This method will now use the entry iterator and returns as soon as it finds that even a single value exists.  This is an important change as the old implementation would have to query any configured Cache Loader's complete size before returning.

Map.containsValue

We never supported this method before (not even locally).  This method will now use the iterator though and if it finds the value at any point point will return immediately so it doesn't have to iterate over the entire contents unless they don't exist.  However if you really want to do this operation often you should really use Indexing to make this faster.

Code Examples

I could put an example here, but I think some could take it as insult.  You have already seen 100's of examples as to how to use the Map interface and now in Infinispan you can use those in the exact same way and they will work just how you would expect them to.

Infinispan 7.0.0.Final is out!!

Hi all,

We are really proud to announce the release of Infinispan 7.0.0.Final!!

This is the culmination of several months of development which has focused on on Security, Cluster Partition handling, JSR-107 JCache 1.0.0 support, Clustered Listeners, Remote Events, Query improvements and brand new XML configuration.

To mark the occasion, the team has prepared a thorough release notes page highlighting all the major features and enhancements implemented in Infinispan 7.0 series.

The Infinispan team would like to recognise all the community members that have contributed to this release, in no particular order:

  • Radim Vansa for his Soft-Index File Store and many more enhancements and fixes
  • Takayoshi Kimura for fixes such as ISPN-3752ISPN-4476 and ISPN-4477
  • Jiri Holusa for his tremendous work to improve our test coverage work and fixing issues like ISPN-3442 
  • Karl von Randow for his documentation fixes, init.d fixes in ISPN-4141 and enhancements to putForExternalRead method as part of ISPN-3792
  • Jakub Markos for his work to optimise the Infinispan Server testsuite in ISPN-4317 and many fixes and test suite enhancements
  • Michal Linhard fox his ISPN-3750 fix
  • Vitalii Chepeliuk for his work on extending test coverage and fixes such as ISPN-3880
  • Wolf Dieter-Fink for fixes such as ISPN-3916 and ISPN-3912
  • Vojtech Juranek for his continued work to improve Infinispan with fixes such as ISPN-4072 and his work to increase the test coverage
  • Martin Gencur for the many issues he fixed including ISPN-3771ISPN-4499 and others...
  • Norman Maurer for porting Infinispan Servers to use Netty 4
  • Alan Field for fixes such as ISPN-4645 and ISPN-4376
  • Tomáš Sýkora for fixes such as ISPN-3136 and ISPN-4076, and improved test coverage
  • Paul Ferraro for many fixes including fixes such as ISPN-4375 and ISPN-4374
  • Nicolas Filotto for his ISPN-3689 fix
  • Rajesh Jangam for his ISPN-3877 and ISPN-3894 fixes
  • Brett Meyer for his amazing work to get Infinispan working in OSGI environments as part of ISPN-800 and many related fixes
  • Radoslav Husar for his several fixes
  • Sebastian Łaskawiec for his work to improve our CDI integration and moving to Jackson for JSON
  • Karsten Blees for his LIRS eviction fixes
  • Niels Bertramn for his ISPN-4679 fix
  • Duncan Doyle for his work on ISPN-4637
  • Emmanuel Bernard for his documentation improvements
  • Gabriel Francisco for his work to revamp the Mongo DB cache store
  • Bilgin Ibryam for his OSGI fixes
  • Erik Salter for his work on orphaned transactions and fixes such as ISPN-4872
Thanks to all contributors for your amazing work and effort! We hope you carry on contributing in future releases.

Finally, during the Infinispan 7.0 series, Gustavo Fernandes has joined the team making outstanding contributions in our Query project, and Tristan Tarrant has joined the team full time taking on Infinispan's Security layer. Thanks to both!!

Cheers,
Galder



Friday, 31 October 2014

Soft-Index File Store

Recently, Infinispan got a new local file-based cache store, called Soft-Index File Store. Why have we created just another cache store, what problems is it solving, what are its limitations and how is it designed?

Single File Store is a well performing cache store, but it stores all keys in-memory; that limits the number of keys you can store. File fragmentation could be even more of an issue: if you store larger and larger values (that happens quite a lot, as users e.g. add stuff into their shopping carts), the space is not reused and instead the entry is appended at the end of the file. The space (now empty) is reused only if you write entry that can fit there. Also, even if you remove all entries from the cache, the file won't shrink, and neither won't be de-fragmented.

LevelDB uses quite well performing Google's library written in native code. The major drawback is the native code - if LevelDB has a bug that ends in segfault, whole JVM crashes, bringing you application server down.

Our new Soft Index File Store is pure Java implementation that tries to get around Single File Store's drawbacks by implementing a variant of B+ tree that is cached in-memory using Java's soft references - here's where the name Soft Index File Store comes from. This B+ tree (called Index) is offloaded on filesystem to single file: in fact, this has theoretically similar problems with fragmentation as Single File Store - but in practice it shouldn't cause such problems. This index file does not need to be persisted - it is purged and rebuilt when the cache store restarts, its purpose is only offloading.

The data that should be persisted are stored in a set of files that are written in append-only way - that means that if you store this on conventional magnetic disk, it does not have to seek when writing a burst of entries. It is not stored in a single file but in a set of files. When any of these files drops below 50% of usage (the entries are marked as removed or overwritten), the file starts being collected, moving live entries into another file and in the end removing the old file from disk.

Most of the in-memory structures in Soft Index File Store are bounded, therefore you don't have to be afraid of OOMEs. You can also configure the limits for concurrently open files as well (so that you don't run out of file descriptors).

How to configure SIFS

The configuration is no different from regular cache store:

Implementation details

The Index does not use single file, in fact it can be split into multiple segments. That's because the algorithm updating this B+ tree is designed as single writer - multiple readers, but that could make the writer thread (called 'Index Updater') the bottleneck. Therefore, you can set how many segments should the Index be split into (according to keys' hashCode()).

Each node in the Index stores 'prefix' of all keys (or rather the serialized forms) used in the node in order to reduce the space required for the node. This comes with the assumption that the prefixes are often similar (e.g. when you use key "user000001" and "user000002"). If you can change how the keys are serialized, it is encouraged to move the changing part of the key to the end of the serialized data.

The data are written by single thread as well, the 'Log Appender'. There's no reason to let threads that access the cache store compete over file-system - Log Appender queues the write results, writes them into the file and wakes up the waiting thread. There are 2 possibly unnecessary context-switches, but in the original design we wanted to allow the write request to return only after the data have been fsynced. By batching the writes, Log Appender allows this as a configuration option - then you can be sure that the data are already on disk when the call returns.

When the entry is modified, the Index needs to be updated. The request is sent to Index Updater via bounded queue and the newest entry location is stored in Temporary Table until this is stored in the Index. The updated nodes are eventually offloaded onto disk in this way.

Known limitations

Size of a node in the Index is limited, by default it is 4096 bytes, though it can be configured. This size also limits the key length (or rather the length of the serialized form): you can't use keys longer than size of the node - 15 bytes. Moreover, the key length is stored as 'short', limiting it to 32767 bytes. There's no way how you can use longer keys - SIFS throws an exception when the key is longer after serialization.

When entries are stored with expiration, SIFS does not discover the a file is full of expired entries and the compaction of old data files may not be started ever (method AdvancedStore.purgeExpired() is not implemented). This can lead to excessive file-system space usage.

Future work

What we need to do know is to benchmark SIFS in many configurations and set the optimal values as defaults. However, we run mostly synthetic benchmarks - and that's where you can help. Let's play with Soft Index File Store a bit and tell us what configuration works best for you!

For storing large keys, building the B+ tree of hashCodes could perform better that storing the whole keys, though it would need additional handling for collisions. Tell us what keys do you use, please!

Currently, each index update needs to be eventually stored, and that means one or more writes into the file-system even when this is not necessary. In the future, we might try to use phantom references instead of soft references to write the Index only when it needs to release some memory. However, this requires a lot of further work, so test SIFS today and let us now how do you like it!

Tuesday, 28 October 2014

Infinispan HotRod .NET Client 7.0.0.CR2

Dear community,

Infinispan HotRod .NET Client 7.0.0.CR2 is now available.

This is mostly a bug-fix release.For the complete list of changes please consult the release notes (includes also the changes from the corresponding version of the C++ Client).
 
Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Thanks to everyone involved for the changes and bug reports contributed!

Monday, 27 October 2014

Infinispan HotRod C++ Client 7.0.0.CR2

Dear community,

Infinispan HotRod C++ Client 7.0.0.CR2 is now available.

This is mostly a bug-fix release. I would like bring to your attention the following changes:
For the complete list of changes please consult the release notes.
Visit our downloads section to find the latest release.
If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Thanks to everyone involved for the changes and bug reports contributed!