Thursday, 24 July 2014

Infinispan Security #2: Authorization Reloaded and Auditing

Since the last post on Infinispan Security, there have been a few changes related to how we handle authorization in Infinispan. All of these are available in the recently released 7.0.0.Alpha5.

Authorization Performance

Previously we mentioned the need to use a SecurityManager and javax.security.auth.Subject.doAs() in order to achieve authorization in embedded mode. Unfortunately those components have a severe performance impact. In particular, retrieving the current Subject from the AccessControlContext adds, on my test environment, 3.5µs per call. Since the execution of a Cache.put takes about 0.5µs when authorization is not in use, this means that every invocation is 8x slower.

For this reason we have introduced an alternative for those of you (hopefully the majority), that do not need to use a SecurityManager and can avoid the AccessControlContext: invoke your PrivilegedActions using the new org.infinispan.security.Security.doAs() method, which uses a ThreadLocal to store the current Subject. Using this new approach, the overhead per invocation falls to 0.5µs: a 7x improvement !
Security.doAs() is actually smart: if it detects that a SecurityManager is in use, it will fallback to retrieving the Subject via the AccessControlContext, so you don't need to decide what approach you will be using in production. The following chart shows the impact of using each approach when compared to running without authorization:


Auditing


An essential part of an authorization framework, is the ability to track WHO is doing WHAT (or is being prevented from doing it !). Infinispan now offers a pluggable audit framework which you can use to track the execution of cache/container operations. The default audit logger sends audit messages to your logging framework of choice (e.g. Log4J), but you can write your own to take any appropriate actions you deem appropriate for your security policies.

As usual, for more details, check out the Security chapter in the Infinispan documentation and the org.infinispan.security JavaDocs.

Thursday, 17 July 2014

Infinispan 7.0.0.Alpha5 relased!

Dear Community,

It is our pleasure to announce the Alpha5 release of Infinispan 7.0.0. This release includes numerous fixes and enhancements in every area of Infinispan. This will be our last alpha release as we move into beta stage and prepare for the final Infinispan 7.0 release.

Major theme of this release is Hot Rod eventing. Java Hot Rod clients can now subscribe to cache entry created, modified and removed events. For more details see  documentation. Infinispan Query is now based on Hibernate Search 5; we also use protoparser lib instead of relying on binary descriptors generated by protoc. Our JPA cache store works in Karaf; we have also made further Map/Reduce improvements and bug fixes. Starting with this release Infinispan server is based on WildFly 8.1 while Hot Rod SASL implementation supports QOP for encryption.

For a complete list of features and bug fixes included in this release please refer to the release notes. Visit our downloads section to find the latest release.

If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

Cheers,
The Infinispan team.

Thursday, 5 June 2014

Map/Reduce Performance improvements between Infinispan 6 and 7


Introduction


There have been a number of recent Infinispan 7.0 Map/Reduce performance related improvements that we were eager to test in our performance lab and subsequently share with you. The results are more than promising. In the word count use case, Map/Reduce task execution speed and throughput improvement is between fourfold and sixfold in certain situations that were tested.

We have achieved these improvements by focusing on:
  • Optimized mapper/reducer parallel execution on all nodes
  • Improving the handling and processing of larger data sets
  • Reducing the amount of memory needed for execution of MapReduceTask

Performance Test Results


The performance tests were run using the following parameters:
  • An Infinispan 7.0.0-SNAPSHOT build created after the last commits from the list were committed to the Infinispan GIT repo on May 9th vs Infinispan 6.0.1.Final 
  • OpenJDK version 1.7.0_55 with 4GB of heap and the following JVM options:
    -Xmx4096M -Xms4096M -XX:+UseLargePages -XX:MaxPermSize=512m -XX:NewRatio=3 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
  • Random data filled 30% of the Java heap, and 100 random words were used to create the 8 kilobyte cache values. The cache keys were generated using key affinity, so that the generated data would be distributed evenly in the cache. These values were chosen, so that a comparison to Infinispan 6 could be made. Infinispan 7 can handle a final result map with a much larger set of keys than is possible in Infinispan 6. The actual amount of heap size that is used for data will be larger due to backup copies, since the cluster is running in distributed mode.
  • The MapReduceTask executes a word count against the cache values using mapper, reducer, combiner, and collator implementations. The collator returns the 10 most frequently occurring words in the cache data. The task used a distributed reduce phase and a shared intermediate cache. The MapReduceTask is executed 10 times against the data in the cache and the values are reported as an average of these durations.

    From 1 to 8 nodes using a fixed amount of data and 30% of the heap


    This test executes two word count executions on each cluster with an increasing number of nodes. The first execution uses an increasing amount of data equal to 30% of the total Java heap across the cluster (i.e. With one node, the data consumes 30% of 4 GB. With two nodes, the data consumes 30% of 8 GB, etc.), and the second execution uses a fixed amount of data, (1352 MB which is approximately 30% of 4 GB). Throughput is calculated by dividing the total amount of data processed by the Map/Reduce task by the duration. The following charts show the throughput as nodes are added to the cluster for these two scenarios:

    These charts clearly show the increase in throughput that were made in Infinispan 7. The throughput also seems to scale in an almost linear fashion for this word count scenario. With one node, Infinispan 7 processes the 30% of heap data in about 100 MB/sec, two nodes process almost 200 MB/sec, and 8 nodes process over 700 MB/sec.

    From 1 to 8 nodes using different heap size percentages


    This test executes the word count task using different percentages of heap size as nodes are added to the cluster. (5%, 10%, 15%, 20%, 25%, and 30%) Here are the throughput results for this test:

    Once again, these charts show an increase in throughput when performing the same word count task using Infinispan 7. The chart for Infinispan 7 shows more fluctuation in the throughput across the different percentages of heap size. The throughput plotted in the Infinispan 6 chart is more consistent.

    From 1 to 8 nodes using different value sizes


    This test executes the word count task using 30% of the heap size and different cache value sizes as nodes are added to the cluster. (1KB, 2KB, 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, and 2MB) Here are the throughput results for this test:

    These results are more interesting. The throughput in Infinispan 7 is higher for certain cache size values, but closer to Infinispan 6 or even slower for other cache size values. The throughput peaks for 32KB cache values, but can be much lower for larger and smaller values. Smaller values require more overhead, but for larger values this behavior is not expected. This result needs to be investigated more closely.

    Conclusion


    The performance tests show that Infinispan 7 Map/Reduce improvements have increased the throughput and execution speed four to sixfold in some use cases. The changes have also allowed Infinispan 7 to process data sets that include larger intermediate results and produce larger final result maps. There are still areas of the Map/Reduce algorithm that need to be improved:
    • The Map/Reduce algorithm should be self-tuning. The maxCollectorSize parameter controls the number of values that the collector holds in memory, and it is not trivial to determine the optimal value for a given scenario. The value is based on the size of the values in the cache and the size of the intermediate results. A user is likely to know the size of the cache values, but currently Infinispan does not report statistics about the intermediate results to the user. The Map/Reduce algorithm should analyze the environment at runtime and adjust the size of the collector dynamically.
    • The fact that the throughput results vary with different value sizes needs to be investigated more closely. This could be due to the fact that the maxCollectorSize value used for these tests is not ideal for all value sizes, but there might be other causes for this behaviour.

    Tuesday, 27 May 2014

    Iterate all the entries in the cache

    Dear all, with the release of 7.0.0.Alpha4 it was mentioned that we now support Distributed Entry Iterator which allows for iteration over all entries in the cache.  Iterating over all the entries in the cache has always been an highly demanded community feature. Existing methods (entrySet, keySet, size) were not a good fit because of potential OOM and were causing a lot of user annoyance. Voila a nice distributed solution :-)  ISPN-4222

    Public Interface Additions


    The added public API changes are as follows:


    AdvancedCache


    This returns an EntryIterable that can be used directly as an Iterable over the contents or also to pass a converter to convert the resulting value that is returned to another value or even type itself.


    EntryIterable



    EntryIterable also implements AutoCloseable and as such should be closed after iteration or if an exception case occurs.  Thus the Java 7 try with resources syntax should be used.

    Note that EntryIterable has a method that allows you to also provide an optional Converter to change the values to another type if desired. This conversion is done on the remote nodes and is preferable to be used when the values can be reduced in size to reduce overall payload size.

    An example of how to perform the iteration with any cache type.

    General Algorithm


    Essentially when the iterator is generated it will start an iteration process on the local node to retrieve all values local to that node (including from loader) and also a remote thread that will do the same thing on nodes one at at time. As values are retrieved they are made available to the iterator for processing. The chunkSize configuration for the State Transfer configuration will limit how many values are available to be waiting to be iterated on at a time (loader, local and remotely retrieved values count towards this). This is important to limit how many values are stored in memory when both using a loader and in distributed caches to help prevent an OOM condition from occurring.

    The provided KeyValueFilter is used on the various nodes to limit what entries are returned to the iterator and are sent to the remote node(s) when using a Distributed cache to limit how many results are returned. A converter is similar to the KeyValueFilter but it is ran on any entry that passes the filter to possibly converter the value to another such as a projection view if desired. Both the KeyValueFilter and Converter must be serializable for proper operation!

    The operation is also aware of rehash events occurring, since this could alter which node owns what entry. This is handled automatically by the iterator by tracking what segments have moved and requesting them from the other node if needed.

    Local, Replicated and Invalidation Cache Optimizations


    These caches have some additional optimizations from above in the following
    1. The KeyValueFilter and Converter do not need to be Serializable
    2. KeyValueFilter optimization is only relevant when using a loader
    3. Converter optimization is minimial, the main benefit being it allows code to be the same between cache types

    Gotchas


    This is just to talk about some various cases that users should be aware of.

    Transactional Behaviour

    When using the entry iterator in a transactional context, all of the values are retrieved outside of the current transaction if there is one, and no transaction is started if there isn't one.

    This is done due to the behaviour of Repeatable Read isolation level.  If not then then all of the retrieved values would have to be stored locally in the current context for that transaction, which would most likely cause an OOM condition in many cases.

    Removal using Iterator

    Since the iteration process does not take part of transactions, the remove operation of the iterator is not supported as well.  If desired the user should just invoke the remove method from the Cache itself to do this.

    Consistency Guarantees

    This iterator only guarantees consistency in regards to each value independently. That is it will show a view of each value that existed during the period of when the iteration began and when it completed. Thus it is entirely possible to see a subset of values if say a transaction was committed at the same time as iteration. This would require additional isolation level changes outside of the scope of the iterator to implement this, such as adding Serializable isolation level.

    Return type change

    Before ISPN 7 is released, it is still needed to change the return type from Map.Entry to instead be CacheEntry as users may need the Metadata stored with the entry as well. This will come in ISPN-4326

    Try it out


    Let us know if and how you guys plan on using this and any feedback would be appreciated!

    Wednesday, 14 May 2014

    Infinispan 7.0.0.Alpha4 is out!

    Dear Community,

    It is our pleasure to announce the Alpha4 release of Infinispan 7.0.0.

    The release highlights are:

    * HotRod protocol now supports authorization and the SKIP_CACHE_LOAD flag;
    * Distributed entry iterator which allows iterate over all entries in the cluster;
    * Object filtering and preview using query DSL;
    * Apache Lucene 4.8.0 is now supported and JGroups was upgraded to 3.5.0.Beta5;
    * Multiple improvements and bug fixes! 

    For a complete list of features and bug fixes included in this release please refer to the release notes. Visit our downloads section to find the latest release.

    If you have any questions please check our forums, our mailing lists or ping us directly on IRC.

    Cheers,
    The Infinispan team.

    Friday, 11 April 2014

    Clustered cache quickstart updates

    As developers, it's always easy for us to "forget" about documentation and tutorials, and let them get out of date. And this is exactly what happened with our clustered cache tutorial.

    Even though we kept updating the tutorial to use the latest configuration style, the core of the tutorial was still assuming that state transfer was disabled by default - something that we changed back in Infinispan 5.0.0.Final.

    This was causing a bit of confusion, so I'm happy to report that I've updated the tutorial and I've removed all traces of the ClusterValidation class. Now the tutorial allows you to start as many nodes as you want, and it also shows how a joining node receives data from the existing members during startup.

    Infinispan Security #1: Authorization

    Dear all, with the release of 7.0.0.Alpha3, Infinispan has finally gained the ability to perform Access Control (aka Authorization) on CacheManagers and Caches. This is the first stepping-stone towards the full-fledged security work that will be completed during the 7.0 cycle.

    Infinispan authorization is built around the standard security features available in a JDK near you, such as JAAS and the SecurityManager. Here's a worked example.

    Running within a SecurityManager

    In order for Infinispan to be able to enforce access restrictions, you should enable the SecurityManager in your JVM. This can be done from the command-line:

    java -Djava.security.manager ...

    or programmatically:

    System.setSecurityManager(new SecurityManager());

    You don't have to use the default implementation that comes with the JDK, but if you do you need to supply an appropriate policy file. The Infinispan distribution comes with an example policy file which illustrates the permissions required by some of Infinispan's JAR files. Integrate these permissions with the ones required by your application.

    While Infinispan's authorization can work without a SecurityManager for the basic cache operations (put, get, etc), some more complex tasks (distexec, map/reduce, query) will fail without one.

    Configuring Infinispan for authorization

    Authorization in Infinispan is configured at two levels: at the cache container and at the single cache.
    Let's look at cache containers (aka CacheManagers) first:
    Each cache container determines the following:
    • whether to use authorization, via the enabled attribute. 
    • a class which will map the user's principals to a set of roles
    • a set of named roles and the permissions they represent
    We then need to define the specific roles for each cache:

    As you can see you can choose to use only a subset of the roles defined at the container level.

    Before you can start using a secured cache, you need to get yourself a javax.security.auth.Subject.

    Obtaining a Subject

    Infinispan is not fussy about how you obtain a JAAS Subject: you may use your container's features, or a third-party library (such as JBoss PicketBox or Apache Shiro). The important thing is that your Subject should be populated with a set of Principals which represent the user and the groups it belongs to in your security domain (e.g. LDAP, Active Directory, etc).
    It is then the duty of the mapper to look through the principals associated with the Subject and convert them into roles suitable for matching those you have defined at the container level.
    Once you have a Subject, you interact with the Cache within the context of a PrivilegedAction as follows:

    Obviously if you're lucky enough to use Java 8, you can use the following, more concise, lambda-enabled code:


    For more details consult the Security chapter in the Infinispan documentation and the org.infinispan.security JavaDocs.

    Stay tuned for the next parts in the Infinispan security saga !