Facebook has released it's new open sourced compression algorithm which is quite efficient and fast at the same time. To read more about it please look here or here.
It's actually so efficient that it seems to be a great candidate for Cassandra table compression.
So to save you all the required work, here I've implemented an Cassandra Zstandard compressor that is ready to use: https://github.com/MatejTymes/cassandra-zstd
Friday, November 11, 2016
Wednesday, June 15, 2016
Missing matched documents on searches and updates reproduction
This blog recently exposed an interesting concurrency caveat related to MongoDB where matching documents won't be found (or updated) if they are being reindexes.
The only part in the entry I was missing is a way how to reproduce this issue. So I decided to create a test which you can test against your version of MongoDB to check if it is still a problem.
Here it is:
The only part in the entry I was missing is a way how to reproduce this issue. So I decided to create a test which you can test against your version of MongoDB to check if it is still a problem.
Here it is:
package co.uk.matejtymes.mongodb; import com.mongodb.*; import org.junit.After; import org.junit.Before; import org.junit.Test; import java.util.*; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutorService; import static com.mongodb.BasicDBObjectBuilder.start; import static java.util.Arrays.asList; import static java.util.UUID.randomUUID; import static java.util.concurrent.Executors.newFixedThreadPool; import static java.util.concurrent.TimeUnit.SECONDS; import static org.hamcrest.CoreMatchers.equalTo; import static org.hamcrest.Matchers.hasSize; import static org.junit.Assert.assertThat; public class ReindexFailureTest { private static final String STATE_FIELD = "state"; private static final Random RANDOM = new Random(); private DBCollection coll; @Before public void setUp() throws Exception { // todo: provide connection details for your mongoDB instance MongoClient mongo = new MongoClient("localhost", 27017); DB db = mongo.getDB("testDb"); coll = db.getCollection("indexTest"); } @After public void tearDown() throws Exception { coll.drop(); } @Test public void shouldFindAllMatchingItemsEvenWhenRecalculatingIndex()throws Exception { int docCount = 250; int concurrentUpdates = 40; int attemptsCount = 1_000; List<String> stateValues = asList("Active", "Inactive"); coll.createIndex(new BasicDBObject(STATE_FIELD, 1)); List<String> allIds = createNDocumentsWithState(docCount, stateValues); ExecutorService executor = newFixedThreadPool(concurrentUpdates); for (int attempt = 1; attempt <= attemptsCount; attempt++) { System.out.println(attempt + ". attempt"); CountDownLatch beginLatch = new CountDownLatch(concurrentUpdates + 1); CountDownLatch endLatch = new CountDownLatch(concurrentUpdates + 1); for (int update = 0; update < concurrentUpdates; update++) { executor.submit(() -> updateState(pickRandomItem(allIds), stateValues, beginLatch, endLatch)); } List<String> foundIds = findDocumentsInState(stateValues, beginLatch, endLatch); Set<String> uniqueIds = new HashSet<>(); Set<String> duplicateIds = new HashSet<>(); Set<String> missingIds = new HashSet<>(allIds); for (String foundId : foundIds) { if (uniqueIds.contains(foundId)) { duplicateIds.add(foundId); } uniqueIds.add(foundId); missingIds.remove(foundId); } if (!missingIds.isEmpty()) { System.err.println(missingIds.size() + ". missingIds: " + missingIds); } if (!duplicateIds.isEmpty()) { System.err.println(duplicateIds.size() + ". duplicateIds: " + duplicateIds); } assertThat(foundIds, hasSize(allIds.size())); assertThat(missingIds, hasSize(0)); assertThat(duplicateIds, hasSize(0)); } executor.shutdown(); executor.awaitTermination(3, SECONDS); } @Test public void shouldUpdateAllMatchingItemsEvenWhenRecalculatingIndex()throws Exception { int docCount = 250; int concurrentUpdates = 40; int attemptsCount = 1_000; List<String> stateValues = asList("Active", "Inactive"); coll.createIndex(new BasicDBObject(STATE_FIELD, 1)); List<String> allIds = createNDocumentsWithState(docCount, stateValues); ExecutorService executor = newFixedThreadPool(concurrentUpdates); for (int attempt = 1; attempt <= attemptsCount; attempt++) { System.out.println(attempt + ". attempt"); String fieldToUpdate = "field" + attempt; Object valueToSet = true; CountDownLatch beginLatch = new CountDownLatch(concurrentUpdates + 1); CountDownLatch endLatch = new CountDownLatch(concurrentUpdates + 1); for (int update = 0; update < concurrentUpdates; update++) { executor.submit(() -> updateState(pickRandomItem(allIds), stateValues, beginLatch, endLatch)); } BasicDBObject query = new BasicDBObject(STATE_FIELD, new BasicDBObject("$in", stateValues)); BasicDBObject update = new BasicDBObject("$set", new BasicDBObject(fieldToUpdate, valueToSet)); beginLatch.countDown(); int n = coll.updateMulti(query, update).getN(); endLatch.countDown(); List<String> updatedIds = new ArrayList<>(); coll.find(new BasicDBObject(fieldToUpdate, valueToSet)).forEach( dbObject -> updatedIds.add((String) dbObject.get("_id")) ); Set<String> missingIds = new HashSet<>(allIds); missingIds.removeAll(updatedIds); if (!missingIds.isEmpty()) { System.err.println(missingIds.size() + ". missingIds: " + missingIds); } if (n != allIds.size()) { System.err.println("n = " + n); } if (updatedIds.size() != allIds.size()) { System.err.println("updateIds = " + updatedIds.size()); } assertThat(n, equalTo(allIds.size())); assertThat(updatedIds, hasSize(allIds.size())); assertThat(missingIds, hasSize(0)); } executor.shutdown(); executor.awaitTermination(3, SECONDS); } /* ====================== */ /* --- helper methods --- */ /* ====================== */ private List<String> createNDocumentsWithState(int docCount, List<String> stateValues) { List<String> ids = new ArrayList<>(); for (int i = 0; i < docCount; i++) { String id = randomUUID().toString(); String state = stateValues.get(i % stateValues.size()); DBObject dbObject = start() .add("_id", id) .add(STATE_FIELD, state) .get(); coll.insert(dbObject); ids.add(id); } return ids; } private void updateState(String id, List<String> stateValues, CountDownLatch beginLatch, CountDownLatch endLatch) { BasicDBObject query = new BasicDBObject("_id", id); String oldStateValue = (String) coll.find(query).next().get(STATE_FIELD); String newStateValue = stateValues.stream().filter(state -> !state.equals(oldStateValue)).findFirst().get(); BasicDBObject update = new BasicDBObject("$set", new BasicDBObject(STATE_FIELD, newStateValue)); beginLatch.countDown(); coll.update(query, update); endLatch.countDown(); } private List<String> findDocumentsInState(List<String> stateValues, CountDownLatch beginLatch, CountDownLatch endLatch) { BasicDBObject query = new BasicDBObject(STATE_FIELD, new BasicDBObject("$in", stateValues)); Iterator<DBObject> dbObjects = coll.find(query).iterator(); List<String> foundIds = new ArrayList<>(); beginLatch.countDown();; while (dbObjects.hasNext()) { foundIds.add((String) dbObjects.next().get("_id")); } endLatch.countDown(); return foundIds; } private static <T> T pickRandomItem(List<T> values) { return values.get(RANDOM.nextInt(values.size())); } }
Friday, April 29, 2016
Executor that notifies you when task finish
Java Executors don't let you know when all tasks are finished or to be more precise, don't block you until the tasks are finished. You could call shutdown() on them and then awaitTermination(), but this way you can't reuse the executor anymore, which is not great. This is why I create a class Runner that can accomplish this. It's used like this:
The code for it can be found here:
https://github.com/MatejTymes/JavaFixes
Hope this will help
Runner runner = Runner.runner(10); runner.runIn(2, SECONDS, runnable); runner.run(runnable); runner.waitTillDone(); // blocks until all tasks are finished (or failed) // and reuse it runner.runIn(500, MILLISECONDS, callable); runner.waitTillDone(); runner.shutdownAndAwaitTermination();
The code for it can be found here:
https://github.com/MatejTymes/JavaFixes
Hope this will help
Tuesday, April 19, 2016
End of BigDecimal BSting
BigDecimal is so close to being great until you face few things which make you just scream.
maybe this seems harmless but once your test will start to fail because the actual and expected domain object (using BigDecimal) are not equal although they are, you will just ask your self: why do we have to go trough this?
Also have you ever been paring with somebody on an interview, where the candidate told you that we should use BigDecimal for this interest rate calculation, but in the end you both decided not to do it - as the interview is not long enough - really BigDecimal adds aditional 10 to 30 minutes to inteview excercise as you have to deal with the equals method - and THIS is the STANDARD!!!
but why would that even matter? Well, if you ever decide to make BigDecimal key for a HashMap, than a situation might occur where you won't find any value for your number, as their hashCode won't match.
I assumed that this is going to be fixed as this problems were present for many years, but it seems this is the design we'll have to live with.
This is why I decided to create a rewrite of BigDecimal called Decimal which you can find on this page (currently I'm finalizing the implementation):
https://github.com/MatejTymes/JavaFixes
It provides few advantages over BigDecimal
The Decimal is an abstract class currently extended by two implementations: LongDecimal - if value can be backed by long and HugeDecimal - backed by BigInteger (for all other numbers). You can't address them directly, but the library handles the transition between these types seamlessly while you're calling arithmetic operations.
And that's it. Please let me know if you can think of any other improvements, or just what you feel about this. I would be happy to hear your thoughts.
- values are sometimes not equal (although you would like them to be):
// yes: -1.2 is not equal to -1.20 assertThat(new BigDecimal("-1.2").equals(new BigDecimal("-1.20")), is(false));
maybe this seems harmless but once your test will start to fail because the actual and expected domain object (using BigDecimal) are not equal although they are, you will just ask your self: why do we have to go trough this?
Also have you ever been paring with somebody on an interview, where the candidate told you that we should use BigDecimal for this interest rate calculation, but in the end you both decided not to do it - as the interview is not long enough - really BigDecimal adds aditional 10 to 30 minutes to inteview excercise as you have to deal with the equals method - and THIS is the STANDARD!!!
- equals is bad, but hashCode is even worse
// yes: hashCodes for -1.2 and -1.20 are not the same as well assertThat(new BigDecimal("-1.2").hashCode(), is(not(new BigDecimal("-1.20").hashCode())));
but why would that even matter? Well, if you ever decide to make BigDecimal key for a HashMap, than a situation might occur where you won't find any value for your number, as their hashCode won't match.
- the ways how to create BigDecimal are not unified at all (depending on your originating value there are few different ways how you create it)
new BigDecimal("-1.20"); // from string BigDecimal.valueOf(-120L, 2); // from long new BigDecimal(BigInteger.valueOf(-120L), 2); // from BigInteger // you should not create BigDecimal from float or double as you might get really weird value (because of transition from binary to decimal form)
I assumed that this is going to be fixed as this problems were present for many years, but it seems this is the design we'll have to live with.
This is why I decided to create a rewrite of BigDecimal called Decimal which you can find on this page (currently I'm finalizing the implementation):
https://github.com/MatejTymes/JavaFixes
It provides few advantages over BigDecimal
- unified creation using two possible factory methods (one more readable decimal(...), one shorter d(...))
- fixed equals
- fixed hashCode:
// equals now works assertThat(decimal("-1.2").equals(decimal("-1.200")), is(true)); assertThat(d("-1.2").equals(d("-1.200")), is(true)); // and surprisingly hashCode as well assertThat(d("-1.2").hashCode(), is(d("-1.20").hashCode()));
- also the creation approach is always the same
decimal("-1.20"); // from string decimal(-120L, 2); // from long decimal(BigInteger.valueOf(-120L), 2); // from BigInteger // you are not able to create Decimal from float or double but have to transform them into string first - otherwise you might get surprising values
- you can use underscores in the numbers to make them more readable
Decimal value = d("-125_550_00.00"); // using underscores as you can use in java numbers
The Decimal is an abstract class currently extended by two implementations: LongDecimal - if value can be backed by long and HugeDecimal - backed by BigInteger (for all other numbers). You can't address them directly, but the library handles the transition between these types seamlessly while you're calling arithmetic operations.
And that's it. Please let me know if you can think of any other improvements, or just what you feel about this. I would be happy to hear your thoughts.
Subscribe to:
Posts (Atom)