AADDA 2018 Selected Papers

Four papers out of Tens papers were selected selected as work-in-progress

Title: Distributed Synthetic Minority Oversampling Technique

Authors: Avnish Kumar Rastogi, Nitin Narang, Mohammad Ajmal

Abstract:

Most real world prediction problems have imbalanced data. Here an imbalance in dataset is represented by a mismatch in class representation. An application of classification algorithms on imbalanced data is biased in favor of the majority class and gets further biased in case of high-dimensional data. This class-imbalance problem can be reduced by under-sampling of majority data or by oversampling of minority data. Synthetic Minority Oversampling Technique (SMOTE) is one such popular technique proposed by Nitesh Chawala that vastly improves random oversampling. Unfortunately, existing SMOTE implementations are inept to handle big data, demanding need for distributed computing solution. In this paper we investigate challenges for implementing SMOTE in distributed environment and present algorithm and results of our distributed SMOTE implementation. Our implementation is done in Apache Spark using variations of high and low dimensional data as well as large and small datasets. The results are compared to existing monolithic implementation in python SMOTE. We were able to successfully demonstrate a distributed version of Spark SMOTE which generated higher or equal quality artificial samples in comparison to python implementation while preserving spatial distribution.

Paper Download

------------------------------------------------------------------------------------------------------------------------

Title: An Innovative Approach for Achieving Composability in Concurrent Systems using Multi-Version Object Based STMs

Speaker: Sandeep Kulkarni, Sweta Kumari, Sathya Peri, Archit Somani

Abstract:

In the modern era of multicore processors, utilizing multiple cores properly is a tedious job. Synchronization and communication among processors involve high cost. Software transaction memory systems (STMs) addresses this issues and provide better concurrency in which programmer need not have to worry about consistency issues. Several big-data applications which deal large amounts of data can benefit from Transactional Memory Systems.

In this paper, we introduce a new STM system as multi-version object based STM (MV-OSTM) which is the fusion of object based STM with multiple versions. As the name suggests MV-OSTM, works on a higher level and keeping the multiple versions corresponding to each key. Presently, we have developed MV-OSTM with the unlimited number of versions corresponding to each key. To overcome traversal overhead, it performs the garbage collection method to delete the unwanted versions corresponding to the key. It provides greater concurrency while reducing the number of aborts. It ensures composability by making the transaction as atomic. In the proposed algorithm, k is the input parameter and the value of it will be decided by the programmer and depends on the application. Programmer can tune the value of k from 1 to ∞. If k equal to 1 then it will boil down to single version object based STM (OSTM) and if k equal to ∞ then it will be equivalent to multi-version OSTM with ∞ versions.

MV-OSTM satisfies correctness-criteria as opacity. For a given version order of keys, if any history H generated by MV-OSTM produces acyclic graph then H is opaque. The progress condition of the proposed MV-OSTM is multi-version permissiveness or mv-permissiveness which never aborts a transaction which is having return-value method only. To the best of our knowledge, this is the first work to explore the idea of using multiple versions in OSTMs to achieve greater concurrency.

Paper Download

------------------------------------------------------------------------------------------------------------------------

Title: Building Efficient Concurrent Graph Object through Composition of List-based Set

Authors: Sathya Peri, Muktikanta Sa, Nandini Singhal

Abstract:

In this paper, we propose a generic concurrent directed graph (for shared memory architecture) that is concurrently being updated by threads adding/deleting vertices and edges. The graph is constructed by the composition of the well known concurrent list-based set data-structure from the literature. Our construction is generic, in the sense that it can be used to obtain various progress guarantees, depending on the granularity of the underlying concurrent set implementation - either blocking or non-blocking. We prove that the proposed construction is linearizable by identifying its linearization points. Finally, we compare the performance of all the variants of the concurrent graph data-structure along with its sequential implementation. We observe that our concurrent graph data-structure mimics the performance of the concurrent list based set

Paper Download

------------------------------------------------------------------------------------------------------------------------

Title: Proving Correctness of Concurrent Objects by Validating Linearization Points

Authors: Nandini Singhal, Muktikanta Sa, Ajay Singh, Archit Somani, Sathya Peri

Abstract:

Concurrent data structures (CDS) and algorithms can accelerate the big data applications. But, developing a CDS and verifying its correctness is an uphill task. Thus, in this paper we address the most basic problem of this domain: given the set of LPs of a CDS, how to show its correctness? We assume that we are given a CDS and its LPs(linearisation points). We have developed a hand-crafted technique of proving correctness of the CDSs by validating its LPs. We also believe that this technique might also offer the programmer some insight to develop more efficient variants of the CDS, by iteratively applying the steps of the proposed technique uncovering hidden parallelism in each iteration or errors in CDS design, if any. The proof technique can be applied to prove the correctness of several commonly used CDSs developed in literature such as Lock-free list based Sets, Skiplists etc. To show the efficacy of this technique, we show the correctness of lazy-list and hoh-locking-list based set.

Paper Download