The past week we started at Rangespan a new pairing model based on the experience at Silver Platter described in the research paper ‘Promiscuous Pairing and Beginner’s Mind‘. Let me first introduce the traditional flow model and compare it to promiscuous pairing before I give some feedback from my experience.
The paper provides compelling insight in an alternative mode of pairing to optimize learning and spread of knowledge through fast switching. This is an alternative to the traditional idea of flow where the intent is to achieve full immersion in a task. In programming this commonly includes an internalised understanding of the goal, dependencies to the task, and how to achieve it. This is even harder to achieve in a paired situation. The highly productive flow, however, requires a deep knowledge of the current task and its context, which is time consuming and necessitates a person or pair to focus on a task for a long period of time. As a result persons specialise and that reduces the spread of knowledge. Consequently only a small amount of talent and experiences get exposed to the task, which may lead to a suboptimal solution.
The proposed alternative is promiscuous paring, which stipulates that partners and tasks change frequently throughout a day. This prevents a state of flow and (deliberately) forces the new pair members in a ‘Beginner’s Mind‘. A beginner to a new task has no preconception of the supposedly optimal solution and thus is more likely to approach a problem creatively. The beginner experiments because she has no idea of the ideal or traditional solution to a problem. Hopefully, she will do this with eagerness and curiosity as well as the preparedness to fail and learn. This comes easier to someone new to a task than to someone who faced it before. This leads to innovative solutions, motivated team members and a rapid spread of knowledge, patterns and best practices. Something that usually is almost mutually exclusive. You either foster best practices or innovation. Furthermore, the different talents in a team are more likely to be exposed to problems and no one pair which possibly lacks a talent to solve a (sub-)task gets bogged down with a problem.
We started this Monday by pairing for 90 minutes, which according to the paper is the ideal time before the beginner’s mind settles and gains taper off. Our pairs consist of a master who worked on a ticket in the iteration before and a (new) student. They work together on the task/ticket until the 90 minutes are over. The master then moves on to be a student in another pair while the student becomes the master for the current task/ticket and is joined by a new student. This way a hard problem or long task is not stuck with one pair and everyone gets exposed to it.
After one week of promiscuous pairing I consider it very useful for new team members like myself. It exposed me to many parts of the product, which I otherwise would not have seen. It gives the opportunity to test drive others’ tools and setups which helps tremendously when you are new to a language and framework. Moreover, it allows you to be productive since you can take part in solving the underlying problem at hand decoupled from the language and framework that would normally stop you. This combines nicely with subsequent implementation of the solution together, learning useful patterns and idiosyncrasies by example.
There are some side effects to this pairing mode that need consideration. Creative work is harder since you have little time left to go off on a tangent, explore long-term oriented ideas or do some blue-sky digging. This may need some dedicated time. While the pairing does not take the whole workday meetings and auxiliary tasks often enough eat up the remaining time. Another point of reflection at the moment is the length of the iterations. For some tasks a longer pairing may improve productivity. We may try variations the coming week. I am wondering if this is a similar case as with Test Driven Development where a short term gain comes at a long term expense or if some specialisation is just in the nature of growing software engineering teams and should be account for in promiscuous pairing by allowing some iterations to be longer. On the other hand, a specialist/talented individual does spend three hours over two iterations (first student then master) on a task.
Comparing Big Data
At Mendeley, we work with an ever increasing document collection currently of the magnitude of 100,000,000. Besides the documents we process related PDFs, extracted and user generated meta-data, user information, user libraries and groups. Together the data set and its application at Mendeley is large and complex. After closer inspection we can identify a core operation and challenge besides scale. In almost every feature/product, internally and client-facing, we have to compare data items. This basic operation becomes challenging not just because of the scale. We deal with a noisy data set with different types of information coming from users, meta-data extraction and partner archives. In short, we have to compare items in a huge set efficiently and effectively. This is a core challenge for big data. Like Mendeley, most if not all real world big data services face some kind of noise in their data and use comparisons extensively in their algorithms/products.
There are three main classes of comparison coming to mind in our context:
- Search – comparing patterns and frequencies within and across items. Example, text query against documents.
- Recommendation – comparing items based on their occurrence. Example, collaborative filtering of co-occurring items
- Classification/clustering – comparing items based on their features. Example, clustering and merging (near) duplicate items.
There are products along these classes of comparison available, e.g. Lucene or Solr. The problem is that products specialise on a use case, for example search in case of Lucene and Solr. The specialisation commonly focuses on one or small subset of aspects of the information we have available, e.g. patterns or relationships. Some of the data are only poorly or not at all utilised and comparison across types is often impossible or hard. Moreover, we have to (internally) in many situations do similar comparisons but utilizing specialised products is not always a sensible approach. Where we do use existing technologies and algorithms we are limited by their abilities and insight (or lack of).
We pose these challenges:
- To unify the data comparison classes in one system to extract value from the full data set (patterns, frequencies, relationships, co-occurrence, …) and access it transparently from different services (search, recommendation, de-duplication) according to their needs.
- To scale it for Mendeley (to 10^8 and beyond).
- To be as effective or even better than dedicated products.
We will solve these challenges as part of the TEAM project applying and extending state-of-the-art research. The outcome will a) extend knowledge in form of peer reviewed research publications, and b) result in a real-world, working system at Mendeley.