Understanding consumer behaviour is important for a wide range of applications, from developing more successful marketing strategies to economic policy design. The aim of this project is to extract and understand the patterns that are present in shopping data. The data takes the form of a transaction history for each consumer, where we know when and how much they bought of a particular product, and have additional information on the demographics and products themselves. The industrial partner for this project is Unilever Ltd.
The underlying structure of the data can be conveniently represented as a temporal, weighted, bipartite network, where consumers and products form the two classes of nodes and links are given by the transactions. One can then use community detection to extract mesoscopic structure, yielding groups of products and consumers that are more densely connected, than would be expected if consumers bought products at random. The algorithm used to find these clusters is based on optimising a generalised version of the popular modularity quality function that allows for time-dependence and different types of edges. One of the challenges of finding meaningful communities in shopping data is the high level of behavioural noise. As modularity maximisation is a NP-hard problem, one has to apply heuristic methods that will usually only find local maxima. We have been able to obtain more robust results by using consensus clustering to combine the information from multiple runs of a stochastic heuristic.
The two datasets we have available are the purchase history of a swiss online supermarket, provided by Unilever, and the purchase history as well as detailed demographic and spatial information for a selection of customers of a number of supermarkets in the midwestern United States, provided by Brian Uzzi of Kellogg School of Management, Northwestern University. Thus far, we have been able to establish significant clustering in both datasets. For the swiss online supermarket, demographic factors appear to have at most a weak impact on the clusters found and limited usable product information makes interpretation difficult. Preliminary results for the US data indicate that brand loyalty is a potential source of clustering, at least in some product categories.
In the future, we plan to use the data to understand urban versus suburban shopping patterns, which should yield actionable insights that can be used by Unilever, as well as interesting sociological results. We are also interested to further develop methods to detect overlapping communities in noisy and temporal data.
Key references in this area
- Mucha, P. J., Richardson, T., Macon, K., Porter, M. A., & Onnela, J.-P. (2010). Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980): 876-878.
- Porter, M. A., Onnela, J.-P., & Mucha, P. J. (2009). Communities in Networks. Notices of the AMS 56(9): 1082-1166.
Crystalline silicon photovoltaics
The dramatic improvement in the energy output of crystalline silicon photovoltaics has moved this technology from novelty to a key ingredient having a tangible impact on renewable energy sources. This project is an investigation of the electric contact between n-type silicon and the silver electrode in a p-base crystalline photovoltaic cell. Currently, models indicate that the electron flow path is through a thin interfacial glass layer existing between the bulk silicon and silver conductor. A mathematical model is under development, based on drift diffusion equations, for the electron transport through this glassy layer to determine whether "crystalline" or "colloid assisted tunneling" theories best describe the situation.
The first step is a one-dimensional model describing the flow of electrons through a homogeneous glassy layer. This model has been solved and the solutions analysed using a number of asymptotic and numerical techniques. The model predicts that the effective resistance across a glass contact may be a nonmonotonic function of the current.
In the future the model will be developed further by first extending to two dimensions and then considering the effect on the contact resistance of silver precipitates present in the glassy layer. The industrial partner for this project is DuPont (UK) Ltd.
Key references in this area
- C Ballif, D Huljic, G Willeke and A Hessler-Wyser (2003). Contact resistance scanning for process optimization: the corescanner method. Applied Phys Letters 82(12): 1878-1880.
- Z Li, L Liang, A Ionkin, B Fish, L Cheng, K Mikeska (2011). Microstructural comparison of silicon solar cells' front-side Ag contact and the evolution of current conduction mechanisms. Journal of Applied Physics 110(7), 074304.
The turmoil witnessed in financial markets in recent years has illustrated important links between seemingly disparate markets and a high level of connectivity of the global financial system. These interdependencies between financial institutions or assets are often poorly understood and can have large and unforeseen consequences, proving to be very important in providing insight into macro-economic risk and large corporate risk.
Networks are used to represent complex systems of interacting entities. We are interested in investigating the structure and dynamics of financial networks (using data from HSBC). "Community detection" is an important tool in network analysis; it is used to cluster the data into densely connected groups and can reveal underlying structure in the network and detect functionalities or relationships between the nodes. In particular, we have been using new methods of network science developed specifically for community detection in time-dependent networks.
Some challenges that arise in extracting communities from financial data include choosing an appropriate network representation (choice of the nodes and edges in the network), applying the method to the chosen network model and interpreting the output of the method. Another issue is also that of allowing overlap between the different communities, for which no method has been developed for time-dependent networks, and which is still unresolved at the level of static networks.
To attempt to address some of these difficulties, we have been re-thinking some of the ideas in community detection for evolving networks, carrying out numerical experiments to attempt to extract robust community partitions and formulating null models to test the significance of the resulting partition.
Up to this point, we have been studying a dataset of financial assets from different markets using a signed, weighted, fully connected time-dependent correlation network. Although we have been able to extract communities that seem to be consistent with previous studies carried out on the same dataset and identify some important financial events across time, it seems that some of the features defined in the current community detection method need to be modified to account for signed edges.
The industrial partner for this project is HSBC.
Key references in this area
- P. J. Mucha, T. Richardson, K. Macon, M. Porter, J-P. Onnela (2010). Community Structure in Time Dependent, Multiscale, and Multiplex Networks. Science 328(5980): 876-878.
- M. A. Porter, J-P. Onnela, P. J. Mucha. (2009). Communities in Networks. Notices of the American Mathematical Society 56(9) :1082-1097 & 1164-1166.
- D. J. Fenn, M. A. Porter, S. Williams, M. McDonald, N. F. Johnson, N. S. Jones (2011). Temporal Evolution of Financial Market Correlations. Physical Review E 84(2): 026109.
- D. J. Fenn, M. A. Porter, S. Williams, M. McDonald, N. F. Johnson, N. S. Jones (2009). Dynamic Communities in MultiChannel Data: An Application to the Foreign Exchange Market During the 2007-2008 Credit Crisis. Chaos 19(3): 033119.