{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Quantum Finance Application on Portfolio Diversification\n",
    "\n",
    "<em> Copyright (c) 2021 Institute for Quantum Computing, Baidu Inc. All Rights Reserved. </em>"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Overview\n",
    "\n",
    "Current finance problem can be mainly tackled by three areas of quantum algorithms: quantum simulation, quantum optimization, and quantum machine learning [1,2]. Many financial problems are essentially a combinatorial optimization problem, and corresponding algorithms usually have high time complexity and are difficult to implement. Due to the power of quantum computing, these complex problems are expected to be solved by quantum algorithms in the future.\n",
    "\n",
    "The Quantum Finance module of Paddle Quantum focuses on quantum optimization: how to apply quantum algorithms in real finance optimization problems. This tutorial focuses on how to use quantum algorithms to solve the portfolio diversification problem."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Portfolio Diversification Problem\n",
    "\n",
    "Limited by the lack of professional knowledge and market experience, the investor prefers a passive investment strategy in the actual investment. Index investing is a typical example of passive investing, e.g. an investor buys and holds the Standard & Poor’s 500 (S&P 500) for a long period of time. As an investor, if you do not want to invest an existing index portfolio, you can also create your own specific index portfolio by picking representative stocks from the market.\n",
    "\n",
    "An important way to balance risk and return in an investment portfolio is to diversify your assets. A specific description of portfolio diversification is as follows: the number of investable stocks is $n$ and the number of stocks included in the portfolio is $K$. Based on some criteria, you need to divide all the stocks into $K$ categories and select the stock from each category that best represent that category. Adding representatives of each category to the index portfolio is better for investment management."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Encoding Portfolio Diversification Problem\n",
    "\n",
    "To transform the portfolio diversification problem into a problem applicable for parameterized quantum circuits, we need to encode portfolio diversification problem into a Hamiltonian.\n",
    "\n",
    "To model the problem, two issues need to be clarified. The first one is how to classify different stocks, and the second one is what criteria are used to select representative stocks. To solve these two problems, firstly we define the similarity $\\rho_{ij}$ between stock $i$ and stock $j$:\n",
    "* $\\rho_{ii} = 1 \\quad $ The stock is similar to itself with a similarity of 1 \n",
    "* $\\rho_{ij} \\leq 1 \\quad$  The larger $\\rho_{ij}$ , the higher the similarity between stock $i$ and stock $j$\n",
    "\n",
    "Due to the correlation of returns between stocks, we can further measure the similarity between the time series on the basis of the covariance matrix. Dynamic Time Warping (DTW) is a common method to measure the similarity of two time series. In this paper, the DTW algorithm is used to calculate the similarity between two stocks. So based on the similarity between different stocks, we can classify the stocks and select representative stock in each category. We can define $n$ binary variables $x_{ij}$ and $1$ binary variables $y_j$ for each stock. Therefore, given $n$ stocks, there are $n^2 + n$ binary variables. For the variable $x_{ij}$, $i$ denotes the order of stock, and $j$ denotes the position among the $n$ binary variables corresponding to that stock. If two stock has same index of $j$, they are classified in the same category. Meanwhile, the stock of $i = j$ is the most representative one in that category selected to the index portfolio:\n",
    "\n",
    "$$\n",
    "x_{ij}=\n",
    "\\begin{cases}\n",
    "1, & \\text{stock $j$ is in the portfolio and it has the highest similarity to stock $i$}\\\\\n",
    "0, & \\text{otherwise}\n",
    "\\end{cases},\n",
    "$$\n",
    "\n",
    "$$\n",
    "y_{j}=\n",
    "\\begin{cases}\n",
    "1, & \\text{stock $j$ is selected to the index portfolio}\\\\\n",
    "0, & \\text{otherwise}\n",
    "\\end{cases}.\n",
    "$$\n",
    "\n",
    "The model can be written as follows:\n",
    "\n",
    "$$\n",
    "\\mathcal{M}= \\max_{x_{ij}}\\sum_{i=1}^n\\sum_{j=1}^n \\rho_{ij}x_{ij}. \\tag{1}\n",
    "$$\n",
    "\n",
    "The model needs to satisfy the following constraints:\n",
    "* Clustering constraint: the index portfolio only include $K$ stocks\n",
    "    - $ \\sum_{j=1}^n y_j = K$\n",
    "* Integer constraint: a stock is either in the index portfolio or not\n",
    "     - $ x_{ij},y_j\\in{\\{0,1\\}}, \\forall i = 1, \\dots,n; j = 1, \\dots, n$\n",
    "* Consistency constraint: if a stock can represent another stock, it must be in the index portfolio\n",
    "    - $\\sum_{j=1}^n x_{ij} = 1, \\forall i = 1,\\dots,n$\n",
    "    - $x_{ij} \\leq y_j, \\forall i = 1,\\dots,n; j = 1,\\dots, n$\n",
    "    - $x_{jj} = y_j, \\forall j = 1,\\dots,n$\n",
    "\n",
    "The objective of the model is to maximize the similarity between the $n$ stocks and the selected index stock portfolio.\n",
    "\n",
    "Since the loss function is to be optimized using the gradient descent method, some modifications are made in the loss function based on the the model equation and constrains:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "C_x &= -\\sum_{i=1}^{n}\\sum_{j=1}^{n}\\rho_{ij}x_{ij} + A\\left(K- \\sum_{j=1}^n y_j \\right)^2 + \\sum_{i=1}^n A\\left(\\sum_{j=1}^n 1- x_{ij} \\right)^2 \\\\\n",
    "&\\quad + \\sum_{j=1}^n A\\left(x_{jj} - y_j\\right)^2 + \\sum_{i=1}^n \\sum_{j=1}^n A\\left(x_{ij}(1 - y_j)\\right).\\\\ \n",
    "\\end{aligned} \\tag{2}\n",
    "$$ \n",
    "\n",
    "The first term represents the similarity maximization, the next four terms are constraints. $A$ is the penalty parameter, which is usually set to a larger number so that the final binary string representing the index portfolio results satisfies the constraints.\n",
    "\n",
    "We now need to transform the cost function $C_x$ into a Hamiltonian to realize the encoding of the portfolio diversification problem. Each variable $x_{ij}$ has two possible values, $0$ and $1$, corresponding to quantum states $|0\\rangle$ and $|1\\rangle$. Note that every variable corresponds to a qubit and so $n^2 + n$ qubits are needed for solving the portfolio diversification  problem. The Pauli $Z$ operator has two eigenstates which are the same as the states $|0\\rangle$ and $|1\\rangle$ . Their corresponding eigenvalues are 1 and -1, respectively. So we consider encoding the cost function as a Hamiltonian using the pauli $Z$ matrix.\n",
    "\n",
    "Now we would like to consider the mapping\n",
    "\n",
    "$$\n",
    "x_{ij} \\mapsto \\frac{I-Z_{ij}}{2}, \\tag{3}\n",
    "$$\n",
    "\n",
    "where $Z_{ij} = I \\otimes I \\otimes \\ldots \\otimes Z \\otimes \\ldots \\otimes I$ with $Z$ operates on the qubit at position $ij$. Under this mapping, the value of $x_{ij}$ represent different meanings. If the qubit $ij$ is in state $|1\\rangle$, then $x_{ij} |1\\rangle = \\frac{I-Z_{ij}}{2} |1\\rangle = 1|1\\rangle $, which means stock $i$ is in index portfolio. Also, for the qubit $ij$ in state $|0\\rangle$, $x_{ij}|0\\rangle  = \\frac{I-Z_{ij}}{2} |0\\rangle = 0 |0\\rangle $.\n",
    "\n",
    "Thus using the above mapping, we can transform the cost function $C_x$ into a Hamiltonian $H_C$ for the system of $n^2+n$ qubits and realize the quantumization of the portfolio diversification problem. Then the ground state of $H_C$ is the optimal solution to the portfolio diversification problem. In the following section, we will show how to use a parametrized quantum circuit to find the ground state, i.e., the eigenvector with the smallest eigenvalue."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Paddle Quantum Implementation\n",
    "\n",
    "To investigate the portfolio diversification problem using Paddle Quantum, there are some required packages to import, which are shown below."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "# Import packages needed\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import datetime\n",
    "\n",
    "# Import related modules from Paddle Quantum and PaddlePaddle\n",
    "import paddle\n",
    "from paddle_quantum.circuit import UAnsatz\n",
    "from paddle_quantum.finance import DataSimulator\n",
    "from paddle_quantum.finance import portfolio_diversification_hamiltonian"
   ],
   "outputs": [],
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-17T08:00:15.901429Z",
     "start_time": "2021-05-17T08:00:12.708945Z"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Prepare experimental data\n",
    "\n",
    "In this tutorial, we choose stocks as the investment assets. For the data used in the experimental tests, two options are provided:\n",
    "* The first method is to generate random data according to certain requirements, e.g. number of assets.\n",
    "\n",
    "If the user prepares data using this method, then when initializing the data, it is necessary to give the list of parameters: a list of names of investable stocks (assets), the start date and end date of the trading data."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "num_assets = 3 # Number of investable projects\n",
    "stocks = [(\"TICKER%s\" % i) for i in range(num_assets)]\n",
    "data = DataSimulator( stocks = stocks, start = datetime.datetime(2016, 1, 1), end = datetime.datetime(2016, 1, 30))  \n",
    "data.randomly_generate() # Generate random data"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "* The second method is that the user can choose to set the data themselves, i.e. real stock data collected by themselves. Considering that the number of stocks contained in the file may be large, the user can specify the number of stocks used for this experiment, i.e. `num_assets` as initialized above.\n",
    "\n",
    "We collect the closing prices of $12$ stocks for $35$ trading days into the `realStockData_12.csv` file, where we choose to read only the first $3$ stocks.\n",
    "\n",
    "In this tutorial, we choose to read real data as experimental data."
   ],
   "metadata": {
    "ExecuteTime": {
     "end_time": "2021-05-17T08:00:16.212260Z",
     "start_time": "2021-05-17T08:00:15.918792Z"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "df = pd.read_csv('realStockData_12.csv') \n",
    "dt = []\n",
    "for i in range(num_assets):\n",
    "    mylist = df['closePrice'+str(i)].tolist()\n",
    "    dt.append(mylist)\n",
    "# Output the closing price of the seven stocks read from the file for the 35 trading days\n",
    "print(dt)   \n",
    "# Specify the experimental data as a local file read by the user\n",
    "data.set_data(dt)  "
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[[16.87, 17.18, 17.07, 17.15, 16.66, 16.79, 16.69, 16.99, 16.76, 16.52, 16.33, 16.39, 16.45, 16.0, 16.09, 15.54, 13.99, 14.6, 14.63, 14.77, 14.62, 14.5, 14.79, 14.77, 14.65, 15.03, 15.37, 15.2, 15.24, 15.59, 15.58, 15.23, 15.04, 14.99, 15.11, 14.5], [32.56, 32.05, 31.51, 31.76, 31.68, 32.2, 31.46, 31.68, 31.39, 30.49, 30.53, 30.46, 29.87, 29.21, 30.11, 28.98, 26.63, 27.62, 27.64, 27.9, 27.5, 28.67, 29.08, 29.08, 29.95, 30.8, 30.42, 29.7, 29.65, 29.85, 29.25, 28.9, 29.33, 30.11, 29.67, 29.59], [5.4, 5.48, 5.46, 5.49, 5.39, 5.47, 5.46, 5.53, 5.5, 5.47, 5.39, 5.35, 5.37, 5.24, 5.26, 5.08, 4.57, 4.44, 4.5, 4.56, 4.52, 4.59, 4.66, 4.67, 4.66, 4.72, 4.84, 4.81, 4.84, 4.88, 4.89, 4.82, 4.74, 4.84, 4.79, 4.63]]\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Encoding Hamiltonian\n",
    "\n",
    "Here we construct the Hamiltonian $H_C$ of Eq. (2) with the replacement in Eq. (3). "
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "In the process of encoding Hamiltonian, we first need to calculate the similarity matrix $\\rho$ between the returns of each stock, which is available in the ``finance`` module and can be called directly. "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "rho = data.get_similarity_matrix()"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Based on the provided and calculated parameters, the Hamiltonian is constructed below. Here we set the penalty parameter to the number of investable stocks.\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "q = 2 # Number of stocks in the index portfolio\n",
    "penalty = num_assets # penalty parameter \n",
    "hamiltonian = portfolio_diversification_hamiltonian(penalty, rho, q)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Calculating the loss function \n",
    "\n",
    "We adopt a parameterized quantum circuit consisting of $U_3(\\vec{\\theta})$ and $\\text{CNOT}$ gates, that can be constructed by calling the built-in function [`complex entangled layer`](https://qml.baidu.com/api/paddle_quantum.circuit.uansatz.html).\n",
    "\n",
    "After running the quantum circuit, we obtain the circuit output $|\\vec{\\theta\n",
    "}\\rangle$. From the output state of the circuit we can calculate the objective function, and also the loss function of the portfolio diversification problem:\n",
    "\n",
    "$$\n",
    "L(\\vec{\\theta}) =  \\langle\\vec{\\theta}|H_C|\\vec{\\theta}\\rangle.\n",
    "\\tag{4}\n",
    "$$\n",
    "\n",
    "We then use a classical optimization algorithm to minimize this function and find the optimal parameters $\\vec{\\theta}^*$. The following code shows a complete network built with Paddle Quantum and PaddlePaddle."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "class PDNet(paddle.nn.Layer):\n",
    "\n",
    "    def __init__(self, n, p, dtype=\"float64\"):\n",
    "        super(PDNet, self).__init__()\n",
    "\n",
    "        self.p = p\n",
    "        self.num_qubits = n * (n+1)\n",
    "        self.theta = self.create_parameter(shape=[self.p, self.num_qubits, 3],\n",
    "            default_initializer=paddle.nn.initializer.Uniform(low=0, high=2 * np.pi),\n",
    "            dtype=dtype, is_bias=False)\n",
    "        # print(self.theta)\n",
    "\n",
    "    def forward(self, hamiltonian):\n",
    "        \"\"\"\n",
    "        Forward propagation\n",
    "        \"\"\"\n",
    "        cir = UAnsatz(self.num_qubits)\n",
    "        cir.complex_entangled_layer(self.theta, self.p)\n",
    "        cir.run_state_vector()\n",
    "        loss = cir.expecval(hamiltonian)\n",
    "\n",
    "        return loss, cir"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Training the quantum neural network\n",
    "\n",
    "After defining the quantum neural network, we use the gradient descent method to update the parameters to minimize the expectation value in Eq. (4). "
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "source": [
    "SEED = 1100   # Set a global RNG seed \n",
    "p = 2        # Number of layers in the quantum circuit\n",
    "ITR = 150    # Number of training iterations\n",
    "LR = 0.4     # Learning rate of the optimization method based on gradient descent"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "Here, we optimize the network defined above in PaddlePaddle."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "source": [
    "# number of qubits\n",
    "n = len(rho)\n",
    "\n",
    "# Fix paddle random seed\n",
    "paddle.seed(SEED)\n",
    "\n",
    "# Building Quantum Neural Networks\n",
    "net = PDNet(n, p)\n",
    "\n",
    "# Use Adam optimizer\n",
    "opt = paddle.optimizer.Adam(learning_rate=LR, parameters=net.parameters())\n",
    "\n",
    "# Gradient descent iteration\n",
    "for itr in range(1, ITR + 1):\n",
    "    loss, cir = net(hamiltonian)\n",
    "    loss.backward()\n",
    "    opt.minimize(loss)\n",
    "    opt.clear_grad()\n",
    "    if itr % 10 == 0:\n",
    "        print(\"iter:\", itr, \"    loss:\", \"%.4f\"% loss.numpy())\n"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "iter: 10     loss: 7.7804\n",
      "iter: 20     loss: 5.4414\n",
      "iter: 30     loss: 3.6022\n",
      "iter: 40     loss: 3.2910\n",
      "iter: 50     loss: 1.9358\n",
      "iter: 60     loss: 0.3872\n",
      "iter: 70     loss: 0.1344\n",
      "iter: 80     loss: 0.0774\n",
      "iter: 90     loss: 0.0122\n",
      "iter: 100     loss: 0.0068\n",
      "iter: 110     loss: -0.0001\n",
      "iter: 120     loss: -0.0019\n",
      "iter: 130     loss: -0.0025\n",
      "iter: 140     loss: -0.0028\n",
      "iter: 150     loss: -0.0028\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Decoding the quantum solution\n",
    "\n",
    "After obtaining the minimum value of the loss function and the corresponding set of parameters $\\vec{\\theta}^*$, our task has not been completed. In order to obtain an approximate solution to the portfolio diversification problem, it is necessary to decode the solution to the classical optimization problem from the quantum state $|\\vec{\\theta}^*\\rangle$ output by the circuit. Physically, to decode a quantum state, we need to measure it and then calculate the probability distribution of the measurement results:\n",
    "\n",
    "$$\n",
    "p(z) = |\\langle z|\\vec{\\theta}^*\\rangle|^2.\n",
    "\\tag{5}\n",
    "$$\n",
    "\n",
    "In the case of quantum parameterized circuits with sufficient expressiveness, the greater the probability of a certain bit string, the greater the probability that it corresponds to an optimal solution of the portfolio diversification problem.\n",
    "\n",
    "Paddle Quantum provides a function to read the probability distribution of the measurement results of the state output by the quantum circuit:"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "source": [
    "# Repeat the simulated measurement of the circuit output state 2048 times\n",
    "\n",
    "prob_measure = cir.measure(shots=2048)\n",
    "investment = max(prob_measure, key=prob_measure.get)\n",
    "print(\"The bit string form of the solution: \", investment)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "The bit string form of the solution:  100001001101\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "After measurement, we have found the bit string with the highest probability of occurrence, the index portfolio in the form of the bit string. As the result above ``100001001101``, we have $n = 3$ investable stocks and choose two to the index portfolio。The first $n^2 = 9$ bits of ``100001001`` represent $x_{ij}$, and every $3$ bits are grouped together. The first bit of the first group of ``100`` is set to $1$, which means it is classified as a class. The third bit in the second group ``001`` and the third group ``001`` is set to $1$, which means they are classified as one class. Also, the positions of $1$ in the first and third groups are satisfied with $i = j$, i.e., these two stocks are the most representative stock of their respective classes. It can be seen that $1$ appears at $j = 1$ and $j = 3$, i.e., two positions are possible for $1$, which corresponds to our presumption of having two stocks in the index portfolio. \n",
    "The last $3$ position is ``101``, which represents $y_j$, indicating that the first stock and the third stock are selected to the index portfolio. If the final result is not such a valid solution as described above, users can still get a better training result by adjusting the parameter values of the parameterized quantum circuit."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Conclusion\n",
    "\n",
    "In this tutorial, we focus on how to classify investable stocks and how to select representative ones to our portfolio. In this problem, each investment item requires $n$ qubits to represent the classification and $1$ qubit to represent whether it is selected to portfolio or not. Due to the limitation of the number of qubits, the number of investment items that can be handled is still small."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "_______\n",
    "\n",
    "## References\n",
    "\n",
    "[1] Orus, Roman, Samuel Mugel, and Enrique Lizaso. \"Quantum computing for finance: Overview and prospects.\" [Reviews in Physics 4 (2019): 100028.](https://arxiv.org/abs/1807.03890)\n",
    "\n",
    "[2] Egger, Daniel J., et al. \"Quantum computing for Finance: state of the art and future prospects.\" [IEEE Transactions on Quantum Engineering (2020).](https://arxiv.org/abs/2006.14510)"
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "interpreter": {
   "hash": "3b61f83e8397e1c9fcea57a3d9915794102e67724879b24295f8014f41a14d85"
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.7.11 64-bit ('pq_env': conda)"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.11"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}