# Sankey Diagram Generator (Python & R)

This repository provides a dual implementation (Python and R) of a customizable Sankey diagram generator.  
It reads flows from Excel and produces **JSON** and (optionally) **JS** files that can be consumed by D3.js or other front-end code.

---

## Features

- Reads flow data from Excel (`data` sheet required; `config` and `nodes_positions` optional)  
- Custom node & link colors  
- Adjustable node positions  
- Transparency for zero-value flows  
- Outputs:
  - sankey.json (data only)
  - sankey.js (wraps data as window.SANKEY_DATA for easy use with file:// or simple HTML)

---

## Requirements

### Python
- Python 3.8+
- Install dependencies:
    pip install -r requirements.txt

### R
- R 4.0+
- Install required packages:
    install.packages(c("readxl","dplyr","tidyr","stringr","jsonlite","tibble"))

---

## File Structure

```
project/
├── data/
│   └── sankey_data.xlsx      # Input Excel file (with optional config and positions)
├── index.html                # The D3.js implementation of sankey chart
├── sankey.py                 # Python script (CLI)
├── sankey.ipynb              # Python notebook - alternative to the script
├── sankey.Rmd                # R Markdown script
├── sankey.json*              # Output JSON (auto-generated)
├── sankey.js*                # Output JS with window.SANKEY_DATA (auto-generated)
└── README.md
```


* generated after running the Python and/or R scripts.

---

## Input File Details

Excel File: data/sankey_data.xlsx

- data sheet (required):  
  - Source (Column A): origin of the flow  
  - Target (Column B): destination of the flow  
  - Flow Value (Column C): numeric flow value  

- nodes_positions sheet (optional):  
  - node – must match entries in Source/Target  
  - x, y – normalized coordinates (0–1)  
  - node_color – HEX color for the node (e.g., #CCCCCC)  
  - incoming_flow_color, outgoing_flow_color – HEX colors for links  

- config sheet (optional):  
  - Cell B1: Diagram title  
  - Cell B2: Subtitle  

---

## Usage

### Python (CLI)

Run from the project directory:
    python sankey.py -i ./data/sankey_data.xlsx -o sankey.json --out-js sankey.js

- Generates sankey.json and, if --out-js is provided, also sankey.js.  
- If --out-js is omitted, only sankey.json is created.

### R (R Markdown)

Open sankey.Rmd in RStudio and Knit, or render from the command line:
    rmarkdown::render("sankey.Rmd", params = list(
      excel = "./data/sankey_data.xlsx",
      out_json = "sankey.json",
      out_js = NULL  # if NULL, defaults to "sankey.js"
    ))

- By default, the R script writes both sankey.json and sankey.js.  
  - If out_js is NULL, it defaults to "sankey.js".

---

## Front-End Consumption

An example `index.html` with a ready-made D3.js Sankey script is included in the project.  
After generating the output files (`sankey.json` and/or `sankey.js`):

- Open `index.html` in your browser.  
- The page will automatically load the Sankey data and render the diagram.  

By default:
- `sankey.json` is loaded if you are serving the files via a local server (`http://...`).  
- `sankey.js` can be used when opening `index.html` directly with `file://`, since it provides the data via a global variable `window.SANKEY_DATA`.  

No additional setup is required — just regenerate the JSON/JS files when your Excel data changes, then refresh `index.html`.

---

## Notes

- Zero-flow links are rendered semi-transparent.  
- Default colors: nodes → grey (#CCCCCC), flows → UN blue (#009EDB).  
- Node labels include total flow values with thousands separators.  
- Python and R implementations produce the same JSON schema (meta, nodes, links).
- Do not change sheet names, as those names are reference points for the script

---

## Privacy and Copyright

The code runs locally and does not connect to external databases or APIs.  
You may adapt and modify it freely to suit your needs.
