danyoung commited on
Commit
c10d1eb
1 Parent(s): 2958dbe

Upload misc documentation files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ ProjectResilience[[:space:]]Overview[[:space:]]LF\[27\].pdf filter=lfs diff=lfs merge=lfs -text
ProjectResilience Overview LF[27].pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d900140a203f124b7dd9c20494741d50b6ad7507cede6eddf6f58aafff2cbe3e
3
+ size 8608823
data_requirements.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Project Resilience Data Requirements and Tips
2
+
3
+ ## Format and Features
4
+
5
+ Data features in each row of the data should include columns that can be cast
6
+ as **Context**, **Actions** and **Outcomes** of a decision pertaining to the
7
+ unit of decision-making.
8
+
9
+ For example, if the problem is carbon emissions decisions per power plant:
10
+ - the unit of decision making is a power plant, so each row should represent
11
+ a decision made for a power plant
12
+ - Context features are features about the plant that can't be changed
13
+ (e.g., location, weather, reactor type)
14
+ - Actions are policies for the plant that can be changed within reasonable
15
+ time so that the effect can be observed and associated to the action
16
+ (e.g., generator setup config, carbon capture level, change in generation
17
+ hours)
18
+ - Outcomes are quantifiable values that can be attributed to a single region
19
+ within a reasonable lag (e.g., carbon emissions, cost of actions, energy
20
+ - output)
21
+
22
+ ## Predictability
23
+
24
+ We need some a priori theory of why/how Actions could affect Outcomes,
25
+ and why we should expect prediction of Outcomes to be easier from
26
+ Context/Actions rather than from a Context alone. A human being should,
27
+ just by looking at the context / action data, be able to predict more or less
28
+ what the outcome should be. At least be able to reason about it.
29
+ Alternatively, a basic predictor model mapping Context/Actions to Outcomes
30
+ should be able to show that it uses the Actions to make predictions better
31
+ than with Context alone. This simple predictor model does not need to use
32
+ the full data or input/output spaces, it just needs to make it clear that
33
+ there's something there.
34
+
35
+ ## Rules of Thumb
36
+
37
+ ### Time-series
38
+
39
+ Either (1) we have an outcome value at each time step, in which case the row
40
+ should indicate the time step; or (2) we have an outcome value only at
41
+ particular time steps (e.g., if we have daily power plant CO2 output,
42
+ but only monthly cost reports). In any case, if there are time steps which
43
+ are missing some values (context, action, or outcome), it's ok if they are
44
+ NA in the row for that time step: we can still construct time series to train
45
+ on from this dataset.
46
+
47
+ ### Missing Data
48
+
49
+ To give the project the best chance of success, the amount of missing data
50
+ should be minimal and/or structured, e.g., we only get cost reports monthly.
51
+
52
+ ### Data Sufficiency
53
+
54
+ Data rows should cover variations of decisions sufficiently, and so, in the
55
+ case of time-series data, we need historical decision instances that include
56
+ different actions taken for similar context.
57
+
58
+ A single row should represent one observation, which includes context,
59
+ actions, outcomes for that observation.
60
+
61
+ We need enough cases for our predictor to learn something about how Actions
62
+ affect Outcomes. If we have thousands of samples to begin with, that certainly
63
+ gives us a better shot. A quick-and-dirty check for correlations between
64
+ actions and outcomes could be used as a gating function, i.e., the
65
+ correlation matrix should not look like noise. If it looks like noise,
66
+ the project may be possible but hard.
67
+
68
+ Data requirement grows exponentially with number of outcome objectives.
69
+
70
+ ### Consistency
71
+
72
+ Same context and actions should result in similar outcomes. Contradicting
73
+ samples should be minimal. In other words, not too many rows with same
74
+ Context and Actions resulting in different Outcomes.
75
+
76
+ It should be possible to observe the outcome of an action in a reasonable
77
+ amount of time (e.g., less than 3 months)
78
+
79
+ ### Availability and Updates to the Data
80
+
81
+ As a rule of thumb, the number of new samples should be at least on the order
82
+ of the problem dimension, or (dim(A) + dim(C)) x (dim(O)). More important
83
+ than the number of new samples is which data is sampled: one sample in a
84
+ previously-unknown region of interest may be more useful than thousands
85
+ in a region we already know well or don't care about. So, if we control
86
+ which data is sampled, we don't need as much of it.
87
+
88
+ ### Transparency/Accountability
89
+
90
+ Data should come from reliable, trusted, scientific, ethical sources.
91
+ (e.g. not blackboxes or your mom's Facebook surveys).
project_resilience_conceptual_architecture.pdf ADDED
Binary file (179 kB). View file