Skip to content

Latest commit

 

History

History
627 lines (516 loc) · 43.8 KB

int.org

File metadata and controls

627 lines (516 loc) · 43.8 KB

*+TITLE: Int

Study Plan

CategoryItemDesc
Java
Memory ManagementJVM internals.pdf
LambdaJava 8 in action
StreamsJava 8 in action
**Linux**
# SQL
innerjoin, outerSQL queries for mere mortals
pass riddle sql questions
# Search
Relevant search bookPractice!
# Troubleshooting
python load log file
python parse log filehttps://www.nltk.org/book/ch03.html
bucket time and plot number of GET**CURRENT**
python chart line/hist log fileLog file line chart or hystogram based on time timestamp
2 variable line chart based on file
# Alexa
Create a skill
General continue
# AWS
Start vm, create user, connect ssh
Install hadoop on custom vm
Run hadoop job on cusotm vm
# BigData
Create spark job
Short snippets of spark
Install local kafka
Send and receive topic events
# Machine Learning
Build model to predict house prices
NLP Basic steps with python code
Deep learning basic model python
# Distributed Systems
`
Design a simple distributed systemWith all the pitfalls concurrency etc.
# Data Structures and Algorithms
General

System Design

itemdescription
DB FederationSplits databases by function https://tinyurl.com/dbfederation
ShardingEach DB subset of data same function multiple db different sections of rows
DenormalizatoinImprove read performance in experse of write performance

Java

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ JVM Architecture (http://blog.jamesdbloom.com/JVMInternals.html) │ └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────┐ ┌──────────────────────────────────────┐ ┌───────────────────────────────┐ │ Stack │ │ Non Heap │ │ Heap │ │ ┌──────────────────┐ │ │ │ │ ┌───────────┐ ┌─────┐ │ │ │ Thread │ │ │┌───────────┐┌───────────────────────┐│ │ │ Young │ │ Old │ │ │ └──────────────────┘ │ ││Code Cache ││ Permanent Generation ││ │ ├─────┬─────┤ ├─────┤ │ │ ┌──────────────────┐ │ │├───────────┤├───────────────────────┤│ │ │ E │ │ │ │ │ │ │ Program Counter │ │ ││ ││┌───────────┐ ││ │ │ d │ S │ │ │ │ │ └──────────────────┘ │ ││ │││ Interned │ ││ │ │ e │ u │ │ │ │ │ ┌───────────┐ ┌───────────┐ │ ││ │││ Strings │ ││ │ │ n │ r │ │ │ │ │ │ Stack │ │ Native │ │ ││ ││└───────────┘ ││ │ │ │ v │ │ │ │ │ │ │ │ Stack │ │ ││ ││┌───────────┐ ││ │ │ S │ i │ │ │ │ │ │ │ │ │ │ ││ │││Method Area│ ││ │ │ p │ r │ │ │ │ │ │ │ │ │ │ ││ │││┌────────┐ │ ││ │ │ a │ o │ │ │ │ │ │ │ │ │ │ ││ ││││ ├┐│ ││ │ │ c │ r │ │ │ │ │ ├───────────┤ ├───────────┤ │ ││ │││└┬───────┘││ ││ │ │ e │ s │ │ │ │ │ ├───────────┤ ├───────────┤ │ ││ │││ └───┬────┘│ ││ │ │ │ │ │ │ │ │ ├─────┬─────┤ ├───────────┤ │ ││ ││└─────┼─────┘ ││ │ ├─────┴─────┤┌─────┴─────┤ │ │ └─────┼─────┘ └───────────┘ │ ││ ││ │ ││ │ │ Minor ││ Major │ │ │ │ │ │└───────────┘└──────┼────────────────┘│ │ └───────────┘└───────────┘ │ │ │ │ │ └─────┐ │ │ │ ┌─┼───────┘ │ │ │ │ │ │ │ └──────────────────────────────────────┘ └──────────────────────────┼───────────┘ └───────────────────────────────┘ │ │ │ ◎ │ ┌────────────────────────────────────────────────┐ ┌────────────────────────────────┐ │ │Frame ┌─────────────┐┌────────────┐│ │Class Data │ │ │┌────────┐┌───────┐│Operand Stack││ Current ││ │┌─────────────┐ ┌─────────────┐ │ │ ││ Return ││ Local │├─────────────┤│ Class ││ ││ Run-Time │ │ Method Code │ │ └─◎│ Value ││ Vars │├─────────────┤│ Constant ││────▶│Constant Pool│ │ │ │ │└────────┘└───────┘└─────────────┘│ Pool ││ │├─────────────┤ │ │ │ │ │ Reference ││ ││String Consts│ │ │ │ │ └────────────┘│ │├─────────────┤ │ │ │ └────────────────────────────────────────────────┘ ││ Num Consts │ │ │ │ │├─────────────┤ │ │ │ ││ Class Refs │ │ │ │ │├─────────────┤ │ │ │ ││ invoke dyn │ │ │ │ │└─────────────┘ └─────────────┘ │ └────────────────────────────────┘

itemdesc
**Basic**
Integer32 bit signed (all is signed in java) so effectively (2^31)-1
**java8**
lambdainterface with single abstract method (can have in addition static,default), parameters, return type match lambda
Expressions(parameters) -> expression e.g. (int x, int y) -> x + y
(parameters) -> statement e.g. () -> System.out.println("hi " + s)
(parameters) -> { statements } e.g. (String s) -> { int n = s.length(); return n; }
Runnable r = () -> System.out.println("Hello!");
Collections.sort(persons, (p1, p2) -> p1.name.compareTo(p2.name));
Callable<String> callable = () -> s;
Method ReferenceClass::staticMethod e.g. Arrays.sort(items, Util::compareItems);
instance::instanceMethod e.g. items.forEach(System.out::print);
Default methodsinterface Descriptive { default String describe() { return "fantastic"
One liner RESTRapidoid: On.get("/size").json((String msg) -> msg.length());
StreamsArrays.asList("a", "", "b", "", "c").stream().filter(x -> !x.isEmpty()).count();
Arrays.asList("A", "bbb", "CC", "dd").stream()..map(String::toUpperCase)..collect(Collectors.joining(":"));
Arrays.asList(10, 3, 2, 5, 9, 4).stream().mapToInt(x -> x).summaryStatistics()
Resourceshttp://www.java8.org/
**JVM**
Internal Threads
VM ThreadRuns safe-point stop-the-world operations (like major gc, thread dumps, thread suspension)
Periodic task threadRuns Timer events like interrupts to schedule execution of periodic operations
GC Threads
Compiler threadsbyte code –> native code at runtime
Signal DispatcherReceives external signals
Threads ComponentsPC, stack, frame per method executed, native stack, stack restriction (stackoverflow)
FrameNew frame per method invocation, local vars, return val, operand stack, reference to runtime constant pool
Dynamic LinkingIn C++ at compile time: objects –> one-object symbols –> address, at java this is at runtime
GCThe permanent generation is collected every time old generation is collected, it happens when they are full
**ClassLoader**JVM Startsup BootstrapClassLoader Loads Main causes Loading, Linking, Initializing additional classes

General

jps -l Show current java processes.

Performance

Measure

cpu

measurments
user cpu

you want: linear relation: increase load on system and increased user cpu.

system cpu
  1. also known as kernel cpu.
  2. reduce - time spent on system cpu is time we don’t have on user cpu, > 5% oepn eye on it.
idle time
HOWOTO
vmstat

`vmstat 5` Global cpu stats

  1. `r` - run queue threads waiting to run.
  2. `si/so` - paging.
mpstat

`mpstat -P ALL` to see virtual cpu stats

virtual memory

if your heap memory is in virtual memory gc would be very slow and gc pauses will take long time

process

context switching

high voluntary context swiching can be an indication of waiting for locks, io, contention on locks and io.

Voluntary
High Voluntary

Java apps that are experiencing lock contention. They use high cpu. `pidstat -w -I -t -p 23132 5` - 23132 is pid, more than 5% of available clock cycles on voluntary context switches is likely suffering from lock contention. General tule of thumb you have 80,000 clock cycles. `cswch` - involuntary context switches, this means locks. Sum up all the involunary numbers for the process and divide by 4 because of 4 core, then multiply by by 80,000 shouldnt be more than 5%, compare to how many clock cycles we have `more /proc/cpuinfo` Mhz is how many clock cycles we have `nvcswch` - non vountary involuntary context switches

Involuntary

more threads than can run.

Scheduling Queue

goes together with involuntary context switching we have more threads than can be handled.

  1. When thread is ready to run it’s placed on the `run queue`
  2. Run Queue Size > Num VCPU * 2 => System is slow

GC

  1. gc scans large chunks of memory, if we have paging, it would be much slower, so check si/so in vmstat

![gcgenerations](https://tinyurl.com/gcyoungold)

  1. -XX:+PrintGCDetails logs: -Xlogcc -
  2. -XX:+PrintGCDateStamps or -XX:+PrintGCtimeStamps
  3. -XX:+PrintGCApplicationStoppedTime How much time did the application stop waiting for gc or safepoint. important.
  4. -XX:+PrintApplicationConcurrentTime How much time did the application run between the gc and safepoints. important.
  5. -XX:+PrintTenuringDistribution - How much time objects stay alive in your generation spaces. “new threshold 1 (max 15)” means at age 1 it’s promotion objects to old generation space. Meaning survivor space is not large enough so it was choosing age 1.
  6. -XX:+PrintAdaptiveSizePolicy (Parallel GC or G1 Only)
  7. VisualVM/VisualGC remote to troubleshoot remotely, install jstatd on server. Requires to run with same javaapp user, and policy, jstatd policy file (search for it). Then start jstatd remotely. Then from client jstat -gcutil pid@remoteip 2000
Sections
Young

Moving them back and forth between s0 and s1. Also known as from and to space.

Eden
  1. Most objects die here.
  2. When eden is full we have minor GC which copies to s0 or s1 in addition in minor gc objects are moved from s1 to s0 and back.
  3. Move objects to survivor (s0).
TLAB’s each thread has it’s own space to allocate data so eden is split and each thread has it’s own space. ![eden tlabs](http://i.umumble.com/img/topic-1-1506586679.png)Survivor ALL objects from s0 are movbed to s1 on minor gc, all objects from s1 are moved to s0 on minor gc. At any point in time only s0 or s1 has objects. The other one is empty.From SurvivorTo Survivor
Old

moving to here from survivor after a couple of minor gc are moved here to old. Here we have the full GC. We try to have objects not arrive to old so that they won’t have full gc.

Permanent

VM Meta Classes

Tools
VisualVM
VisualGC Plugin
Resources

Safari Java Performance LiveLessons

Network

nicstat

`nicstat -i eth0 5`

DiskIO

iostat -xm 5 : include io%util we are interested in it.

https://www.safaribooksonline.com/library/view/java-performance-livelessons

concurrent

delayed operation

“`java Scheduler scheduler = Executors.newSingleThreadScheduledExecutor() scheduler.schedule(new Runnable() { override def run(): Unit = { Some Code } }, 1, TimeUnit.SECONDS) “`

URLConnection

“`java URL url = new URL(“http://example.com”); HttpURLConnection connection = (HttpURLConnection)url.openConnection(); connection.setRequestMethod(“GET”); connection.connect();

int code = connection.getResponseCode(); “`

gc

“`markdown ![gcgenerations](https://tinyurl.com/gcyoungold) “`

young

Moving them back and forth between s0 and s1. Also known as from and to space.

eden

“`markdown

  1. Most objects die here.
  2. When eden is full we have minor GC which copies to s0 or s1 in addition in minor gc objects are moved from s1 to s0 and back.
  3. Move objects to survivor (s0).

“`

TLAB’s

“`markdown each thread has it’s own space to allocate data so eden is split and each thread has it’s own space. ![eden tlabs](http://i.umumble.com/img/topic-1-1506586679.png) “`

survivor

ALL objects from s0 are movbed to s1 on minor gc, all objects from s1 are moved to s0 on minor gc. At any point in time only s0 or s1 has objects. The other one is empty.

from survivor
to survivor

old

moving to here from survivor after a couple of minor gc are moved here to old. Here we have the full GC. We try to have objects not arrive to old so that they won’t have full gc.

Permanent

VM Meta classes.

resources

https://www.safaribooksonline.com/library/view/advanced-java-performance/9780134653273/ajph_01_01.html?autoStart=True

tools

visualvm

visualgc plugin

performance troubleshooting

measure

cpu

user cpu

you want: linear relation: increase load on system and increased user cpu.

system cpu

also known as kernel cpu.

reduce

time spent on system cpu is time we don’t have on user cpu

idle time

virtual memory

if your heap memory is in virtual memory gc would be very slow and gc pauses will take long time

process

context switching

high voluntary context swiching can be an indication of waiting for locks, io, contention on locks and io.

voluntary
involuntary

more threads than can run.

scheduling queue

goes together with involuntary context switching we have more threads than can be handled.

resources

CS Patterns

DetailsItem
[a-z]: [65,90], [A-Z]: [97-122], Space: [32]Ascii
Java
Character.isLetter(c)is char letter
String.split(" ")Split to works
Arrays.copyOfRange(arr, from = 0, to = 2)SubArray
Arrays.toString(new int[] {1,2,})Print Array
Arrays.deepToString(…)Print 2 dim array
Arrays.sort(arr)Sort Array
int[][] my2dimarr = new int[3][3]Initialize 2 dim array
PriorityQueue<Integer> q = new PriorityQueue<>(); q.add(2); q.poll()Heap PriorityQueue
  1. **Brute force**
  2. **Massage input** if can (sort, precalc, cache, turn to graph, turn to priority-queue/heap, HashMap, Set)
  3. If cannot massage input then `greedy algorithm`
  4. Try being `greedy` in O(n) see if works.
  5. Recursion, first item either in result or not, if yes, do this if not do that.
  6. Dummy trick compare - str1, str2, str3? maybe instead of combinations you can just compare the length of str1+str2 to the length of str3?
  7. **impossible** what is the BEST HEAVEN data structure you would want to solve it? Now prepare that DS one time even with high calculation cost and use it to solve the problem. [http://www.ardendertat.com/2011/10/17/programming-interview-questions-8-transform-word/](http://www.ardendertat.com/2011/10/17/programming-interview-questions-8-transform-word/)
  8. **Mental jump** convert input data structure to the best one you want! cost is one time then all lookups and you always have that output data structure, think you have hadoop.

Signs you need recursion

  1. **Permutations** The problem requires many permutations like 2^n
  2. **impossible** It looks impossible to solve the problem. Let recursion help you.
  3. **Impossible** cannot get answer? When questions looks impossible most probably recursion. In this case you must get some help from recursion friend to reduce the problem. Impossible question is a big hint we need a recursion, we can’t boggle our mind around it.

Signs you need to massage input

Impossible to solve, too many options, you need to turn the input data structure into the dream data structure that would help you resolve the question. As you do it one time, precalc, and then for each test function you just run on the new data structure.

Coding Patterns

  1. Use **`PriorityQueue`** instead of Max/Min heap!!!! This will save you a huge load of time as you need a heap in rather many of the questions.
  2. Use `Arrays.binarySearch` - to find!!
  3. Use `Arrays.toList(new int {1, 2, 3})` to turn an array to list.
  4. `assert(condition for tests)` with `-ea` flag to turn it on.
  5. Throw `IllegalArgumentException` for quick validation.
  6. Use `while (cur.next() != null)` in linkedList to avoid holding two pointers `prev` and `cur`.
  7. `BFS` always finds the shortest path.
  8. `DFS` uses less space.
  9. Use `ArrayQueue` instead of `ArrayList` for efficient `FIFO` in arrayList remove would be `O(n)`
  10. Use `Collections.reverse` to reverse a `list`
  11. `DFS` and `BFS` both take `O(m+n)`
  12. `BFS` store `nodesAlreadyVisited` don’t revisit them wasting time in shortest path finding.

CS Literacy

  1. **Random Forests** => machine learning, take “average” of multiple decision trees as your result, avoic noise.
  2. **DFS** => Init: Stack, Push head ==> Loop while stack not empty ==> Pop one, Mark it, push all adjucent vertexes to stack. O(V + E) [https://www.youtube.com/watch?v=1MBr9swUPE8](https://www.youtube.com/watch?v=1MBr9swUPE8)
  3. **merkel trees** tree of hashes you send it in between the nodes, so that nodes can very quickly know if they have the wrong data, so they sync.
  4. **vector clocks** logical clocks, think git and distributed resolution, although we are distributed each commit get’s a hash and we can fix collisions.

CS Interview Resources

TopicCategoryResource
----------------------------------------------------------------------------------------
CS Programming Interview querstions and good answershttp://www.ardendertat.com/2011/10/18/programming-interview-questions-9-convert-array/

Sort

QuickSort

recurse: choose pivot, forwardI++, backwardI++, swap if left smaller pivot nad right bigger pivot.

BFS

*+BEGIN_SRC java Lpublic int findeftMostNode(TreeNode root) { Queue<TreeNode> queue = new LinkedList<>(); queue.add(root); while (!queue.isEmpty()) { root = queue.poll(); if (root.right != null) queue.add(root.right); if (root.left != null) queue.add(root.left); } return root.val; } *+END_SRC

queue

insert root

repeat
pop node
insert children

BTree

In computer science, a B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children. ![btree](http://www.virtualmachinery.com/images/tree.gif)

infix

postfix/prefix

convert to postfix/prefix

postfix and prefix do not need parenthasis A + B * C => B C * + to convert: operands stay in same relative places, only operators change positions.

no need parenthasis

evaluate

push operands as long as we have operands once we have operator pop 2 operands and run operation on them.

Resources

Problem solving with algorithms and data structures

online concise python book

Security

ItemDescription
OAuth2https://aaronparecki.com/oauth-2-simplified/
TokenEnd goal to get token then use it with Authorization: Bearer RsT5OjbzRn430zqMLgV3Ia
Register AppFirst step to register your app give redirect_url
httpsAs opposed to OAuth1 which used crypt here we must use https
Client id/secretApp receives client id and secret, secret used when server makes the call not in webapp/mobile
1st request userUser given url that asks him to authenticate to server with redirect back uri and &state=rand27873
Access codeWe get back ok with access code to redirect url
2nd request appApp now goes to server with received auth-code and same state to exchange for token
TokenThis is it we got the token

Softskills

meetings

end

you already konw how you want the meeting to end, before meeting and during meeting you should stick as fast as possible to how the meeting should and and put your voice.

Math

mod

only the reminder so 2 % 3 is 1 and 4 % 3 is 1

WORK

amazon

machine learning

models lifecycle

release process from data scientists to production

measure effectiveness

effectiveness of models are they good?

Scala

codedesc
def wrapCodeWithLog(blockOfCode: () => String): () => String = log.info("before");blockOfCode()function that wraps block of code with logging, just wraps does not run
scala

mockito
import org.scalatest.mock.MockitoSugar
import org.mockito.Mockito._

with MockitoSugar

val mockClient = mock[Client]
when(mockClient.status).thenReturn(200)
play
json
implicit val writesMutableListBuffer: Writes[ListBuffer[(String, mutable.ListBuffer[T])]] = new Writes[ListBuffer[(String, mutable.ListBuffer[T])]] {
 
    def writes(q: ListBuffer[(String, mutable.ListBuffer[T])]): JsValue = {
      Json.obj("myobj" -> q.map(
        item => Json.obj(
          "listbuffer-key" -> Json.toJson(item._1),
          "listbuffer-values" -> Json.toJson(item._2)
        )
      )
      )
    }
  }
 
  val someReader = new Reads[MyObj] {
    override def reads(json: JsValue): JsResult[MyObj] = {
      val fieldValue = (json \ "someField").as[String]
    }
  }
 
  // for simple case classes just define the default writes/reads
 
  case class Customer(name: String)
  object Customer {
    implicit val customerJsonWriter = Json.writes[Customer]
    implicit val customerJsonReader = Json.reads[Customer]
  }
 
  // for inheritance define case object for the train with pattern matching
 
  object RequestData {
    implicit val requestDataWriter = new Writes[RequestData] {
      override def writes(o: RequestData): JsValue = {
        o match {
          case stringRequestData:StringRequestData => StringRequestData.stringRequestDataWriter.writes(stringRequestData)
          case _ => throw new IllegalArgumentException(s"requestDataWriter: No writer for $o")
        }
      }
    }
 
    implicit val requestDataReader = new Reads[RequestData] {
      override def reads(json: JsValue): JsResult[RequestData] = {
        val requestType = (json \ "requestType").as[String]
        requestType match {
          case "stringRequestData" => StringRequestData.stringRequestDataReader.reads(json)
          case _ => throw new RuntimeException(s"requestDataReader: does not support json: $json with type $requestType")
        }
      }
    }
  }
 
  // Map to json and json to map, array
  implicit val moreDetailsJsonWriter = new Writes[Map[String, SomeValue]] {
    override def writes(o: Map[String, SomeValue]): JsValue = {
      Json.arr(o.map( {
        case (key, value) => key -> SomeValue.writes(value)
      }))
    }
  }
 
  implicit val moreDetailsJsonReader = new Reads[Map[String, SomeValue]] {
    override def reads(json: JsValue): JsResult[Map[String, SomeValue]] = {
      val mapAsJson = json.as[JsArray]
      val kvSeq = mapAsJson.value.flatMap { entry =>
        val keyValSeq = entry.asInstanceOf[JsObject].fields
 
        keyValSeq.map(keyValEntry => keyValEntry._1.asInstanceOf[String] ->
          SomeValue.reads(keyValEntry._2).get)
 
      }
      JsSuccess(kvSeq.toMap)
    }
  }
 
  // Take first key of a json { "somekey": "somevalue" } will return "somekey"
  json.asInstanceOf[JsObject].fields(0)._1

Ubuntu

shortcutdesc
sudo apt install gnome-tweak-toolinstall to remove alt-shift from keyboard shortcut then run tweak

Status

quick sort Quick Sort Code