Forem

CodingBlocks

95. Data Structures – Arrays and Array-ish

We continue our deep dive into data structures, this time focusing in on arrays and array-like types as Allen gives Shania Twain some singing competition, Joe is going to owe us another tattoo, and wait … when does Michael think C++ was invented?

How are you reading this? If you’re using your podcast player to read these show notes, you can visit https://www.codingblocks.net/episode95 to view this show’s full show notes and be a part of the discussion.

Sponsors

  • Datadog.com/codingblocks – Sign up today for a free 14 day trial and get a free Datadog t-shirt after creating your first dashboard.
  • Manning Publications – Sign up to Manning’s Deal of the Day (here) to participate in the Countdown to 2019 to be in the running to win awesome prizes and get special daily discounts.
  • Discover.bot – a digital space for bot developers and enthusiasts of all skill levels to learn from one another, share stories, and move the conversation forward. New to bots? Get started with their Beginner’s Guide.

Survey Says

Anonymous Vote
Sign in with Wordpress
What do you want to focus on improving in 2019?
  • Front-End, there's a 3p service for everything now.
  • Back-End, cuz Flexbox...done.
  • Persistence is king, data data data.
  • Algorithms and data structures, can't go wrong with fundamentals.
  • Clean Code, master the tactical before the strategic.
  • Architecture, I've ascended to higher levels of abstraction.
  • DevOps, good luck doing anything without me!

News

  • We take a moment to say thank you to everyone that took time out of their busy schedules to leave us a review:
    • iTunes: Mr_Joe1, Scweiss1, akaryrye, JacobCastiglioni
    • Stitcher: kannan, WeirdFlexButOK
  • How do you get other people within a company to get on board with writing better code?
  • Is it megabits? Or megabytes?

Arrays + Cousins

Array

What is it? And how does it work?

  • A way to store a collection of items.
  • In C, these arrays are stored in a continuous block of memory.
    • You either have to declare the array type and size or you have to initialize the array with the elements at the time the type is defined, or both. Example initializations:
      • int myArray[10];
      • int myArray[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
      • int myArray[10] = {1, 2, 3}
  • No index checking in C – ever get that index out of bounds error in our other languages?
    • So what happens? It doesn’t blow up – you’ll just get unexpected / garbage results.
  • C also won’t blow up if you initialize it with more data that what’s in the defined size – C++, C#, Java, etc will complain at compile time.
  • In languages like C, sometimes people get confused between arrays and pointers. It’s true that arrays start at a spot in memory and then contain the contiguous space in memory, but it’s not the same thing as having a pointer to that first memory location.
    • You can check by doing a sizeof in C to see that they are indeed different.
    • Confusion typically happens because arrays are passed as pointers in C and you access the array members using pointer arithmetic.
  • C arrays can contain anything except void or functions.
  • Can be initiated in any of three memory segments – data, heap and stack.
    • Dynamically allocated arrays go on the heap.
      • What’s a dynamically allocated array? One that’s created using malloc – allocates a chunk of space on the heap by defining how much space you need.
    • Static / Global arrays get allocated to the data segment.
    • Local arrays are put on the stack.

What about JavaScript?

  • Things are loosely enforced at best.
  • You don’t define a length or a type.
  • Technically, it’s nothing more than a special object behind the scenes that use numerical keys for sorting purposes.
  • Because it’s an object, it is copied by reference.
    • Don’t believe us, just put this into your console typeof []
  • If using the array (without breaking the rules) there are built in optimizations like storing the data in contiguous memory spaces.
    • What are examples of _breaking the rules_?
      • Manually set a property on an array either with dot notation or using a non-numerical key.
      • Leave major gaps in the indices.
      • Reverse adding items to the array from largest index to smallest.
    • Add items to the array ordered, and life will be great
    • Push / Pop are fast, but Shift / Unshift are costly.
  • Use for .. of
  • Technically can use for .. in because it’s an object, but not a good idea. 10 – 100x slower, AND you can get additional items that you didn’t expect.
  • Length isn’t really length…it’s the largest numbered index + 1
    • let myArr = []; myArr[200] = true; // myArr.length === 201;
  • Length is writeable – what?!
    • Yeah, you can increase it, and nothing majorly bad happens … but if you decrease it, it’ll truncate data in the array above the specified length.

More about Arrays

  • In a strongly typed language, these items must be of the same type.
  • In loosely typed languages, typically you can store whatever you’d like in the collection.
  • In most typed languages, an array is a fixed sized sequential set of data – typically stored in contiguous memory.
  • In scripting languages like JavaScript, the memory allocation may not be contiguous – rather it’s an implementation detail of the engine itself.

When is an array not an array?

  • Arrays generally have fixed sizes because the allocated memory needs to be contiguous. The size can be dynamically set at run time but you generally can’t grow it without moving it.
  • Ruby has dynamic arrays, which are allocated dynamically, and contiguous…but will automatically reallocate more space as needed – copying all of the existing data to a new spot and allocating more space.
  • Python doesn’t have arrays, it has lists which are not fixed and are not contiguous.
  • JavaScript has “arrays” which are objects, with properties like “length” that are saved properties on the object.
  • “In V8 and Carakan (and presumably Chakra), all (non-host) objects (both those that are arrays and those that aren’t) with properties whose names are array indexes (as defined in ES5) are stored as either a dense array (a C array containing some value wrapper) or a sparse array (which is implemented as a binary search tree).”
    • When is a JS array not an array? When you do weird stuff!

Sizes

  • JavaScript: Maximum Size: 2^31 (unsigned 32 bit integer ~ 4.29 billion elements)
  • C#: Maximum Size: By default, the maximum size of an Array is 2 gigabytes (GB). In a 64-bit environment, you can avoid the size restriction by setting the enabled attribute of the gcAllowVeryLargeObjects configuration element to true in the run-time environment. However, the array will still be limited to a total of 4 billion elements, and to a maximum index of 0X7FEFFFFF in any given dimension (0X7FFFFFC7 for byte arrays and arrays of single-byte structures).
  • Java: Maximum Size: 2GB -> 2^31

When to use

  • In typed languages, you’ll probably want to use this if trying to squeeze out every bit of performance available.
    • Otherwise, using the built in List types are typically more flexible and powerful.

Linked List

What is it? And how does it work?

  • Ordered set of objects that contain references the next item in a list, like a conga line.
  • There are also double-linked lists which contain a reference to the previous object.
  • What’s the deal with double?
    • You have the additional overhead of setting previous and next values.
    • Advantage is that it’s easy to drop a node, given the node (i.e.: node.prev.next = node.next) (rather than iterating through the list to find the node‘s prev).

Pros

  • Don’t need to specify a size up front.
    • No over-allocating.
    • No need for contiguous memory.
  • Easy to insert/delete, think about removing an item from a sorted array.

Cons

  • No random access.
    • You need to iterate through the list to get to any particular object.
  • Terrible choice for some algorithms, like binary search.
  • Extra memory required, one pointer per object.

When to use

  • When you have lots of insert/deletes.
  • When you don’t need random access.
  • When you don’t know a fixed size.

What about …

  • Languages that don’t have pointers? Use a reference: var node2 = { value: 5, next: node1 }
  • C#? LinkedList class …isn’t it trivial? Why would you ever need that?
    • Nice to have helpers like AddFirst, AddLast, RemoveLast, AddAfter, etc.
  • LinkedList vs ListList is actually backed by an array, and LinkedList can cheaply add/remove items out of the middle of the list, List can only cheaply add to the end.

Queue

What is it? And how does it work?

  • Very similar to a Stack … the biggest difference is the order in which data is removed.
  • Queues are FIFO’s – First In First Out – oldest items get removed first.
  • Think about your school lunch line – you got in line first, you got served first and then you left the line.

Useful When

  • Ordering is important via the FIFO.
    • Breadth first searches.
    • Resource sharing among multiple consumers – you basically go to your first consumer in line, 2nd, etc.
    • Asynchronous data syncing – data not received at the same rate it is sent.

Derivatives

Priority Queues
  • Similar to the standard queue with the exception that queue items have a priority associated with them, and the higher priority items are dequeued first.
    • If the priorities are the same, then the queue falls back to standard operating mode FIFO.
Useful When
  • Efficient for finding shortest path in Dijkstra’s algorithm.
  • Data compression (Huffman).
  • AI
Deque – Double Ended Queue
  • Allows insert and delete at both ends.
    • This essentially makes a queue a hybrid of a stack and a queue.
Useful When
  • Handling clockwise and counter-clockwise rotations in O(1) time.
  • Efficient at adding or removing items from both ends of the queue.
Circular Queue
  • Also known as a Ring Buffer.
  • Last item in the queue links back to the first item.
  • Seeing if a queue is full – if the first item -1 == last item, then the queue is full.
Useful When
  • Used for memory management – allows you to fully utilize memory locations that might have been skipped otherwise.
  • Traffic systems – constantly cycle through and switch on the next traffic light on a timer basis.
  • CPU scheduling.

Stack

What is it? And how does it work?

  • A stack is a “last in, first out” data structure (LIFO).
  • Think of anything you’ve ever stacked … a stack of plates, a stack of books, a stack of pancakes … you don’t have random access but instead access the top most item on the stack.
  • Stacks have two principle operations, push and pop, and often a peek operation.
    • push – put a new element at the top of the stack.
    • pop – remove the top most element from the stack.
    • peek – see the top most element on the stack without removing it.
  • Stacks can be implemented using at its core either an array of a linked list
    • Technically only a singly linked list is necessary … i.e. we only need to point to the next item in the stack (the one below), because we’re only going to traverse down the stack.

Pros

  • Fast way to know where you’ve been.
    • Reads/writes are O(1) since you are only touching the top of the stack.

Cons

  • In some languages, stacks are used to store both input data as well as return addresses of the calls making them susceptible to stack smashing attacks when the input sizes aren’t verified.
  • This is no random access to a stack … it’s last in, first out.

When to use

  • Backtracking while traversing a graph or tree.
    • Remember depth-first search.
  • Reversing things like a string (or any other array/list like thing).
  • Call stack.
  • Writing your own compiler … stacks are often used to ensure opening/closing constructs like curly braces, parenthesis, etc. are balanced.

Resources We Like

Tip of the Week

Want to share a tip? Submit your tip at https://www.codingblocks.net/tips.

  • Use npm ci for faster, more reliable builds. (The NPM Blog)
    • Thanks gaprogman!
  • doFactory .NET Design Pattern Framework 4.5 (doFactory)
    • Thanks JoeRecursionJoe!
  • BenchmarkDotNet – Powerful .NET library for benchmarking. (GitHub)
  • Use git subtree to merge repositories together or split portions out into new repositories all while maintaining the history. (GitHub)
    • Thanks GreenFieldCoder!
  • devdocs.io – All of the documentation for everything you’ll ever use in one place.
    • Thanks Konrad!

Episode source