Packages

c

ch.ninecode.cim

CIMNormalize

class CIMNormalize extends CIMRDD with Serializable

Handle normalization.

For each element with a 1:N relation, ensure the N referece the 1 and not vice versa.

Linear Supertypes
Serializable, Serializable, CIMRDD, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CIMNormalize
  2. Serializable
  3. Serializable
  4. CIMRDD
  5. AnyRef
  6. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CIMNormalize(spark: SparkSession, storage: StorageLevel = StorageLevel.MEMORY_AND_DISK_SER)

    spark

    The Spark session this class is running in.

    storage

    The storage level to cache the resultant RDD.

Type Members

  1. case class Relation(parent: String, parent_class: String, relationship: CIMRelationship, referred: List[String]) extends Product with Serializable

    A denormalized relation to be fixed.

    A denormalized relation to be fixed.

    parent

    the id of the object containing the denormalized list

    parent_class

    the class of the denormalized object

    relationship

    the relationship (from the static list in the companion object of the class or superclass with the 1:N relation)

    referred

    the list of N element id that need to be fixed

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def add(arg: (String, (Element, Iterable[Relation]))): (String, Element)

    Add the normalized fields given by the relation.

    Add the normalized fields given by the relation.

    arg

    the id and a tuple of the original element and the relations to add

    returns

    a tuple of the id and a new (fixed) element

  5. def asDataSet[T <: Product](implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], spark: SparkSession): Dataset[T]

    Get a typed DataSet for the given class.

    Get a typed DataSet for the given class.

    T

    The type of RDD.

    spark

    The Spark session.

    returns

    A typed dataset of the RDD.

    Definition Classes
    CIMRDD
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def biggest(name: String): (Int, (Int, RDD[_])) ⇒ Int

    Find the largest integer after the name| pattern in a foldLeft operation.

    Find the largest integer after the name| pattern in a foldLeft operation.

    name

    the name to check

    Definition Classes
    CIMRDD
  8. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  9. def companion[T](name: String)(implicit man: Manifest[T]): T

    Retrieve a (static) companion object.

    Retrieve a (static) companion object.

    T

    the trait of the companion object (what to cast it to)

    name

    the class name of the object with the companion to get

    man

    the type information for the object

    returns

    the companion object as type T

  10. def do_normalization(): RDD[Element]
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  13. def get[T](implicit arg0: ClassTag[T], spark: SparkSession, log: Logger): RDD[T]

    Get the typed RDD.

    Get the typed RDD.

    Convenience method where the name of the RDD is the same as the contained class type (the usual case).

    T

    The type of the RDD, e.g. RDD[T].

    spark

    The Spark session which persisted the typed RDD.

    log

    A logger for error messages.

    returns

    The RDD with the given type of objects, e.g. RDD[ACLineSegment].

    Definition Classes
    CIMRDD
  14. def get[T](name: String)(implicit arg0: ClassTag[T], spark: SparkSession, log: Logger): RDD[T]

    Get the named RDD.

    Get the named RDD.

    T

    The type of objects contained in the named RDD.

    name

    The name of the RDD, usually the same as the CIM class.

    spark

    The Spark session which persisted the named RDD.

    log

    A logger for error messages.

    returns

    The typed RDD, e.g. RDD[T].

    Definition Classes
    CIMRDD
    Example:
    1. The RDD of all elements is somewhat special, currently it is named Elements (plural), so this method must be used:

      val elements: RDD[Element] = get[Element]("Elements")
  15. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  16. def getOrElse[T](implicit arg0: ClassTag[T], spark: SparkSession): RDD[T]

    Get the typed RDD or an empty RDD if none was registered.

    Get the typed RDD or an empty RDD if none was registered.

    Convenience method where the name of the RDD is the same as the contained class type (the usual case).

    T

    The type of the RDD, e.g. RDD[T].

    spark

    The Spark session which persisted the typed RDD.

    returns

    The RDD with the given type of objects, e.g. RDD[ACLineSegment], or an empty RDD of the requested type.

    Definition Classes
    CIMRDD
  17. def getOrElse[T](name: String)(implicit arg0: ClassTag[T], spark: SparkSession): RDD[T]

    Get the named RDD or else an empty RDD of the requested type.

    Get the named RDD or else an empty RDD of the requested type.

    T

    The type of objects contained in the named RDD.

    name

    The name of the RDD, usually the same as the CIM class.

    spark

    The Spark session which persisted the named RDD.

    returns

    The typed RDD, e.g. RDD[T], as either the persisted RDD or an empty one if none was found.

    Definition Classes
    CIMRDD
  18. def get_denormalized(arg: (String, Element)): List[Relation]

    Identify objects that are broken.

    Identify objects that are broken.

    arg

    the id and element to check for denormalized relation

    returns

    a list of denormalized 1:N relations

  19. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  20. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  21. implicit val level: StorageLevel
  22. def like(name: String): ((Int, RDD[_])) ⇒ Boolean

    Match names with pattern "name|xxx".

    Match names with pattern "name|xxx".

    name

    the name to match

    returns

    true if the RDD name matches the pattern

    Definition Classes
    CIMRDD
  23. implicit val log: Logger
  24. def nameOf[T](implicit arg0: ClassTag[T]): String

    Get the base name of the CIM class.

    Get the base name of the CIM class.

    T

    The type of the class.

    returns

    The base name of the class.

    Definition Classes
    CIMRDD
  25. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  27. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  28. def put[T <: Product](rdd: RDD[T], keep: Boolean = false)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], spark: SparkSession, storage: StorageLevel): Unit

    Persist the typed RDD using the class name, checkpoint it if that is enabled, and create the SQL view for it.

    Persist the typed RDD using the class name, checkpoint it if that is enabled, and create the SQL view for it.

    T

    The type of RDD.

    rdd

    The RDD to persist

    keep

    If true then don't unpersist any existing RDD with the given name.

    spark

    The Spark session.

    storage

    The storage level for persistence.

    Definition Classes
    CIMRDD
  29. def put[T <: Product](rdd: RDD[T], name: String, keep: Boolean)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], spark: SparkSession, storage: StorageLevel): Unit

    Persist the typed RDD using the given name, checkpoint it if that is enabled, and create the SQL view for it.

    Persist the typed RDD using the given name, checkpoint it if that is enabled, and create the SQL view for it.

    Since the use of RDD persistence can be problematic for memory reuse, the keep parameter has special processing. When keep is true any existing RDD of the same name is renamed to name|n where n is sequentially increasing to the next available integer. When keep is false all existing RDD of the above form are unpersisted.

    This facilitates the use-case where the new RDD depends on the existing one. For example, the Elements RDD is rewritten by about, dedup, join, normalize and topological processing, where the new RDD is derived from the original. But when a completely new CIM RDF file is read in, keep=false will unpersist all the derived RDD.

    T

    The type of RDD.

    rdd

    The RDD to persist

    name

    The name under which to persist it.

    keep

    If true then don't unpersist the existing RDD with the given name.

    spark

    The Spark session.

    storage

    The storage level for persistence.

    returns

    The named, viewed and possibly checkpointed original RDD.

    Definition Classes
    CIMRDD
  30. def remove(arg: (String, (Element, Iterable[Relation]))): (String, Element)

    Remove the denormalized fields given by the relations.

    Remove the denormalized fields given by the relations.

    arg

    the id and a tuple of the original element and the relations to remove

    returns

    a tuple of the id and a new (fixed) element

  31. implicit val session: SparkSession
  32. def set(element: Element, child: String, field: String, value: String, setbit: Boolean = true): Element

    Create a new Element with the (possibly child) class's field set to value.

    Create a new Element with the (possibly child) class's field set to value.

    Recursively hunts for child class in the hierarchical element and sets its field to value.

    element

    the element to copy with the changed field

    child

    the name of the class with the field

    field

    the field to set

    value

    the value to set the field to

    setbit

    if true set the bitfield of the field, else clear it

    returns

    the new (cloned) element with the field set

  33. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  34. def test[T](implicit arg0: ClassTag[T], spark: SparkSession): Boolean

    Check for the typed RDD.

    Check for the typed RDD.

    Convenience method where the name of the RDD is the same as the contained class type (the usual case).

    T

    The type of the RDD, e.g. RDD[T].

    spark

    The Spark session which persisted the typed RDD.

    returns

    true if the named RDD exists, false otherwise.

    Definition Classes
    CIMRDD
  35. def test[T](name: String)(implicit arg0: ClassTag[T], spark: SparkSession): Boolean

    Check for the named RDD.

    Check for the named RDD.

    T

    The type of objects contained in the named RDD.

    name

    The name of the RDD, usually the same as the CIM class.

    spark

    The Spark session which persisted the named RDD.

    returns

    true if the named RDD exists, false otherwise.

    Definition Classes
    CIMRDD
  36. def toInt(s: String): Option[Int]
    Definition Classes
    CIMRDD
  37. def toString(): String
    Definition Classes
    AnyRef → Any
  38. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  40. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated @deprecated
    Deprecated

    (Since version ) see corresponding Javadoc for more information.

Inherited from Serializable

Inherited from Serializable

Inherited from CIMRDD

Inherited from AnyRef

Inherited from Any

Ungrouped